<img alt="" src="https://secure.agile365enterprise.com/790157.png" style="display:none;">
28526

Organizations across the globe rely on analytics products that can sift through complex datasets - derive actionable insights, make informed decisions, optimize operations, and stay ahead in competitive markets. Robust data architecture is at the heart of these solutions, it provides a solid foundation that ensures data is not only accessible and manageable but also  transforms it into a strategic asset for driving innovation and competitive advantage.

Client

Our client is a renowned global management consulting firm, known for their data-driven, research oriented approach and teamwork. They work across different industries, offering services like strategy development, organizational design, operational improvement, digital transformation, and risk management.

Our client leverages their in-house data product to support their B2B and B2C clients. With interactive and customizable dashboards, it allows their users to visualize complex data sets. Their market intelligence capabilities enable their users to stay informed about industry trends, competitor activities, and market dynamics and improve marketing effectiveness and enhance customer experiences. It also enables strategic decision making for optimizing pricing, promotions, and sales for both B2B and B2C markets.

Highlights:

The key solution highlights include:

  • Adopted Databricks as the core platform for big data needs
  • Implemented a standard development framework that enabled feature reuse
  • Enabled near real-time support for selected pipelines

These initiatives aim to streamline data processing and ensure a unified view of the data product across all microservices and clients.

Challenges

Our client faced these challenges: 

  • The Discrete Data Lake team struggled with keeping a consistent development pace, resulting in delays and inconsistencies in feature releases and updates.
  • The process of setting up new clients to use the product was time-consuming, involving extensive configuration and customization, which hindered swift adoption and deployment.
  • Due to variations in client requirements and evolving product features, the team had to frequently reconcile and synchronize the codebase across different client instances, resulting in complexity and potential errors.
  • The use of disparate data management tools and platforms across clients led to fragmentation and inefficiencies in maintaining the data lake infrastructure, complicating data integration and management processes.
  • For their ETL pipelines, processing large data batches took about 10-12 hours, leading to delays.

Requirements 

To address their existing challenges, our client wanted to revamp the product's data architecture. This involved creating a unified and performance-oriented data pipeline to ensure consistency across various client engagements and deployment setups. The key requirements were to:

  • Centralize data pipeline codebase across different clients to streamline development and maintenance processes
  • Migration to Data Lake Solution: Transition the storage platform from multiple storage technologies (like Exasol and Snowflake) to a single, efficient delta lake solution, enhancing data management and integration capabilities.
  • Speed up the process of new client onboarding.

The Solution

To enhance the efficiency and functionality of the data architecture, a comprehensive solution was implemented with several key components:

  • Standardization of Data Hub Development: We established a uniform approach to developing data hubs across all microservices and products within the organization.
  • Databricks Adoption: Migrated to Databricks as the main platform for handling all big data requirements. Its powerful data processing capabilities became central to their data strategy, enabling more efficient data handling
  • Standard Development Framework: Implemented a standard development framework to allow for the reuse of features across the pipeline
  • Deployment Process Standardization: We standardized the deployment process across all clients, making it easier for them to access and benefit from new features without complex integration efforts.
  • HUB and Spoke Data Exchange Model: Set up a HUB and Spoke model to efficiently exchange large data between microservices, with the data lake acting as the central hub.
  • Reduction of Processing Time: By optimizing multiple data pipelines, we managed to reduce processing times dramatically, with some pipelines now completing in under 10 minutes—a significant improvement from the previous 8-10 hours.
  • Near Real-Time Data Support: For selected pipelines, we introduced near real-time processing capabilities using Distributed Ledger Technology (DLT) pipelines where feasible.
  • Unified Data Product View: A single, consolidated view of the data product was provided, ensuring consistency across microservices and for all clients.
  • Direct Data Access via SQL Warehouse: Leveraging SQL Warehouse technology, some microservices can now directly access output data from the data lake, bypassing the need for complex export processes.

Tech Stack

The technology stack we used for the project were:

  • Databricks
  • Delta Lake
  • Microsoft Azure
  • Python
  • Event Streaming

Benefits

Here are the business advantages:

  • Streamlined Processes: Standardization and adoption of efficient frameworks and Databricks streamlined development and maintenance processes.
  • Efficient Data Processing: Achieved a significant reduction in ETL data processing times -with some of the tasks now taking under 10 minutes instead of the previous 8-10 hours.
  • Reduced Onboarding Time: Significantly decreased the time required to onboard new clients, enhancing client satisfaction and service efficiency.
  • Unified Data Product View: Established a consistent and unified view of data across the organization, improving data usability and access.
  • Real-Time ETL Support: Set the foundation for achieving near real-time support for ETL processes, promising even quicker data processing and analytics.

Subscribe to our newsletter