Unlocking Data Unity: Strategies to Tackle the Disconnect Dilemma
author-image
Gregor Sieber
April 16, 2024
·
9 MIN Reading time

Unlocking Data Unity: Strategies to Tackle the Disconnect Dilemma

Straddling the line between the ambitious visions we see in the media and the reality of delivering actionable data insights can feel like navigating a maze. EBCONT is here to tackle some of the hurdles you might face in today's IT world. In this series, we're diving deep into the trenches, shedding light on the real-world challenges and presenting practical solutions. Whether it's taming unruly data sets or optimizing your IT infrastructure, EBCONT has a roadmap to guide you.

slider-thumbnail
play-icon
Disconnected data and common data models

Every organization faces challenges that could be addressed if they were able to harness the full potential of the data within their grasp. In most organizations, there are sources of data that would create actionable insight if they were joined together in a common analytic context. Frank Blau, Data Architect and IT consultant at EBCONT points out that this situation is not abnormal, but that it should be addressed in an effective Data Architecture strategy.

In today's data-driven world, organizations are inundated with large amounts of data flowing in from multiple  sources. This creates a series of distinct problems that are commonly faced:

  • Data is stored in different systems (for perhaps good reasons) that do not communicate with each other and cannot be used to uncover insights that their alignment would reveal.
  • Data is arriving in such high velocity that it cannot be consumed effectively by business stakeholders
  • Data arrives in different formats and structures, and more than one approach is needed to understand and consume them.

This is where modern data architecture steps in, offering a framework to bring together disparate data into a cohesive structure that can fuel analytics, business intelligence, and the power of (Gen)AI.

Data is not always in the most convenient location to deliver the insight that it may reveal. This may be due to regulatory or even good design principles. Being able to address the needs for holistic insight requires understanding the fundamental requirements driving those location choices and how best to enable the data to contribute to the desired analytic outcomes. Modern data architecture supports several approaches to dealing with these concerns:

Data Federation: the creation of virtual views across multiple sources of data to offer the user a unified perspective.

Data Modeling using Modelstorming: Rather than monolithic waterfall models, application of Agile techniques to data modeling, focused on process outcomes, can produce more adaptable and functional data models. Model storming is a collaborative process where cross-functional teams brainstorm and iterate on data models and analytical approaches. By bringing together diverse perspectives and expertise, organizations can uncover novel insights and innovation.

ETL and ELT processes: Both of these are processes for extracting, transforming and loading data from one system (typically transactional) for use in another system (typically analytic). The difference between them is where and when the transformation is occurring; In ETL it happens before you load the target system and in ELT it happens after.

Data Hubs: A data hub is a centralized repository designed to facilitate the storage, management, integration, and distribution of data from various sources. It serves as a central point in an organization's data architecture and allows for significant improvements over a data federation approach. 

Driving Innovation: Iterative Modelstorming

While delivery of a system-level data architecture model is still a core activity in the overall data management process, it has limitations that are exposed as teams begin to work with it  in agile processes. When one team is working holistically to realize a product (the data model) over the entire architecture, and the developers are working on discrete stories, features and sprints, there is a mismatch in goals and activities that is not effective or productive. In order to accommodate this (necessary) evolution of the modeling process, we have embraced the iterative techniques of agile development in order to better align with these principles. Iterative processing involves a cyclical approach to data analysis, where insights inform subsequent iterations of data modeling and analysis. This iterative approach enables organizations to adapt to changing business requirements, refine their analytical models, and uncover deeper insights over time.

The process to deliver this goal is called modelstorming. This is a collaborative effort between the data architects, developers and importantly, the users of the data architecture. It is a workshop-based approach that strives to deliver a joint understanding between all relevant parties. These participants should be able to deliver the following inputs:

  • Participate  in discussions about  business processes.
  • Identify key elements of processes to be recorded and measured.
  • Describe data related to the processes being discussed.
  • Collaborate on a consensus for the goals and outcomes of the project

While there should still be well-considered system architecture and data management artifacts, the deliverables for the data modeling team will also be more discrete data models, sometimes just at the level of a single inter-application process, such as an API interface or message queue. These component artifacts should also be consistent in form and content with the broader data modelling deliverables. Examples of iterative modeling outputs include:

  • Business Event and Modeling diagrams (also known as BEAM) that uncover and document discrete business processes and events.
  • Hierarchy Charts to describe relationships between dimensional attributes
  • An event matrix that describes the relationships between events and dimensions
  • Data mart schemas that describe business-facing perspectives for consumption


From Silos to Synthesis: Conquering Data Fragmentation the Data Hub Pattern

Fragmented and Siloed Data One of the primary hurdles organizations face is the existence of data that may provide more insight when joined with other data in the broader landscape. Departments often store data in isolated systems, leading to fragmented information and inhibiting cross-functional insights. Silos can hinder collaboration, innovation, and ultimately, the ability to derive meaningful value from data. For example, one system such as a web platform may contain the analytics used to derive customer intent, while another ERP system may contain the transactional activity for your customers. Without synthesizing these disparate data sources, it is difficult to understand the quantitative relationship between intent and purchasing. In another example, there may be incoming data related to patient status that is generated by machines and sensors. Naturally this data is used operationally for the monitoring of patients, but if it is brought into a data architecture where it can be analysed across a broader population (with appropriate regulatory compliance), there may be opportunities to uncover treatment modalities that offer better outcomes for more patients.

Crafting the Data Lifecycle: Modeling, Extracting, and Transforming
At the heart of modern data architecture lies data modeling. This involves designing a blueprint that defines how data is structured, stored, and accessed within the organization. A well-designed data model provides a common language for stakeholders and ensures consistency and accuracy in data interpretation. ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes play a crucial role in data integration. These fundamental practices involve extracting data from various sources, transforming it into a consistent format, and loading it into a centralized repository or data lake. By automating these workflows, organizations can streamline data ingestion and preparation, enabling faster insights.

Data federation has emerged as one solution to break down these silos. By creating virtual views of data across disparate sources, organizations can access and analyze information seamlessly. This approach fosters agility and flexibility, allowing for real-time insights without the need to physically consolidate data into a single repository, which can be a considerable time and cost savings. It should be noted that many popular tools such as PowerBI and Tableau, allow you to materialize this approach directly within the visualization tool, including transformations, aggregations and joining to multiple sources simultaneously. While this virtual approach can work well when your data is relatively co-located, the weakest link in your data chain will ultimately determine the quality and performance of the system, making this an approach that is often unsuitable for tasks such as building novel applications that should make use of such harmonized data.

Data Hub: The Centralized Nexus for Actionable  Insights
One emergent approach is the development of a data hub, where multiple services and repositories are combined into a single point of innovative interactions.  This data hub serves as a centralized nexus for storing, managing, and accessing integrated data. It acts as a single source of actionable truth,  providing a unified view of the organization's data assets, regardless of format or schema. While the data hub is not a replacement for a data warehouse, it may be delivered as a means to collect and conform complex source data that may be arriving in different formats, granularity and velocity.  By leveraging a data hub’s ability to deliver harmonized data at scale, organizations can simplify data management, improve data quality, and empower users with smart data mastering and self-service analytics capabilities.  High-performance data hubs can also be deployed as a component in a hybrid architecture, offering a best-of-breed solution between the federated data, data mesh and data warehouse platforms. 

EBCONT has demonstrated considerable success delivering data hub solutions across a broad range of markets, platforms and architectures. We offer the ability to collaborate on a deep dive analysis of how your business requirements can inform an intelligent decision about how this approach can fit into your data management strategy.

Summary

While there are often good reasons for an organization to separate data into different tiers of storage and access, many times these tiers and their artifacts become an impediment to a holistic analytic perspective. Understanding these processes and roles and creating a data management strategy is at the heart of an effective data architecture to mitigate this. Being able to understand the ways that data can become disconnected and how to bring it back into a performant conformed perspective would be a key deliverable of your architecture. Starting with collaborative data modeling or modelstorming, and adopting appropriate features like iterative processing, ETL/ELT and data hubs are fundamental building blocks for addressing these challenges.

Frank Blau explains: “Bringing together a customer's disconnected data into a common data model and data access layer is crucial for analytics, business intelligence, and to harness the power of (Gen)AI. Data architecture is the practice of deciding how to build this data model, and with what tools. And that is what we at EBCONT can help.

And Gregor Sieber (Executive Vice President / Head of Consulting and Innovation at EBCONT) delves deeper into best practice :”One of our customers in the finance and regulatory space faced the challenge of needing to access data stored in a large number of IT systems whenever they wanted to roll out a new web-based customer product, which created a lot of overhead and duplicated dependencies. As parts of the data were inconsistent, it was also a danger that the validation logic for this data would become inconsistent between the different customer products. By moving to a data hub approach, we were able to centralize the data mastering approach and provide a single, consistent, and scalable API for all new customer products. This has greatly accelerated the creation and roll-out of new products and improved the governance on a data and processing level, which is an essential advantage for organizations in the finance space.”

For more information please contact gregor.sieber@ebcont.com.