A common catchphrase within the IT community is “Garbage in, Garbage out”. You can best characterize most current data architectures as a collection of disparate datasets that do not work together effectively. So, when you combine data from siloed datasets, you typically yield invalid results! Collectively, these siloed datasets contain conflicting master data representations that cannot be reconciled to reliably join them. When you use data from multiple disparate datasets as a data source, you have the ‘Garbage In’ effect.
You should not trust any data administration methods you attempt downstream of these disparate source datasets. It is too late to correct the ‘Garbage In’ issue. Some popular data administration methods include ETL/ELT, Master Data Management, Data Mesh, and Semantic Knowledge Graph. Have you ever wondered why your data quality is so bad, and your analytical data results are not trusted? ‘Garbage In’ is often the answer, as there is no way to audit, validate, or reconcile the resulting dataset with the disparate source datasets.
Our independent research confirms that the contextual characteristics of a dataset play a crucial role in determining its interoperability with other datasets. We define a dataset’s data context as the shell of master metadata and data content that encapsulates properly formed datasets. Siloed datasets have disparate data contexts because data integrity is not enforced across silos. In contrast, Universally Interoperable Datasets share a common data context, with data integrity enforced across all Universally Interoperable Datasets. When we implement data integrity across datasets, we resolve discrepancies in their data contexts.
Fortunately, siloed datasets can be individually enriched to become universally interoperable by simply adding Data Compatibility Standards. Without altering the original dataset metadata and data content, these standards are added to provide a universal data context that encapsulates the dataset. Beyond providing Universal Dataset Interoperability, our universal data context enriches each dataset, making it ready for analytics and AI.
To eliminate your ‘Garbage In’ problems, the data context of each source dataset must be enriched to ensure universal interoperability. These Universally Interoperable Datasets are so compatible that they form a Modular Data Fabric. Each Universally Interoperable Dataset is a modular plug-and-play component of the fabric. Within your fabric, all your data content is readily accessible for end-to-end auditing, justification, and problem resolution. These essential data governance functions are absent when datasets are siloed. With data compatibility, you finally have the solid data foundation required to support information building, analytics, and more modern technologies such as AI/ML and Knowledge Graph.