The Garbage In Problem

0 July 17, 2025

Robert Mack

Garbage, Garbage Everywhere

A common catchphrase within the IT community is “Garbage in, Garbage out”. You can best characterize most current data architectures as a collection of disparate datasets that do not work together effectively. So, when you combine data from siloed datasets, you typically yield invalid results! Collectively, these siloed datasets contain conflicting master data representations that cannot be resolved to join the datasets reliably. When you use data from multiple disparate datasets as a data source, you have the ‘Garbage In’ effect.

You should not trust any data administration methods you attempt downstream of these disparate source datasets. It is too late to correct the ‘Garbage In’ issue. Some of these popular data administration methods are ETL/ELT, Master Data Management, Data Mesh, and Semantic Knowledge Graph. Have you ever wondered why your data quality is so bad and your analytical data results are not trusted? ‘Garbage In’ is often the answer, as there is no way to audit, validate, or resolve the resultant dataset against the disparate source datasets.

Data Context is King for Dataset Interoperability

Our independent research confirms that the contextual characteristics of a dataset play a crucial role in determining its interoperability with other datasets. We define a dataset’s data context is the shell of master metadata and data content that encapsulates properly formed datasets. Siloed datasets have disparate data contexts since data integrity is not enforced between siloed datasets. In contrast, Universally Interoperable Datasets have compatible data contexts as data integrity is enforced between all interoperable datasets. When we enforce data integrity between datasets, we resolve the disparity of the dataset’s data context.

Fortunately, siloed datasets can be individually enriched to become universally interoperable by the simple addition of Data Compatibility Standards. Without altering the original dataset metadata and data content, these standards are added to provide a compatible data context that encapsulates the dataset. Beyond providing Universal Dataset Interoperability, our compatible data context enriches each dataset to be ready for analytics and AI.

To eliminate your ‘Garbage In’ problems, the data context of each source dataset must be enriched to be compatible. Compatible datasets are so compatible that they form a distributed data fabric. Each compatible dataset is a modular plug-and-play component of the fabric. Within your fabric, all your data content is readily accessible for end-to-end auditing, justification, and problem resolution. These essential data governance functions are absent when datasets are siloed. With data compatibility, you finally have the solid data foundation required to support information building, analytics, and more modern technologies such as AI/ML and Knowledge Graph.