Published on

Why Definitions Are the Foundation of Every Data Warehouse

Authors
Line-art mission-control HUD diagram in Daana blue on near-black: a hierarchical stack labeled Concept, Characteristic, Domain, Value forming the foundation; above the foundation, multiple source-system shapes converging into a single integrated information surface, with thin connecting lines tracing the integration flow

Organizations rarely have just one source system. They have dozens - sometimes hundreds - spanning departments, divisions, and countries, each producing overlapping information under different names and structures. When they need to report across these boundaries, they build a Data Warehouse. But the Data Warehouse's primary job is not storage or performance or dashboards. It is Information Integration - creating a shared view of information that works regardless of where the data originated.

Integration has two sides: model integration (how we structure the common view) and semantic integration (what the information actually means). These are tightly intertwined, but semantic integration is where most teams struggle - and it is the focus of this article. Semantic integration depends on one foundational ingredient: Definitions.

Without clear Definitions, we cannot determine whether data from different systems refers to the same thing. Without that determination, there is no integration - just an expensive copy of source systems.

This is the first article in a three-part series. Here, we establish what Definitions are and why they form the foundation of any Data Warehouse. In Part 2, we apply this hierarchy to integration work in practice - how failure to have definitions can hurt the work of integration and much more. In Part 3, we explore how this integration discipline translates to declarative architectures and AI-driven platforms.

The Definition Hierarchy

We need Definitions at four levels, each building on the previous one:

  1. Concept - What kind of thing are we describing?
  2. Characteristic - What properties describe instances of this Concept?
  3. Domain - What values can a Characteristic take?
  4. Value - What does a specific value within a Domain mean?

Two additional elements complete the picture: Terms (the labels we refer to Concepts by) and Instances (the individual things that fulfill a Concept's Definition). We will walk through each, starting with the most fundamental distinction - the difference between a name and a meaning.

Term: A Label, Not a Definition

A Term is simply a name - a word or group of words that refers to a Concept. The Term "Car" by itself tells us nothing about what a car is. When someone says "I have a car," we understand only because we already carry a Definition of the Concept that the Term labels.

Terms are how we communicate, but they are not how we understand. A Term works as a "knowledge bearer" for those who already know the Definition behind it. For those who do not, it is meaningless - and communication about it becomes impossible.

In data warehousing, this plays out constantly. Two systems might use different Terms for the same Concept, or - more dangerously - the same Term for different Concepts. Without explicit Definitions, we cannot tell which case we are looking at.

Concept: The Core of Integration

A Concept represents a "kind of thing." We naturally understand that every individual thing can be seen as an instance of some kind, and a Concept captures that abstraction. While the Concept itself never exists in the real world, it carries the Definition we rely on to sort and interpret information.

This is the central theme of Information Integration: finding commonality among information, regardless of where it was created in the organization. We do this by identifying shared Concepts and agreeing on their Definitions. How else would we know whether data from different source systems truly describes the same kind of thing?

A Concept's Definition should answer one question: What is that? The clearer the answer, the more reliable the integration. A poorly crafted Definition leads to incorrect groupings - records that look similar get integrated despite representing fundamentally different things.

This requirement for clear Definitions is what we call Semantic Clarity - the principle that every entity, attribute, relationship, and metric must have clear, business-oriented Definitions accessible to both humans and AI systems across the organization. Without it, ambiguity grows with scale, and integration becomes guesswork - for humans and AI systems alike.

Characteristic: Describing the Instance

A Characteristic holds information that describes an instance of a Concept. If the Concept is "Customer," a Characteristic might be "Date of Birth." That value describes a specific customer, but it is not part of the Definition of "Customer" itself.

The key insight: each Characteristic is its own Concept. We need to define it independently because we need to know what data can legitimately populate it. This recursive principle - that each level in the hierarchy is itself a Concept requiring its own Definition - applies at every level from here on.

Consider "Bank Account" with the Characteristic "Account Balance." What does that actually mean? A simplified Definition: the balance at a specific point in time, representing the monetary value of settled transactions, not including accrued interest.

Without that Definition, we might load values that include unsettled transactions or accrued interest - mixing data with different meanings into the same field. Practitioners call this "mixing apples with pears," and it leads to reporting that looks correct but produces wrong conclusions.

Domain: Where Values Come From

Each Characteristic draws its values from a Domain. There are two main types:

Open Domains require little integration effort. Date is a good example - "Date of Birth" takes values from the Date Domain. The values themselves do not need integration; only the format does.

Closed Domains often require significant integration work. A Domain, like every other level, is also a Concept and needs its own Definition. Consider "Account Balance Currency," which uses the Domain "ISO 4217 Currency Code" - a standardized set of values with its own Definition.

Values within a Domain also need a defined format. A date must follow its specified format regardless of how the source system stored it. Format integration is mechanical, but skipping it breaks downstream consistency.

Value: When Individual Entries Carry Meaning

Individual Values can also represent their own Concepts. If we have a Concept "Organization" classified by the Global Industrial Classification System (GICS), one Value might be code "10" - the Term "Energy Sector," defined as:

Companies whose businesses are dominated by the construction or provision of oil rigs, drilling equipment, energy-related services, or the exploration, production, marketing, refining, and transportation of oil, gas, coal, and other consumable fuels.

This Value, like the levels above it, is its own Concept with its own Definition. The same applies to ISO 4217 Currency Codes, where each code carries a Definition.

The ability to integrate Values - not just Concepts, Characteristics, and Domains - is what completes the common information view in the Data Warehouse.

Instance: Where Definitions Meet Reality

An Instance is an individual thing that fulfills the Definition of a Concept - a specific bank account, a particular customer, an individual organization. The only way to determine whether something qualifies is to check it against the Concept's Definition.

Through the Values each Instance holds across its Characteristics, we can analyze and report. When the Concept, Characteristics, Domains, and Values are all integrated according to their Definitions, the information becomes usable throughout the organization - regardless of which system or country created it. That is the primary purpose of a Data Warehouse.

Acceptance: The Organizational Dimension

The hierarchy above assumes Definitions already exist and are agreed upon. But agreed upon by whom, and at what scope?

A Definition's usability depends on how widely it is accepted. A Definition created for a single division cannot serve as an Enterprise Definition. This sounds obvious, but it is one of the most common problems in practice - expectations about data usability that exceed the scope of the Definitions behind it.

If someone claims an "Enterprise Data Warehouse," the Definitions must be accepted throughout the enterprise. They seldom are. This gap creates frustration, mistrust, and failed projects. Being explicit about acceptance scope - and honest about its limitations - is essential for realistic expectations.

Where This Leads

The conceptual hierarchy above is the foundation, but it is only the starting point. These Definitions shape how both humans and AI systems work with data - from integration decisions and data modeling to platform construction. In Part 2, we look at how Definitions are applied in practice when doing integration work: where they break down, how teams negotiate them, and what happens when they are missing. In Part 3, we explore how this integration discipline translates into declarative architectures and AI-driven platforms.

Conclusion

Information Integration is the discipline of determining which Concept each data record belongs to. The only way to do this reliably is through Definitions at every level:

  • Concept Definition - what kind of thing is this?
  • Characteristic Definition - what describes instances of this Concept?
  • Domain Definition - what values are valid, and in what format?
  • Value Definition - what does a specific value mean?

The organizational challenge is just as real as the technical one. Definitions need broad acceptance to deliver their promised scope, and that acceptance is hard-won. But the investment compounds: Semantic Clarity - every Concept, Characteristic, Domain, and Value clearly defined and broadly accepted - creates a foundation that the rest of the organization can build on.

Where to start? Before integrating data from two systems, write down the Definition of each shared Concept and verify that both teams agree on it. If they do not, you have found the exact place where integration will break.

For a deeper treatment of Definitions and their management - including formal methods for creating, validating, and governing them across an organization - see Definitions in Information Management by Malcolm D. Chisholm, Ph.D. It remains one of the clearest formal frameworks on this subject.