The Architectural Blueprint

Introduction

The term "architectural blueprint" shifts meaning depending on who you ask. This article describes how I define and use it when building Data Warehouses -- not as universal truth, but as a practical approach that has served well over the years. If it differs from how you work, maybe it sparks an idea or two.

Abstract line art depicting a central geometric threshold of straight parallel lines, with flowing curved paths approaching from the left and emerging more ordered on the right, representing architectural principles that shape decisions passing through them.

Stand the Test of Time

The architectural blueprint describes the vision of what we want to achieve with our architecture. It contains nothing about technology, processes, modeling methods, or design patterns. We might change all of those, but the fundamental vision should still hold true.

Deals in Absolutes

The blueprint deals in absolutes -- no compromises, no exceptions. It describes the best possible logical architecture, the one that will make the system be its best throughout its whole life cycle. It does not account for -- "But if we need to do this because of this or that", or "if the business wants the data now what do we do then", or "when we work with Machine Learning we should be able to do it this way". Those are requirements. When requirements need to break the fundamental principles, that's when the blueprint's power comes into play.

The Gatekeeper

Anyone that has worked with architecture knows it is all about compromising when things do not work as we want. So what value does the blueprint give us when reality wants to break the absolutes?

"When you argue with reality, you lose, but only 100% of the time." -- Byron Katie

So we do not argue with the reality, we define the cost that the reality wants to impose on us.

The principles define the way we "want" the system to work. When a requirement does not fit, we need to take an informed decision. But if we do not have an absolute principle that is described -- why we have it, what effect it has -- we have nothing to break. We will just do. No greater thought on how the requirement will affect the system's life cycle, since it is not described.

As a gatekeeper, the blueprint describes how we want the system to work. When we know that, we also know what requirement(s) force us to go against the principle(s). We can measure the effect of breaking them -- should we break it, why, how, and the future cost it will incur for the system.

Time and again, discussions about breaking an architectural principle have ended the same way: once the effect and cost are laid out clearly, the business themselves back down and say, "No, let's follow the principle. We can wait to get this done properly -- it is not that important to get it done this quickly. It is not worth it!"

But other times, we have gone ahead and broken the principle, with everyone involved -- the product owner, the requirement owner, the data warehouse architect, and the data engineer -- all understanding why and what cost it will incur on the system's life cycle.

In both cases, the blueprint has done its work -- to make us take informed decisions.

The System Architectural Blueprint -- Example

How can you create a blueprint that stands the test of time when technology and processes constantly change? By dealing in "logic" -- undeniable logic that holds true regardless of technology, process, or tooling.

The Vision Statement

We set up a vision statement for the system as an IT artifact -- not detailed business requirements, but what the system itself must achieve:

"The Data Warehouse has to be
Maintainable
Scalable
Adaptable
Good time to market
throughout its whole life cycle."

Each point is then broken down. No technology, no process, no specific use-cases -- just actionable points.

Maintainable -- Complexity has to be controlled at all times and grow deliberately, not organically.

Scalable -- The system has to grow in data and data concepts without affecting maintainability.

Adaptable -- The system has to handle change of requirements (internal/external, large/small) without affecting maintainability.

Good time to market -- The system has to respond to changes and produce value within acceptable time periods, while upholding the fundamental principles. Not the fastest way, but always within acceptable timing.

Throughout its whole life cycle -- Estimated at 15-25 years.

Does this seem abstract? It is -- this is where we start. Now we set up the logical architecture to support the vision and describe the principles.

This is not the only architectural document -- it is the overarching one. From it, other documents spawn covering technology, tooling, processes, modeling techniques, and so on. But they should always support the vision statement.

High Level Logical Architecture

To support the vision, we set up a layered approach.

Three-Layer Architecture for a Data Warehouse

Data According to System (DAS) -- Receives and stores data from source systems in its raw format
Data According to Business (DAB) -- Data stored semantically and structurally source-system agnostic
Data According to Requirement (DAR) -- Prepares and structures data for end users based on specific requirements

If these layers sound familiar, they should. DAS/DAB/DAR map roughly to staging/core/marts in dbt terminology, or bronze/silver/gold in the medallion architecture. The concepts overlap, but the naming here is deliberate -- it emphasizes the purpose of each layer (what the data represents) rather than its position in a sequence or a quality grade. That distinction matters when the names are also your principles.

The three layers exist to:

Break down complexity in code, so data movement between layers has a specific purpose and does not try to do too many things at once.
Limit the effects of changes. Each layer's code will not affect the other layers.
Specialize each layer so the system avoids sub-optimization.

Overarching Principles for the Three Layers

Data usage for analytical purposes is only allowed from the DAR layer.
Reason -- All analytical consumption goes through DAR so DAB remains a stable, reusable foundation. If users query DAB directly, their usage patterns create implicit dependencies that make the system harder to change.
Data in the DAR layer can only come from the DAB layer.
Reason -- Every analytical product must be built on business-integrated, quality-assured data. Bypassing DAB means bypassing the single source of truth, leading to inconsistent numbers and eroded trust.
Data in the DAB layer can only come from the DAS layer.
Reason -- Clean separation between raw data capture and business integration. DAS absorbs all source-system volatility, so changes in data delivery never force changes in business logic.

The following sections describe the principles for each layer. These are the rules the gatekeeper enforces -- the absolutes we measure every deviation against.

Data According to System

The DAS layer captures data in its raw format and persists it.

Principles

Principle	Rule	Why It Matters
Persist	Non-volatile. Not allowed to update or delete data.	The data has to be auditable and the system must be able to reload downstream layers with all historical data at any time.
Simplicity	Capture and persist only. Transformation logic resides in the code that loads DAB.	A simple ingestion layer means new sources can be onboarded with minimal effort.
Metadata	Every data point entering DAS has to be recorded for auditability.	Always have the ability to see from where and when data came into the system.

Data According to Business

The DAB layer serves two purposes:

Break dependencies between the sourcing of data and the usage of data.
Ensure consistency -- the same numbers in all reports when using the same data.

Principles

Principle	Rule	Why It Matters
Persist	Non-volatile. Not allowed to update or delete data.	Business-integrated data has to be auditable and traceable over time. Immutability ensures we can reproduce any analytical result and understand how business data evolved.
Reusability	Produce clear, well-defined atomic business semantics without focus on analytical use-case.	Straightforward business definitions keep the layer focused on the ability to support any use case.
Atomic	Always has the atomic data point as the base of its semantic meaning.	Data has to be reusable, independent of analytical use-case. Aggregations happen when data moves to DAR.
Business Oriented	No semantic or structural representation that is source-system specific or use-case specific. Data has a business representation independent of both.	The system has to adapt to change from both sourcing and usage perspectives. DAB buffers against changes in either from cascading to other layers.
Metadata	Every data point entering DAB has to be recorded for auditability.	Always have the ability to trace data lineage from source through integration, to understand what business rules were applied and when.

Data According to Requirement

The DAR layer creates and supports data products that user groups want.

Principles

Principle	Rule	Why It Matters
Persist	Unlike DAS and DAB, DAR structures may be volatile where the requirement demands it (e.g. full refreshes of aggregated views), but all changes must be traceable.	Reproducibility is essential. Users and auditors need to understand what data was served, when, and how it was derived from DAB.
Specialization	Data structures and data points are specialized to support the specific requirement(s).	Data should be easy to use, shaped exactly to what each user group needs.
Requirement Oriented	Contains the exact semantic and structural representation for specific requirement(s). No "reuse" between different requirements. Different user groups must not share structures even when requirements align. No intra-dependencies between requirements of different user groups.	Reusability in DAB ensures ease of creating unique analytical structures. DAR therefore focuses on specialization per user group, ensuring no contention between requirements. This supports good time to market throughout the system's life cycle.
Metadata	Every data point entering DAR has to be recorded for auditability.	Always have the ability to see what data was delivered to which user group, when, and how it was derived from the business-integrated layer.

We will stop here. Normally the blueprint covers much more, but this illustrates the idea. No mention of technology, modeling methods, or deployment processes -- only the logical approach and the principles that ensure the vision will work.

Too Smart for Your Own Good -- Why Exceptions Kill the Blueprint

Many of you will directly see that these principles will not hold in the real world. Requirements and analytical use-cases will have a hard time working according to them. The temptation is to add "exceptions" -- since deviations will happen, why not cover all possible cases up front?

This might sound responsible. But what happens is that we open Pandora's box. We are giving people the "right" to break fundamental principles without discussion. No mandate left in the document.

Since each "exception" breaks the fundamental principles, it builds a less good system from a maintainability, scalability, and adaptability point of view. Time to market erodes as more exceptions are deployed, and the system's lifetime gets shorter.

Let's make this concrete.

Machine Learning/AI Needs Raw Data?

One classic myth is that Machine Learning and/or AI need raw, unintegrated data to work. That is not the case. Machine Learning and AI thrive on well-defined and integrated data, where quality is high and the ML/AI models get a picture of how the company works, not how one system works. The reason we (think we) need raw data is often twofold:

It takes time to define and integrate data into a business representation in DAB.
The integrated data in DAB is not "atomic" enough, since we have broken the fundamental "Atomic" principle described above.

So the argument used is "we do not have time to wait for the data needed in DAB". Instead we develop our own integration and definitions as a side track, which often takes more time and produces less accurate results than adding the missing data to DAB and building our solution in DAR - because the ML team ends up re-creating integration work that DAB already handles, but without the benefit of cross-domain consistency or reuse.

Also: since the raw data used in the ML/AI solution is not integrated into DAB, we are losing -- if we had done the work and integrated the data into DAB, its reusability would have grown even further. We are sub-optimizing our system and platform.

Now imagine we had a classic "exception" in our blueprint: "Machine Learning applications are allowed to use raw data because they will not always work with the data in DAB."

That exception removes the mandate. We have already given the right to bypass the architecture and many fundamental principles.

But without the "exception", the ML project would have to explain why they need to short-circuit the system. Everyone involved could take into account:

The cost for the system architecture's life cycle
The loss of reusability (since we short-circuit the architecture)
Cost of maintainability, scalability, adaptability, and time to market for the whole system

We can also take an informed decision based on these questions:

What time and cost would it take to build the ML application against raw data?
What time and cost to define and integrate the data needed into DAB?
What time and cost to build the ML application against DAR?
What future value would integrating the data into DAB first give our company and the system?
...and many more questions.

That is the gatekeeper doing its work. Not blocking, but forcing the conversation.

Conclusion

An architectural document that deals in absolutes does not hinder development -- there will always be use-cases that do not fit. The purpose is to take informed decisions before we short-circuit the system architecture, and to understand the cost of doing so.

When the blueprint works as intended, it protects the vision: a system that stays maintainable, scalable, adaptable, and delivers good time to market throughout its whole life cycle. The gatekeeper does not say "no" -- it says "let's understand what this costs us before we decide." That single shift, from reflexive action to informed decision, is what separates systems that last fifteen years from systems that need replacing in five.