Published on

Why We Built Daana: Reclaiming Lost Knowledge

Authors
  • avatar
    Name
    Siavoush Mohammadi
    Twitter

The Knowledge That Was Lost

I started my career in Stockholm around 2010, learning from some of the greatest data architects in the industry. Data modeling wasn't optional - it was foundational. Every data professional learned Kimball's dimensional modeling, Inmon's enterprise warehouses, normal forms, ensemble techniques like Data Vault, Focal and Anchor modeling. We understood that how you structured data mattered as much as what you captured.

These weren't abstract theories but proven patterns for building systems that could grow without collapsing under complexity. Proper modeling meant source system changes didn't cascade chaotically. Business logic lived in well-defined places. New team members understood the system by examining the model.

Then came the "big data" revolution. A generation of data professionals was told: "You don't need to model. Just throw it in the data lake." Schema on read, not schema on write. ELT replaced ETL. Store everything raw and figure it out later. The lake became a swamp.

The problem wasn't the technology - distributed storage and processing unlocked genuine new capabilities. The problem was discarding decades of architectural knowledge because the new tools didn't enforce it. We went from "model carefully" to "modeling is old-school, just dump it all in."

Teams stopped teaching data modeling because it seemed irrelevant. Senior architects who carried this knowledge retired or moved on. Junior engineers never learned it because companies didn't practice it. The skills that made data systems comprehensible became concentrated in a shrinking pool of practitioners.

The pendulum swung too far. We gained powerful tools but lost the discipline that made data platforms sustainable. Now, 10-15 years later, we're living with the consequences.

The Consequences We're Living With

I've worked across dozens of organizations, and the pattern is impossible to ignore. Data engineers drowning in unmaintainable hand-written SQL. Pipelines that break constantly - 3 AM alerts because some upstream column changed. Data quality issues that erode trust until stakeholders abandon the platform entirely.

The chaos manifests predictably. Point-to-point solutions with no coherent structure. Poorly defined "medallion architectures" where bronze, silver, and gold layers exist in name only - nobody can articulate what belongs where or why. Transformation logic scattered randomly. Business rules duplicated in five places, each implementation uniquely wrong.

Junior engineers never learned data modeling because it's not taught. They write SQL reflecting no understanding of separation of concerns or semantic clarity. They're not at fault - they're doing what they were trained to do in environments that never exposed them to these principles.

Senior engineers hold critical knowledge in their heads with no way to encode it. They know which transformations are fragile, which sources can't be trusted, where hidden dependencies lurk. But this exists as tribal lore, not preserved in any system. When they leave, that knowledge vanishes.

Every company rebuilds the same transformation patterns badly. Same mistakes. Same brittle approaches. Same late discovery that mixing source representation with business logic makes everything unmaintainable. With no encoded knowledge to draw from, each team starts from scratch and suffers through identical problems.

The symptom everyone sees is brittle pipelines requiring constant manual intervention. But the root cause runs deeper - architectural knowledge that's been lost. We stopped teaching and practicing the disciplines that made data systems maintainable, and now we're paying the price in toil and fragility.

What We Lost (And Why It Mattered)

Data modeling isn't about resisting innovation. It's about understanding what you're building semantically. It's separation of concerns - knowing what belongs where and why. Clear interfaces between layers so changes don't cascade chaotically. Systems comprehensible to humans and AI.

Proven modeling approaches exist for different purposes. Ensemble models like Data Vault, Focal and Anchor modeling excel at capturing business perspective in the Data As Business (DAB) layer - handling changing source systems while preserving historical context. Dimensional modeling remains powerful for analytics in the Data As Requirements (DAR) layer, organizing data around how people actually analyze it. Normalized models work for operational reporting. Event-driven patterns serve real-time needs.

These aren't competing philosophies - they're complementary techniques for different layers. The three-layer architecture (Data As System sees it, Data As Business sees it, Data As Requirements needs it) provides clear separation. Source changes stay isolated in DAS. Business semantics stabilize in DAB. Consumption patterns in DAR evolve without touching upstream layers.

Three-Layer Architecture Diagram showing data flow through DAS (raw ingestion), DAB (business entity modeling with ensemble patterns), and DAR (analytics consumption) layers with clear separation of concerns

Data contracts as implementation make this practical. Forgiving ingestion accepts all data. Strict unpacking via contracts. Self-healing pipelines automatically reflect corrections when sources fix mistakes. These patterns have been proven at scale.

But accessing this knowledge requires massive upfront investment - months of design work before seeing results. Most teams can't justify that timeline, no matter how motivated their engineers are.

Want to build proper three-layer architecture? You'll spend months designing, implementing, debugging hand-written transformations. By the time you deliver value, stakeholders have lost patience. The business case for doing it right can't compete with shipping something broken next week.

This is why knowledge stays concentrated. Only organizations with sufficient resources can invest in building these systems properly. Everyone else cobbles together point-to-point solutions because they have no practical alternative.

What We Gained (And What's Still Missing)

While we lost modeling knowledge, we gained something valuable. Tools like dbt brought modern software engineering to data work - version control with Git, CI/CD pipelines, testing frameworks, code review workflows, infrastructure as code. These professionalized data teams.

This matters. These practices made teams more effective, collaborative, and maintainable. Anyone who's worked on data projects before and after this shift knows the difference. We finally got the engineering rigor that software teams had enjoyed for years.

But here's the gap: we have modern engineering practices without modeling knowledge to guide what we're building. Excellent tools for version-controlling SQL that lacks architectural coherence. Testing frameworks validating brittle point-to-point transformations instead of well-designed semantic models. CI/CD pipelines deploying code reflecting no understanding of separation of concerns.

The problem isn't the tools - they work as designed. The problem is using modern engineering practices to build poorly-architected systems more efficiently. We've gotten very good at deploying bad architecture faster.

What if we could have both? Modeling knowledge and modern practices together?

This is where declarative insight matters. Separate knowledge from ceremony. What if proven patterns were accessible without requiring everyone to become ensemble modeling experts? What if understanding business semantics didn't mean months of entity-relationship diagrams before seeing value?

Declarative tooling changes the equation. Declare what you want - business entities, relationships, semantic meaning - and let the system generate how to build it following proven patterns. You get benefits of good architecture without traditional barriers.

Documentation becomes implementation. They can't drift because they're the same thing. Change the semantic definition, transformations regenerate automatically. Not documentation gathering dust in Confluence - documentation that directly drives your platform.

LLMs become dramatically more effective. Feed them hand-written SQL and they can maybe help debug. Feed them semantic definitions of business entities and they become tailored to your organization - answering questions, suggesting transformations, identifying inconsistencies. But only if you have semantic clarity to begin with.

The path forward isn't choosing between modeling knowledge and modern practices. It's combining them. Declarative definitions encoding proven patterns. Version control and CI/CD for those definitions. Testing frameworks validating semantic correctness. The craft of data modeling, made accessible through modern tooling.

Standing on the Shoulders of Giants

We need to be clear: we're not inventing new modeling techniques or claiming some revolutionary approach invalidates decades of data architecture knowledge.

We're standing on the shoulders of giants. Kimball's dimensional modeling principles still hold - they align with how humans actually analyze data. Three-layer architecture is proven - separating source representation from business semantics from consumption patterns creates stable, comprehensible systems. Data Vault, Focal and Anchor modeling provide powerful patterns for the business semantic layer, handling change while preserving history. Data contracts, semantic clarity, self-healing pipelines - all battle-tested concepts.

What's new isn't the patterns. What's new is making these principles accessible through declarative YAML. Encoding the knowledge so teams don't rebuild it from scratch. Designing for the AI age where semantic definitions matter because LLMs need structure to be effective.

We're bringing back the craft of data modeling with modern tooling - the ability to iterate quickly while maintaining quality. This combination is new, the underlying principles are not.

Humility matters here. The data architecture community built incredible knowledge over decades. Practitioners like Ralph Kimball, Bill Inmon, Dan Linstedt, and countless others solved hard problems and shared insights. Our contribution isn't the patterns - it's creating a path for teams to actually use them without years of specialized training or massive upfront investment.

We're restoring lost knowledge, not inventing it. That restoration is only possible because the foundations were built by those who came before us.

Why Daana Exists

We built Daana because I kept seeing the same problems repeated across dozens of organizations. Teams suffering from brittle pipelines. Architectural knowledge concentrated in a few senior people. Junior engineers never exposed to modeling principles. The same transformation patterns rebuilt poorly everywhere.

Tools existed for writing SQL and orchestrating pipelines. But there was no tool for encoding architectural knowledge itself. No way to capture proven patterns and make them repeatable. No path for teams to adopt good architecture without hiring the rare senior architect who carried this knowledge.

I learned data modeling from Stockholm's greatest architects early in my career. I saw firsthand how powerful good architecture could be - systems that remained stable as businesses grew, pipelines that didn't break constantly, platforms that new team members could understand and extend. This worked.

But I also saw how inaccessible this knowledge had become. Most teams couldn't afford the upfront investment. Even teams wanting to do things properly couldn't justify months of design work before delivering value. The business case for quality couldn't compete with pressure to ship something immediately.

Daana is declarative data modeling for modern platforms. You declare your business entities in YAML - what they are, how they relate, what they mean semantically. The system generates transformation pipelines following proven architectural patterns. It works on BigQuery, Snowflake, and other cloud platforms.

The goal: let data professionals focus on understanding the business. Capturing business processes in information models. Creating value iteratively. Not becoming experts in ensemble modeling techniques, but still enforcing that you actually model with clear purpose and semantic meaning. Not choosing between speed and quality, but getting both.

It's ELv2 open source because this knowledge should belong to everyone. Not locked behind proprietary tools or expensive consulting. The data community built these patterns over decades - they should be accessible to everyone building data platforms.

That's why Daana exists: to encode proven architectural knowledge and make it accessible. To combine the craft of data modeling with modern engineering practices. To reclaim what was lost.

What This Means in Practice

What does this look like when building a data platform?

You define your Core Business Concepts - Account, Subscription, Order, whatever entities matter for your business - in YAML metadata. You describe what they mean, how they relate, what attributes they have. You map them to source systems. This is where you encode your business understanding.

Daana generates transformation SQL following proven patterns automatically: forgiving ingestion accepting all data without breaking, strict unpacking via data contracts, the three-layer architecture separating concerns cleanly. Both historized and latest views for every entity. Self-healing behavior when sources correct themselves - the latest view automatically reflects corrected data.

Comparison diagram showing traditional hand-written SQL pipelines (tangled dependencies, scattered logic, fragility) versus Daana's declarative approach (organized YAML definitions generating structured three-layer architecture)

Your semantic definitions serve triple duty: what you read to understand the business model, what LLMs read when answering questions about your data, and what drives transformation generation. These aren't three separate artifacts that can drift - they're the same source. Update the model, transformations regenerate automatically. The architecture becomes both visible and maintainable.

This isn't about replacing data engineers but elevating the work. Focus on understanding the business problem, modeling semantics correctly, creating the right entities and relationships, defining clear data contracts. These are the skills that matter - human judgment about what data means and how it should be organized.

Let the system handle generating consistent, high-quality transformation code following best practices. The ceremony of writing SQL that unpacks JSON according to contracts. Creating both historized and latest views. Implementing three-layer architecture.

There are trade-offs. Learning declarative modeling requires shifting from imperative SQL thinking. You gain consistency but sacrifice some flexibility in custom transformations. The generated SQL may not match exactly what you'd hand-write, though it follows proven patterns you'd want anyway.

We've seen one person outperform traditional 15-person teams. Not through superhuman effort, but through working at the right level of abstraction. Declaring business semantics. Letting proven patterns handle implementation. Getting quality and speed together instead of choosing between them.

The Invitation

This isn't finished. It's a beginning.

We're launching closed alpha on November 24, 2025. We need data engineers and analytics engineers who've felt this pain to help shape what Daana becomes. Your feedback will determine whether we're solving the real problem or building something nobody needs.

If you recognize this problem - if you've watched teams struggle with brittle pipelines and lost architectural knowledge - we need your perspective. Not to sell you something, but to learn from your experience. What patterns matter most in your context? What barriers prevent adoption? What did we get wrong?

This is ELv2 open source because knowledge should belong to the community. The data architecture community built these patterns over decades - they shouldn't be locked behind proprietary tools. But open source doesn't mean we know all the answers. It means we're building this together.

This is alpha because we're still learning. The core patterns work - We've used them across organizations with proven results. But translating those patterns into a tool that works for diverse teams and contexts requires feedback from people building real systems under real constraints.

This is an invitation because we can't reclaim lost knowledge alone. It takes a community. Data engineers who understand the frustration of unmaintainable pipelines. Analytics engineers who've felt the pain of drifting documentation. Architects who've tried teaching these patterns and hit barriers. People who believe data platforms should be better than what we've settled for.

Comment 'ALPHA' on the LinkedIn post or reach out directly. Help us bring back the craft of data modeling, accessible to everyone who builds data platforms.

The data architecture community built incredible knowledge over decades. Let's make sure the next generation can access it.