- Published on
The Rise of the Model-Driven Data Engineer
- Authors

- Name
- Siavoush Mohammadi
From Writing SQL to Designing Models: How Data Engineering Is Changing
I used to write SQL. A lot of SQL. Transformations for every source, business logic scattered across hundreds of models, hand-written joins that broke whenever upstream schemas changed. Now I write YAML. I declare what entities should exist, and the system generates transformations. Same results, one-tenth the code, zero maintenance headaches.
Data engineering is transforming - from writing code to declaring intent. Not because new tools make it optional, but because we've learned the hard way that hand-written transformations don't scale. Every data engineer has watched pipelines break at 3 AM because some upstream column changed. We've all debugged SQL only to realize the real problem was scattered business logic that drifted across twelve different models.
A Different Way to Build Data Systems
In 2020, building a Kafka ingestion pipeline meant writing Python connection code, SQL transformations, Airflow orchestration, more SQL for entities, and metric definitions for BI tools. I'd spend three days writing code, two days debugging, and a week later realize my documentation already didn't match what I built. Every. Single. Time.
In 2025, I declare what I want in YAML: incoming data structure, business entities, relationships, and metrics. Systems generate pipelines, transformations, and data structures. Documentation and implementation become the same artifact. What took weeks now takes days.
We've seen similar patterns before. Infrastructure as Code transformed operations from manual configuration to declarative Terraform. Kubernetes shifted deployment from imperative scripts to declarative manifests. OpenAPI moved API development from writing and documenting endpoints separately to declaring them once.
Data engineering is undergoing the same evolution, but with broader scope. Where declarative thinking typically applies to specific layers, modern data platforms can be declarative everywhere - from ingestion contracts through business entity definitions to metrics for analysis.
The results are dramatic: single engineers outperforming teams of fifteen, pipeline costs dropping to 20% of hand-written alternatives, systems comprehensible to both humans and AI. Work shifts from implementing pipelines to designing semantic models, from writing transformations to architecting layered systems.
Beyond productivity, fully declarative platforms create semantic layers that large language models can understand. Documentation never drifts because it is the implementation. Quality improves because generated code follows proven patterns rather than varying with each engineer's interpretation.
We're watching data engineers become model-driven practitioners - people who think in entities, relationships, and contracts rather than code, functions, and scripts. To understand what drives this change and why it matters, we need to unpack what "declarative" actually means in data engineering.
- A Different Way to Build Data Systems
- What "Declarative" Actually Means
- Model-First Architecture: The Three Layers
- The Work Changes
- Where Declarative Appears Today
- Benefits, Tradeoffs, and When to Stay Imperative
- What Changes for Platforms and Teams
- Data Engineering Is Growing Up
What "Declarative" Actually Means
The word "declarative" gets thrown around, but here's what it actually means: describing what you want instead of specifying how to build it. Simple concept, massive implications.
Consider ingestion. Imperative means writing Python that connects to Kafka, reads messages, parses JSON, handles errors, transforms fields, and loads results. You specify every step:
# Imperative approach: Hand-written ingestion code
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer(
'user_events',
bootstrap_servers=['localhost:9092'],
value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)
for message in consumer:
try:
data = message.value
user_id = data['payload']['userId']
event_type = data['eventType']
timestamp = parse_timestamp(data['timestamp'])
# ... 50+ more lines of parsing, validation, error handling
except Exception as e:
handle_error(e, message)
Declarative means writing a data contract in YAML defining the structure you expect and schema you want. Systems generate all connection, parsing, transformation, and error handling code:
# Declarative approach: Data contract
version: "1.0"
valid_from: "2025-01-01"
endpoints:
source:
provider: kafka
entity: user_events
schema:
primary_keys:
- user_id
- event_id
columns:
- source_path: payload.userId
target_name: user_id
type: INTEGER
mode: REQUIRED
description: "User account identifier"
- source_path: eventType
target_name: event_type
type: STRING
mode: REQUIRED
description: "Event type (USER_CREATED, USER_UPDATED, etc.)"
- source_path: timestamp
target_name: event_ts
type: TIMESTAMP_MILLIS
mode: REQUIRED
description: "Event timestamp in milliseconds"
The pattern repeats at every layer. For business logic, imperative is SQL that joins source tables, applies rules, handles slowly changing dimensions, and produces entity tables. Declarative is defining entities with attributes and relationships in metadata. Systems generate transformations.
For metrics, imperative means writing SQL calculations and maintaining definitions wherever your BI tool expects them. Declarative means defining metrics once with calculation logic, grain, and dimensions. Systems expose definitions to any consumption layer.
Not everything that looks declarative actually is. Worth understanding the difference because true declarative architecture's benefits only materialize when you're declaring intent, not templating code.
Take dbt, which brought software engineering practices to analytics. When you write a dbt model, you write SQL - a SELECT statement referencing other models with {{ ref() }} syntax. Better than scattered scripts, yes. Dependency graph explicit, version controlled, maintainable. But you're still writing transformations step by step: how to join tables, calculate fields, filter records. Improved imperative code, not declarative architecture.
Contrast with declaring a Subscription entity with attributes like start_date, status, acquisition_type, and relationships to Account and Base plan. You're not writing joins, not specifying transformations, not implementing slowly changing dimension logic. You declare what should exist; systems determine how to build it.
SQL with nice references versus entity declarations that generate SQL - these are architecturally different. One improves how you write imperative code. The other makes code an artifact generated from semantic definitions.
Why does separation matter? Separating "what you want" from "how to get it" enables consistency, portability, and comprehensibility that better-organized imperative code cannot achieve. When you declare a metric as "monthly recurring revenue, calculated as sum of active subscription values, dimensioned by plan type and region," that declaration generates SQL for your warehouse, API responses for applications, and documentation for business users. One declaration serves all three needs. Hand-written SQL for each consumption pattern means maintaining three implementations that will drift.
Declarative approaches make semantic definitions the source of truth and let code be generated from them. That principle can apply at every layer of a data platform, which brings us to what fully model-driven architecture looks like.
Model-First Architecture: The Three Layers
A fully declarative data platform separates concerns into three semantic layers: Data As System sees it (DAS), Data As Business sees it (DAB), and Data As Requirements needs it (DAR). Each serves a different purpose and can be made declarative.

DAS: Where Raw Data Lands
DAS represents source systems without interpretation through two stages: landing zone and staging. Landing receives raw data exactly as produced - often JSON strings with no schema enforcement. Staging unpacks to tabular format while remaining faithful to source structure.
In declarative DAS, data contracts define unpacking. A 50-line YAML contract specifies source structure, target schema, and transformation rules. Systems generate 100+ lines of SQL to extract nested JSON fields, cast types, handle arrays, and create historized and latest views. When sources change, you update the contract and regenerate SQL. Contracts are truth; SQL is artifact.
Pipelines become self-healing. Ingestion is forgiving (accept all data as JSON); unpacking is strict (contract-driven). Data is never lost when schemas change. If sources send incorrect data then corrections, "latest" views automatically reflect the corrected state.
DAB: Business Semantic Model
DAB is where business entities live. Here's what this looks like in practice. Instead of writing SQL that joins customer, subscription, and payment tables - handling active status logic, managing slowly changing dimensions, producing a final table with business knowledge embedded in transformation code - you declare what a Subscription entity should be using Daana's Model Description Language (DMDL):
# Declarative entity definition in DMDL
entities:
- id: "SUBSCRIPTION"
name: "SUBSCRIPTION"
definition: "A customer subscription"
description: "Represents an active or historical subscription to a service plan"
attributes:
- id: "SUBSCRIPTION_ID"
name: "SUBSCRIPTION_ID"
definition: "Unique subscription identifier"
type: "STRING"
effective_timestamp: false
- id: "ACQUISITION_TYPE"
name: "ACQUISITION_TYPE"
definition: "How the subscription was acquired"
description: "Origin channel: ORGANIC, CAMPAIGN, REFERRAL, TRIAL_CONVERSION"
type: "STRING"
effective_timestamp: true
- id: "STATUS"
name: "STATUS"
definition: "Current subscription status"
description: "Status derived from dates: ACTIVE, CANCELLED, EXPIRED"
type: "STRING"
effective_timestamp: true
- id: "SUBSCRIPTION_START_DATE"
name: "SUBSCRIPTION_START_DATE"
definition: "When subscription activated"
type: "START_TIMESTAMP"
- id: "SUBSCRIPTION_END_DATE"
name: "SUBSCRIPTION_END_DATE"
definition: "When subscription expires or was cancelled"
type: "END_TIMESTAMP"
- id: "MONTHLY_VALUE"
name: "MONTHLY_VALUE"
definition: "Monthly subscription value with currency"
effective_timestamp: true
group:
- id: "MONTHLY_VALUE"
name: "MONTHLY_VALUE"
definition: "The monetary amount"
type: "NUMBER"
- id: "MONTHLY_VALUE_CURRENCY"
name: "MONTHLY_VALUE_CURRENCY"
definition: "Currency code (USD, EUR, SEK)"
type: "UNIT"
relationships:
- id: "BELONGS_TO_ACCOUNT"
name: "BELONGS_TO_ACCOUNT"
definition: "Subscription belongs to an account"
source_entity_id: "SUBSCRIPTION"
target_entity_id: "ACCOUNT"
- id: "HAS_BASE_PLAN"
name: "HAS_BASE_PLAN"
definition: "Subscription is based on a plan"
source_entity_id: "SUBSCRIPTION"
target_entity_id: "BASE_PLAN"
From these declarations and a corresponding one for data mappings, systems generate transformation logic. They join source tables based on defined relationships, implement slowly changing dimension logic, and create both SUBSCRIPTION_HIST (complete history) and SUBSCRIPTION_LATEST (current state) views. Generated transformations follow proven patterns consistently.
A philosophical shift happens here: you design the system you want rather than implement how to build it. Semantic definitions become source of truth. Documentation and implementation can't drift because they're the same artifact.
DAR: Consumption Patterns
DAR optimizes for actual usage - metrics, dimensions, aggregations, and serving patterns. Also declarative.
Instead of writing SQL to calculate monthly recurring revenue in your BI tool, similar logic in your API, then documenting separately, you declare the metric once: "MRR equals sum of monthly_value for active subscriptions, dimensioned by plan_type and region, grain of monthly." Generates SQL for warehouse queries, drives API responses, and serves as documentation.
Why Separation Matters
Three-layer architecture provides stability. When a source system renames customerID to account_id, only DAS changes - you update the contract mapping. DAB's Account entity remains unchanged because it's defined in business concepts, not source fields. DAR consumers are unaffected.
Layer separation makes systems comprehensible to humans and AI. Anyone - or anything - examining your platform understands: these are source systems (DAS), these are business concepts they feed (DAB), these are consumption patterns (DAR). Each layer is documented through declarative definitions. Large language models fed these definitions gain structured understanding of your data semantics, becoming dramatically more effective at data work.
Model-first architecture captures what should be consistent (entity definitions, transformation patterns, quality rules) while preserving flexibility where it matters (business logic, metrics, consumption patterns). Working in these systems changes the job.
The Work Changes
Working in a declarative architecture changes the job. I've watched data engineers spend less time writing SQL transformations, Python pipelines, and Airflow DAGs - more time on what looks like architecture and product design. It's not easier work. It's different work, requiring different skills.
New Skills Required
In model-driven environments, data engineers design entity models that accurately represent business concepts. Understanding not just technical data structure, but business semantics. What is a "subscription"? When does it begin - at signup, payment, or access grant? What makes it "active"? Business domain questions with technical implications.
Defining semantic relationships becomes central work. A subscription belongs to an account, contains events, references a plan, tracks payment through transactions. Relationships need names, cardinalities, and clear definitions. Getting them right means generated transformations correctly join data. Getting them wrong means debugging entity mismatches - harder than debugging your own SQL because you need to understand how the generator works.
Creating data contracts becomes interface design. You define contracts between your platform and source systems - expected structure, required types, valid values. API design for data flows, requiring thought about versioning, backward compatibility, and error handling at the contract level.
Work requires architectural thinking about layered systems with clear boundaries. Changes in DAS shouldn't cascade to DAB. DAB entities should be stable despite evolving sources. DAR optimizations shouldn't leak business logic. Separation requires discipline and design.
Not Easier, Just Different
Model-driven approaches raise the bar. You can't hack together SQL that works. You must consider the semantic model you're building, whether entity definitions are correct, whether relationships are properly understood. Abstraction layers help you scale but require understanding how they work.
Debugging differs. Instead of stepping through SQL, you examine whether entity definitions are correct, whether contracts match source data, whether generation logic handles edge cases. You need both business semantics AND technical implementation patterns.
Model-driven data engineers develop dual fluency: thinking in business entities and relationships while understanding how declarations translate to implementations. They explain to product managers why Subscription needs particular attributes, and to platform teams why generated transformations need specific join patterns.
What Stays the Same
Deep technical knowledge remains crucial - perhaps more than before. Understanding data types, when normalization helps versus hurts, performance implications of join patterns, incremental processing logic. Application differs: instead of directly writing transformations, you design models that generate good transformations.
You still write code, but less and differently. Custom business logic for non-standard patterns. Edge case handling. Integration scripts. But repetitive, pattern-following transformation code - the kind accumulating bugs when hand-written by multiple people - gets generated from models.
Work becomes more about architecture and semantics, less about implementation mechanics. You design the data platform as a cohesive system rather than assembling pipelines. Stepping back from code to consider the semantic layer - what concepts matter, how they relate, what definitions enable clear organizational communication.
Where Declarative Appears Today
Declarative approaches are emerging across data engineering, though with varying commitment to the paradigm. Understanding where and how declarative thinking appears clarifies both potential and current limitations.
Ingestion Layer Declarations
Data contracts have gained traction for defining ingestion interfaces. Organizations use YAML, JSON schemas, or similar formats to specify expected data from source systems. Some treat these as documentation driving validation. Others use contracts as source of truth generating transformation code.
Distinction matters. If your contract is documentation manually kept in sync with hand-written SQL, you've improved documentation but haven't achieved declarative architecture. If your contract directly generates unpacking logic, you've unified documentation and implementation - changing the contract changes code, forcing accuracy.
What's new is using these not just for documentation but as source of truth for downstream transformations, making the entire ingestion-to-staging flow declarative.
Business Layer Approaches
Landscape varies most widely here. Some approaches provide better templating for SQL - macros, packages, reusable patterns. These improve consistency but remain imperative: you're still writing transformations with better tools.
Other approaches move toward true entity modeling. dbt's semantic layer allows defining metrics and entities with relationships, but the transformation layer below still requires hand-written SQL models. You declare entities but implement their construction.
Fully declarative business layers - where you define entities and relationships, and transformations are generated - remain less common in mainstream practice. They exist in specialized tools and custom platforms, particularly where organizations have committed to model-driven approaches. They demonstrate the potential: entity definitions in metadata, transformation logic generated, historized and latest views automatically maintained.
Metrics Layer Declarations
Semantic layers for metrics have seen considerable development. Tools like dbt's semantic layer, Cube, Malloy, and headless BI approaches allow declaring metrics once with calculation logic, grain, and dimensions. Declarations drive multiple consumption patterns - BI tools, APIs, embedded analytics.
Truly declarative architecture for consumption. Metric definition is source of truth; implementations for various tools are generated or derived. Solves metric calculations scattered across BI tools, application code, and documentation, each drifting from the others.
Parallels Throughout Software Engineering
Patterns appear throughout modern software engineering. Terraform declares desired infrastructure state; tools determine API calls to make. Kubernetes manifests declare what should run; control planes achieve that state. OpenAPI specifications declare API structures; frameworks generate server stubs, client libraries, and documentation.
Each illustrates the same principle: declare intent, let systems generate implementation. Data engineering follows this established pattern, applying it to pipelines, transformations, and business logic.
The Spectrum
"Declarative" isn't binary. A spectrum runs from "imperative with good tooling" to "fully declarative with code generation." Many organizations operate in the middle: declarative contracts for ingestion, hand-written SQL with good practices for transformations, declarative metrics for consumption.
Pragmatic hybrid approaches make sense for many contexts. Declarative architecture's benefits must outweigh investment in tooling and new practices. For small teams with simple pipelines, better-organized imperative code may suffice. For teams managing dozens of sources, hundreds of entities, and constantly changing business logic, declarative approaches' consistency and scalability becomes compelling.
The critical distinction remains: are you writing SQL with better references and macros, or declaring entities and relationships that generate SQL? Improving your imperative approach versus adopting a different architectural paradigm with different benefits and trade-offs.
Benefits, Tradeoffs, and When to Stay Imperative
Declarative approaches deliver meaningful benefits, but they're not universally superior. Understanding when each makes sense requires examining what you gain and what you give up.
What You Gain and Give Up
| Aspect | Declarative Approach | Imperative Approach |
|---|---|---|
| Consistency | Generated code follows same patterns for every entity. Same historization, incremental processing, error handling across all transformations | Code quality varies by engineer and context. Each implementation subtly different, accumulating technical debt |
| Quality | Proven patterns encoded once, applied everywhere. Hundredth entity gets same error handling as the first | Resilient pipelines require specialized knowledge. Later implementations often cut corners under time pressure |
| Documentation | Cannot drift - model definitions ARE the implementation. Update model, transformations regenerate automatically | Requires manual synchronization. Every engineer has wasted hours reconciling docs with reality |
| Testing | Validate models: "Does this entity definition represent the business concept correctly?" Meaningful semantic questions | Test code syntax and logic. Less insight into whether you're building the right thing |
| AI Effectiveness | LLMs read structured entity definitions with clear relationships and semantic meaning. Dramatically more effective | AI struggles with hand-written SQL scattered across repositories. Hard to extract intent from implementation |
| Productivity | Single engineers matching 15-person teams. Leverage from semantic design vs. implementation mechanics | Linear scaling with complexity. More sources/entities = proportionally more people needed |
| Upfront Investment | Weeks to months: framework setup, code generation tooling, entity modeling patterns, team training | Days: Start writing SQL immediately with familiar tools and practices |
| Learning Curve | Must learn to think in entity models and semantic relationships. Different problem-solving paradigm | Familiar SQL/Python patterns. Incremental learning within known approaches |
| Edge Cases | Common patterns elegant; edge cases awkward. Need escape hatches, but too many undermine consistency | Easy to handle any custom logic. Full flexibility to write exactly what's needed |
| Flexibility | Work within framework constraints. When constraints align with best practices, they help. When they don't, frustration | Write arbitrary SQL/code. Full control over every detail of implementation |
| Tooling Maturity | Emerging tools, especially for full entity modeling. May need custom solutions. Early adopters pay pioneering tax | Mature ecosystem: dbt, Airflow, Fivetran, etc. Well-documented, battle-tested tools |
| Debugging | Understand both model definition AND generation logic. Indirection adds complexity, requires good error messages | Debug your own code directly. Straightforward relationship between what you wrote and what runs |
When Imperative Still Wins
Declarative architecture makes sense when patterns exist and repetition is high. But clear cases favor imperative approaches:
One-off analyses: Exploratory queries answering specific business questions don't need entity definitions and generated transformations. Write SQL, get the answer, move on.
Genuinely novel algorithms: Implementing new recommendation algorithms or complex statistical models requires flexibility to write custom logic. Forcing this into a declarative framework adds overhead without benefit.
Performance-critical optimization: Sometimes you need hand-tuned SQL exploiting specific warehouse features or data characteristics. Generated code follows general patterns; custom code optimizes for specific cases.
Early-stage exploration: Before patterns emerge, forcing declarative structure is premature. Build a few pipelines imperatively, discover patterns, then consider declarative approaches for scaling.
Very small teams with simple needs: With three sources, ten entities, and one data engineer, declarative architecture overhead may exceed benefits. Better-organized imperative code could serve you well.
Declarative and imperative approaches can coexist. Use declarative architecture for repetitive, pattern-following work comprising most of a data platform. Preserve imperative flexibility for the edges, the novel, the optimized. Mature platforms provide both with clear guidance on when to use each.
Context matters immensely. Benefits that make declarative architecture compelling for growing companies with dozens of data sources may not outweigh costs for startups with three tables. Understanding your context - team size, complexity, rate of change, engineering maturity - guides decisions better than universal recommendations.
What Changes for Platforms and Teams
Moving to declarative architecture has cascading implications for how data platforms are built and teams operate. Changes affect tools, organizational structure, skill development, and the relationship between central platform teams and domain teams.
Platform Requirements Shift
Platforms enabling declarative workflows must support model definition, code generation, and clear layer separation. Building or integrating:
Model registries storing entity definitions, relationships, and semantic descriptions as queryable metadata. Not just documentation sites, but active systems that drive generation and enable programmatic access for AI assistants.
Generation frameworks that reliably translate models into correct transformations. Handling common patterns (slowly changing dimensions, incremental processing, relationship joins) consistently while providing escape hatches for custom logic.
Clear layer boundaries implemented through architecture, not documentation. DAS transformations shouldn't apply business logic. DAB entities shouldn't contain consumption optimizations. Platforms enforce separation of concerns.
Validation at model level that catches errors before code generation. Does this entity definition reference non-existent source fields? Are relationships correctly specified? Is the contract consistent with landing zone data? Catching issues at model level prevents cascading failures.
Self-Service Through Models
Declarative approaches enable different self-service than traditional platforms. Domain teams don't get access to write arbitrary SQL against production tables. They declare domain models that platforms implement according to standards.
Addresses a central tension in platform design. Domain teams have business context but often lack data engineering expertise. Central platform teams have expertise but lack domain context. Declarative architecture lets domain teams contribute what they know (business entities, relationships, semantics) while platforms apply what they know (correct transformation patterns, quality controls, performance optimization).
A product team can declare a Session entity with specific attributes and relationships to User and Event entities. Platforms generate transformations, apply standard historization, ensure incremental processing works correctly, and integrate the entity into the broader data model. Product team contributed domain knowledge; platform contributed engineering expertise.
Federated Ownership With Centralized Patterns
Creates a natural model for federated data ownership. Domain teams own their models - defining entities relevant to their domain, maintaining contracts with source systems, and evolving definitions as domains change. Central platform teams own patterns, standards, and generation frameworks ensuring consistency.
Quality control shifts from code review to model review. Instead of checking SQL for correctness, data engineers review entity definitions for semantic accuracy. Does this entity definition make sense? Are relationships properly specified? Is the contract consistent with what source systems provide? Technical implementation is handled by generation.
Practice Changes
Skills that matter shift. Domain modeling becomes more valuable than SQL optimization. Understanding business semantics and representing them in entity definitions matters more than knowing warehouse-specific SQL dialects. Best data practitioners bridge business concepts and technical implementation through good semantic modeling.
Architectural thinking becomes central. Understanding layer separation, relationship patterns, and when to use declarative versus imperative approaches matters more than remembering SQL syntax. Work becomes more about designing coherent systems than implementing individual pipelines.
Communication skills gain importance. Data engineers work more closely with domain teams to understand business concepts and translate them into entity models. Explaining why particular entity definitions make sense, or why certain relationships need explicit modeling, requires clear communication about semantics.
AI Integration Becomes Natural
When platforms are fully declarative, integrating AI assistance becomes straightforward. Large language models can read entity definitions, understand relationships, answer questions about business logic, and suggest model improvements. Semantic clarity that makes systems understandable to humans makes them understandable to AI.
LLM effectiveness increases dramatically when working with declarative definitions versus imperative code. An AI assistant can explain what the Subscription entity is and how it relates to Account when relationships are explicitly declared. It struggles extracting the same understanding from SQL joins scattered across transformation files.
Data Mesh Through Declarations
Declarative approaches provide a practical implementation path for data mesh principles. Domain teams own their domain models (source-aligned data products). Models are declared with clear contracts, making them discoverable and understandable to other domains (consumer-aligned). Central platforms provide infrastructure and standards (federated governance). Models serve as product interfaces.
Addresses one of data mesh's persistent challenges: enabling domain ownership without fragmenting into chaos. Declarative definitions provide both autonomy (teams declare models) and consistency (generation follows standards) needed to make federated ownership work.
The Productivity Evidence
Organizations implementing these approaches report results that seem implausible until you understand the leverage. Single engineers matching fifteen-person traditional teams. Pipeline costs dropping to 20% of hand-written alternatives in many cases since we can optimize the underlying pattern instead of each pipeline. Metrics reflect compounding benefits: consistency reducing debugging time, generation eliminating repetitive coding, clear semantics accelerating understanding.
A data engineer spending 80% of time writing and debugging SQL versus 80% designing entity models and reviewing domain semantics operates on different productivity curves. The first scales linearly with complexity. The second benefits from patterns, generation, and semantic definition reusability.
Technical Debt Shifts
Technical debt's nature changes. Instead of unmaintainable SQL scattered across repositories, you risk rigid abstractions that don't accommodate legitimate edge cases. Instead of undocumented transformations, you risk overly generic models lacking domain specificity. Problems shift to different areas requiring different mitigation strategies.
Mature platforms need hybrid approaches during transitions. Support both declarative models for common patterns and imperative code for edge cases. Provide clear guidance on when to use each. Design escape hatches carefully so they don't undermine consistency benefits.
Data Engineering Is Growing Up
Data engineering is growing up. We're moving from "write SQL to solve each problem" to "design semantic models that generate solutions." Software engineering went through similar transitions: assembly to high-level languages, manual configuration to infrastructure as code, monolithic apps to declarative orchestration. Every transition traded direct control for higher-level abstractions that let systems scale.
Data engineering is making that transition now. Imperative approaches won't become obsolete - assembly language still exists for specific use cases - but the default approach for most work shifts to higher abstraction.
We're in a transition period where hybrid approaches dominate, and that's appropriate. Organizations can't flip a switch from fully imperative to fully declarative. The path involves identifying where patterns exist and repetition is high, applying declarative approaches there first, and preserving imperative flexibility for genuine edge cases.
Practical Guidance
For teams considering this shift: start with layers where patterns are clearest. Ingestion contracts often provide high return - transformation from landing zone to staging follows consistent patterns across sources. Core business entities in the DAB layer offer similar benefits once you've established your domain model. Metrics definitions benefit from declarative approaches when serving multiple consumption patterns.
Maintain escape hatches for custom code. Not everything fits into generated patterns, and forcing it creates more problems than it solves. Effectiveness matters, not purity.
Invest in learning model-driven thinking, not just specific tools. Understanding how to design good entity models, specify clear semantic relationships, and when to use declarative versus imperative approaches matters more than mastering any particular framework. Tools evolve; principles endure.
Be thoughtful about what to abstract. Declarative approaches create leverage by encoding patterns, but premature abstraction before patterns emerge adds overhead without benefit. Build a few implementations, discover commonalities, then abstract.
What's Next for Data Engineers
Data engineers of the next decade will be fluent in both paradigms - comfortable designing declarative models and writing imperative code, knowing when each serves better. They'll think in semantic layers and entity relationships rather than just tables and joins. They'll spend more time on architecture and domain modeling, less on transformation mechanics.
Not about making data engineering "easier" in requiring less skill. It's about applying skills at a different abstraction level. Work becomes more about understanding business semantics, designing coherent architectures, and building systems that remain comprehensible as they scale. Different challenges than writing correct SQL, but not simpler ones.
Organizations that thrive will recognize declarative approaches not as dogma but as tools - powerful where patterns exist, unnecessary where they don't. They'll build platforms enabling model-driven development while preserving room for imperative solutions. They'll develop data engineers who understand both the semantic models they design and the technical implementations those models generate.
Model-driven data engineers aren't replacing code - they're building richer stacks of semantic definitions, architectural patterns, and generated implementations. That's how engineering disciplines mature.