Meet Lars Fredholm: Principal Data Engineer at Daana

Isometric line-art illustration of a data platform — sources flowing into ingestion layers, data contracts and an information model in the center, semantic outputs on the right — rendered as a wireframe in Daana blue on near-black, signaling the metadata-driven modeling thesis at the heart of Lars's interview

Lars's path through the data ecosystem is deeper than most. Product development. Data engineering. Head of Data at Advisa, where Siamak first hired him - which is how the rest of us got to know him years before Daana existed. After Advisa exited, he went into angel investing and startups - including Validio, where he built the kind of data quality monitoring most teams don't even know is possible. Then independent consulting. Then Daana - first as an investor, then he decided he'd rather build the thing than fund it.

The thread: he kept watching the same data platform problems get rebuilt at every company he touched, decided he was tired of it, and joined the team trying to encode the solution into tooling instead of solving it once per engagement.

What's actually rare about Lars is the combination. Most data engineers are deep on one axis. Lars reads business, product, and data engineering at the same depth. He writes Kullback-Leibler divergence in ANSI SQL for anomaly detection - because apparently "not null" and "is unique" weren't enough - and also designs sales motions in his evenings. He invested in Daana, helped raise client capital, and now writes Go for the CLI on the side. The guy who funds the company also builds the product after hours.

Right now he's embedded full-time at one of our clients building their data stack. Not advising from a distance. Building.

You invested in Daana before you joined. You've invested in several tech companies - what was different about this one that made you want to be on the inside?

Three things stood out with Daana.

Product idea: Daana is of course special to me since the product solves exactly the problem I've experienced over and over again for the last 12 years or so. Metadata-driven data modeling, starting from the actual real-world (aka business) logic, has so many advantages.

Team: The product idea is great, and the team is super competent - people with long experience from different angles of the data space, and all very invested in modern agentic workflows, which makes the product advance at incredible speed.

Timing: The timing now is perfect. AI will automate everything, but as with any automation, to get unambiguous output you want to have unambiguous input. A dozen partially overlapping hand-crafted dbt models is not unambiguous input and any AI will struggle to make reliable output of such data. Daana is the opposite, and with its automated metadata-driven unambiguous data modeling an AI knows exactly what the data is and how to work with it. Both upstream layers (ingest) and downstream (metrics layer, BI) essentially become a function of what Daana has defined.

What patterns keep repeating across companies that shouldn't?

We could talk for hours about this. Some troublesome patterns that come to mind:

Data modeling is forgotten. The main purpose for data modeling (Kimball, Data Vault etc.) is to transform raw data into a truthful and sustainable shape that satisfies consumption. Sustainable is the challenging word here, because it takes about a year before you see the real pains of poor or lacking data modeling. 30 years ago this was much less of a problem because you had to do proper data modeling to solve more immediate challenges - performance, storage, BI compatibility. But most of those immediate challenges are solved today by technology, mainly by close to unlimited (and decoupled) storage and compute. So you're no longer punished short-term for poor modeling.

Add to that patterns and tools (looking at you, dbt and Medallion) that loosen the once so important discipline that rules data modeling. They basically say "it's not that important how you write and layer your models, just make more of them, and everyone can do it." So data engineers are essentially encouraged to not care too much about modeling, and they're also not short-term punished for poor modeling. But one year down, you get a complete mess. No one knows what or where the truth lies, half of your models are redundant or orphaned, your code base has as many conventions as there are data engineers in the team. I haven't seen any company last 10 years where this is not a big and expensive problem (but most don't even realize the problem because they haven't experienced the payoff of "good" data modeling).

Hot take: what's the most overrated practice in data engineering right now?

Data Engineering has adopted a huge number of practices from Software Engineering (most things DevOps, for example). And this is mainly great. But I often see a naive assumption that you should go all in on this - that Data Engineering equals Software Engineering. This is overrated.

A common example is trying to build a data stack with the same modularity (microservices etc.) as a software application. Doing so makes it very hard to solve problems around single source of truth, how to process massive data sets, and not to mention teaching DEs completely new ways of working. I think Data Engineering should continue approaching Software Engineering, but it's a delicate maneuver that takes time and will be very costly if rushed.

What's the worst data architecture you've ever inherited?

One client had decided to modernize their stack by moving from SQL Server to cloud. So they moved all raw data to Parquet on S3-equivalent storage. The problem was that the DWH still had to be on SQL Server on-prem for security reasons. And SQL Server (that version) didn't support pruning when reading Parquet from S3, so all data had to be read on ingest, essentially eliminating all benefits of Parquet and resulting in a more complex architecture with worse performance than before.

What does your typical day at Daana look like right now?

I'm currently helping one of Daana's clients full-time in building the data stack and implementing agentic workflows. In the evenings I usually build or explore some new feature in Daana CLI, work on sales and business development, or polish Daana's docs.

You wrote a Kullback-Leibler divergence implementation in pure ANSI SQL. What's the story there and why should data engineers care about statistical anomaly detection?

I worked at a company in the Data Observability space, where we developed lots of sophisticated data quality monitoring. Distribution analysis can uncover a wide range of interesting data quality insights in data, and having an ANSI SQL starting point is very beneficial for compatibility. Engineers are becoming better and better at implementing tests, but often limited to "not null" and "is unique." Depending on use case, more sophisticated tests will pay off.

Lars is currently embedded at a client site building their data platform with Daana - which is exactly what a typical engagement looks like. When he's not doing that, he's extending the CLI, working on business development, and making sure the docs are sharp. The guy who invested in the company now builds the product in the evenings because he can't stop.

Want to see what we're building? Check out daana.dev.