Published on

Stop Being the Naming Convention Police

Authors
  • avatar
    Name
    Siavoush Mohammadi
    Twitter

You go to find a metric in your semantic layer that someone else built. You know it exists. You need the count of paying subscribers, so you search for subscription_paid_count. Nothing. You try paid_subscriptions_count. Still nothing. After ten minutes of increasingly creative searches, you finally discover it's called nbr_of_paid_subs.

This is the same naming convention you spent two hours workshopping with the team last week. Everyone agreed. Everyone nodded along. And then someone went and did exactly what you all agreed not to do.

The urge to fire off an angry Slack message is strong. But you've been here before. You've been the naming convention police. You've sent those messages, had those conversations, felt the friction it creates. It's exhausting, and honestly, it doesn't even work that well. People agree in the moment, then forget when they're actually writing code.

Here's the thing: the problem isn't that people are careless or don't care about conventions. The problem is that enforcement is completely disconnected from the conventions themselves. Your style guide lives in Confluence. Your code lives in your IDE. And the two never talk to each other.

What if there was a way to enforce naming conventions automatically, in your CI/CD pipeline, using the same documentation you already maintain? No linter configs to keep in sync. No manual policing. Just your Confluence page becoming the actual source of truth that gets enforced on every pull request.

Why Conventions Fail (And Why Linters Don't Fix This)

The gap between agreeing on conventions and following them is not a discipline problem. It's a systems problem.

Your naming conventions live in a Confluence page or a Notion doc somewhere. When a developer is writing code, that document is out of sight. They might remember the general idea, but the specifics? The rule about not using "total" as a prefix? The requirement for plural entity names? Those details slip away when you're focused on solving an actual problem.

Traditional linters can help, but they come with their own challenges. To use a linter, you need to translate your documentation into configuration files. This means maintaining two versions of your conventions: the human-readable one that people actually consult, and the machine-readable one that enforces rules. These two inevitably drift apart. Someone updates the Confluence page but forgets the linter config. Or the linter config gets tweaked to handle an edge case, but the documentation never reflects the change.

Worse, linters work through pattern matching. They're looking for string patterns that violate rules. But naming conventions often require context. Is total_orders wrong because it uses a forbidden word, or is it acceptable because it's a column in a source table you don't control? A regex can't make that distinction.

The result is a frustrating cycle: you put effort into documentation, you put effort into linter configuration, and you still end up being the convention police because neither system actually solves the problem.

The Insight: What If AI Could Read Your Docs?

The idea struck me while working with an AI coding agent on a different project. I had manually referenced a naming convention we'd written and stored in Confluence, asking the agent to apply it to some code we were generating together by accessing the Atlassian MCP. It worked beautifully. The agent read the convention, understood the rules, and applied them correctly.

What if this happened automatically, in CI/CD, on every pull request?

This is where MCP, the Model Context Protocol, becomes interesting. MCP lets AI agents connect to external tools and data sources. There's an Atlassian MCP server that can fetch Confluence pages. A Notion server for Notion docs. GitHub, Slack, you name it. The agent can reach out and grab information from these systems just like a human would.

So imagine this: a developer opens a pull request. The CI/CD pipeline triggers and spawns a specialized AI agent. That agent uses MCP to fetch your latest naming convention document from Confluence. It reads the rules, not through pattern matching, but through actual comprehension of the text. Then it reviews the changed files against those rules and posts its findings.

The documentation you already maintain becomes the single source of truth that's actually enforced. Update the Confluence page, and the next pull request automatically follows the new rules. No linter config changes needed. No sync problems. No drift.

Your documentation becomes executable.

How This Actually Works

The architecture is simpler than you might expect. Here's the flow:

  1. A developer commits code and opens a pull request
  2. CI/CD triggers a specialized AI agent (via something like Claude Code CLI)
  3. The agent uses MCP to fetch the naming convention from your documentation platform
  4. It analyzes the changed files against the fetched rules
  5. The agent posts a review with any violations found, suggested fixes, and explanations
  6. Optionally, it can auto-fix simple violations and create a fix commit

What makes this fundamentally different from traditional linting? Let me break it down:

Traditional LintingAI + MCP
Rules hardcoded in config filesReads your living documentation directly
Pattern matching with regexUnderstands context and intent
Binary pass/fail messagesHelpful explanations that reference your standards
Config files require maintenanceSelf-updating as documentation changes
Same rules for all situationsContext-aware, can handle edge cases

The key difference is comprehension. A linter sees total_orders and either matches or doesn't match a pattern. The AI agent reads your documentation that says "scope words like 'total' and 'sum' should not appear in metric names because the semantic layer handles aggregation scope through filters." It understands why the rule exists, which means it can explain violations clearly and even recognize when an exception might be warranted.

This also means the agent can handle conventions that are difficult or impossible to express as regex patterns. Rules like "metric names should be self-explanatory without requiring documentation lookup" or "names should reflect business concepts, not technical implementation" are easy for humans to understand but nightmarish to encode as pattern matching. The AI agent handles them naturally.

What This Looks Like in Practice

Let me walk you through a concrete example. Say your naming convention specifies that metrics should follow the pattern <descriptor>_<entity>_<unit>, with entities in plural form and without words like "total" or "sum" (because scope is handled elsewhere in the semantic layer).

A developer writes this SQL:

SELECT
  subscription_id,
  SUM(order_count) AS total_orders,
  SUM(revenue) AS sum_revenue
FROM orders
GROUP BY subscription_id

The agent fetches your convention from Confluence, analyzes the code, and produces feedback like this:

VIOLATION: Line 3: `total_orders`
- Issue: Contains forbidden word "total"
- Standard: "Scope words never appear in metric names"
- Suggested fix: `orders_count`
- Severity: Medium

VIOLATION: Line 4: `sum_revenue`
- Issue: Contains forbidden word "sum"
- Standard: "Scope words never appear in metric names"
- Suggested fix: `revenue_amount`
- Severity: Medium

Notice what's happening here. The feedback isn't just "error on line 3." It explains what the issue is, references the specific rule from your documentation, and suggests a concrete fix. The developer learns something, and they're much more likely to remember the convention next time.

The agent can also assign severity levels. A metric name that breaks a semantic layer contract might be critical and block the PR. A style preference violation might be minor and just generate a warning. This granularity matters because not all conventions are equally important, and treating them all as blocking errors creates friction that makes people resent the system.

Getting Started: The Minimum Viable Setup

If this sounds useful, here's how to start without getting overwhelmed.

First, you probably already have your most important conventions documented somewhere. If not, write up the one that causes you the most pain. For most data teams, metric naming is a good starting point because it's high impact and the rules tend to be clear.

Second, set up MCP configuration in your CI environment. There are existing MCP servers for Atlassian, Notion, and other platforms. The configuration is straightforward, mostly a matter of providing API credentials and pointing to the right server.

Third, create a focused prompt for the agent. Something like: "Fetch page X from Confluence. Review the changed files in this PR against the conventions described in that page. Report any violations with explanations and suggested fixes." You don't need anything elaborate. The agent knows how to read documentation and compare it against code.

Fourth, and this is important, run in warning-only mode first. Don't block PRs immediately. Let the system run for a week or two and see what it catches. You'll likely need to iterate. Maybe your documentation isn't as clear as you thought and the agent is catching things you didn't intend. Maybe there are false positives you need to address by clarifying your conventions. This calibration period is valuable and will help to make your document clearer, like for instance distinguishing between hard rules and guidelines.

Finally, once you're confident in the results, turn on blocking for critical violations and keep warnings for lesser issues. Then expand to other conventions as you see fit.

Start with one convention, prove the value, and grow from there.

A Different Way to Think About Documentation

Here's what really changed for me: Way back, I used to think of documentation and enforcement as fundamentally separate concerns. Documentation was for humans. Enforcement was for machines. And keeping them in sync was an eternal, never-quite-solved problem. I'm a firm believer in metadata-driven data engineering—the idea that "harder" metadata like data contracts and entity definitions can be used to make sure documentation and implementation are aligned. However, what I was still struggling with was "softer" things like naming conventions and style guides.

This approach solves that as well. Documentation is enforcement. The same artifact serves both purposes. Humans can read and understand it easily because it's written in natural language with explanations and examples. Machines can enforce it because an AI agent can read that same natural language and apply it intelligently.

When you update the documentation, enforcement changes automatically. When something gets flagged that shouldn't be, you clarify the documentation, and that clarification immediately takes effect. The two are no longer separate systems that drift apart. They're the same system.

And here's the part that matters most: you stop being the naming convention police. The agent handles enforcement consistently, on every pull request, without fatigue and without creating interpersonal friction. When standards evolve, they're enforced immediately. When someone new joins the team, they get the same feedback as everyone else.

So next time you're about to send that frustrated Slack message, consider setting up an agent instead. Your blood pressure and your team relationships will thank you.

Get updates on AI-driven data engineering