Beware the ineffective stack.
From "Tool Sprawl" to an Operational Data Platform: How AI Clarifies (and Fixes) the Modern Data Stack
Now with the security layer everyone "totally already had," right?
The modern data stack is simultaneously incredible and... exhausting. Incredible because cloud elasticity, open table formats, and modular tooling let small teams ship analytics, ML, and products that used to require a data center. Exhausting because that same modularity has turned "our data platform" into a Pinterest board of vendor logos, half-finished pipelines, and a Slack channel called #data-incidents that never sleeps.
Eric Flaningam's "Data Industry Primer" does a solid job mapping the market - warehouses, lakes/lakehouses, BI, governance, security, observability, and the vendor constellations around each. But operators don't win by naming categories. Operators win by turning that map into an actual system your company can run - reliably, cheaply, and without accidentally emailing the entire customer list to an LLM.
MIT's "How is this time different?" adds the subtext we all feel: AI adoption is going bottom-up. People already have tools. They will use them. If you block them entirely, you don't stop adoption - you just create shadow IT with worse audit trails.
So the real question becomes:
How do we operationalize the stack when the interface to data is becoming conversational and agentic - without turning our data platform into a compliance incident generator?
Let's complete the primer with the parts that matter: runtime control, contracts, semantics, observability, cost... and security that's real (not "we have SSO").
The post-LLM reality check: the model is the least interesting problem
After building an enterprise AI platform that connects LLMs to real organizational data, here's the punchline:
The model is the least interesting problem you'll solve.
The hard problems live in four unglamorous categories - aka the stuff nobody wants on a keynote slide, but everyone pays for in production.
1) Data access is a political problem disguised as a technical one
The first challenge wasn't generating correct SQL. It was answering: who gets to see what?
Connecting an LLM to your data is easy. Connecting it responsibly - per-user document isolation, query validation, read-only enforcement, row-level restrictions - that's where the work lives. None of it has anything to do with model quality.
Do this before you evaluate a single model: map your data access topology (systems, domains, sensitivity, owners, consumers, and the "special exceptions" everyone swears are temporary).
2) Governance isn't a feature. It's the foundation.
You cannot ship enterprise AI without answering: if something goes wrong, can we trace exactly what happened?
Permissions aren't a boolean. They're a graph: users, groups, granular capabilities, time-limited grants, individual overrides - plus an audit log for every change.
Design your permission model before your prompt templates. The fastest path to production is through your security strategy, not around it. (Yes, I know that's disappointing if you wanted to spend this week prompt-engineering your way out of RBAC.)
3) Integration complexity compounds silently
The individual integrations weren't hard. What's hard is orchestrating them:
- query classification
- parallel knowledge base + web retrieval
- permission-filtered context
- real-time streaming
- token tracking
- context window management
- reliability, timeouts, retries... all at once
Budget 3x more time for the "glue" than you think you'll need. That glue is where enterprise AI lives or dies.
4) Automation without trust doesn't get adopted
Build systems with graduated autonomy: from fully manual -> fully autonomous, with a risk-scoring engine in between.
Key insight: users didn't want full autonomy. They wanted legible autonomy - the ability to see what the AI is doing, understand why, and dial trust up or down.
Design for trust gradients, not trust switches. The goal isn't removing humans from the loop; it's making the loop faster.
The real competitive advantage? Every vendor has access to the same foundation models. Differentiation is everything around the model: data connectivity, security posture, audit trails, and user trust.
The model is the easy part. Everything else is the work.
The stack is not a diagram. It's a set of promises (including security promises)
A data platform exists to keep a few promises:
- Discoverable - people can find the right dataset and trust what it means
- Dependable - freshness, quality, lineage, SLAs
- Usable - consistent metrics/semantics across BI, apps, ML
- Economical - cost is observable, allocated, governed
- Safe - access control, auditability, policy enforcement, isolation
- AI-safe - governed model access, prompt/tool controls, data egress protection, traceability
Most "modern stack" diagrams focus on tools. Operators need the missing layer: contracts + controls + feedback loops. Security is not a checkbox - security is a control plane.
The "missing layers" most primers underweight (aka why your platform feels haunted)
These gaps show up repeatedly - even when you bought the "right" tools.
1) Orchestration and runtime control
Pipelines are production systems: scheduling, dependencies, retries, idempotency, backfills, safe deploys.
Failure mode: pipelines exist, but nobody can answer: "What ran, what changed, what broke, and what data did it corrupt before we noticed?"
2) Contracts between producers and consumers
"The table exists" is not a promise. A promise is schema + definitions + cadence + quality rules + ownership - enforced.
Failure mode: dashboards silently change and you're blamed for "moving the goalposts," even though nobody wrote down where the goalposts were.
3) Semantic consistency (metrics + definitions)
A semantic/metrics layer makes "revenue" mean one thing across BI, finance reporting, experimentation, and product analytics.
Failure mode: metric wars. The CFO has one number, the CRO has another, and your team becomes a therapist with SQL.
4) Lineage as a first-class primitive
Lineage contains blast radius and shortens incidents.
Failure mode: debugging becomes archaeology.
5) Observability for data, not just compute
Freshness, volume, distribution shifts, completeness, downstream impact.
Failure mode: you learn about problems when a VP forwards a screenshot with "???" as the subject line.
6) Cost governance and allocation
Cloud makes it easy to ship value and easy to leak money.
Failure mode: the warehouse bill becomes a jump-scare.
7) Security as an operational system (not a policy PDF)
The minute you add AI (or just more self-serve), your threat surface expands:
- more people and tools touching data
- more "unstructured" outputs leaving systems (summaries, reports, chat answers)
- more ways to mix sensitive + non-sensitive data
- more automation that can take real actions (write, email, provision, deploy)
Failure mode: "We didn't think the assistant could access that." It could. It did.
The security model you actually need for a modern, AI-enabled data platform
Think in three layers: Identity, Policy, Proof. If you don't have all three, you have vibes.
1) Identity: who/what is acting?
- Human identities: SSO is table stakes; least privilege isn't optional
- Service identities: pipelines, dbt jobs, reverse ETL, BI extracts
- Agent identities: LLM tools/digital workers with scoped privileges
Non-negotiables
- least privilege by default
- per-workload/service accounts (no shared "data_bot")
- secrets management + rotation (agents don't get permanent creds "because it's easier")
2) Policy: what actions are allowed?
Security isn't just "read access." In an agentic world, it's "what tools can be used, on what data, with what output controls."
Core controls
- data classification + tagging
- row/column-level security where appropriate
- network egress controls
- DLP on outputs (summaries leak too)
- tool allowlists for agents
AI-specific controls
- prompt injection defenses (treat retrieved content as untrusted)
- grounding requirements (answers cite catalog/contracts/lineage)
- action gating (read -> write -> admin requires approvals)
- sandboxing for execution
3) Proof: how do we audit and respond?
If you can't prove what happened, you don't have governance - you have hope.
- end-to-end audit trails (who asked, what data was accessed, what tools ran, what outputs were produced)
- immutable logs for investigations
- lineage + run logs that connect outputs back to sources
- playbooks for AI incidents (injection, exfiltration attempts, policy violations)
Where AI helps: clarify, enforce, and reduce blast radius
Generative AI is best at bridging human intent and system execution - but only if you ground it in metadata and controls.
1) Clarify the stack into a single operational truth (including security truth)
AI can inventory tools/datasets/identities, explain "who can access what" with evidence, and map data flows for risk reviews.
Practical output:
"If we deprecate customer_status, which dashboards and features break, who owns them, and what's the approved migration plan?"
2) Turn contracts into enforceable security boundaries
Contracts shouldn't stop at schema and SLAs. They should include classification, allowed consumers, approved outputs (e.g., aggregated only), retention and masking.
AI can draft:
- contract templates (schema + policy)
- test suites validating both quality and policy expectations
- migration plans that avoid accidental widening of access
Practical output:
"Orders v2 introduces customer_email. That's PII. Here are the masking rules, access groups, and downstream systems that must be blocked or modified."
3) Security-aware observability and incident response
With telemetry + lineage, AI can flag suspicious access patterns, identify blast radius, recommend containment steps, and draft incident timelines.
Practical output:
"Spike in full-table scans on employees_hr from a new service principal. Revoke token, rotate secret, and review tool permissions."
4) Governed self-serve that doesn't turn into "everyone is root"
AI can suggest safe datasets/aggregations, block restricted access with alternatives, generate approved metrics, and produce outputs with citations + redaction.
Practical output: "You don't have access to raw salary data. Here's an approved, aggregated compensation band report and canonical definitions."
The architecture that actually works: metadata spine + security control plane + AI layer
If you want a clean mental model:
- Metadata spine
- catalog + glossary
- lineage (OpenLineage-like events)
- orchestration run logs
- data quality results
- cost telemetry
- Security control plane
- identity (human/service/agent)
- policy-as-code (classification, RBAC/ABAC, masking, DLP)
- approvals/workflows for sensitive changes
- audit logs + retention
- AI layer (grounded + gated)
- retrieval from approved sources only
- action gating (read -> write -> admin)
- tool allowlists + sandboxing
- redaction and output controls
- full traceability
That's how you get speed without brand risk - and avoid the "Open-Claw" problem where agents have sharp tools and no guardrails (Crazy!).
Build order: AI that improves security before it automates everything
If you're serious, don't start with "agent that writes to prod." Start with "agent that makes prod safer."
- unify metadata spine (catalog + lineage + run logs + permissions context)
- add contracts that include security (classification, consumers, output constraints)
- security-aware incident copilot (triage + containment + audit trail)
- semantic copilot with approvals (metrics/definitions + governance)
- cost copilot (allocation + anomaly detection + guardrails)
- agentic workflows gated by policy (backfills, deprecations, doc updates)
This sequence keeps AI grounded and makes your platform more reliable and safer before it starts doing higher-risk actions. (Let's not start with "AI that writes to prod" unless your favorite hobby is incident response.)
Quick “ship / no-ship” gate (60 seconds)
If you can’t answer these confidently, you’re not ready: (Yes, it's boring. Yes, it's the whole game.)
- Who can see what—and why?
- If something goes wrong, can we trace exactly what happened?
- Can we stop it instantly?
- Can we prove what data left the system (if any)?
Need the full version? See the Enterprise AI Security Pre-Flight Checklist.
LLM-to-warehouse security patterns (the "SQL is not the scary part" card)
- Query firewall: enforce limits, denylist patterns, block raw PII selection, restrict joins
- Row/column enforcement: apply policies in the warehouse and validate in the agent layer
- Read-only by default: separate read vs write credentials; write requires gated workflow
- Scoped datasets: curated views per role/domain instead of "here's the whole lakehouse"
- Result constraints: cap rows returned, aggregate where possible, redact sensitive columns
- Audit everything: prompt -> retrieval -> query -> results -> output -> actions taken
- No raw exports: restrict COPY/UNLOAD; require approved destinations with logging
AI doesn't replace controls. It makes controls usable.
The stack will stay modular because the domains are real: ingestion, storage, compute, governance, observability, BI, ML, security. AI doesn't remove them. It connects them, if you give it shared primitives: contracts, lineage, semantics, telemetry, and policy.
The win isn't "an AI tool." The win is an operational platform where:
- change is safe
- meaning is consistent
- reliability is measurable
- access is governed
- and no one can "accidentally" export your customer list into a chat window and call it innovation
Sources (credible references)
- Eric Flaningam, "Data Industry Primer" (Generative Value): https://www.generativevalue.com/p/data-industry-primer
- MIT Work of the Future (Generation AI), "How is this time different?": https://mitgenerationai.substack.com/p/how-is-this-time-different
- NIST, AI Risk Management Framework (AI RMF 1.0): https://www.nist.gov/itl/ai-risk-management-framework
- OWASP, Top 10 for Large Language Model Applications (v1.1): https://owasp.org/www-project-top-10-for-large-language-model-applications/
- Lewis et al., "Retrieval-Augmented Generation..." (NeurIPS 2020): https://proceedings.neurips.cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf
- Martin Fowler / Zhamak Dehghani, "Data Mesh Principles..." (2020): https://martinfowler.com/articles/data-mesh-principles.html
- OpenLineage project + spec docs: https://openlineage.io/
- OpenTelemetry, "What is OpenTelemetry?": https://opentelemetry.io/docs/what-is-opentelemetry/
- Bitol, Open Data Contract Standard (ODCS) v3.1.0: https://bitol-io.github.io/open-data-contract-standard/v3.1.0/
- Great Expectations documentation: https://docs.greatexpectations.io/docs/home/
- Apache Airflow documentation: https://airflow.apache.org/docs/apache-airflow/stable/
- dbt Semantic Layer overview: https://www.getdbt.com/blog/semantic-layer-introduction
Photo by Ajda Zinber on Unsplash