The Data Consulting Company

Enterprise AI Security Pre-Flight Checklist

A practical go-live gate for connecting LLMs to enterprise data safely.

Security pre-flight checklist

If you're connecting LLMs to real organizational data, this is the "don't embarrass yourself in production" list.

1) Identity and Access

SSO + MFA enforced for all users (no exceptions, no "temporary" accounts)
Separate identities for humans, services, and agents (no shared "data_bot")
Least privilege by default: start read-only, scoped datasets, scoped tools
RBAC/ABAC defined (roles/attributes mapped to reality, not fantasy)
Row/column-level security for sensitive domains (HR, finance, customer PII)
Time-bound access for elevated permissions (auto-expire; approvals logged)
Secrets management + rotation (no long-lived keys; no creds in prompts/logs)

2) Data Classification and Handling

Classification tags applied (PII/PHI/PCI/Confidential/Public)
Approved-use policies per classification (e.g., "PII can't leave VPC")
Masking/tokenization defined and enforced
Retention rules for data and AI artifacts (prompts, transcripts, embeddings)

3) Agent Tooling Controls (the "sharp objects" section)

Tool allowlist (explicit connectors/actions; everything else blocked)
Read vs write separation (writes require stronger auth + gating)
Action approvals for high-risk actions (exports, perms, deletions, sends)
Sandboxed execution (no unrestricted network, no arbitrary file writes)
Query validation to prevent exfil patterns; enforce limits and safe joins
Egress controls (network policies, domain allowlists, "no internet" mode)

4) Prompt Injection and Retrieval Safety

Treat retrieved content as untrusted input
Grounding required: answers cite catalog/contracts/lineage where possible
Permission-filtered retrieval (per-user isolation, per-dataset constraints)
Output filtering: redact sensitive values; block disallowed classes
No training on customer data unless explicitly governed/contracted

5) Auditability and Evidence

End-to-end audit logs: who asked, what was retrieved/queried/output
Immutable log storage with compliance-aligned retention
Lineage + run logs connected to AI actions ("this answer came from...")
Change control for prompts/tools/policies (versioning + approvals + rollbacks)

6) Observability and Detection

Data observability: freshness, volume, distribution shifts, schema changes
Security observability: query anomalies, export spikes, new principals, drift
Cost observability: token usage + query spend per team/agent
Alerts routed to owners (not #general) with playbooks attached

7) Incident Response (because it's not "if")

Playbooks for: leaks, injection, tool misuse, credential exposure
Kill switch: disable an agent/connector instantly
Key rotation runbook tested (not theoretical)
Postmortems include blast radius + datasets + outputs + remediations

8) Trust Gradients (adoption hinge)

Graduated autonomy levels (Manual -> Suggested -> Assisted -> Autonomous)
Show sources + planned actions + impact before execution
Risk scoring for actions (higher risk -> more friction/approval)
Easy rollback for automated changes

Self-Hosted Security & Governance Limitations

White paper on security and governance tradeoffs in self-hosted analytics platforms

Assessments Overview

Structured evaluations to baseline data and AI maturity