Enterprise AI Security Pre-Flight Checklist
A practical go-live gate for connecting LLMs to enterprise data safely.
Security pre-flight checklist
If you're connecting LLMs to real organizational data, this is the "don't embarrass yourself in production" list.
1) Identity and Access
- SSO + MFA enforced for all users (no exceptions, no "temporary" accounts)
- Separate identities for humans, services, and agents (no shared "data_bot")
- Least privilege by default: start read-only, scoped datasets, scoped tools
- RBAC/ABAC defined (roles/attributes mapped to reality, not fantasy)
- Row/column-level security for sensitive domains (HR, finance, customer PII)
- Time-bound access for elevated permissions (auto-expire; approvals logged)
- Secrets management + rotation (no long-lived keys; no creds in prompts/logs)
2) Data Classification and Handling
- Classification tags applied (PII/PHI/PCI/Confidential/Public)
- Approved-use policies per classification (e.g., "PII can't leave VPC")
- Masking/tokenization defined and enforced
- Retention rules for data and AI artifacts (prompts, transcripts, embeddings)
3) Agent Tooling Controls (the "sharp objects" section)
- Tool allowlist (explicit connectors/actions; everything else blocked)
- Read vs write separation (writes require stronger auth + gating)
- Action approvals for high-risk actions (exports, perms, deletions, sends)
- Sandboxed execution (no unrestricted network, no arbitrary file writes)
- Query validation to prevent exfil patterns; enforce limits and safe joins
- Egress controls (network policies, domain allowlists, "no internet" mode)
4) Prompt Injection and Retrieval Safety
- Treat retrieved content as untrusted input
- Grounding required: answers cite catalog/contracts/lineage where possible
- Permission-filtered retrieval (per-user isolation, per-dataset constraints)
- Output filtering: redact sensitive values; block disallowed classes
- No training on customer data unless explicitly governed/contracted
5) Auditability and Evidence
- End-to-end audit logs: who asked, what was retrieved/queried/output
- Immutable log storage with compliance-aligned retention
- Lineage + run logs connected to AI actions ("this answer came from...")
- Change control for prompts/tools/policies (versioning + approvals + rollbacks)
6) Observability and Detection
- Data observability: freshness, volume, distribution shifts, schema changes
- Security observability: query anomalies, export spikes, new principals, drift
- Cost observability: token usage + query spend per team/agent
- Alerts routed to owners (not
#general) with playbooks attached
7) Incident Response (because it's not "if")
- Playbooks for: leaks, injection, tool misuse, credential exposure
- Kill switch: disable an agent/connector instantly
- Key rotation runbook tested (not theoretical)
- Postmortems include blast radius + datasets + outputs + remediations
8) Trust Gradients (adoption hinge)
- Graduated autonomy levels (Manual -> Suggested -> Assisted -> Autonomous)
- Show sources + planned actions + impact before execution
- Risk scoring for actions (higher risk -> more friction/approval)
- Easy rollback for automated changes