AI agents aren't experimental anymore. They write code, run shell commands, call external APIs, and orchestrate complex workflows — usually with the same OS privileges as the developer who launched them. That convenience is real. So is the risk.
This post covers the concrete security problems that arise when LLM-powered agents operate inside development environments and production pipelines, and two open-source projects we've built to address them:
AI Risk Compliance Platform — static analysis (SAST) tuned for AI usage in codebases
AgentShield — runtime policy enforcement for AI agent actions
Together they form a detect-then-enforce loop: scan the code to find AI risks, then generate and enforce policies that stop those risks at runtime.
The Problem: No Guardrails Between the LLM and the OS
A typical AI coding agent — Windsurf, Cursor, Claude Code — runs with whatever permissions the developer has. Which means it can:
Read
~/.ssh/id_rsa,~/.aws/credentials, and.envfilesExecute arbitrary shell commands (
rm -rf /,curl | bash)Call MCP tool servers with unrestricted arguments
Install packages from arbitrary registries
Modify IDE configuration and hook files
There's no enforcement layer. The agent decides; the OS obeys.
The code that calls AI APIs is equally ungoverned. Teams adopt OpenAI, Anthropic, LangChain, and dozens of other SDKs across multiple languages, with no centralized tracking of which models touch customer data, where API keys are hardcoded, or whether data flows meet EU AI Act, SOC 2, or OWASP LLM Top 10 requirements. When an enterprise prospect asks for a risk assessment, nobody has the answers.
The Threat Landscape
Through our research, we identified eight threat categories that cover the AI agent attack surface:
| Kingdom | Examples |
|---|---|
| Destructive Ops | rm -rf /, disk wipes, fork bombs |
| Credential Exposure | SSH key theft, AWS credential harvesting, API token leaks |
| Data Exfiltration | Network egress, base64 encoding, cloud data copy |
| Unauthorized Execution | curl \| bash, eval injection, uncontrolled model calls |
| Privilege Escalation | sudo abuse, setuid manipulation, container escape |
| Persistence & Evasion | Crontab injection, log deletion, disabling security tools |
| Supply Chain | Dependency confusion, registry tampering, lock file manipulation |
| Reconnaissance | Network scanning, process enumeration, lateral movement |
Each kingdom is defined in a shared machine-readable YAML taxonomy — the common vocabulary that connects static scan findings to runtime enforcement rules.
Part 1: Static Analysis for AI Usage
The AI Risk Compliance Platform extends the SAST paradigm to AI-specific risks. Traditional tools find SQL injection and XSS. We find uncontrolled model invocations, leaked API keys, customer data flowing into LLM calls, and ungoverned agent framework usage — across Python, TypeScript, Go, Java, and Ruby.
Architecture
The platform is built around six bounded contexts:
What Gets Detected
The scanner uses Semgrep CE with custom rule packs built for AI risk:
| Rule Pack | What It Finds | Languages |
|---|---|---|
ai-sdk-imports |
OpenAI, Anthropic, LangChain, Vertex AI, Bedrock usage | Python, TS, Go, Java, Ruby |
ai-api-keys |
Hardcoded API keys and env var references | Python, TS, Java |
ai-model-refs |
Model name strings (gpt-4, claude-3, etc.) |
Python |
ai-data-flows |
Customer data flowing into LLM API calls | Python, JS, TS |
The taint analysis is especially powerful. Using Semgrep's labeled taint mode, we track data from HTTP inputs, database queries, and auth sessions through to LLM API sinks:
DETECTED: customer data reaches LLM call
@app.route("/chat")
def chat():
user_msg = request.json()["message"] # source: CUSTOMER_DATA
response = openai.chat.completions.create( # sink: LLM API
model="gpt-4",
messages=[{"role": "user", "content": user_msg}]
)
Risk Scoring
Raw findings are mapped to the threat taxonomy and scored:
RiskScore = baseScore × confidenceWeight × taxonomyBoost
Base score by finding type:
api_key = 8,sdk_import = 7,model_ref = 6Confidence weight from Semgrep rule metadata
Taxonomy boost from threat severity:
critical = 1.5×,high = 1.3×
The risk engine then maps findings to compliance standards — OWASP LLM Top 10 2025, EU AI Act, and SOC 2 — and surfaces coverage gaps.
Part 2: Runtime Enforcement with AgentShield
Static analysis tells you what risks exist. AgentShield stops them from materializing.
AgentShield is a deterministic policy gate between AI agents and the OS. Every shell command and MCP tool call passes through a six-layer analyzer pipeline before execution:
| Layer | What It Does | Example |
|---|---|---|
| Regex | Fast pattern matching | rm -rf /, curl \| bash |
| Structural | Shell AST parsing, flag normalization | --recursive → -r, sudo unwrapping |
| Semantic | Intent classification | "file-delete", "network-exfil", "code-execute" |
| Dataflow | Source→sink taint through pipes | cat ~/.ssh/id_rsa \| base64 \| curl |
| Stateful | Multi-step attack chain detection | wget ... && chmod +x && ./payload |
| Guardian | Prompt injection and obfuscation | Unicode smuggling, base64 payloads |
When layers disagree, most-restrictive-wins: BLOCK > AUDIT > ALLOW.
MCP Mediation
AgentShield also intercepts Model Context Protocol tool calls — the emerging standard for agent-to-server communication. It evaluates:
Blocked tools — an always-deny list for dangerous tool names
Argument patterns — glob/regex matching on call arguments
Content scanning — detects SSH keys, AWS credentials, base64 blobs in arguments
Value limits — numeric thresholds against uncontrolled resource commitment
Config file guards — blocks writes to IDE hooks, shell dotfiles, and AgentShield's own policy
Tool description poisoning — scans
tools/listfor hidden instructions
Red-Team Results
| Test Suite | Cases | Pass Rate |
|---|---|---|
| Shell threat detection | 123 | 100% (0 false negatives) |
| Guardian (prompt injection, obfuscation) | 21 | 100% |
| MCP stdio integration (real proxy, real server) | 24 | 100% |
The Closed Loop: Detect → Enforce → Monitor
The real value isn't either tool in isolation — it's the governance loop they form together:
Step 1 — Detect. Scan your codebase with comply scan. The platform finds every AI SDK import, API key reference, model invocation, and unsafe data flow.
Step 2 — Assess. The risk engine maps findings to the threat taxonomy, computes risk scores, and identifies compliance gaps against OWASP LLM Top 10, EU AI Act, and SOC 2.
Step 3 — Enforce. The policy generator produces an AgentShield-compatible YAML policy pack. Install it, and AgentShield blocks the specific threats your codebase is exposed to.
Step 4 — Monitor. AgentShield's audit log feeds back into the compliance platform for trend analysis and audit-ready reporting.
The full loop in four commands
bin/comply scan -path ./my-project -taxonomy third_party/agentshield/taxonomy
bin/comply generate-policy -path ./my-project -org "my-startup" -output policy.yaml
cp policy.yaml ~/.agentshield/mcp-packs/
bin/comply report -path ./my-project -output compliance-report.md
Why Now
The AI governance market is projected to grow from $308M in 2025 to $3.6B by 2033. The EU AI Act is entering enforcement. Enterprise buyers and investors are increasingly requiring documented AI risk assessments as part of due diligence.
Yet existing tools weren't built for this:
| Tool | The Gap |
|---|---|
| Vanta, Drata | General compliance — no AI-specific scanning or agent mediation |
| Fortify, Snyk, Semgrep | Find SQLi and XSS — not LLM data flows or agent threats |
| Guardrails AI, NeMo Guardrails | Filter prompts/outputs — don't govern shell commands or MCP calls |
| Manual audits | Stale the moment the code changes |
The combination of AI-specific SAST and deterministic runtime enforcement is a new category. Nothing else covers both the code level and the runtime layer for AI agent security.
Getting Started
AI Risk Compliance Platform
git clone --recurse-submodules https://github.com/security-researcher-ca/AI_risk_compliance.git
cd AI_risk_compliance
make setup-semgrep # downloads bundled Semgrep CE binary
make test-rules # verify 88+ rule test assertions pass
make build # build the comply CLI
bin/comply scan -path /path/to/your/project
AgentShield
brew tap security-researcher-ca/tap
brew install agentshield
Protect your IDE agent
agentshield setup claude-code # or: windsurf, cursor, openclaw
Protect MCP tool servers
agentshield setup mcp
Generate Policies from Scan Results
bin/comply generate-policy \
-path ./my-project \
-taxonomy third_party/agentshield/taxonomy \
-org "my-startup" \
-output ~/.agentshield/mcp-packs/comply-rules.yaml
What's Next
Cross-file taint analysis — tracking data flows across module boundaries
Additional language analyzers — expanding SAST coverage beyond the current six languages
eBPF-based enforcement — kernel-level interception for agents that bypass user-space wrappers
Remote audit log forwarding — shipping enforcement logs to SIEM systems for enterprise SOC integration
CI/CD integration — failing builds on AI risk threshold violations (GitHub Action already available)
Closing Thoughts
AI agents are powerful, unsupervised, and overprivileged. Most security tooling was designed for a pre-agent world. Closing that gap requires coverage at two layers:
Static analysis to discover and assess AI risks in the codebase before deployment
Runtime enforcement to block dangerous actions as agents execute
The AI Risk Compliance Platform and AgentShield provide this as open-source tools, connected by a shared threat taxonomy and a policy generation pipeline. Scan findings produce the policies that the runtime enforces — a loop that gets smarter as the codebase evolves.
The era of "trust the agent" is over. Verify, then enforce.
Both projects are open-source (Apache 2.0):
Questions or collaboration? Open an issue on either repo.
Loading comments...