Securing AI Agents: From Code Scanning to Runtime Enforcement

AI agents aren't experimental anymore. They write code, run shell commands, call external APIs, and orchestrate complex workflows — usually with the same OS privileges as the developer who launched them. That convenience is real. So is the risk.

This post covers the concrete security problems that arise when LLM-powered agents operate inside development environments and production pipelines, and two open-source projects we've built to address them:

AI Risk Compliance Platform — static analysis (SAST) tuned for AI usage in codebases
AgentShield — runtime policy enforcement for AI agent actions

Together they form a detect-then-enforce loop: scan the code to find AI risks, then generate and enforce policies that stop those risks at runtime.

The Problem: No Guardrails Between the LLM and the OS

A typical AI coding agent — Windsurf, Cursor, Claude Code — runs with whatever permissions the developer has. Which means it can:

Read ~/.ssh/id_rsa, ~/.aws/credentials, and .env files
Execute arbitrary shell commands (rm -rf /, curl | bash)
Call MCP tool servers with unrestricted arguments
Install packages from arbitrary registries
Modify IDE configuration and hook files

There's no enforcement layer. The agent decides; the OS obeys.

The code that calls AI APIs is equally ungoverned. Teams adopt OpenAI, Anthropic, LangChain, and dozens of other SDKs across multiple languages, with no centralized tracking of which models touch customer data, where API keys are hardcoded, or whether data flows meet EU AI Act, SOC 2, or OWASP LLM Top 10 requirements. When an enterprise prospect asks for a risk assessment, nobody has the answers.

The Threat Landscape

Through our research, we identified eight threat categories that cover the AI agent attack surface:

Kingdom	Examples
Destructive Ops	`rm -rf /`, disk wipes, fork bombs
Credential Exposure	SSH key theft, AWS credential harvesting, API token leaks
Data Exfiltration	Network egress, base64 encoding, cloud data copy
Unauthorized Execution	`curl \\| bash`, eval injection, uncontrolled model calls
Privilege Escalation	sudo abuse, setuid manipulation, container escape
Persistence & Evasion	Crontab injection, log deletion, disabling security tools
Supply Chain	Dependency confusion, registry tampering, lock file manipulation
Reconnaissance	Network scanning, process enumeration, lateral movement

Each kingdom is defined in a shared machine-readable YAML taxonomy — the common vocabulary that connects static scan findings to runtime enforcement rules.

Part 1: Static Analysis for AI Usage

The AI Risk Compliance Platform extends the SAST paradigm to AI-specific risks. Traditional tools find SQL injection and XSS. We find uncontrolled model invocations, leaked API keys, customer data flowing into LLM calls, and ungoverned agent framework usage — across Python, TypeScript, Go, Java, and Ruby.

Architecture

The platform is built around six bounded contexts:

What Gets Detected

The scanner uses Semgrep CE with custom rule packs built for AI risk:

Rule Pack	What It Finds	Languages
`ai-sdk-imports`	OpenAI, Anthropic, LangChain, Vertex AI, Bedrock usage	Python, TS, Go, Java, Ruby
`ai-api-keys`	Hardcoded API keys and env var references	Python, TS, Java
`ai-model-refs`	Model name strings (`gpt-4`, `claude-3`, etc.)	Python
`ai-data-flows`	Customer data flowing into LLM API calls	Python, JS, TS

The taint analysis is especially powerful. Using Semgrep's labeled taint mode, we track data from HTTP inputs, database queries, and auth sessions through to LLM API sinks:

DETECTED: customer data reaches LLM call

@app.route("/chat")
def chat():
    user_msg = request.json()["message"]        # source: CUSTOMER_DATA
    response = openai.chat.completions.create(  # sink: LLM API
        model="gpt-4",
        messages=[{"role": "user", "content": user_msg}]
    )

Risk Scoring

Raw findings are mapped to the threat taxonomy and scored:

RiskScore = baseScore × confidenceWeight × taxonomyBoost

Base score by finding type: api_key = 8, sdk_import = 7, model_ref = 6
Confidence weight from Semgrep rule metadata
Taxonomy boost from threat severity: critical = 1.5×, high = 1.3×

The risk engine then maps findings to compliance standards — OWASP LLM Top 10 2025, EU AI Act, and SOC 2 — and surfaces coverage gaps.

Part 2: Runtime Enforcement with AgentShield

Static analysis tells you what risks exist. AgentShield stops them from materializing.

AgentShield is a deterministic policy gate between AI agents and the OS. Every shell command and MCP tool call passes through a six-layer analyzer pipeline before execution:

Layer	What It Does	Example
Regex	Fast pattern matching	`rm -rf /`, `curl \\| bash`
Structural	Shell AST parsing, flag normalization	`--recursive` → `-r`, sudo unwrapping
Semantic	Intent classification	"file-delete", "network-exfil", "code-execute"
Dataflow	Source→sink taint through pipes	`cat ~/.ssh/id_rsa \\| base64 \\| curl`
Stateful	Multi-step attack chain detection	`wget ... && chmod +x && ./payload`
Guardian	Prompt injection and obfuscation	Unicode smuggling, base64 payloads

When layers disagree, most-restrictive-wins: BLOCK > AUDIT > ALLOW.

MCP Mediation

AgentShield also intercepts Model Context Protocol tool calls — the emerging standard for agent-to-server communication. It evaluates:

Blocked tools — an always-deny list for dangerous tool names
Argument patterns — glob/regex matching on call arguments
Content scanning — detects SSH keys, AWS credentials, base64 blobs in arguments
Value limits — numeric thresholds against uncontrolled resource commitment
Config file guards — blocks writes to IDE hooks, shell dotfiles, and AgentShield's own policy
Tool description poisoning — scans tools/list for hidden instructions

Red-Team Results

Test Suite	Cases	Pass Rate
Shell threat detection	123	100% (0 false negatives)
Guardian (prompt injection, obfuscation)	21	100%
MCP stdio integration (real proxy, real server)	24	100%

The Closed Loop: Detect → Enforce → Monitor

The real value isn't either tool in isolation — it's the governance loop they form together:

Step 1 — Detect. Scan your codebase with comply scan. The platform finds every AI SDK import, API key reference, model invocation, and unsafe data flow.

Step 2 — Assess. The risk engine maps findings to the threat taxonomy, computes risk scores, and identifies compliance gaps against OWASP LLM Top 10, EU AI Act, and SOC 2.

Step 3 — Enforce. The policy generator produces an AgentShield-compatible YAML policy pack. Install it, and AgentShield blocks the specific threats your codebase is exposed to.

Step 4 — Monitor. AgentShield's audit log feeds back into the compliance platform for trend analysis and audit-ready reporting.

The full loop in four commands

bin/comply scan -path ./my-project -taxonomy third_party/agentshield/taxonomy
bin/comply generate-policy -path ./my-project -org "my-startup" -output policy.yaml
cp policy.yaml ~/.agentshield/mcp-packs/
bin/comply report -path ./my-project -output compliance-report.md

Why Now

The AI governance market is projected to grow from $308M in 2025 to $3.6B by 2033. The EU AI Act is entering enforcement. Enterprise buyers and investors are increasingly requiring documented AI risk assessments as part of due diligence.

Yet existing tools weren't built for this:

Tool	The Gap
Vanta, Drata	General compliance — no AI-specific scanning or agent mediation
Fortify, Snyk, Semgrep	Find SQLi and XSS — not LLM data flows or agent threats
Guardrails AI, NeMo Guardrails	Filter prompts/outputs — don't govern shell commands or MCP calls
Manual audits	Stale the moment the code changes

The combination of AI-specific SAST and deterministic runtime enforcement is a new category. Nothing else covers both the code level and the runtime layer for AI agent security.

Getting Started

AI Risk Compliance Platform

git clone --recurse-submodules https://github.com/security-researcher-ca/AI_risk_compliance.git
cd AI_risk_compliance
make setup-semgrep   # downloads bundled Semgrep CE binary
make test-rules      # verify 88+ rule test assertions pass
make build           # build the comply CLI
bin/comply scan -path /path/to/your/project

AgentShield

brew tap security-researcher-ca/tap
brew install agentshield

Protect your IDE agent

agentshield setup claude-code   # or: windsurf, cursor, openclaw

Protect MCP tool servers

agentshield setup mcp

Generate Policies from Scan Results

bin/comply generate-policy \
  -path ./my-project \
  -taxonomy third_party/agentshield/taxonomy \
  -org "my-startup" \
  -output ~/.agentshield/mcp-packs/comply-rules.yaml

What's Next

Cross-file taint analysis — tracking data flows across module boundaries
Additional language analyzers — expanding SAST coverage beyond the current six languages
eBPF-based enforcement — kernel-level interception for agents that bypass user-space wrappers
Remote audit log forwarding — shipping enforcement logs to SIEM systems for enterprise SOC integration
CI/CD integration — failing builds on AI risk threshold violations (GitHub Action already available)

Closing Thoughts

AI agents are powerful, unsupervised, and overprivileged. Most security tooling was designed for a pre-agent world. Closing that gap requires coverage at two layers:

Static analysis to discover and assess AI risks in the codebase before deployment
Runtime enforcement to block dangerous actions as agents execute

The AI Risk Compliance Platform and AgentShield provide this as open-source tools, connected by a shared threat taxonomy and a policy generation pipeline. Scan findings produce the policies that the runtime enforces — a loop that gets smarter as the codebase evolves.

The era of "trust the agent" is over. Verify, then enforce.

Both projects are open-source (Apache 2.0):

Questions or collaboration? Open an issue on either repo.