Securing AI Agents: From Code Scanning to Runtime Enforcement

Securing AI Agents: From Code Scanning to Runtime Enforcement

9 min read
With contributions from: security-researcher-ca
Updated March 14, 2026

AI agents aren't experimental anymore. They write code, run shell commands, call external APIs, and orchestrate complex workflows — usually with the same OS privileges as the developer who launched them. That convenience is real. So is the risk.


This post covers the concrete security problems that arise when LLM-powered agents operate inside development environments and production pipelines, and two open-source projects we've built to address them:

Together they form a detect-then-enforce loop: scan the code to find AI risks, then generate and enforce policies that stop those risks at runtime.


The Problem: No Guardrails Between the LLM and the OS

A typical AI coding agent — Windsurf, Cursor, Claude Code — runs with whatever permissions the developer has. Which means it can:

  • Read ~/.ssh/id_rsa, ~/.aws/credentials, and .env files

  • Execute arbitrary shell commands (rm -rf /, curl | bash)

  • Call MCP tool servers with unrestricted arguments

  • Install packages from arbitrary registries

  • Modify IDE configuration and hook files

There's no enforcement layer. The agent decides; the OS obeys.

The code that calls AI APIs is equally ungoverned. Teams adopt OpenAI, Anthropic, LangChain, and dozens of other SDKs across multiple languages, with no centralized tracking of which models touch customer data, where API keys are hardcoded, or whether data flows meet EU AI Act, SOC 2, or OWASP LLM Top 10 requirements. When an enterprise prospect asks for a risk assessment, nobody has the answers.


The Threat Landscape

Through our research, we identified eight threat categories that cover the AI agent attack surface:

Kingdom Examples
Destructive Ops rm -rf /, disk wipes, fork bombs
Credential Exposure SSH key theft, AWS credential harvesting, API token leaks
Data Exfiltration Network egress, base64 encoding, cloud data copy
Unauthorized Execution curl \| bash, eval injection, uncontrolled model calls
Privilege Escalation sudo abuse, setuid manipulation, container escape
Persistence & Evasion Crontab injection, log deletion, disabling security tools
Supply Chain Dependency confusion, registry tampering, lock file manipulation
Reconnaissance Network scanning, process enumeration, lateral movement

Each kingdom is defined in a shared machine-readable YAML taxonomy — the common vocabulary that connects static scan findings to runtime enforcement rules.


Part 1: Static Analysis for AI Usage

The AI Risk Compliance Platform extends the SAST paradigm to AI-specific risks. Traditional tools find SQL injection and XSS. We find uncontrolled model invocations, leaked API keys, customer data flowing into LLM calls, and ungoverned agent framework usage — across Python, TypeScript, Go, Java, and Ruby.

Architecture

The platform is built around six bounded contexts:

What Gets Detected

The scanner uses Semgrep CE with custom rule packs built for AI risk:

Rule Pack What It Finds Languages
ai-sdk-imports OpenAI, Anthropic, LangChain, Vertex AI, Bedrock usage Python, TS, Go, Java, Ruby
ai-api-keys Hardcoded API keys and env var references Python, TS, Java
ai-model-refs Model name strings (gpt-4, claude-3, etc.) Python
ai-data-flows Customer data flowing into LLM API calls Python, JS, TS

The taint analysis is especially powerful. Using Semgrep's labeled taint mode, we track data from HTTP inputs, database queries, and auth sessions through to LLM API sinks:

DETECTED: customer data reaches LLM call

@app.route("/chat")
def chat():
    user_msg = request.json()["message"]        # source: CUSTOMER_DATA
    response = openai.chat.completions.create(  # sink: LLM API
        model="gpt-4",
        messages=[{"role": "user", "content": user_msg}]
    )

Risk Scoring

Raw findings are mapped to the threat taxonomy and scored:

RiskScore = baseScore × confidenceWeight × taxonomyBoost
  • Base score by finding type: api_key = 8, sdk_import = 7, model_ref = 6

  • Confidence weight from Semgrep rule metadata

  • Taxonomy boost from threat severity: critical = 1.5×, high = 1.3×

The risk engine then maps findings to compliance standards — OWASP LLM Top 10 2025, EU AI Act, and SOC 2 — and surfaces coverage gaps.


Part 2: Runtime Enforcement with AgentShield

Static analysis tells you what risks exist. AgentShield stops them from materializing.

AgentShield is a deterministic policy gate between AI agents and the OS. Every shell command and MCP tool call passes through a six-layer analyzer pipeline before execution:

Layer What It Does Example
Regex Fast pattern matching rm -rf /, curl \| bash
Structural Shell AST parsing, flag normalization --recursive-r, sudo unwrapping
Semantic Intent classification "file-delete", "network-exfil", "code-execute"
Dataflow Source→sink taint through pipes cat ~/.ssh/id_rsa \| base64 \| curl
Stateful Multi-step attack chain detection wget ... && chmod +x && ./payload
Guardian Prompt injection and obfuscation Unicode smuggling, base64 payloads

When layers disagree, most-restrictive-wins: BLOCK > AUDIT > ALLOW.

MCP Mediation

AgentShield also intercepts Model Context Protocol tool calls — the emerging standard for agent-to-server communication. It evaluates:

  • Blocked tools — an always-deny list for dangerous tool names

  • Argument patterns — glob/regex matching on call arguments

  • Content scanning — detects SSH keys, AWS credentials, base64 blobs in arguments

  • Value limits — numeric thresholds against uncontrolled resource commitment

  • Config file guards — blocks writes to IDE hooks, shell dotfiles, and AgentShield's own policy

  • Tool description poisoning — scans tools/list for hidden instructions

Red-Team Results

Test Suite Cases Pass Rate
Shell threat detection 123 100% (0 false negatives)
Guardian (prompt injection, obfuscation) 21 100%
MCP stdio integration (real proxy, real server) 24 100%

The Closed Loop: Detect → Enforce → Monitor

The real value isn't either tool in isolation — it's the governance loop they form together:

Step 1 — Detect. Scan your codebase with comply scan. The platform finds every AI SDK import, API key reference, model invocation, and unsafe data flow.

Step 2 — Assess. The risk engine maps findings to the threat taxonomy, computes risk scores, and identifies compliance gaps against OWASP LLM Top 10, EU AI Act, and SOC 2.

Step 3 — Enforce. The policy generator produces an AgentShield-compatible YAML policy pack. Install it, and AgentShield blocks the specific threats your codebase is exposed to.

Step 4 — Monitor. AgentShield's audit log feeds back into the compliance platform for trend analysis and audit-ready reporting.

The full loop in four commands

bin/comply scan -path ./my-project -taxonomy third_party/agentshield/taxonomy
bin/comply generate-policy -path ./my-project -org "my-startup" -output policy.yaml
cp policy.yaml ~/.agentshield/mcp-packs/
bin/comply report -path ./my-project -output compliance-report.md

Why Now

The AI governance market is projected to grow from $308M in 2025 to $3.6B by 2033. The EU AI Act is entering enforcement. Enterprise buyers and investors are increasingly requiring documented AI risk assessments as part of due diligence.

Yet existing tools weren't built for this:

Tool The Gap
Vanta, Drata General compliance — no AI-specific scanning or agent mediation
Fortify, Snyk, Semgrep Find SQLi and XSS — not LLM data flows or agent threats
Guardrails AI, NeMo Guardrails Filter prompts/outputs — don't govern shell commands or MCP calls
Manual audits Stale the moment the code changes

The combination of AI-specific SAST and deterministic runtime enforcement is a new category. Nothing else covers both the code level and the runtime layer for AI agent security.


Getting Started

AI Risk Compliance Platform

git clone --recurse-submodules https://github.com/security-researcher-ca/AI_risk_compliance.git
cd AI_risk_compliance
make setup-semgrep   # downloads bundled Semgrep CE binary
make test-rules      # verify 88+ rule test assertions pass
make build           # build the comply CLI
bin/comply scan -path /path/to/your/project

AgentShield

brew tap security-researcher-ca/tap
brew install agentshield

Protect your IDE agent

agentshield setup claude-code   # or: windsurf, cursor, openclaw

Protect MCP tool servers

agentshield setup mcp

Generate Policies from Scan Results

bin/comply generate-policy \
  -path ./my-project \
  -taxonomy third_party/agentshield/taxonomy \
  -org "my-startup" \
  -output ~/.agentshield/mcp-packs/comply-rules.yaml

What's Next

  • Cross-file taint analysis — tracking data flows across module boundaries

  • Additional language analyzers — expanding SAST coverage beyond the current six languages

  • eBPF-based enforcement — kernel-level interception for agents that bypass user-space wrappers

  • Remote audit log forwarding — shipping enforcement logs to SIEM systems for enterprise SOC integration

  • CI/CD integration — failing builds on AI risk threshold violations (GitHub Action already available)


Closing Thoughts

AI agents are powerful, unsupervised, and overprivileged. Most security tooling was designed for a pre-agent world. Closing that gap requires coverage at two layers:

  1. Static analysis to discover and assess AI risks in the codebase before deployment

  2. Runtime enforcement to block dangerous actions as agents execute

The AI Risk Compliance Platform and AgentShield provide this as open-source tools, connected by a shared threat taxonomy and a policy generation pipeline. Scan findings produce the policies that the runtime enforces — a loop that gets smarter as the codebase evolves.

The era of "trust the agent" is over. Verify, then enforce.


Both projects are open-source (Apache 2.0):

Questions or collaboration? Open an issue on either repo.

Comments

Loading comments...