Preproduction Control for AI Agents

Diving deeper into

CISO at F500 Company on automating security operations with AI agents

Interview
We also use that same environment to perform activities like penetration testing and prompt injection testing to verify that it would be safe to use in the production environment.
Analyzed 5 sources

This testing setup shows that enterprise AI security is becoming a pre production control problem, not just a vendor review problem. The important shift is that agents are treated like semi trusted operators with permissions, tool access, and failure modes that need live rehearsal. In practice, the test environment is where teams see whether an agent misreads alerts, takes the wrong action, or follows hostile instructions before it ever touches Splunk, Jira, or production data.

  • Penetration testing and prompt injection testing cover different risks. Pen testing checks whether the surrounding system can be broken into. Prompt injection testing checks whether crafted text can steer the model into ignoring rules, leaking data, or misusing tools. OWASP now treats prompt injection as the top LLM application risk.
  • The reason this can happen in one staging environment is that the company is building agents inside an existing security stack, on top of Splunk, Jira, GitHub, and Bitbucket, with humans still approving outcomes. That makes the safest first step a closed replica where the team can log every suggestion and compare agent behavior to analyst judgment.
  • This mirrors a broader market shift toward red teaming and control layers for agentic systems. Security products like Promptfoo and DryRun are built around finding prompt injection, tool misuse, and unsafe output paths before deployment, while OpenAI recommends structured outputs, tool approvals, and evals to limit malicious instruction flow.

The next stage is that these staging environments become permanent proving grounds for higher autonomy. As security teams move from alert triage to auto closing duplicates, false positives, and eventually remediation, the winners will be the teams that can continuously test agents against new attack prompts and operational edge cases before expanding production permissions.