Promptfoo
Valuation & Funding
Promptfoo's most recent disclosed post-money valuation was approximately $85.5M, set at the time of its $18.4M Series A in July 2025, led by Insight Partners with participation from Andreessen Horowitz.
The company raised a $5M seed round in July 2024, led by Andreessen Horowitz, with angel participation from Tobi Lütke, Stanislav Vishnevskiy, and Frederic Kerrest.
Total disclosed primary funding raised across both rounds is $23.4M.
Product
Promptfoo is an AI application security platform for engineering and security teams to identify how their LLM-powered products can fail before those products reach users.
The core workflow starts in a command-line interface. A developer installs Promptfoo via npm, pip, or Homebrew, writes a YAML configuration file that describes the target AI system, and then runs either an evaluation or a red-team scan. That YAML config is version-controllable and reproducible, so the same test suite can be re-run in CI/CD as a regression gate before every deployment.
Promptfoo connects to a target system through a generic HTTP API endpoint, a Python or JavaScript wrapper, a shell script, a browser session, or a direct provider integration. It can test a customer-support chatbot, an internal knowledge assistant, an agent with tool access, or a raw model, without requiring any changes to the application being tested.
The red-team capability is a core part of the platform. Rather than requiring a human security engineer to manually invent adversarial prompts, Promptfoo generates them automatically using a library of 134 plugins organized into categories: brand risk, compliance and legal, dataset issues, security and access control, trust and safety, and custom. Each plugin is a trained model that produces targeted attack payloads for a specific vulnerability class, prompt injection, jailbreaks, PII leakage, insecure tool use, RAG poisoning, document exfiltration, business rule violations, and more.
The distinction between testing a raw model and testing an application matters here. Promptfoo is designed to find failures that only appear when a model is embedded in a real workflow. If a company has an internal knowledge assistant built on a RAG pipeline, Promptfoo can test whether a malicious document in the knowledge base can override the system prompt, whether the assistant leaks confidential source documents, whether retrieved content can be used to exfiltrate data through tool calls, and whether citations can be fabricated under adversarial pressure.
For agent systems connected via Model Context Protocol, Promptfoo can test the MCP server itself as the target, evaluating whether the agent exercises excessive permissions, exposes sensitive data through tool calls, or accepts instructions from unapproved external servers. The enterprise MCP Proxy product sits between users and MCP servers as a network control layer, enforcing whitelists, logging all tool interactions, and alerting on policy violations.
The code scanning product moves the security check earlier in the development process. A VS Code extension scans code on save and surfaces inline diagnostics for LLM-specific issues like prompt injection vectors, PII exposure, and improper output handling. The same checks run in GitHub Actions, commenting on pull requests with findings and suggested fixes before code is merged.
Once a red-team scan completes, Promptfoo generates a remediation report that maps each discovered vulnerability to a prioritized fix, includes real attack examples from the scan, and provides code-level remediation guidance. Enterprise teams get a shared dashboard where findings can be filtered by severity, target, and risk category, and where security and compliance stakeholders can track posture across a portfolio of AI applications.
The adaptive guardrails feature closes the loop from discovery to defense. Vulnerabilities found during red-team scans are used to generate or update blocking policies that apply at inference time, covering model inputs, outputs, tool-call inputs, and tool-call outputs. Promptfoo also integrates with third-party guardrail systems including OpenAI Moderation, Microsoft Presidio, Azure AI Content Safety, and AWS Bedrock Guardrails.
The ModelAudit engine, open-sourced in early March 2026, extends the platform's scope to model artifact security. It statically scans ML model files across 42+ formats for unsafe loading behaviors, known CVEs, and suspicious artifacts, without executing the model or importing ML frameworks. This covers the supply-chain risk of using third-party or open-source model weights, not just the runtime behavior of deployed applications.
Promptfoo supports providers including OpenAI, Anthropic, Google, Azure, AWS Bedrock, Mistral, Cohere, Hugging Face, IBM watsonx, LiteLLM, and OpenRouter, as well as custom Python, JavaScript, and HTTP providers. Teams can use it to test applications regardless of which model or cloud they've standardized on.
Business Model
Promptfoo operates as an open-core B2B platform. The open-source CLI is MIT-licensed, runs entirely locally for core evaluation workflows, and can be installed in minutes without sending prompts off-machine. That free tier is not a stripped-down demo. It includes all LLM evaluation features, all provider integrations, red teaming up to the probe limit, and local vulnerability scanning. The free tier functions as the primary acquisition channel rather than a cost center.
The economic logic is that broad developer adoption at zero CAC creates a large funnel of teams that have already validated the product in real workflows before any sales conversation happens. When those teams need shared visibility, policy enforcement, compliance documentation, or production deployment controls, the upgrade path to enterprise is natural rather than disruptive.
Enterprise monetization combines a platform subscription with usage-based expansion. The platform subscription covers the governance and collaboration layer: shared dashboards, RBAC, SSO, centralized compliance reporting, API access, managed cloud or on-prem deployment, SLA-backed support, and professional services. The usage expansion layer is driven by probes. Each probe is a single request made to the target system during red-team testing, and enterprise customers can purchase additional probe capacity beyond the free tier's 10,000 monthly limit.
The probe model has a structural advantage. Because dynamic attack generation and grading require inference compute, Promptfoo's costs scale with usage in a way that flat seat pricing would not capture. By metering on probes, the company aligns its pricing with both the value delivered to customers and its own underlying compute costs. A team running weekly red-team scans across 50 AI applications consumes probes at a fundamentally different rate than a team running a single pre-release check, and the pricing reflects that difference.
The cost structure is mixed. High-margin software features, dashboards, RBAC, reporting, API access, integrations, sit alongside lower-margin inference-heavy capabilities like dynamic red-team generation and managed scanning. On-prem deployment and professional services add implementation complexity and some labor intensity, but they are necessary for regulated buyers in healthcare, financial services, and government who cannot use multi-tenant SaaS.
The architecture's durability comes from the community-to-threat-intelligence loop. A large developer user base exposes Promptfoo to a wider range of real-world failure modes faster than any internal research team could replicate. New attack patterns discovered in the wild get incorporated into the plugin library and deployed automatically to all users. That keeps the product relevant as the threat landscape evolves and gives enterprise buyers confidence that their test coverage is current, a differentiator in a category where the attack surface changes as fast as the underlying technology.
Workflow stickiness compounds over time. Once a team uses Promptfoo in their IDE, in pull request reviews, in CI/CD pipelines, in shared security dashboards, and in compliance reporting workflows, the platform becomes embedded in the software development lifecycle in a way that is difficult to replace without disrupting multiple teams simultaneously.
Competition
Cybersecurity incumbents absorbing the category
The most significant competitive pressure on Promptfoo comes from large cybersecurity platforms that are acquiring AI security specialists and bundling their capabilities into broader enterprise security architectures.
Check Point acquired Lakera, which combines Lakera Red for automated red teaming with Lakera Guard for runtime protection. That combination, pre-deployment testing plus in-line enforcement from a single vendor with global channel reach, is an alternative for enterprises that prefer to consolidate AI security under an existing cybersecurity relationship rather than add a new vendor.
SentinelOne announced an agreement to acquire Prompt Security, which similarly spans red teaming, discovery, remediation, and runtime controls. SentinelOne's endpoint and cloud security go-to-market motion gives Prompt Security distribution into security operations teams that Promptfoo's developer-led motion does not naturally reach.
Cisco's AI Defense product, built around the Robust Intelligence acquisition, brings algorithmic red teaming, runtime guardrails, AI bill of materials, and MCP governance into Cisco's existing enterprise security architecture. The threat here is not product superiority, it is procurement bundling. Regulated enterprises with existing Cisco relationships can add AI security without a new vendor evaluation.
AI-native security specialists
Protect AI competes directly on automated red teaming with its Recon product, which markets a 450+ attack library with weekly updates and framework mappings to OWASP and other standards. Protect AI's broader platform and government credibility, including a Leidos partnership for U.S. government deployments, give it advantages in regulated and public-sector accounts where procurement complexity and compliance packaging matter as much as product quality.
SPLX competes on continuous AI red teaming with a platform narrative oriented toward no-code enterprise workflows and centralized security buying centers. Where Promptfoo's historical strength is developer adoption, SPLX is more naturally aligned with CISO-led purchasing motions.
Noma Security approaches the market from an AI security posture management angle, with an emphasis on discovery, inventory, and governance of sprawling agent deployments. Its control-plane orientation is less developer-native than Promptfoo but more attractive to Fortune 500 security teams trying to govern AI adoption at scale rather than test individual applications.
HiddenLayer adds automated red teaming as one feature of a broader AI security platform that includes model scanning, GenAI detection and response, AI bill of materials, and model genealogy. That platform framing lets HiddenLayer sell red teaming as part of a larger AI security program rather than as a standalone tool.
Open-source substitutes and cloud-native guardrails
Garak, maintained under NVIDIA's GitHub, is the most important open-source substitute. NVIDIA's NeMo Guardrails documentation explicitly uses Garak for vulnerability-scanning workflows, which gives it credibility with infrastructure-heavy AI teams. Garak pressures Promptfoo's free tier by raising the baseline expectation that testing primitives should be available at no cost, which increases pressure on Promptfoo to win on workflow, reporting, collaboration, and managed updates rather than raw scan capability.
Meta's LlamaFirewall is a runtime guardrail alternative for agent and LLM applications, positioned as a final defense layer that teams can deploy without a commercial platform.
AWS Bedrock Guardrails and Microsoft Azure AI Content Safety both include native prompt-attack detection for jailbreaks, prompt injection, and prompt leakage. For enterprises already standardized on a single cloud, these built-in controls can satisfy lighter-weight security requirements without adding a new vendor. The substitution risk is material for first-party use cases where depth of adversarial testing matters less than ease of procurement.
DataRobot occupies adjacent territory in AI observability and monitoring, with prompt and security controls layered into a broader MLOps platform. It is less focused on adversarial red teaming than Promptfoo but competes for the same AI governance budget in enterprises that want a single platform for model monitoring and security.
TAM Expansion
Runtime control and continuous monitoring
Promptfoo's current product set is weighted toward pre-deployment testing, but the adaptive guardrails capability points toward a larger opportunity in always-on runtime defense. The shift from one-time red-team scans to continuous monitoring and policy enforcement changes the product from a testing tool into an operational control plane, a stickier and higher-value category.
The path runs through SIEM and SOAR integrations, incident response hooks, anomaly detection, and audit logging that satisfies compliance requirements under frameworks like the EU AI Act, NIST AI RMF, and ISO 42001. Enterprises moving from AI pilots to production fleets of agents need pre-deployment assurance and ongoing visibility into whether deployed systems are behaving within policy. That demand is growing as the EU AI Act's broad applicability deadline approaches in August 2026, creating a concrete compliance forcing function for enterprises building AI systems for European markets.
Shift-left code security
The VS Code extension, GitHub Action, and CLI code scanning products represent a TAM expansion from security teams validating finished applications to engineering teams securing AI features during development. This broadens the addressable seat count, every developer writing code that touches an LLM becomes a potential user, not just the security engineers running pre-release scans.
The AppSec market analogy is instructive here. Tools like Semgrep and DryRun Security have shown that catching security issues at the code review stage is cheaper for developers and more effective than finding them at runtime. Promptfoo is applying the same shift-left logic to LLM-specific vulnerabilities: prompt injection vectors, PII exposure, excessive agency, and indirect injection pathways that lead to data exfiltration through tool calls. As AI features become standard components of ordinary application code rather than specialized ML projects, the addressable market for this layer expands accordingly.
Agent and MCP governance
The emergence of Model Context Protocol as a standard for connecting AI agents to external tools creates a new governance surface for Promptfoo. The enterprise MCP Proxy product, which sits between users and MCP servers to enforce whitelists, log tool interactions, and alert on policy violations, is not a testing tool but a network control layer. That puts Promptfoo in the same conceptual space as identity and access management platforms like SailPoint, Microsoft Entra, and AWS IAM, but applied to AI agent permissions rather than human user permissions.
As enterprises adopt MCP to connect agents to databases, APIs, and internal tools, the security question shifts from what the model says to what the agent does. Promptfoo's MCP testing and proxying capabilities address that expanded attack surface directly. The adjacent opportunity is in tool discovery and authorization governance, understanding which MCP servers exist in an enterprise environment, which agents can reach them, and whether those connections are approved. Keycard and Defakto are building in this space, and Promptfoo's MCP Proxy gives it a natural wedge into the same budget.
Model supply-chain security
ModelAudit's ability to scan 42+ ML model file formats for unsafe loading behaviors, known CVEs, and suspicious artifacts without executing the model opens a new TAM in AI software supply-chain security. As enterprises increasingly pull open-source model weights from Hugging Face and other repositories, the risk of malicious or compromised model artifacts entering production environments is a growing concern. This is a different buyer motion than prompt security, it touches DevSecOps, artifact scanning, and model governance teams, but it is addressable with the same platform and expands Promptfoo's relevance from the application layer to the infrastructure layer.
Risks
Platform neutrality: Promptfoo's historical value proposition rests on being a model-agnostic testing layer that works across OpenAI, Anthropic, Google, Azure, AWS, and custom providers. Following the OpenAI acquisition, enterprises running multi-model stacks may view Promptfoo as biased toward OpenAI's ecosystem, shifting demand toward neutral alternatives like Protect AI, SPLX, or Noma Security. The risk is not that Promptfoo becomes technically incapable of testing non-OpenAI systems, but that procurement teams in multi-model enterprises choose vendors without a perceived conflict of interest.
Inference cost compression: Dynamic red-team test generation and grading require inference compute, which means Promptfoo's gross margins on its most differentiated capabilities are structurally lower than pure software. As the volume and frequency of automated red-team scans grows, especially for enterprises running continuous monitoring across large agent fleets, the cost of attack generation and grading inference could compress margins unless the company manages model selection, caches attack patterns, or offloads compute to customer infrastructure in on-prem deployments.
Regulatory commoditization: OWASP's LLM Top 10, NIST AI RMF, and the EU AI Act are pushing compliance mapping and audit trail generation toward table stakes rather than differentiators. As AI security vendors add framework mappings and compliance exports, the governance and reporting layer that currently justifies enterprise contract values becomes harder to price at a premium. Promptfoo must deepen its technical coverage of emerging attack surfaces, particularly in agents, MCP, and model supply chains, to remain differentiated from the compliance-checkbox tier of the market.