Monitoring and AB Testing Stack
Chris Lu, co-founder of Copy.ai, on the future of generative AI
This is what turns an AI writing app into real software infrastructure. Once Copy.ai was routing work across 20 to 30 fine tuned models, the hard part stopped being generation and became quality control, deciding which model to call, measuring whether users actually liked the output, and shipping improvements fast enough to keep pace with model changes. A large user base matters because it makes these tests statistically useful within a day.
-
The monitoring stack is tied to concrete user signals. Copy.ai tracks actions like copying, saving, and rewriting output, then uses those signals as training data for newer models. That turns normal product usage into a feedback loop for ranking prompts, swapping models, and retraining task specific systems.
-
This became more important as Copy.ai moved from one off copy generation into repeatable GTM workflows. In the newer product, the system researches accounts, drafts outreach, enriches CRM records, and can run inside tools like Salesforce and HubSpot. In that setup, a bad model call breaks a business process, not just a paragraph.
-
The closest comparable is Jasper, but the strategic fork is different. Jasper leaned toward an AI layer that follows users across apps, while Copy.ai pushed toward workflow automation inside revenue teams. In both cases, iteration speed matters because model quality keeps improving and the app layer has limited raw model defensibility.
Going forward, the winners in application layer AI will look less like template libraries and more like orchestration systems with built in evals, routing, and guardrails. As Copy.ai keeps moving into enterprise workflows, the monitoring and A B testing layer becomes part of the product moat, because it helps the company improve output quality, lower model costs, and prove ROI inside real business systems.