Background
Mike Knoop is co-founder and CPO at Zapier. We talked to Mike about Zapier's new OpenAI integration and ChatGPT plugin, how LLMs will change the Zapier paradigm of trigger-and-action, and how Zapier rebuilt their API to serve input from LLMs.
Questions
- How did the partnership between Zapier and OpenAI come about?
- How does the Zapier Natural Language Actions API work at a high level?
- How does the Zapier Natural Language Actions API work on the backend?
- Can you talk about the design decisions you made in the user workflow based on the fact that Natural Language Actions is powered by AI that “guesses” what the intended action is, versus having that being created explicitly by the user when they create a Zap? Audit trail, undo, embedded point and click interfaces—how does the user gain confidence in NLA?
- The Zapier ChatGPT plugin works on the NLA API. What was involved in building the plugin and how does it work? Is it generalizable, e.g., to a Zapier Slackbot powered by the NLA API and any arbitrary chat interface (vs only working with chat interfaces to LLMs)?
- Will we see every app having a chatbot/assistant interface?
- On the flipside of every product having a chat interface, might we see every API having some kind of natural language endpoint?
- Zapier was an early launch partner with OpenAI for ChatGPT plugins. How might that collaboration evolve over time?
- Zapier is something of a productivity App Store. Can you talk about how you see the landscape of productivity app stores, particularly with the new addition from ChatGPT Plugins?
- What does a post-chat centric AI future look like in terms of UI?
- There are other LLMs besides OpenAI’s—Cohere, Google’s Bard and more popping up every day. How does Zapier think about the LLM space and where to partner? Is it important for Zapier to support open source LLMs to ensure there’s an alternative to closed-source systems? Might you look to vertically integrate with your own LLM trained for a Zapier specific context?
- A Zapier native chat assistant seems like an obvious move. Can you talk about the benefits to launching Zapier’s own vertically integrated chat experience on its own site with its own LLM or fine-tuned model? What do you think happens to the point and click GUI for creating zaps in a world of LLMs?
- With NLA, you can accomplish with one action what might take you multiple different steps in a traditional zap. How do you think about the ‘risk’ that the NLA API would ‘cannibalize’ usage of regular zaps in the long run given Zapier charges per task?
- Auth is a point of friction in all of these app workflows that won’t go away with LLMs, where Zapier already has a large user base with tens of millions of auths into apps. Can you talk about Zapier’s single-sign-on-esque moat? How does it grow where Zapier is likely the first point of integration into competing chat & productivity interfaces?
- Security/privacy has become one of the big topics of concern around ChatGPT since its launch. How does Zapier ensure the security and privacy of user data when using the NLA API, particularly when accessing multiple third-party apps and how do you think about security more broadly?
- The rise of AI and LLMs seems like a massive opportunity for Zapier to drive another step function in growth. If Zapier plays its hand right over the next few years, what does success look like and how does the company evolve?
- Is there a risk that OpenAI turns Zapier into just another provider of services inside their chat interface?
Interview
I went all in on AI language model research last summer. My primary focus has been on reasoning and complex tool use. I was particularly excited about the React paper published in January, which highlighted some of the capabilities of GPT-3 that most people were not aware of, such as self-reasoning and looping, which enable more complex capabilities than what's available from the base model.
One of the big problems with GPT-3 (or any LLM right now) is that it's frozen in time the moment training is finished, which means they can't access more recent information. And they can't interact with data or systems users generally want them to integrate with.
That seemed like the next big frontier and what the whole industry was racing towards. We saw first evidence of this with search in Bing and Bard but complex tool use for LLMs goes far beyond search.
We started playing around with how to expose Zapier's ecosystem of thousands of actions to LLMs. We also encountered some problems—one of the big ones being that AI language models have limited prompt context windows today which constrains both how many tools you can equip them with and how you deal with arbitrary return data
For example, the Gmail API is 10,000 tokens, which couldn't fit in any off-the-shelf language model as of last November. And most of the data coming back from that API is for machines, not humans: things like payload headers, HTML, ID numbers, etc. We spent a great deal of work on sanitizing returns so it's safe to inject back into arbitrary prompts.
We've known Sam and Greg at OpenAI for a while now (I recall Sam was a partner while we were going through YC in 2012).
They approached us and were like, “Hey, we're going to be working on this plug-in thing. Zapier seems relevant for this.” We were definitely interested and had a proof of concept from our work, so we decided to turn our learnings into an API that was publicly consumable for both ChatGPT as well any anyone trying to equip LLMs with tools. ChatGPT helped us accelerate our plans to put that out in the world.
We believe that the natural language API will be a useful tool for OpenAI with ChatGPT, but it's also important for us in the broader context of everyone building chatbots and large language model-based products. We kind of expected the mimetic response from the market to be like, “Wow this is possible, how do we do this too? How do we plug this in?” And from Bing to Slack to Discord, you name it, you can come up with tons of these emerging chat agents.
How does the Zapier Natural Language Actions API work at a high level?
Natural language actions works as a universal Zapier API. Our first ever, in fact. It's an API that lets developers directly call the thousands of actions on Zapier via a sort of passthrough API. We've talked about doing this for a long time internally. It's somewhat ironic that Zapier hasn't had a universal general API before, despite being so well known as an API company. One of the big hesitations around creating a universal API was merely passing on all the complexity of Zapier's platform directly to developers to deal with.
In our Zap editor, we deal with things like long argument lists, custom fields, and dynamic dropdowns where APIs require ID numbers, while humans (and language models as it turns out) prefer readable labels to choose from. All this complexity has been built into the black box around Zapier's Zap editor. If developers had to rebuild Zapier's editor and relearn our 10 years of pain, it would be a huge waste of time for everyone. That's why we hesitated to publish a universal API.
With natural language APIs and large language model-based tools, we can expose APIs in a way that language models can use, and use the models to make that black box better and faster so that developers don't have to rebuild an entire editor themselves. In fact the only required parameter to call our new NLA API is a plain-text "instructions" string! We've been able to simplify the experience in many ways. Our ideal user experience feels, to the user, like going through an OAuth prompt flow instead of having to go through a configuration setup. And by default, thanks to LLMs, values for fields get guessed.
Enter ChatGPT. ChatGPT is sort of the reference implementation of our new API, in my mind. If you haven't tried it yet, here's how it works. You're inside ChatGPT and you click install Zapier. There is an OAuth-style flow where a popup asks if you want to allow ChatGPT access to your Zapier data. You approve and then explicitly expose actions you want ChatGPT to have access to.
We designed it this way to put users in control because as good as language models are, you still need a human in the loop for most use cases, and users need that explicit control today still to instill confidence and trust in the system.
Here's an example. Say you want to give ChatGPT the ability to send a Slack message. There are two required inputs: which channel to send the message to, and the message contents. You might be okay with letting an LLM write the message but not trying to guess the channel name which could have more unintended outcomes if the LLM is wrong in guessing the correct channel. NLA offers a way for the user to set specific values during setup which override any guessing.
Broadly the API supports searches, writes, and updates across the tens of thousands of actions and over 5,000 unique apps with more getting added every day. The API has a few other bells-and-whistles too like being able to preview actions before committing. Can read more about it here: https://nla.zapier.com/api/v1/dynamic/docs
How does the Zapier Natural Language Actions API work on the backend?
A way to think about it is that we are taking a natural language input string, which could come from various sources. It could be generated by a language model, typed by a user into a text box, or hardcoded by a developer in their codebase. This input is a natural language instruction for one action. Currently, it is constrained to one action, but we are looking into expanding this into multi-step actions. A straightforward use case would be "Find the latest email from Jan in my inbox." This natural language instruction is passed to us over an API.
We have a language model on our end that interprets this instruction string into the required parameters for calling the Gmail API to perform that action. Our system takes your authentication key, which you've already connected with Zapier, signs the request, and sends it. We receive the raw payload back from Gmail and then trim it down into fields most useful for humans and language models. We guarantee that the result will be 350 tokens or less, making it safe to be used as input for another language model prompt.
The process involves receiving natural language instructions, interpreting them through our API, using a language model to break them down into parameters, signing the request to the appropriate API, and summarizing the payload before returning it. This allows the language model to determine the next steps, such as sending a message to the user or calling another tool.
Can you talk about the design decisions you made in the user workflow based on the fact that Natural Language Actions is powered by AI that “guesses” what the intended action is, versus having that being created explicitly by the user when they create a Zap? Audit trail, undo, embedded point and click interfaces—how does the user gain confidence in NLA?
One of the things we learned in the first three or four months of exploration is that language models can hallucinate. That's a known problem these days, but there are times when it's more risky for it to hallucinate than others. I shared the Slack example where choosing the wrong channel might be very bad for some users, but acceptable for others. Another example is Gmail. Zapier supports both sending email as well as creating draft emails. For certain use case you might find LLMs are reliable enough to send the email directly but for others you might want it only to create drafts instead of automatically sending.
Ultimately, we want to push more and more override levers into the user's hands to give them more control. That's our general design philosophy: the user needs to be in control at the end of the day because they need to make a local decision about their own risk tolerance for the automation and the action, and where things can go wrong. That's where most of the product education comes in—trying to explain the downsides and what can happen so that users can make informed choices about what they want.
The Zapier ChatGPT plugin works on the NLA API. What was involved in building the plugin and how does it work? Is it generalizable, e.g., to a Zapier Slackbot powered by the NLA API and any arbitrary chat interface (vs only working with chat interfaces to LLMs)?
Yes, any product can use the NLA API to create the exact same experience in ChatGPT within their own chatbot or product. I think this technology will proliferate rapidly wherever there's a text box. There are two durable trends you should expect: everyone is going to have sort of chat-based augmented interface, and every search box is going to be powered by embeddings. These are the most obvious ways to deploy this technology and there's a lot of value in those and everyone will try it.
Will we see every app having a chatbot/assistant interface?
How many people put Intercom on their website? That's one way to think about it from a CS angle, although it is a conservative estimate. This new AI language model technology excels at taking ill-formed ideas and translating them into structure because these models have some basic ability to do reasoning now. That's the killer implication. What does it mean now that software can do reasoning?
While chat is going to go far I don't expect it to take over the world as the primary input method. You won't see Salesforce changing their default UI to a simple text box, for example. When users are well-versed in software, point-and-click is often faster than typing into a chat interface, which is slower. It will be exciting to pay attention to new multi-modal models coming online later this year though with image/video input.
On the flipside of every product having a chat interface, might we see every API having some kind of natural language endpoint?
What we're seeing right now is that language models don't necessarily need that. To the extent that they need it right now, it's a short-term crutch, not a long-term solution. In the long term, AI models, not just language models, will be able to interact with software the same way humans do—through the same input and output mechanisms like pixels, keyboard, mouse clicks, and keystrokes. The endgame is for this technology to operate in a very similar way to how humans do. So, any of the API-related adaptations are likely short-lived.
You might see it in the short term. That's actually how ChatGPT plugins work—you have to expose a more limited version of your APIs in a plugin form for a model to access. What we found with GPT-3.5 was that we didn't have to do much for it to be able to fill out parameters correctly.
Zapier was an early launch partner with OpenAI for ChatGPT plugins. How might that collaboration evolve over time?
In the short run, OpenAI and Zapier accelerate each other. The ChatGPT app and the OpenAI app on Zapier are the fastest-growing apps ever. And we're seeing massive adoption of the plugin we just launched. In the long run, software automation is going to look completely different with the ability for software to do reasoning. That's the key technology showcased in GPT-3.5 and GPT-4: the nascent ability for software reasoning. This changes the paradigm for what automation looks like and what it can do in many ways—not just making it easier to translate user needs into software actions but actually running the software too.
I don't think anyone knows for sure yet how this is going to unfold but one area we are excited to explore is plugging machine reasoning into the execution flow of automation and not just making set up easier.Everyone is starting with the setup flow because it's safe; there's a human in the loop. But we have millions of use cases of automation on Zapier right now that I also know are safe and could be way better, more error resistant, easy to build with reasoning in the loop. The ultimate goal is to figure out how to put this technology to work when we're not working. That's what people use Zapier for—they want it to work while they sleep. That's the frontier to explore.
Zapier is something of a productivity App Store. Can you talk about how you see the landscape of productivity app stores, particularly with the new addition from ChatGPT Plugins?
The chat plugin store is interesting because providers still must create subsets of their products and expose them through APIs for chat. It optimizes use cases for which it's really good, but the full version of the product doesn't always condense well into a command-line interface. Once you understand a software and have context over what you're trying to do, natural language can be slower. For example, nobody likes having to call phone support to cancel a service—it's a frustrating and slow experience. Forcing every interaction through the chat paradigm might not be the endgame for this technology.
Natural language is great for initial onboarding, but there may be a spectrum between starting with natural language and graduating to a UI once you know what you're doing. It might even involve the model building a UI for you. There's likely a continuum of starting with chat and ramping up to more tool-based software that you can come back to and use.
Regarding app stores, if I look at Zapier's ecosystem with 5,000+ apps, I'm a bit concerned for those that are just thin productivity wrappers around integrations. For example, a small niche productivity app that lets you triage your email faster might find themselves "default dead" instead of "default alive", to use YC's terminology, in the future.
I suspect we'll see software developers getting pushed to either focus on unique ownership of the interface, the front end where users do work and interact with the software, or to become capability providers on the other side.
What does a post-chat centric AI future look like in terms of UI?
We could be looking at a future where models output deterministic software. For example, a model could generate deterministic Python code that gets hosted on a service somewhere and functions as a point-and-click database or form frontend. One UX paradigm that I find inspiring, and which feels directionally accurate, is how WhatsApp, Facebook Messenger, and iMessage for Business allow providers to define embeddable widgets in the chat stream.
For instance, with iMessage for Business, if you text Apple support and want to buy a product, they can render a purchase widget inline that pops up with Apple Pay, allowing you to go through the checkout flow and provide your billing address. This is much more efficient than having an agent ask a series of questions to gather your information.
The more we can put UI and UX around those routine tasks that humans can do quickly with software, the better the experience will be. It doesn't necessarily have to be limited to rendering widgets in the chat context. These tools can be long-lived and not just stateful within the chat.
Imagine interacting with a Salesforce bot, asking it to build a lead intake form and reporting dashboard. The bot could create those tools instantly in your Salesforce instance, generate a link to the form connected to it, and link the data. This is an example of how the tools built by AI could be long-lived beyond the initial chat context.
There are other LLMs besides OpenAI’s—Cohere, Google’s Bard and more popping up every day. How does Zapier think about the LLM space and where to partner? Is it important for Zapier to support open source LLMs to ensure there’s an alternative to closed-source systems? Might you look to vertically integrate with your own LLM trained for a Zapier specific context?
In the next 12 months, there will likely be five to seven major providers for language models through APIs. There will be a lot of choices available soon. Our users tell us they want choice so we're going to try and do what's best for them.
Zapier is a little unique because we both enable Zapier users to use LLMs in workflows and we also use LLMs to power our own products such as the NLA API we talked about. With today's current generation of models, most users really shouldn't be trying to train their model at least as a first. Off-the-shelf models are really good at "being trained" simply by include a few examples in the prompt itself.
I think it is important for the world that there become a lot of AI language model providers from a diversity, cost, and freedom standpoint. Given how transformative and useful software reasoning is, I think it needs to be widely shared and available. I have no problem with big AI labs and organizations keeping the latest generation of this new tech proprietary in order to fund "next generation" R&D but I would like to see a world where "last generation" is open source including weights. I'm particularly excited about localized LLMs and the energy we're seeing across the world to be able to run last-gen AI language models on commodity hardware.
A Zapier native chat assistant seems like an obvious move. Can you talk about the benefits to launching Zapier’s own vertically integrated chat experience on its own site with its own LLM or fine-tuned model? What do you think happens to the point and click GUI for creating zaps in a world of LLMs?
It's on the list. I expect language models to eventually be integrated into Zapier's entire end-user experience. Everyone is going through the same thought process right now: software can do reasoning at a basic level, so what were the problems we faced before that were intractable without this capability, and which can we now solve? That's the exciting part. You should expect that.
As far the GUI, the short answer is that natural language is clearly much better than our current GUI interface. It feels so much better to describe in words what you want in a setup flow rather than doing a hundred clicks to get it exactly right. We're going to try and build the best experience possible, which just wasn't possible before.
One way to think about Zapier's future is that we get millions of people coming to our site every month, but we convert only a tiny percentage of them into active users and paying customers. Our big challenge is helping people connect their intrinsic problems and needs with what our software can do and then getting them set up for it. These are two problems that are completely upended by the ability for software to do reasoning and understand natural language.
We're not talking about optimizing on margins here; we're talking about a 10x or 100x increase in conversion rates on the most important metrics we have. That's the upside and the opportunity I see with this technology in the near term—helping millions more users actually use and deploy automation for themselves so they can spend time on more important things.
With NLA, you can accomplish with one action what might take you multiple different steps in a traditional zap. How do you think about the ‘risk’ that the NLA API would ‘cannibalize’ usage of regular zaps in the long run given Zapier charges per task?
For people who want to use the old way, I think they can still do it; that software is not going to go away. We're talking about an alternative, and I believe, a better paradigm. However, the old way will continue to exist for a long time. When we started Zapier, we created this concept of a "Zap" and a "Task". Throw away your notions about what those things are for a moment. Just think of a Zap as a black box of software that does something automatically for me, and a Task as the log of what the software did. These primitives fit well into the future where automation software looks and feels more like virtual assistants than configuring software.
As AI tech progresses the experience will shift from setting up and configuring software to training, teaching, and shaping your software to do what you want it to do. To speculate a bit about the future, I would not be surprised to see users giving their Zaps anthropomorphic characteristics, like calling a zap "Joe" and saying that you turned Joe on, gave him instructions, and he did everything you wanted.
Monetization eventually gets interesting too. Think about how a virtual assistant charges, which is per hour, or maybe a contractor you hire per day or per week. That's going to feel more like the natural pricing model for this technology. It actually aligns with the long-term direction of the software and technology as well. You could have a smart, intelligent agent that you give instructions to, and you pay per hour, per day, or per week, or whatever time unit you choose for it to be turned on and helping you. You can scale it up or down, having multiple agents all doing different things.
Auth is a point of friction in all of these app workflows that won’t go away with LLMs, where Zapier already has a large user base with tens of millions of auths into apps. Can you talk about Zapier’s single-sign-on-esque moat? How does it grow where Zapier is likely the first point of integration into competing chat & productivity interfaces?
Authentications are a proxy measure for user trust. The fact that we have millions of users connected to Zapier signals that users trust us. User trust is certainly a differentiator, but it isn't valuable if we don't do anything with that trust. If we just sit on it, it's easy for another provider to grow their own trust and brand. So, I think of it as having a lot of user trust today, which is great. This trust has stemmed from our philosophy of serving users first and foremost, standing by them regardless of the software they want to use and helping them achieve their goals.
However, if we don't do anything useful to meet users' needs, then it doesn't matter at all. I think trust is more of an accelerant for the plans and ideas we have rather than an asset in itself. Zapier is a trusted brand, so users might give it a try because they already have their accounts connected and trust that we won't make poor decisions, like automatically spamming their Slack channels. This trust means we've earned the chance for users to try our services once. Whether they try it again depends on the usefulness and value of what we've put out into the world.
Security/privacy has become one of the big topics of concern around ChatGPT since its launch. How does Zapier ensure the security and privacy of user data when using the NLA API, particularly when accessing multiple third-party apps and how do you think about security more broadly?
At the end of the day, we want to do what our users expect and want. When we started beta testing the NLA API, one of the first pieces of feedback was about data privacy policies. At the time, OpenAI didn't have a data privacy agreement, so we went to them and expressed our concerns. We couldn't use their APIs in production unless they changed how they treated user input and committed to not training models on user data that hit their API, as well as getting GDPR data processing policies in place.
To their credit, they responded quickly and made the necessary changes. This satisfied our users who had privacy concerns. We follow what our users expect and want at the end of the day. I wouldn't be surprised if users have higher expectations for privacy in the future, like wanting to bring their own language model or using a local language model with our software. If that's what our users want, we'll try to make it happen.
At the very least, we need to offer choice, like supporting different providers such as Anthropic, OpenAI, Hugging Face, etc., so users can choose when building and setting up workflows. Ultimately, we're going to follow our users' needs and stand with them, even if it's against bigger software players in the world.
The rise of AI and LLMs seems like a massive opportunity for Zapier to drive another step function in growth. If Zapier plays its hand right over the next few years, what does success look like and how does the company evolve?
There's a game here of identifying what's changed and what hasn't. What hasn't changed is that users want software to work while they sleep and they want choice in the software and tools they use. What has changed is the availability of reasoning capability in software, which wasn't possible before. We need to explore and chart a course for what automation looks like with reasoning capability.
Initially, you'll see it applied to setup, which is the low-hanging fruit. Next, it will be applied to the execution side, where the software makes smart choices about actions, instead of being 100% deterministic. The end game is where the experience of automation feels more like hiring, training, and providing feedback to a virtual assistant.
Using a VA or training someone on your team is completely different from using Zapier today. You typically create a document outlining the job, have a video call to demonstrate the software, and provide screenshots with arrows pointing to important aspects. That's the world we're moving towards in terms of getting software to do what we want. We need to jump to the end game to ensure Zapier remains relevant long term.
Is there a risk that OpenAI turns Zapier into just another provider of services inside their chat interface?
I mean, there's certainly a world where Zapier becomes more of a capability provider. We are exploring it with the NLA API. But is it the one that I'm super excited about long term? I like having direct relationships with our users. It helps us learn and get feedback. When I get up in the morning, the folks that I get really, really excited to serve are those who've told us how much Zapier has helped them over the last decade and how much they've been able to do because it existed—the ones for whom Zapier was not just a "time saver" but an unlocking product. That said, I think reality is AI is going to enable that same unlock for hundreds of millions more people very quickly. The race is on.
Disclaimers
This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.