Sacra Logo

What is the typical configuration of Census for customers who use the platform for data integration?

Sean Lynch

Co-founder & CPO at Census

A typical Census deployment is usually into a modern data stack. We are sitting on top of a data warehouse. It's usually one of the big three: Redshift, Snowflake, BigQuery. There's usually something already in place where data is loaded into the data warehouse, and there's some amount of transform, cleaning or manipulation of that data going on.

There's usually also a data team in play, though I would say that's starting to shift a little bit. We're seeing the data warehouse increasingly being set up by marketing teams on their own. More often than not, if it's a marketing team, they're doing it with BigQuery -- I suspect it's like, "Oh, you're on Google Ads, are you using Google Analytics?" There's probably a "deploy BigQuery" button as part of that. It's also sometimes sales operations deciding, "Hey, these are the types of reporting I want to do, I can't do it entirely in Salesforce. I'm going to do it in something like a Mode, so let me put this into a data warehouse." But typically there's a data team and Census is sitting on top of that.

You can break down the types of data that we typically see companies working with in two axes. One is in terms of use case, and the most common use cases for us are along a customer journey. That's getting back to the CDP aspect of things, so advertising, marketing, SMS marketing, etc. that you hook up with Google, Facebook, and so on. On the marketing side, it's the new school ones like Braze and Klaviyo, as well as some of the old school ones like Marketo, Pardot, that sort of thing. For sales, Salesforce is our biggest destination. Support, customer success, even through to the finance and billing side of things with a Stripe or NetSuite. So one axis of the types of use cases are around the different teams, destination tools or customer facing functions. Usually they're custom more than anything else.

The other interesting axis to look at is types of data. When we started, we were really focused on syncing what we call "profile data." This is the profile of a user or the profile of a company. You may have other profiles or objects and nouns in your business model that matter. A common one in PLG is the concept of a team or a workspace, which is different from a company. For example, Disney may have many workspaces, and many of those may exist independently from each other because different people set up workspaces over time. Understanding that Disney as a company, from a sales perspective, is different from the dozen different workspaces that exist separately is a really valuable insight.

Getting back to your earlier question about one of the things that are changing a lot of these tools, we're increasingly seeing tools support the concept of custom objects. So not just their core "whatever their equivalent of a company is" -- an account or an organization or a business -- and not just the person or lead contact, but also custom objects. Salesforce has done this forever, but now HubSpot, Customer.io and Marketo do this too. We're seeing services that are offering this popping up. The use case for us is representing those related auxiliary objects: workspace is one we see over and over again, subscriptions, that sort of thing -- objects that somebody might care about.

When we're talking about profile data, I would say that the use cases there tend to be a little bit more of "a human is looking at a tool that is seeing that profile." In a CRM, I'm going to go look at a company and at the context. That's true in the customer success case: "Hey, is this company a turn risk? I'm going to look at their usage profile." Support is the same thing; finance is as well to a certain extent.

The marketing and advertising side tends to be interested less in the specific instances of a profile and more in data in aggregate, in what you might call segments, lists or audiences. That's another type of data that we do a lot of work with, where you're going to define, for example, your VIP segment. In the past, the way you would do that is just by exporting a CSV of those, or if you had the properties in those different tools, you could set up the segment definitions in them individually.

One thing that we see our customers doing more and more now is defining segments in one place in SQL. So I'll define what VIP looks like as a segment and get the list of all the user profiles that qualify for that segment, and then I'll sync that list definition into the various destination tools from Census.

The third category of data we see a lot of is event data. When we got started at Census, it didn't feel like a particularly good fit for the data warehouse, but we've seen this shift so much, especially over the last year. When we started out, we thought you're going to want something like a segment to pick up the event and pass it into the destination system as quickly as possible. More and more, though, we are seeing companies say, "I'm going to put my events through the data warehouse as well." This is a bit unintuitive, especially if you think of a warehouse as being slow and creaky. Why would you send events if it's going to take 24 hours for you to load them in there to do anything with them? Part of the reason is that you're going to put the events in the data warehouse anyway because you're going to use them for analysis to generate profiles and that sort of thing.

We're seeing a lot of customers generating different types of events they couldn't before. We call them "synthetic events." These might be events like, "Hey, a user made a purchase offline in a store that didn't generate an event on the iPhone app, but we do still want that to flow through the system as an event."

Another really interesting thing is enriched events, which means taking an event that's happening through the system, but adding more data in context that you have in the data warehouse at the time. One of the early versions of this was: "Hey, we're going to send a welcome email, but we want to send it from the sales rep that owns the account. While we have the user signing up event, we want to add the email address of our sales rep so that we can send a targeted email. And we want to do that joined data before that event propagates through." We're seeing these really interesting events where the data is still the event-based user token action -- this thing happened, a subscription expired, whatever, it's going to be consumed by a service that consumes event-based data -- but it's actually generated in the data warehouse.

The reason this was unintuitive at the start was because you didn't think warehouses were particularly fast. But one of the interesting things that's changed with these cloud warehouses over the last couple of years is they've gotten faster and faster in terms of indexing the data that's being inserted into them. There are still use cases where you really do want the "if this, then that" to say, "This thing happened, I fired a web book, another thing happened a sub-millisecond afterwards." If you're sending a password reset email, you don't want to wait that long. But we now are regularly seeing customers doing the feedback loop through the modern data stack in about 15 minutes, with data flowing into the data warehouse, being ingested and transformed with something like dbt and being sent back out.

Find this answer in Sean Lynch, co-founder of Census, on reverse ETL's role in the modern data stack
lightningbolt_icon Unlocked Report