What are the AI and data model components of Levity's system?

We have a very standard React web application on the frontend that people interact with. Our backend is built on Python/Django to process the requests that they make. Apart from that, we have built dedicated services for the “heavy-lifting” components of our technology, such as data import, data processing, training and deploying machine learning models.

Currently, our ML stack consists of PyTorch as a baseline framework and we work with Valohai as an MLOps solution, which runs inside our AWS. We run the data pre-processing and model training on Valohai which makes it very easy to spin up GPU instances to train models in a timely manner. It's a great solution.

The other part concerns data upload and data processing. This can be quite challenging especially if a customer comes to us with historical data of a million rows in an Excel sheet that they want to upload using our web application.

You’ll often see SaaS applications limit CSV import to a couple thousand rows, because it’s a technical challenge and it becomes even worse when people want to upload thousands of PDF documents from the past. Then, you need to do all sorts of different things, like splitting the PDFs into pages, running OCR on them, and transforming everything into a format that makes it possible for the ML pipeline to consume.

Therefore, data upload and data processing is a major part we're building. Connected to that are data import integrations, because you may not always have the perfect dataset on your PC ready to be uploaded. Often, it's stuck in one or multiple apps and not structured the right way.

Earlier, when we didn't have these integrations, people often told us, "I need a developer to get me that export, clean it, and transform it before I can upload it to you." So, we started integrating with those data sources to make it easier to extract and ingest data into our system.

The difference here is that it's not like a normal ETL where you have databases with rows and columns that you want to import. It's unstructured data, like images, PDF documents, or raw text that needs to be processed a certain way. That also makes it more challenging because there isn't that much existing tooling out there.

Another major component is the actual workflow integration. Usually there's a trigger like “new email”, then you run that data through your Levity classifier and get a prediction. After that, you want do something with that prediction. That's where we need to integrate with lots of tools to make it possible for our users to put whatever they have trained to use and integrate it into the actual workflow.