SuperDuperDB - The Ruby on Rails for AI?

SuperDuperDB deep dive, how to think about the Vector Embeddings wars, and the role of Snowflake in the generative AI landscape

Jan 27, 2024

SuperDuperDB is an open-source framework that lets Python developers easily integrate AI into existing databases and apps.

This week, we speak with the team behind SuperDuperDB. As usual, together with Max, we do a deep dive into the product, business model, and space. They can be the next Ruby on Rails of app building for genAI!

Also, the ongoing pricing war on the vector embedding space gives us a lot to think about. How should we think about this space, and how do we decide when to choose one vector embedding model over the other? We share our thoughts.

Plus, we share our perspective on the role that Snowflake plays in the generative AI space. This is a company that is not immediately associated with building AI applications. However, they are extremely active, and the AI engineer cannot ignore them. We explain why it should be on our radar!

And as usual, if you like this issue, subscribe, share it, and leave out your comments!

Let’s get started!

Vector Embeddings: Understanding the Foundation of Generative AI

Vector embeddings are a crucial data format powering generative AI models
Options are emerging for embeddings from OpenAI, startups like TogetherAI, and local hosting
Mastery of embeddings will distinguish senior from junior AI engineers

Vector embedding models are rapidly evolving, with announcements this week from OpenAI and startups like Together AI that promise better performance and lower pricing. Rather than get lost in the details of which model is cheapest today, we advise focusing on the big picture.

When should we use OpenAI's models versus alternatives like Together AI or hosting embeddings locally? Here are some guidelines:

OpenAI provides simplicity if you want everything in one place, even if not the cheapest option.
Alternatives like Together AI allow for optimizing each part of your stack.
Hosting locally demands more effort but allows total flexibility and understanding.

Crucially, we predict vector embeddings as space will form a "zoo of models" we can tap into for different needs, from video to text and more. Understanding these options will distinguish senior from junior AI engineers.

So, while beginners may default to OpenAI for convenience, experts will know how to customize and get the most from this foundational AI building block.

Mastering vector embeddings is essential to understanding the generative wave sweeping AI. Even a browser can generate embeddings locally with JavaScript, as we demonstrate in a recent article published at the VectorHub. Check it out!

In the world of AI, few components are more critical than vector embeddings right now.

AI Pro Ducks: SuperDuperDB

SuperDuperDB abstracts infrastructure complexity so developers focus purely on AI logic
Early community traction shows promise for expanding access
Questions remain around business model, comparables, and enterprise functionality

Together with Max, we chat about SuperDuperDB - a startup aiming to simplify deploying AI applications.

As Max highlights, SuperDuperDB essentially "re-bundles" all the disjointed components needed for ML into one neat package. It means that instead of needing specialized expertise across data pipelines, hosting, deployment, etc., developers can focus purely on the AI logic.

In our view, if SuperDuperDB can become the "lingua franca" for AI development like Ruby on Rails for web applications - it would transform accessibility. But it's early days, which raises questions on the business model, comparables, and more that we discuss in the show.

Max draws an interesting parallel - elite football academies allow prospects to focus purely on their game. By removing infrastructure headaches, SuperDuperDB similarly lets AI engineers practice their craft—democratization at its finest.

SuperDuperDB will need to keep simplicity and community building at its core as the space matures. Adding fancy features could increase stickiness - but also complexity. Walking this tightrope between power and usability will determine if SuperDuperDB can fulfill its mission of opening AI to many more builders.

So what do you think? How can we keep expanding access to the power of AI? What questions do you still have about SuperDuperDB's approach? Let us know!

Duncan Blythe and Timo Hagenow from SuperDuperDB

SuperDuperDB is an open-source framework to simplify deploying AI applications
It "super dupers" your database to handle infrastructure, enabling AI focus
Longer-term vision is becoming the "lingua franca" for AI development

In this week’s episode of our show, we chat with Timo and Duncan, the founders of SuperDuperDB - the open-source framework on a mission to simplify deploying AI.

Timo explains that current ML pipelines involve disjointed parts that take effort to set up and maintain. SuperDuperDB essentially "super dupers" your database to enable AI readiness through a Python interface. Whether MongoDB, Postgres, or other databases - it handles the heavy lifting, so you focus purely on the AI logic.

Both founders bring interesting backgrounds spanning business, research, startups, and more - coming together around the democratization of AI. As Duncan highlights, empowering work should not require elite resources. SuperDuperDB lets small teams leverage AI through open-source software, just like the big players.

We discuss example apps the community has built, from video search to conversational podcast search. While novel use cases using generative AI excite people, Timo stresses even standard applications of ML provide immense value. It's about finding the proper use case rather than chasing hype.

As for the road ahead, the team will improve SQL support and self-hosting of large language models. In the longer term, becoming the "lingua franca" for AI development would fulfill their vision of vastly increasing access.

It's still early, but the community momentum makes the founders hopeful that SuperDuperDB can drive mainstream AI adoption. What use cases could you enable through its simplicity? However, your application needs to evolve; this team aims to keep innovators focused on creating, not just infrastructure.

So what do you think? How can we expand access to AI's possibilities? What questions do you still have about SuperDuperDB's approach? Let us know!

The AI Monologues: Why AI Engineers Should Know Snowflake

Snowflake is emerging as a foundation for custom AI development
Already present in most companies for data, well-suited for fine-tuning
Architecturally positioned to support real-time applications

In our AI monologue of the week, we discuss an unexpected player relevant to the world of generative AI - the cloud data platform Snowflake.

You may wonder - isn't Snowflake just about massive data warehouses for business intelligence? How does it connect to innovations like LLMs?

As it turns out, Snowflake plays a growing role as the foundation for real-time, customized AI applications. With most companies already using it for data infrastructure, it's emerging as the natural place to leverage that data for fine-tuning.

Want to create a chatbot that answers questions on company revenue? You need access to those finance systems. Are you building an LLM for your industry? Snowflake has all the specialized documents to train it.

So, while Snowflake targets data engineers today, AI engineers also need awareness. Last week, we discussed the friction emerging on what datasets we are allowed to use to train models due to privacy or intellectual property concerns. Snowflake provides data lineage and governance to address these explainability issues.

It also owns Streamlit, which is increasingly the standard for creating interfaces to data applications. Streamlit shortcuts continue to expand for activities like embedding generation that power AI use cases.

As real-time LLMs gain traction, Snowflake's streaming architecture uniquely positions it to incorporate live data. Processing and adding new information continuously remains non-trivial for most systems.

In summary, Snowflake is already in the middle of most companies' data stacks. So whether you work on enterprise AI internally or across industries, familiarity with its capabilities is becoming an essential background.

The tools are sometimes challenging, as I discovered struggling through Snowflake's LLM workshop myself last year! However, Snowflake will prioritize more AI focus, given the massive potential of using enterprise data to build AI apps.

So what do you think? Are you leveraging Snowflake for AI use cases today, or is this new territory? What other questions do you have, or what could smooth the path to get started? Let me know in the comments!

This is everything for this week. Remember to like us, subscribe, share your insights, and be part of The AI Engineer; we are the AI community for hackers! We are the AI Ducktypers!