Databricks large language models (LLMs) to SQL and MLflow 2.3
Databricks is continuing to expand its efforts to democratize artificial intelligence (AI) today, announcing a pair of technology updates designed to help make it easier for enterprises to benefit from and use SQL to perform data analysis on large language models (LLMs).
The updates include the open-source MLflow 2.3 milestone that will make it easier for organizations to manage and deploy machine learning (ML) models, particularly transformer-based models hosted on Hugging Face. MLflow is a widely used technology effort led by Databricks that simplifies ML life cycle management, from experimentation to deployment, by providing tools for tracking, packaging and sharing models.
Databricks is also opening up LLMs to data analysts by enabling support for SQL (structured query language) queries. SQL is commonly used for querying databases and performing data analytics.
>>Don’t miss our newest special issue: Data centers in 2023: How to do more with less.<<
The new updates are the latest in a series of AI efforts from Databricks in recent weeks as the company looks to help make it easier for organizations to benefit from AI. Earlier, on March 24, Databricks announced the initial release of its open-source Dolly ChatGPT-type project, which was quickly followed up a few weeks later on April 12 with Dolly 2.0. The new MLflow and SQL updates announced today will help further advance Dolly, as well as the usage of other LLMs, by making it easier for users to implement and run the technology to help enterprises gain real business benefits from their data.
Databricks isn’t just about AI. At its core, the company is about data, having coined the term data lakehouse and offering a cloud-based data lakehouse platform based on its open-source Delta Lake technology. According to Databricks cofounder and VP of engineering Patrick Wendell, organizations turn to his company to do “interesting things” with data.
“There’s two big categories of stuff people do with data: one is they ask questions about what happened in the past, so they’re doing some analytical processing,” Wendell told VentureBeat. “The other one is they’re building models to predict the future and, you know, we call that machine learning.”
Going with the MLflow to Hugging Face
Wendell said a common problem his company heard from users about LLMs in the past is that while the models might be powerful, all users really want to do is build an application with their own data.
What users are looking for, more often than not, is a way to bridge between their enterprise data and LLMs in a way that’s useful to the business. That’s part of the reason why Databricks built Dolly and it’s also the foundation of what MLflow 2.3 is all about.
There’s a whole set of things that a user needs to do to get started with ML to solve a business use case, including experimenting with different types of models and configurations. Figuring out how to deploy a model and then iterating over time is all part of a process commonly referred to as a machine-learning workflow, which is what MLflow provides.
With MLflow 2.3, Wendell said that there is now native support for packaging and bundling Hugging Face models up in the standard MLflow format to make it much easier for people to deploy and build applications. Hugging Face has emerged in recent years as one of the most popular repositories of open-ML models. According to Wendell, MLflow 2.3 will now significantly lower the barrier to entry for organizations looking to operationalize LLMs, including Databricks’ own Dolly model.
“This [MLflow 2.3 update] pretty much makes it point and click for anyone that wants to consider using these large language models as part of an MLflow deployment,” Wendell said, adding that most of the beneficiaries “tend to be companies that are deploying their own ML infrastructure.”
SQL comes to LLMs
The SQL query language is commonly used for data analytics but, to date, it hasn’t been all that easy to use SQL alongside ML applications and datasets.
That’s a situation Databricks is now looking to solve.
“We’re basically building the ability in SQL to directly call into these large language models,” Wendell said.
For example, data analysts will be able to use SQL with ML to execute common tasks such as sentiment analysis on a particular dataset or column within a dataset. Analysts could also use ML to summarize text from a dataset using a SQL query.
“SQL integration is really about coming up with good interfaces for how people can use the models,” he said.