The Databricks Data and AI Summit 2023 was packed with incredible announcements for new partnerships, technologies, and innovations across the whole Databricks ecosystem. In this blog, we will go through all the highlights, and if you want to learn more about Databricks, feel free to check out our recent post: Databricks 360 – How to Build a Unified Analytics Platform with Databricks.
Databricks has been investing heavily in Generative AI, introducing new tools and features to simplify the data development process and make it easier for users to build, deploy, and manage large language model (LLM) applications directly within the Databricks Lakehouse Platform.
- LakehouseIQ: The AI-powered knowledge engine of Lakehouse that understands your business and provides insights to drive decision-making. It uses machine learning to analyze your data and provide practical findings, helping you make data-driven decisions.
- Databricks Assistant: AI-based companion pair programmer to make you more efficient as you create notebooks, queries, and files. It can help you rapidly answer questions by generating, optimizing, completing, explaining, and fixing code and queries.
- Lakehouse AI: A new feature that allows you to build AI models directly on your Data Lakehouse, eliminating the need for separate data silos. This means you can now build, train, and deploy AI models in the same place where your data lives.
- Vector Index – Easily create auto-updating vector search indexes from data in Unity Catalog
- Model Serving – GPU-enabled, real-time inference of LLMs at reduced cost and up to 10x lower latency
- Curated Open-Source Models – Backed by optimized Model Serving for high performance
- MLflow 2.5 – Manage your end-to-end LLM Operations (LLMOps) effectively and reliably
Databricks has announced multiple new features in Unity Catalog, simplifying the process of managing and discovering data across your organization. This includes new data governance and metadata management features, helping you keep track of your data and ensure it’s used responsibly.
- Lakehouse Federation Capabilities: New capabilities in Unity Catalog allow you to access and analyze data across multiple data sources. This means you can now query data from different sources as if it were in a single database and quickly analyze and understand your data.
- AI Governance: Features in Unity Catalog for managing all AI/ML assets, including Feature Store, Model Registry, and Volume. These features help simplify the DataOps and MLOps/LLMOps processes and prepare organizations for AI compliance.
- Lakehouse Monitoring and Observability: an AI-driven monitoring service to monitor the quality and integrity of all your data and AI assets. Billing, audit, lineage, and security info as tables for enhanced observability.
Data Sharing and Collaboration
Databricks has enhanced data sharing and collaboration capabilities in Lakehouse, increasing the efficiency of teams working together and sharing insights on data projects. This includes new features for sharing queries, visualizations, and dashboards.
- Databricks Marketplace: An open marketplace for all your data, analytics, and AI models. It allows you to quickly discover and evaluate external data with prebuilt notebooks without going through a prolonged procurement cycle.
- Lakehouse Apps: A new and secure way to build, distribute, and run innovative data and AI applications directly on the Databricks Lakehouse Platform.
- Databricks Clean Room: A privacy-safe collaboration environment where multiple participants can join their first-party data and perform analysis on the data without the risk of exposing their data to other participants.
- Delta Sharing: Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations, regardless of their computing platforms. Databricks is expanding the Delta Sharing ecosystem with new partners, including Cloudflare, Dell, Oracle, and Twilio.
Databricks has introduced new features to improve performance, compatibility, scalability, and usability within its data development environment.
- English SDK for Apache Spark: Databricks has introduced the English SDK for Apache Spark, making it easier to write Spark code using English language instructions. This new SDK allows you to express complex data transformations in plain English, opening up Spark to non-technical users for the very first time.
- Delta Lake 3.0: Databricks has announced Delta Lake 3.0, introducing a new universal format and liquid clustering for improved performance and compatibility. This means you can now use Delta Lake with any data format, and it automatically optimizes data layout for better performance.
- Materialized Views and Streaming Tables: Databricks SQL now supports materialized views and streaming tables, improving performance and facilitating real-time data analysis. You can now create views that store the result of a query and can be refreshed as needed and tables that continuously update with new data.
- Project Lightspeed: Databricks has significantly advanced in Apache Spark structured streaming with Project Lightspeed. This includes improvements in performance and scalability, streamlining the process of ingesting large volumes of data in real time.
- Data Engineering and Streaming: Databricks has introduced new features for data engineering and streaming, improving performance and scalability. This includes enhancements for Delta Live Tables and Databricks Workflows to build and manage data pipelines more efficiently.
Leverage the Latest Innovations at the Databricks Data + AI Summit
The Databricks Data and AI Summit 2023 was a true testament to its commitment to innovation and vision for the future of data and AI. With all these exciting new features and advancements, Databricks is poised to redefine how organizations manage and leverage their data, driving the next wave of innovation and democratization in the data and AI landscape.
Excited to learn more about how Databricks can elevate your business? Don’t hesitate to contact us below!