
Artificial Intelligence
Large Language Models (LLMs) are transforming how businesses operate by offering unparalleled capabilities for human-like text generation and communication, search augmentation, and question-answering. However, LLMs often face challenges, particularly when it comes to accuracy and relevance. Retrieval Augmented Generation (RAG) is emerging as a popular solution that helps mitigate these problems. RAG combines specific information retrieval with the advanced generation capabilities of LLMs, resulting in a more accurate and contextually rich output. By integrating domain/organization-specific data with pretrained LLMs, it offers the following benefits over pretraining or fine-tuning LLMs:
RAGs consist of three key components:
The RAG process can be broken down into the following steps:
In one of our recent client engagements, we successfully developed a cutting-edge RAG Proof of Concept (PoC). The PoC chatbot incorporated complex client documents to respond to user queries regarding the client’s internal policies, while also offering chat history and a feedback mechanism for continuous improvement of the LLM. By leveraging Snowflake’s Cortex framework for the backend RAG chain and seamlessly integrating it with Dataiku’s web apps, we delivered a state-of-the-art solution that transformed how our client interacts with their data.
To design this RAG PoC, we leveraged three key features of the Snowflake and Dataiku environments:
Cortex Search is Snowflake’s fully managed hybrid (vector and keyword) search service for documents and unstructured data that optimizes retrieval with automated semantic reranking. It processes natural language queries and returns the most relevant text results and corresponding metadata. Handling everything from text data embedding and infrastructure maintenance to search quality parameter tuning, Cortex Search allows you to focus on crafting exceptional chat experiences. Additionally, it simplifies document filtering based on metadata tags, ensuring secure access and streamlined document access management. We utilized Cortex Search to handle retrieval in our RAG design.
Snowflake Cortex’s LLM functions are a suite of AI tools that leverage state-of-the-art language models to enable a range of natural language processing tasks. You can access leading LLMs through these functions, which are fully hosted and managed by Snowflake, requiring no setup on your end. Moreover, your data remains within the Snowflake environment, ensuring governance and security. Using the COMPLETE function in our PoC, we leveraged LLM functions to access generation LLMs such as Mistral, Llama, and Snowflake-Arctic.
For a breakdown of how to use the COMPLETE function, view a recent video in our Snowflake Cortex AI YouTube series below:
By using a Dataiku webapp we eliminated the need to manage and host an external webapp. For this PoC, we created a Dash webapp in Dataiku to serve as the frontend of the chatbot. Dash is an open-source framework in Python that develops web applications. The webapp featured a chat history mechanism that would store previous questions from the user to answer subsequent questions. The webapp also incorporated a thumbs up/thumbs down feature to allow the user to provide feedback to the RAG regarding the corresponding response. If you are interested in a Snowflake webapp solution, check out our blog post discussing Streamlit.
As illustrated in the RAG architecture diagram below, we implemented a solution that utilizes Snowflake for the backend and Dataiku for the frontend user interface (UI).
The Snowflake RAG architecture backend is orchestrated through a SQL Stored Procedure which chains together the Cortex Search vector database, the augmented prompt, and the Cortex LLM to produce the final response to the end user.
Setting up the Snowflake backend for the RAG consists of the following two steps:
Once the backend is built, from the Dataiku side, we make a call from Dataiku to the Snowflake store procedure. A call to the Snowflake stored procedure triggers the following steps:
The responses and chat history will be held in memory on the Dataiku side. Upon each conversational exchange between the user and AI chatbot the chat history, latency, and feedback will be written back to a Snowflake logs table.
Snowflake streamlines the process of building vector databases by offering a managed solution called Cortex Search that automatically generates embeddings. Out of the box, it provides a hybrid retrieval mechanism by combining semantic similarity search with filtering capabilities. With the comprehensive features of Cortex AI, Snowflake has become an exceptional choice for customers looking to accelerate the development timeline of Retrieval Augmented Generation (RAG) solutions.
Retrieval Augmented Generation (RAG) offers an effective way to demonstrate the potential of GenAI in your organization. Unlock your full GenAI capabilities with the support of the Aimpoint Digital team’s deep industry expertise in delivering GenAI solutions across GenAI strategy, RAG applications, fine-tuning, and pretraining.
Whether you need advanced AI solutions, strategic data expertise, or tailored insights, our team is here to help.