First Line Software is a premier provider of software engineering, software enablement, and digital transformation services. Headquartered in Cambridge, Massachusetts, the global staff of 450 technical experts serve clients across North America, Europe, Asia, and Australia.
Client
The client is a leading manufacturer and seller of audio systems, including speakers, amplifiers, and sound systems. Their product line is extensive, ranging from consumer audio devices to professional-grade audio equipment. The client approached our company with a request to develop an AI-powered website assistant capable of performing various tasks, such as the retrieval of product specifications, technical design files for installing the systems, and answering customer and sales representatives’ requests. In the future, it is also expected to perform a variety of calculations related to product performance, etc. Our team decided that Agentic RAG architecture was a perfect suit for this challenge.
Challenge
With the standard Naive RAG architecture, which is one of the safest and most broadly used for the implementation of Gen AI assistants, all the requests are served in the same way. But, for example, the request to find specifications by ID and advise on the best acoustic solution is better served by dedicated tools. The same goes for requests for calculations, which cannot be done by search in the data storage, so, it cannot be completed by Naive RAG. We realized that not all tasks could be solved with a Naive RAG approach, and a more powerful solution was needed.
Solution
After thorough analysis, our team proposed an architectural solution that leveraged Agentic Retrieval-Augmented Generation (RAG) combined with Azure AI Service. The Agentic RAG is flexible and allows to address different types of requests.
What is Agentic RAG?
Agentic RAG is an advanced AI architecture that combines the strengths of Retrieval-Augmented Generation (RAG) with an agent-based decision-making approach.
- Retrieval-Augmented Generation (RAG): RAG is a technique that enhances the capabilities of AI models by allowing them to fetch relevant information from enterprise-owned proprietary data sources before generating a response. Instead of relying solely on pre-trained knowledge, the model can query databases, APIs, or other resources to retrieve up-to-date and contextually relevant information. This is particularly useful in scenarios where the knowledge base is vast or often changing.
- Agentic Approach: The agentic aspect refers to the system’s ability to autonomously decide how to handle a user’s query. It operates like an orchestrator, determining whether to use a language model (such as Azure AI LLM) for generating a response, query an external database for factual information, or invoke a specialized service for precise calculations. This decision-making capability makes the system more flexible and capable of handling a wide variety of tasks efficiently.
Implementation Overview
The solution incorporated the following key components:
- Agentic RAG Layer: As shown in the provided schema, the Agentic RAG serves as the core decision-making unit. When a user submits a query, it first passes through the front end and then to the agent, which refines the query and decides the appropriate action. It is implemented using LangChain Python support for Agents.
- LLM (Large Language Model): The agent may direct general queries to the LLM (the Azure OpenAI GPT-4o model was selected), which generates responses based on pre-trained data. This is typically used for standard queries that do not require external data retrieval or complex calculations. This model supports Agents and Function Calling and is one of the most powerful models for this task.
- Knowledge Base Access: For queries that require specific product details or documentation, the agent accesses the company Vector Database. An additional business process was implemented to regularly update the Vector Database with the most recent marketing materials. The process know and vector calculation (embeddings) is used to index marketing information with semantic tags. These components allow the agent to fetch the most relevant and up-to-date information to include in the response. A problem domain-specific chunking strategy was applied to further improve the quality of the context of requests to the LLM.
- Tools for Precise Operations: If a query requires complex calculations or the use of external APIs, the agent redirects the query to the appropriate computation tools or API integrations. This ensures that calculations are accurate and that the information provided is reliable.
How the Architecture Works
- Query Processing: The user’s query is submitted via the front-end chatbot interface and is initially processed by the Agentic RAG. The agent refines the query, deciding which subsystem (LLM, Knowledge Base, or Tools) should handle it.
- Decision-Making:
- For general questions, the query is passed to the LLM for response generation.
- For specific product design information, the agent queries the company’s structured data in the form of an SQL database.
- For complex calculations, the agent calls upon external computation tools and API integrations.
- Response Generation: Once the appropriate data or computation is completed, the agent compiles the information and sends the final response back to the user through the chatbot interface.
Agentic RAG Schema
The schema below visually represents how Agentic RAG interacts with various components in our project:
This schema illustrates the flow of information from the user’s query through the agent to various subsystems, ensuring that the most appropriate resource is utilized for each request.
Results
The GenAI PoC successfully showcased the following:
- Accurate Product Information: The AI assistant provided precise and up-to-date specifications for the client’s needs for audio systems.
- Complex Calculations: The agent accurately redirected complex calculations to the appropriate systems, ensuring reliable results.
- Handling Tricky Queries: The system effectively managed questions about competitors, providing neutral, fact-based responses.
- Quick Turnaround: The two-week development period for the PoC demonstrated our ability to rapidly deliver a functional prototype, significantly enhancing our standing in the tender process.
Technologies
- LangChain-based Agentic RAG: Central decision-making AI that manages query routing.
- Azure OpenAI LLM: Language model for generating natural language responses.
- Azure Cognitive Services: Used for integrating external systems and databases.
- Custom Calculation Tools: Python code integrated for performing precise calculations.
What’s Next?
Start exploring the AI opportunities with our GenAI PoC package or learn more about RAG Implementation with First Line Software.