Building Enterprise-Scale RAG Chatbots Using Azure AI Foundry

AI DEMO

1:20 PM – 2:05 PM

Room 260-262

SPEAKER

Mehul Bhuva

Senior Software Engineer (Data & AI), QCI

Use with AI

Copy this session's complete context to paste into ChatGPT, Claude, or any AI assistant.

Preview context block

## Session: Building Enterprise-Scale RAG Chatbots Using Azure AI Foundry
**Track:** AI Demo | **Time:** 1:20 PM–2:05 PM | **Room:** 260-262 | **Type:** AI Demo
**Conference:** CIRAS AI Summit for Iowa — May 6, 2026, Scheman Building, Iowa State University, Ames IA

### Speaker(s)

**Mehul Bhuva** — Senior Software Engineer (Data & AI), QCI (West Des Moines, IA)
I’m an Data & AI Platform Engineer with 22+ years of experience building and modernizing enterprise solutions using Databricks, Azure AI Foundry, .NET Core, Azure Functions, Azure Data Factory, Blazor, Angular and SharePoint. My work spans metadata driven ETL frameworks, cloud modernization initiatives, and production grade AI systems that deliver real business value at scale.

I’m the creator of SharePointFix.com, a technical blog with 1M+ views, and I run the SharePointFix Facebook community with 1,000+ members. My community contributions include technical publications on SQL Server Central, Kobai, and multiple peer reviewed journals, along with conference speaking and mentoring 50+ developers across Azure, AI, and .NET technologies.

I was nominated to be a Microsoft Azure Developer Influencer (MADI) by Microsoft, recognizing my leadership and impact in driving Azure and AI adoption within the developer community.

I’m passionate about sharing real world AI practices, collaborating with industry leaders, and helping teams accelerate digital transformation through modern data and AI platforms.

### Session Description

This session provides a practical, end‑to‑end deep dive into building enterprise‑grade Retrieval Augmented Generation (RAG) systems using Microsoft Azure AI Foundry. Drawing from real‑world implementations—including a production RAG chatbot serving 22,000+ global users—the session walks participants through the complete lifecycle of creating scalable, secure, and high‑accuracy RAG solutions tailored for industry.

We begin by breaking down the RAG architecture: document ingestion using Azure Document Intelligence, adaptive chunking strategies, embedding generation, and vector indexing with Azure AI Search. The session then explores how user queries are transformed into embeddings, how retrieval pipelines work, and how Azure OpenAI models inside AI Foundry generate grounded, contextual responses.

Using visuals from the included architecture diagram, attendees learn best practices for chunk sizes, metadata, hybrid search, reranking, prompt design, and governance (RBAC, encryption, audit logging).

Real‑world examples from agriculture, manufacturing, financial services, and healthcare show how organizations use RAG for maintenance assistants, compliance bots, crop advisory tools, customer service, workflow documentation, and more. Performance benchmarks—such as 95% retrieval accuracy, sub‑2‑second responses, and 40%–60% cost reduction—demonstrate measurable business impact.

The session concludes with an interactive discussion on emerging trends—multi‑modal RAG, knowledge graphs, and real‑time streaming—and how Azure AI Foundry’s roadmap supports the next generation of enterprise AI.

This is a highly actionable, architecture‑focused session designed for leaders, engineers, and practitioners looking to implement RAG at scale with Microsoft technologies.

### Other sessions in the AI Demo track

- M365 Copilot Rollout: Driving Adoption and Impact at Pella (3:10 PM–3:55 PM)
- From Chatbot to Builder: Turning AI Into a Daily Collaborator Inside Real Projects (10:20 AM–11:05 AM)
- Stop Automating Broken Processes: How to Redesign Your Business Operations for the Age of AI Agents (11:15 AM–12:00 PM)
- Close the GenAI “Learning Gap”: Self‑Improving AI Without Fine‑Tuning (2:15 PM–3:00 PM)

### Suggested prompts for this session

- "What questions should I prepare to ask the speaker(s) at this session?"
- "Create a structured note-taking template for this session focused on actionable takeaways"
- "Based on this session description, what background reading should I do to get the most value?"
- "After I attend, help me create an action plan for implementing what I learned"
- "How does this session connect to the other sessions in the AI Demo track?"

## Session: Building Enterprise-Scale RAG Chatbots Using Azure AI Foundry
**Track:** AI Demo | **Time:** 1:20 PM–2:05 PM | **Room:** 260-262 | **Type:** AI Demo
**Conference:** CIRAS AI Summit for Iowa — May 6, 2026, Scheman Building, Iowa State University, Ames IA

### Speaker(s)

**Mehul Bhuva** — Senior Software Engineer (Data & AI), QCI (West Des Moines, IA)
I’m an Data &amp; AI Platform Engineer with 22+ years of experience building and modernizing enterprise solutions using Databricks, Azure AI Foundry, .NET Core, Azure Functions, Azure Data Factory, Blazor, Angular and SharePoint. My work spans metadata driven ETL frameworks, cloud modernization initiatives, and production grade AI systems that deliver real business value at scale.

I’m the creator of SharePointFix.com, a technical blog with 1M+ views, and I run the SharePointFix Facebook community with 1,000+ members. My community contributions include technical publications on SQL Server Central, Kobai, and multiple peer reviewed journals, along with conference speaking and mentoring 50+ developers across Azure, AI, and .NET technologies.

I was nominated to be a Microsoft Azure Developer Influencer (MADI) by Microsoft, recognizing my leadership and impact in driving Azure and AI adoption within the developer community.

I’m passionate about sharing real world AI practices, collaborating with industry leaders, and helping teams accelerate digital transformation through modern data and AI platforms.

### Session Description

This session provides a practical, end‑to‑end deep dive into building enterprise‑grade Retrieval Augmented Generation (RAG) systems using Microsoft Azure AI Foundry. Drawing from real‑world implementations—including a production RAG chatbot serving 22,000+ global users—the session walks participants through the complete lifecycle of creating scalable, secure, and high‑accuracy RAG solutions tailored for industry.

We begin by breaking down the RAG architecture: document ingestion using Azure Document Intelligence, adaptive chunking strategies, embedding generation, and vector indexing with Azure AI Search. The session then explores how user queries are transformed into embeddings, how retrieval pipelines work, and how Azure OpenAI models inside AI Foundry generate grounded, contextual responses.

Using visuals from the included architecture diagram, attendees learn best practices for chunk sizes, metadata, hybrid search, reranking, prompt design, and governance (RBAC, encryption, audit logging).

Real‑world examples from agriculture, manufacturing, financial services, and healthcare show how organizations use RAG for maintenance assistants, compliance bots, crop advisory tools, customer service, workflow documentation, and more. Performance benchmarks—such as 95% retrieval accuracy, sub‑2‑second responses, and 40%–60% cost reduction—demonstrate measurable business impact.

The session concludes with an interactive discussion on emerging trends—multi‑modal RAG, knowledge graphs, and real‑time streaming—and how Azure AI Foundry’s roadmap supports the next generation of enterprise AI.

This is a highly actionable, architecture‑focused session designed for leaders, engineers, and practitioners looking to implement RAG at scale with Microsoft technologies.

### Other sessions in the AI Demo track

- M365 Copilot Rollout: Driving Adoption and Impact at Pella (3:10 PM–3:55 PM)
- From Chatbot to Builder: Turning AI Into a Daily Collaborator Inside Real Projects (10:20 AM–11:05 AM)
- Stop Automating Broken Processes: How to Redesign Your Business Operations for the Age of AI Agents (11:15 AM–12:00 PM)
- Close the GenAI “Learning Gap”: Self‑Improving AI Without Fine‑Tuning (2:15 PM–3:00 PM)

### Suggested prompts for this session

- "What questions should I prepare to ask the speaker(s) at this session?"
- "Create a structured note-taking template for this session focused on actionable takeaways"
- "Based on this session description, what background reading should I do to get the most value?"
- "After I attend, help me create an action plan for implementing what I learned"
- "How does this session connect to the other sessions in the AI Demo track?"

TRACK AI Demo

FORMAT AI Demo

ROOM 260-262

This is a highly actionable, architecture‑focused session designed for leaders, engineers, and practitioners looking to implement RAG at scale with Microsoft technologies.

Key Takeaways

Learn the complete RAG architecture using Azure AI Foundry—from document ingestion and chunking to embeddings, vector search, and grounded response generation.
Apply practical best practices for building accurate, scalable, and secure enterprise RAG systems, including hybrid retrieval, metadata strategy, and prompt design.
See real‑world impact in action through a live, end‑to‑end walkthrough of a production RAG chatbot serving 22,000+ users, with patterns you can immediately implement in your organization.

Continue the conversation with Mehul Bhuva at the Leadership & Workforce Facilitated Discussion — 3:10 PM - 3:55 PM, Room 220-230-240

Session Recording

Session Data

Download SRT (Captions) Attendee Slides (PDF) AI-Formatted PDF Download Session Bundle (ZIP)

Transcript from Summit:

00:00 Speaker Introduction and Background Slide: 1

introduction mahul abuva cirrus marketing azure ai rag systems

All right, I'll go ahead and do a quick introduction. I'm Gail Masberg, and I'm Cirrus Marketing Manager. If you're new to this room, I get the pleasure to work for Cirrus in a little bit different lens. So I'm more on the internal side, so working on getting y'all to these events and helping prepare for these events, as well as working on website marketing PR for Cirrus. So welcome, that's me. get to introduce today coming off of lunch. Hopefully you all had a great energizing lunch. I get to introduce Mahul Abuva, a senior software engineer in data and AI at QCI. With more than 20 years of experience, Mahul has built and modernized enterprise systems using Microsoft Azure and advanced AI platforms, delivering scalable solutions across industries. He is also a recognized contributor to the developer community, author of SharePointFix.com. I've been there. a widely read technical blog and Microsoft-nominated Azure Developer Influencer.

01:03 Mahul's Professional Journey and Experience Slide: 1

software development microsoft stack full-stack sharepoint dotnet

His work spans production-grade AI systems, including large-scale implementations. In today's session, Mahul will walk us through how to design and deploy enterprise-grade, read it all out, retrieval augmented generation, or RAG systems on Azure, sharing practical architecture insights, real-world examples, and best practices you can apply in your own organization. So please join me in welcoming Mehul. Thank you. Hello, Can you guys hear me out back there? Good? Okay. Let me know if I'm being too loud. I have that habit of being loud. So thank you, Gail, for introducing me. And this is Mehul Bhuva. I have 22 years of software development experience. I started with Microsoft technology stack as a full-stack app developer working on several Microsoft technologies, including VDASP, customizing SharePoint, C#,.NET, the classic world, the modern world,.NET Core, Angular, React, you name it.

02:08 Session Agenda Overview Slide: 2

agenda demos rag architecture azure ai foundry production design

I've done all kinds of developments. But recently, I've stepped into the data and AI world. It's been three or four years now where I stepped in from an app world to a data world. and I can bring in a lot of experience and realize that I was able to contribute a lot there. So that's a background of my experience. Moving on to the next slides. Again, this Gail has already covered this part, so I'm not going to cover it. So the agenda for today, I don't want to bore you with the slide decks and slide shows. I know you guys are here for the demo, and I have 3 interactive demos. In fact, I build these two agents where I can show you the capabilities of the agent. And then the third demo is going to be we build the agent together. So the agenda is that we cover what is RAG and why RAG, RAG architecture, Azure AI Foundry, what Azure AI Foundry is all about, how to build on Azure AI Foundry, what are the building blocks, production design, how to design A production grade platform, for your or a framework for your rank-based scenarios.

03:15 RAG Definition and Enterprise Use Case Slide: 2

rag definition retrieval augmented generation onboarding employee information sop

And a case study which I recently implemented at my client location, which is Corteva, I've been working with them for the last six years. We implemented an enterprise-wide chatbot for them that scales for up to 22,000 users across the globe. And so what are some of the best practices that I have learned from my experience? So moving on. So what is RAG? So what is RAG? Are you guys aware of RAG? First of all, let me ask you this. Okay. So pretty informed audience, retrieval, R stands for retrieval, A for augmented, and G for generation. So coming back here to RAG, right? So let's assume you have, and every company has this problem, right? You have an employee who joins on day one. right? He needs information right away to get started. It could be any kind of information. It could be an onboarding information, right? I need my, I need to sign up for a 401k plan.

04:17 Enterprise Document Management Challenges Slide: 2

unstructured documents sharepoint onedrive document management plant operators

I need to sign up for a dental, medical, vision plans. How does the employee know all this? Secondly, if you have a plant operator at a site, he needs a checklist on day one, how he should learn the whole process. There are SOPs. There are PDFs, there are hundreds and hundreds of pages of documents that he has to go through. A majority of the companies, small, big, medium, doesn't matter, have large-scale, unstructured and semi-structured documents. They have lesser databases, but more documents sitting out there in a repository somewhere. It could be a OneDrive location. It could be a SharePoint site. or it could be a person's own physical machine. There are documents everywhere, and there are several versions of these documents, making it very difficult to find relevant information when you need it the most. That's the key here. So the employee who's starting on day one needs a lot of information on day one to get started on his job. Of course, there'll be onboarding, there'll be training, things like that, but if he's on the shop floor and he does not know what to do, then we have a bigger problem.

05:21 RAG Chatbot Benefits at Corteva Slide: 2

corteva chatbot implementation plant operators multilingual llm context

because then the enterprise or the organization is not allowing him or enabling him to be productive from day one. That's why the whole concept of rack-based chatbot. So when I say this, it is very serious because we have seen the benefits of implementing a rack-based chatbot at Cortiva. So we have these hundreds of plant sites and operators across the globe. And when we provision, and they don't like to read documents, especially the documents, especially from Europe. They don't like to read documents that are in English. They like to go and translate them in their local language and then they can make sense out of it. So when you put the LLM, which is a large language model, in front of your own enterprise documents, it all of a sudden has more context because it is not looking at the world wide web. It is just looking at your own documents that you care about. Right? So coming back to this situation where this employee doesn't know what to do, and if an LLM chatbot can basically look at a 200-page document and give him a checklist of things that he needs to do on day one, like dry the seeds in this way, go to this plant, follow this process, follow this conditioning process, follow this quality process, those kind of things will help him get started.

06:42 Traditional LLM Gaps Addressed by RAG Slide: 2

llm gaps enterprise data hallucination reduction azure ai foundry data privacy

Now, what are the gaps that we have tried to address? I mean, talk about the traditional LLM gaps. It's a large language model, like a ChatGPT model that you've used, cloud model that you've used. It does not know anything about your enterprise data. It knows a lot about the rest of the world's data. When you ask questions to the LLM in general, it can give you generic responses, not tied to your own process. But with the RAG in the enterprise, that fixes that problem because then now you have your LLM pointing to your data and not the rest of the world. And with Azure AI Foundry, the contract with Microsoft is such that your data will not leave or the LLM will not be trained on your data, right? Or will not be retrained on your data outside your enterprise. So we solved that problem. There is 80% less hallucination because it is looking at a specific context. 10 to 100 times more cheaper because you don't have to retrain the model each time. It takes 2 years to retrain a large language model. But when you give it enough context, it's just going to give you a meaningful response.

07:42 RAG Architecture Components Slide: 5

rag architecture vectorization chunking embedding azure ai search

And you can also fine tune your LLM in Azure AI Foundry, which I'll show you in a bit. Very, very cheap, in a very cheap way. You don't have to spend millions of dollars for that. Day's time to do production. Basically, you can go to production in a matter of, so at Cortiva, if someone comes up with a new use case, like a training, onboarding, knowledge assistance, we can onboard them on their data with a rank-based profile in four hours. Full audit trail of who's asking what questions. There are safeguards and guardrails around it. There are evaluations that we do. If someone is asking a harmful question, we flag that, right? Someone is asking an explicit question, we flag them. So we have all the full audit trails with the Azure AI Foundry. Now, this is the architecture that I want to go through real quick. You have your source documents. They could be living anywhere in your enterprise. It could be a SharePoint, but the major part of it, it's all going to be on SharePoint. I believe everyone knows about SharePoint or a knowledge repository somewhere, right?

08:46 RAG Pipeline Extract and Index Steps Slide: 5

data extraction chunk ids embedding models vectorization azure ai search

OneDrive, your physical folders, BLOB containers. I've seen a lot of documents or unstructured or semi-structured documents living on a BLOB container. So the way the pipeline works is you extract or vectorize the data that you have. It could be hundreds of, you know, PDFs or thousands of PDFs. You vectorize it. Foundry will automatically chunk it into several chunks with the chunk IDs and then you embed it. using a large language embedding model. Then you vectorize and index it through Azure AI search service. And I'm going to take you through that. These are just the terms that you just have to remember for now. I'm going to take you and show you what I mean. The retrieve, generate, and grounded answers are the last and final pieces, where essentially your question also gets converted to basically a vector. Who knows, I mean, do you know what a vector index is here? Any idea what a vector index is?

09:54 Vector Index Explained Slide: 5

vector index floating point similarity search embedding numerical representation

Okay, so I can explain. So a vector index is basically, let's assume that you have a set of floating point numbers. Okay, so if you have, if you have a, if you ask a question, what is the weather outside look like today? That question gets converted into floating point numbers or coordinates like latitudes and longitudes, which does not make sense to a human, but makes a lot of sense to the computer. Now your grounding information, which is your source data, is also vectorized. So your documents that you have uploaded, your enterprise documents, your SOPs, plans, procedures, information manuals, your codes, everything is vectorized in the vector database. Those vectorizations are nothing but floating point numbers. So when you ask a question to the chat bot, where can I find a plant operating manual? Or what is the process of drying a seed?

10:51 Hybrid Search and Re-Ranking Slide: 5

hybrid search vector search keyword search bm25 re-ranking

It basically gets converted into a numerical representation. and then it looks at, so we have a big floating point number, and it goes back and looks at your vector index of your documents and tries to do a similarity search, which means it tries to go to the number which is closest to your question. Everything gets converted into a floating point number. So if you have a number like 999, it's going to look at a range of the closest numbers to 999. And then it gives you more information and context. What that means is that you don't, it's basically not doing a keyword search, but it's going to go and search for the meaning inside the document, right? And then when we combine the hybrid retrieval, that's known as the hybrid retrieval, where you combine the keyword search and the vector search so that it gives you more context and the meaning. And then we also have another concept of re-ranking on the Foundry, which gives you even better quality of the search outcome because it's a combination of both hybrid plus re-ranking.

11:59 Semantic Search vs Keyword Search Slide: 6

semantic search vector search bm25 keyword search natural language

So semantic or vector search is incredibly powerful for natural language questions. It can paraphrase queries and conceptual lookups. This is what makes the chatbot feels like it actually understands you rather than doing a control F, like finding your information. And the BM25, which is another concept, is a keyword-based search where it's like a keyword specialist that is looking at blind spot information which are more specific, like a product code. A vector search will not be good for specific information. It'll be only good for a contextual-based search or meaningful-based search. That's where the keyword search comes into picture as well. And when you combine both these, it becomes even more powerful. So this is an example of a document intelligence pipeline. As I said, the source documents, we have something known as Azure Document Intelligence, which basically looks at all your documents, thousands of documents. There's no limit to it how much documents you can vectorize.

12:53 Azure Document Intelligence Pipeline Slide: 6

azure document intelligence token size chunking metadata enrichment openai embeddings

you can then it automatically decides what documents need to be chunked with what tokens. Now, tokens are number of words that it will basically combine into a chunk. So, depending on if you have short policies, the token size is chosen accordingly. By default, I think the Foundry interface picks the default token of 512 tokens, which is approximately 500 words. It does some metadata enrichment, the security, timestamp, ACL, things like that. And then the OpenAI embeddings convert that into a vector index and store it in the search service, Azure AI search service. There are several components. It's a whole enterprise architecture framework. Within the Foundry, you are also connecting to an Azure AI search service, which actually does the vectorization process for you. Foundry hosts all the models and LLMs, which I'll show you in a little bit. I just wanted the concepts to be clear before. Now, what is the benefit of using an Azure AI Foundry?

13:53 Azure AI Foundry Platform Features Slide: 6

azure ai foundry model catalog playgrounds mcp connectors fabric lakehouse

There's a model catalog. There are different kinds of playgrounds. There's an agent playground. There are tools that you can use. And there is also a chat playground. There are hubs and projects found. I'm not going to go into the details. These are all high-level information. There is a concept of MCP connectors. Now, for example, how many of you here know about lake houses? Do you have lake houses in your enterprise. So a lake house is a concept where you bring in data for analytics and machine learning and AI purposes into a central location. So all your transactional systems, you bring in the data into that main reporting engine. They used to call it the warehouses, but now the term has changed to a lake house, which is, you know, you can get billions and billions of records in your lake house. Those are mostly structured data, not unstructured data. So with MCP connectors in Foundry, you can tap into those like a Fabric lake house. Have you heard about Fabric? It's an Azure concept.

14:53 Central IT Infrastructure Management Slide: 6

central it infrastructure management foundry project teams agents

Have you heard about Databricks? Who has heard about Databricks? Databricks has something known as Genie spaces, which basically connect to structured data. So you can bring that in data. So you can marry the structured data, which is the relational data and unstructured data and semi-structured data using AI Foundry. So it's a platform. and a tool to build on top of. So again, this is central IT now, this is the best approach where a central IT team manages the infrastructure for Foundry and enables the different teams to use the Foundry infrastructure. So it's like provisioning the infrastructure first, and then your project teams can pick and choose the projects they want to create, agents they want to create, right, resources they want to create. So what powers your rack system is the AI hub, AI projects that we have in the Foundry, the FCP connectors, playgrounds, vector stores, and then of course you have the observability through app insights and open telemetry.

16:00 Building a RAG Chatbot with Zero Code Slide: 6

zero code foundry project model selection knowledge sources index configuration

Now this is what how you are going to build a rack-based chatbot. I'll show you in a demo. We create the resources for a project, you select a model, you connect your knowledge. This is all through zero code. Okay, you can do everything through Foundry. So basically, you create a project for Foundry, then you select the model from a catalog of 11,000 models. When I say model, it is your open source GPT models that we have, LLMs, right? Like ChatGPT or cloud or deep, you know, Gemini, right? Meta models, deepseek. Then you connect your knowledge. It can be a SharePoint site or it can be a BLOB storage. You configure the index. You vectorize and configure your index through Azure AI service. You can tune and evaluate your models. You can fine tune your models with your data by giving it a training data set. And then publish and integrate, which means you can publish your agents. You would need a custom integration either through a Teams channel. Have you used Teams? Yes.

16:58 Production-Grade Implementation Architecture Slide: 11

production architecture sso mfa cosmos db chat history

So you can create a Teams channel directly through Foundry. You don't have to write a single line of code for that. And then you can also call the APIs that you have created on the Foundry for custom interface, just like a regular API call with a key-based authentication. So again, this is a production-grade implementation that I've done at Cortiva, and this is just an example of what we have done. So we use the SSO inbuilt SSO. sort of for a single sign-on experience and multi-factor authentication from user perspective when he signs in. On the back end, we have Cosmos DB, which is another, you can store your chat responses and history in any database. We chose Cosmos DB because it's a big database. It's, you know, it can store, you know, terabytes of data. So whatever interactions you have with your chat bot, we wanted to capture all the interactions that user had, what kind of questions they were asking, what kind of ratings they were giving to the responses. Whether it was a thumbs up or a thumbs down, we wanted to capture all that.

18:01 Corteva's Sprout Chatbot Implementation Slide: 11

sprout chatbot corteva seeds 22000 users knowledge transfer employee training

So we use Cosmos DB. The knowledge part of it, the other part that you see in purple here, it's all based on Foundry. It's all a part of the Foundry framework. Now, this is just at a glance that one of the implementations I made at Corteva was a Sprout. We call it the Sprout. It's an enterprise-grade chatbot meant for 22,000 global users for our seeds business. So we have a seeds business and a crop protection business. This is for the seeds business. What we found is that there were a couple of use cases that we were able to enable on Foundry. One of them was an employee who was 25 years with the company, decided to leave the company. And he had a bunch of documents that he had created. Those were all semi-structured or unstructured documents. Now we had to train a new employee in his position in 15 days. They asked us to create a profile on Sprout. That's how the whole evolution started. We were able to create that profile and bring the new person coming in on the chatbot.

19:02 Multiple Use Cases and Profiles Slide: 11

use cases onboarding excel analysis charts graphs code interpreter

That was our first validation that yes, chatbots can do a lot than we think about otherwise. So he got trained in 15 days and he is right now performing really well. So the feedback that we are getting that yes, the chatbot was giving accurate answer. We also were increasing the confidence of the end user by citing each and every source. So citations are very important when you implement chatbots in your companies that you cite the responses and trace it back so that the users can trace it back to the original document. That way they have high trust in what they are seeing and it also is a good indicator of their hallucinations. So we implemented that and that's quickly we created a framework so that we can scale to multiple different use cases. onboarding assistance. We have 100, we have at least 10 different kinds of profiles now serving hundreds of customers. So another profile which was very interesting was there were leaders who had to present something every Monday morning. And they had these Excel sheets that they couldn't sometimes if they spent hours making diagrams and charts, you know, things like that, you do on Excel, you create a bar chart, you create a pie chart, you create a histogram, all those regular sales report, marketing reports.

20:16 Structured Data Integration and Multilingual Support Slide: 11

databricks genie spaces structured data sql queries multilingual cost reduction

So we enabled that profile where they upload the Excel on the chatbot and the chatbot info the metrics that they want and create the charts and graphs for them. So you could also use it for scenarios other than the regular chatbot experience for more of an agentic AI. So then we also had, so we used tools like Code Interpreter to enable those things. We also have other kind of agentic profiles where we are asking them to go to Databricks Genie Spaces and give back structured data or structured information, like a SQL query, natural language. We ask questions in natural language, it comes back with a SQL query. So these are some of the use cases we have enabled at Gotiva. 95%, we've seen a 95% retrieval accuracy because we log and monitor all of our responses. Less than two second response time and 40 to 60% cost reduction, especially for places like Brazil, South America or Europe, where the documents that we have are in English, they don't need to translate because AI is a multilingual, LLMs are multilingual.

21:23 RAG Use Cases Across Industries Slide: 11

healthcare manufacturing logistics financial services compliance

So they respond back in your own language, not the English language. So that was very helpful because users are now interacting in their own language there. In the past, they had to translate. So we saved a lot of cost there. Now, what are the situation like, what are the RAGs can be implemented not just in the seed business, but also across the board. And we have seen if you have gone to several, websites like Amazon uses RAG all the time. There are all these, you know, chat bots that you see on your credit card, your banks, right, bank apps, they use RAG all the time because they are looking at FAQs and responding back to you. So, health care policy about retrieval, manufacturing, maintenance, SOP assistance, troubleshooting guides for your workers. If you are, if you have a shop floor and you're in the manufacturing business, you have so many processes that the person needs, employee needs to follow to create a product, right, or to basically build a product in your shop floor. Those things can be automated through Rack-based chatbots.

22:25 Industry Benchmarks and Security Slide: 14

benchmarks retrieval efficiency hallucination reduction cost cutting scalability

Logistics, route planning, customs documentation, things like that. Now, we have one of the profiles where... And I'll also show you in a demo where it can basically write up something for you. If you give it enough context that, hey, you know, I have this requirement, can you write up a draft e-mail for me based on the context I'm giving you? Financial services use compliance bots all the time. Public sector industries are using it. Citizen service assistance, guidance, grounded regulations, et cetera. So benchmarks that we have seen, as I said, 95% retrieval efficiency, 80% hallucination reduction, 40 to 60% cost cutting, and, you know, versus fine-tuning or retraining agents. These are the some high-level numbers, industry-wide. Scalability and security at enterprise. We have, as I said, we can support 10,000 plus concurrent users, less than 100 millisecond vector search. Bring your own key encryption and 0 public exposure, because it's all self-contained within the enterprise. Best practices are you have to, so it is a garbage in and garbage out.

23:30 Best Practices for RAG Implementation Slide: 14

best practices data quality hybrid search embedding models observability

Your chatbot is as good as your data. If your data is crappy, the chatbot is also going to be crappy. So that's pretty, it's like a no-brainer. So you have to understand that the data has to be accurate for your chatbots to work accurately, especially in the RAG world where we are telling the LLM that don't use your intelligence, use our intelligence. Don't retrieve information from your memory, use our data. So retrieval design, hybrid search, re-ranking are the best practices. Using a large language model, text embedding large is the best practice currently. Observability, you have to track what you know, what people are asking questions on. Security is very important. We have basically the custom chatbot interface to only people. the enterprise. So that is also very important that you don't want exposure. Rotating those keys are also very important because these are all key-based authentication from your endpoints perspective. Cost optimization, we have a way to monitor cost.

24:31 Cost Monitoring and Automation Slide: 14

cost monitoring token cost devops infrastructure as code iac

Now how do you know what's your input and output token cost? And based on how many number of users are hitting your chatbot every day, there is a visibility for that too on the Foundry interface, which I'll show you. Automation. It's very important now. This might not be important for business users, but from an IT or technical users, you should be automating your deployments. You should not be creating Foundry infrastructure manually every time, like staging up a search service or staging up a model, creating a model, adding a model. Those can all be automated through something as a DevOps, IAC, infrastructure as a code. Have you heard about infrastructure as a code? Yes. So these can all be automated for you. so this is just a slide on trends in Azure. So what is the current trends are Foundry IQ is very important because it connects to several different kinds of systems. Foundry IQ can connect to structured data, unstructured data, semi-structured data, laying anywhere in your enterprise. It just doesn't have to be Microsoft enabled.

25:34 Current Trends in Azure AI Foundry Slide: 14

foundry iq mcp servers agentic ai multi-agent supervisory agents

It can be also auto databases. It can be other types of databases. or other types of disparate systems that you can connect to through MCP servers. Agent AI is very, basically Microsoft is now focused on doing agentic rack-based systems where you have an agent which basically calls another agent and then they basically synchronize with each other and come back with the information. That's where they're going to, the multi-supervisory agent that can delegate the tasks and come back with the accurate response. I'll show you an example of that as well, how we can do that easily through a workflow in Foundry. Efficient models, again, this is just for information purposes. There are some smaller, large language models that you can use for smaller tasks. You don't have to use a large language model all the time. There's also a small LLM that you can use. Like streaming or even indexes can be updated at a time or at a frequency.

26:39 Vector Index Example with IMDB Dataset Slide: 19

vector index imdb dataset floating point chunk ids csv

Now, go to the most important section, which is the demo, which is what I think you guys are waiting for. So, let me show you how a rack. So, first starting with what's a vector, how does a vector index looks like? So, if you see my screen here, this is how a vector index looks like. This is a IMDB movie data set, and this is a this is a set of 5000 best or top movies later on IMDB. And if you see, these are the vectors for that. And if you see, these are the chunk IDs, chunk title. So it's a CSV file. Out of that CSV file, we have so many vectors. So I was telling you that vector is a floating point numerical value. And every word that's in that CSV file gets converted into a vector or a numerical floating point. And when you ask a question that, hey, give me the list of top 10 highly rated movies, it's going to go and convert that to a numerical representation and find the closest match in this vector database.

27:37 Document Ingestion and Index Automation Slide: 2

document ingestion blob container sharepoint automation vectorization pipeline

This is the IMDB movie data set that's hosted on a BLOB container. And I'll show you, we have just vectorized that here as a knowledge source. So, this is the index I was talking about. The first step is to do the grounding data, which is the grounding data is actually your enterprise documents. Now, you can automate this by creating some pipelines where you can extract data from your SharePoint at a frequency, because there could be hundreds of new documents every day. You don't have to do this manually. I'm showing it for the demo purposes, but in enterprise scenarios. You should create a document ingestion pipeline if you have a lot of unstructured data and then automate the vector indexing part of it. So the way I created this index was pretty simple. I imported the data from BLOB container. Yeah, I'm out of the quota. So I have 3 indexes. That's why it doesn't allow me, but that's OK. So the way I do it is basically I've created these indexes and this tier of search service, which is a basic tier, has only three indexes that I can create.

28:42 Azure AI Search Service Limitations Slide: 2

azure ai search service tiers index limits basic tier manual import

So I couldn't show it to you, but there are these three indexes that I've created through an import process, which is a manual process, and you can automate that easily with very little code. So now what I'm going to do is I'm going to create an agent where we type agent to this particular IMDB index. Right, let's call it. So this is an OK. This is the Foundry interface. Now coming back to Foundry, the I think I showed you the Azure AI search service. This is what I was talking about. And then the BLOB container and the vector store. We go to the foundry interface, which is where the playground is for creating agents and fine-tuning models and creating evaluations and looking at the tools and things like that. So if you see, there are several tools available. Let me see, let me discover. These are the models that you can choose and pick and choose from. Let's show you some of these models here. There are more than 11,000 models, and every day there are at least 2 or 3,000 models that are being added.

29:45 Model Catalog and Data Privacy Slides: 19, 2

model catalog 11000 models claude gpt-4 gemini

Here we go. So cloud is there, and then you have GPT 5.4. So these are all available to you. And then the data, as I said, these models are available exclusively for your subscription or tenant in your organization. They are not going to train these models with your data and expose it to the outside world. That's why we have Foundry, right? Because everything is self-contained in the framework. So let's go back and build an agent. Let's create an IMDB agent, movie advisor agent. Now what's the first step? The first step is to connect your data source, right? Because how does the agent know what to do? If you don't give it the data, the grounding data, so there is this knowledge section here right here. So it's the Foundry IQ I was talking about.

30:48 Creating IMDB Movie Advisor Agent Slides: 18, 19

imdb agent movie advisor foundry iq web search rotten tomatoes

You can connect any kind of services. I want to connect my search service to the knowledge base. If you see, I have my index that I was showing you listed here. I connect my index to this Foundry IQ. I automatically have a web search here because I also want to go and look at rotten tomatoes. And whatever movie recommendations I get from IMDB data set, I want to validate that with Rotten Tomatoes. You know Rotten Tomatoes, right? It's just a public crowdsourcing for movies. So now I have instructions here that I have predefined. I don't want to create them by hand. So now these instructions are very important because this tells the agent on what to do. So if you read this in a nutshell, what it tells you is You are an intelligent movie discovery assistant powered by a vectorized knowledge base of 5,000 movies from the TMDB. Now, the data set, they call it the movies database because of copyright infringement, right? So the data set I have calls it TMDB, but it's actually IMDB data.

31:52 System Prompt Engineering Importance Slide: 19

system prompt prompt engineering agent behavior persona tmdb

So it, and then we tell it to basically go and 1st look at our knowledge source and then go to do a live web search to Rotten Tomatoes for real-time scores, audience scores, critics, et cetera, movie reviews. These are the instructions. These are very important. The model will exactly, and this is an exercise on its own. When we created these profiles for employees at, I mean, for our user base at Corteva, we had to go through and talk to domain experts for each and every profile that we were creating that, hey, how do you want this profile to behave? Every profile is going to behave in a different way. And it's all configured through this prompt template. This is a very important, this is known as a system prompt engineering. This is a very important aspect in the way that you tell the agent in natural language of how it needs to behave or what persona it needs to take. When someone asks a specific question. So we also have examples here and the outputs, how it should generate the output.

32:52 Voice Mode and Deployment Options Slide: 19

voice mode teams integration microsoft 365 deployment endpoints

Like when you generate the output, show the title, generate, director, cast, et cetera, et cetera, et cetera. right? So special capabilities, semantic discovery, and I give it more specific information, right? So then there's also voice mode, which I've not, I mean, it's cool to play with, but I've not played around with it too. This is a new feature where you can basically speak to it and it'll respond back and also speak back to you. So this is all out-of-the-box, guys. You don't have to code for it. You don't have to do like graphs and lang chains and write anything new. You can, and then the other part which is interesting is you can directly publish the endpoint or you can publish to teams. and Microsoft 365 as an agent. So this is out-of-the-box. And it requires very little technical insight. We just need to know how to set it up the first time. That took us quite a while to figure out. So then there is also memory. Now some people like to also keep like the past chat history memory of the chat bot based on the The questions that people have been asking, it goes back and looks at the memory.

33:57 Guardrails and Safety Features Slide: 17

guardrails safety features unethical questions self-harm hacking

That's why if you have seen, if you go to Claude or ChatGPT, the longer the conversations you have with it, the slower it becomes. Because it has to go back to your past questions and respond back. Then there is guardrails I was talking about. Guardrails are nothing but safety features that people, if they are asking self-harming questions, questions that they should not be asking, questions about hacking, unethical stuff, we should block it. These are all configured by default for you. So let's save this and let's ask a few questions here real quick. I think I had a list of questions that I had prepared. Here we go. So now the agent is practically ready. I have pointed it to a GPT 4.1 model. I can change the model any time I want. Look at this. I can change to any model I want. And then when I ask a question here, let's see what it comes back with. Hopefully not an error, because I've seen a lot of errors in my demos. Okay, here we go.

35:05 IMDB Agent Demo and Results Slide: 17

demo movie recommendations matrix inception accurate results

So it gave me all the top five movies in the exact format I had asked him to do. And also, this tool always approved this tool. Recommend 5 movies similar to The Matrix, right? So Matrix was a sci-fi movie. Inception, I think these are accurate. Ghost in the shell, equilibrium, minority report, 13th floor. Perfect. So this is bang on. This is exactly what we wanted. Let's talk about this. is my favorite movie. Why is Shawshank Redemption rated so high? 9.2 or something. I guess. So let's see. Now if you choose a 5.4 model, it performs better in terms of responses, accuracy and all that stuff. So now it also tells you how much tokens it is used here.

36:00 Token Usage and Cost Tracking Slide: 17

input tokens output tokens token usage cost tracking caching

So you know how the, what is an input token and an output token? Do you guys, are you guys aware of what is an input token and an output token? Input token is when you ask a question, your, the length of your characters in your question is, you know, gets converted to a token. So in this case, our input token was 2884 characters. The output is what it comes back with. So 343 tokens for the output. Sometimes it also caches the previous output and shows some response. So then you save cost that way. So your cost for using the LLM is monitored this way. Now, emotional resonance, strong performance, clean writing, et cetera. So give you some valid responses back there. Let's do another one. You see. Which movie has the highest gross collection, right? Or made the highest money?

37:06 Grammatically Incorrect Query Handling Slide: 17

grammar handling incorrect queries avatar box office web search

I can ask it, grammatically incorrect questions and it'll still come back. That's what I want to show you, that even if my grammar is incorrect, it will still come back with the right responses. Yeah, Avatar, which is accurate. So 2.9 billion at the box office. Now, I can tell him that go to Rotten Tomatoes and show me the critic reviews for Avatar. So that's a public site. If you asked me to go to a private site, what I did, you were to use a critic. So that's a web search. I'm doing a Google search or Bing search, right? But if I have to query my own internal site for a tool, tool-based search, first of all, you'll have to configure an endpoint, like an API. And my identity that I will use is a custom, so I'm not going to, so this is a playground for testing, but I will have a custom web app or an interface.

38:12 Custom Tool-Based Search and API Integration Slide: 17

tool-based search api integration private sites endpoint configuration key authentication

And the app and interface needs to have access to the API. Yeah, So yeah, it gives me back. Now I'm going to go back quickly and show you some other agents that I've built. So this was the agent that we built for scratch in like 10 minutes. And I have just five more minutes, so I'm just going to do this quick. I have another agent where it's basically doing something different. It's basically had a data set of diseases and predictions. It's a prediction data set where every diseases have certain symptoms. And we can now predict the disease based on the symptoms that you have. Or like, for example, if I have a cough and cold, what could I be suffering from, right? If I have high fever, do I have malaria? We can ask these kind of questions to the data. And then there's also a web search that it goes to WebMD to validate that information as another check. So let's take this prompt, for example. Which diseases match these symptoms most closely? Fatigue, cough, and chest pain?

39:13 Disease Prediction Agent Dataset and Inference Slide: 17

disease prediction csv dataset symptom severity inference webmd

Let's ask this question. And again, as you see, I have this instructions clearly defined on how the agent needs to behave. If you don't give it the instructions, it's going to behave very randomly. It just will behave like a Google search, not exactly the way you want. Yes, sir. So for this scenario, what does the data set do? is it an Excel spreadsheet or is it a markup file? This is, in this case, it's a CSV file, which has, which has a, or an Excel or a CSV, which has columns and rows. And every disease is mapped based on symptom severity in the scale of 1 to 10. You don't have to do anything in between the data set and. We don't have, that's the beauty. You don't have to create a very specific machine learning model for this AI to look, it will infer on its own. That's why the need of machine learning is eliminated. So I have done a project where we had to estimate costs based on the past spend. There was an Excel sheet with 10 years worth of data for all the projects that we have spent money on and the scale and the size of the projects.

40:20 Cost Estimation Agent Example Slide: 17

cost estimation project data excel dataset historical analysis prediction

So without any machine learning technology, I created an agent and gave it that data set. and I asked and I prompted the agent to behave in a certain way that go and look back all the projects which is similar to my prompt that I'm asking a question on and compare and see what would be my estimated cost in the future if I were to do a similar project. So it gave me a prediction based on my past data. Similar to in the past, you had to write a machine learning machine, not with AI, because it has a capability. Let me ask another question. So that the with the have some safeguards or guardrails in it, will block certain prompts. If it has PHI or if it thinks that there is like a health information that he's, you know, I want to retrieve back. I think this should work. Which symptoms most strongly categorize diabetes? So yeah, so based on the data set, we'll first look at the data set and then it'll tell you.

41:24 Disease Symptom Overlap Query Slide: 17

symptom overlap typhoid malaria phi protection safeguards

I mean, look at the third question here: which symptoms overlap between typhoid and malaria? Here we go. It has, you know, vomiting, high fever, etcetera, etcetera, and then severe context, right? So, basically, this is a quick demo on the disease prediction agent. The last and important demo was for the HR. Now, every enterprise has HR, right? So now imagine HR gets hundreds and hundreds of resumes every single day, where there is a job opening, they have a database of all the resumes of current employees and newer prospective employees.

42:22 HR Resume Analysis Agent Use Case Slide: 17

hr agent resume analysis vectorization sharepoint linkedin verification

Imagine there is a new job requirement that the HR is asked to look for candidates. Will the HR go and look at each and every resume, download it from a SharePoint site or from an external custom site and review each and every resume? If she has to review 100 of resumes, she's going to take a week or so to do that. What if we build an agent that can do that? Which means the agent will go and vectorize each and every resume that's being added to a specific SharePoint site. And when you ask a question like, give me the top five data engineering profile, or the top five data engineering candidates that matter job solution. And then you can also ask the agent to go to LinkedIn and verify whether that candidate exists or does not exist. You can also go and do a background check, go to a third-party API to do a background check based on his first, last name and the address.

43:22 HR Agent Demo and Results Slide: 17

hr demo data engineer databricks role candidate search agent results

See if he has a criminal record or things like that. So all those things can be enabled through Foundry. So here an example is, find me the qualified data engineers for a senior data breaks role. For example, this is my requirement, job description or requirement. Yeah, it's telling, I am the top candidate here. So that's great. It's actually, yeah, since I built the agent, so it knows that. So again, I think this is the last part of the demo. And then I just want to show how powerful AI can be if you use it the right way with the right safeguards and with the right guard rails. I hope you enjoyed this and I hope you learned a lot from this. Thank you. Yeah. Do different models have different efficiencies as far as token use and that sort of thing? Yeah, so the models have a higher efficiency for larger token use compared to a small LM models. Yes, the newer models are more efficient, like Claude is by far what I've tested for Claude Opus.

44:31 Model Efficiency and Q&A Wrap-Up Slide: 17

model efficiency token usage claude opus cost comparison gpt models

It's by far the best model out there for all kinds of tasks, not just chat, but also creating charts and graphs or writing code or doing some extraordinary tasks for you. But they are more costly, 7.5 times more costly than the low cost model, like GPT ones. Any other questions? You have time for Q&A? Gail, do we have time for Q&A? OK, yeah, you can talk separately outside this. It's fine. Thank you.

So welcome, that's me. get to introduce today coming off of lunch. Hopefully you all had a great energizing lunch. I get to introduce Mahul Abuva, a senior software engineer in data and AI at QCI. With more than 20 years of experience, Mahul has built and modernized enterprise systems using Microsoft Azure and advanced AI platforms, delivering scalable solutions across industries.

He is also a recognized contributor to the developer community, author of SharePointFix.com. I've been there. a widely read technical blog and Microsoft-nominated Azure Developer Influencer. His work spans production-grade AI systems, including large-scale implementations. In today's session, Mahul will walk us through how to design and deploy enterprise-grade, read it all out, retrieval augmented generation, or RAG systems on Azure, sharing practical architecture insights, real-world examples, and best practices you can apply in your own organization.

So please join me in welcoming Mehul. Thank you. Hello, Can you guys hear me out back there? Good?

Okay. Let me know if I'm being too loud. I have that habit of being loud. So thank you, Gail, for introducing me.

And this is Mehul Bhuva. I have 22 years of software development experience. I started with Microsoft technology stack as a full-stack app developer working on several Microsoft technologies, including VDASP, customizing SharePoint, C#,.NET, the classic world, the modern world,.NET Core, Angular, React, you name it. I've done all kinds of developments.

But recently, I've stepped into the data and AI world. It's been three or four years now where I stepped in from an app world to a data world. and I can bring in a lot of experience and realize that I was able to contribute a lot there. So that's a background of my experience. Moving on to the next slides.

Again, this Gail has already covered this part, so I'm not going to cover it. So the agenda for today, I don't want to bore you with the slide decks and slide shows. I know you guys are here for the demo, and I have 3 interactive demos. In fact, I build these two agents where I can show you the capabilities of the agent.

And then the third demo is going to be we build the agent together. So the agenda is that we cover what is RAG and why RAG, RAG architecture, Azure AI Foundry, what Azure AI Foundry is all about, how to build on Azure AI Foundry, what are the building blocks, production design, how to design A production grade platform, for your or a framework for your rank-based scenarios. And a case study which I recently implemented at my client location, which is Corteva, I've been working with them for the last six years. We implemented an enterprise-wide chatbot for them that scales for up to 22,000 users across the globe.

And so what are some of the best practices that I have learned from my experience? So moving on. So what is RAG? So what is RAG?

Are you guys aware of RAG? First of all, let me ask you this. Okay. So pretty informed audience, retrieval, R stands for retrieval, A for augmented, and G for generation.

So coming back here to RAG, right? So let's assume you have, and every company has this problem, right? You have an employee who joins on day one. right? He needs information right away to get started.

It could be any kind of information. It could be an onboarding information, right? I need my, I need to sign up for a 401k plan. I need to sign up for a dental, medical, vision plans.

How does the employee know all this? Secondly, if you have a plant operator at a site, he needs a checklist on day one, how he should learn the whole process. There are SOPs. There are PDFs, there are hundreds and hundreds of pages of documents that he has to go through.

A majority of the companies, small, big, medium, doesn't matter, have large-scale, unstructured and semi-structured documents. They have lesser databases, but more documents sitting out there in a repository somewhere. It could be a OneDrive location. It could be a SharePoint site. or it could be a person's own physical machine.

There are documents everywhere, and there are several versions of these documents, making it very difficult to find relevant information when you need it the most. That's the key here. So the employee who's starting on day one needs a lot of information on day one to get started on his job. Of course, there'll be onboarding, there'll be training, things like that, but if he's on the shop floor and he does not know what to do, then we have a bigger problem. because then the enterprise or the organization is not allowing him or enabling him to be productive from day one.

That's why the whole concept of rack-based chatbot. So when I say this, it is very serious because we have seen the benefits of implementing a rack-based chatbot at Cortiva. So we have these hundreds of plant sites and operators across the globe. And when we provision, and they don't like to read documents, especially the documents, especially from Europe.

They don't like to read documents that are in English. They like to go and translate them in their local language and then they can make sense out of it. So when you put the LLM, which is a large language model, in front of your own enterprise documents, it all of a sudden has more context because it is not looking at the world wide web. It is just looking at your own documents that you care about.

Right? So coming back to this situation where this employee doesn't know what to do, and if an LLM chatbot can basically look at a 200-page document and give him a checklist of things that he needs to do on day one, like dry the seeds in this way, go to this plant, follow this process, follow this conditioning process, follow this quality process, those kind of things will help him get started. Now, what are the gaps that we have tried to address? I mean, talk about the traditional LLM gaps.

It's a large language model, like a ChatGPT model that you've used, cloud model that you've used. It does not know anything about your enterprise data. It knows a lot about the rest of the world's data. When you ask questions to the LLM in general, it can give you generic responses, not tied to your own process.

But with the RAG in the enterprise, that fixes that problem because then now you have your LLM pointing to your data and not the rest of the world. And with Azure AI Foundry, the contract with Microsoft is such that your data will not leave or the LLM will not be trained on your data, right? Or will not be retrained on your data outside your enterprise. So we solved that problem.

There is 80% less hallucination because it is looking at a specific context. 10 to 100 times more cheaper because you don't have to retrain the model each time. It takes 2 years to retrain a large language model. But when you give it enough context, it's just going to give you a meaningful response. And you can also fine tune your LLM in Azure AI Foundry, which I'll show you in a bit.

Very, very cheap, in a very cheap way. You don't have to spend millions of dollars for that. Day's time to do production. Basically, you can go to production in a matter of, so at Cortiva, if someone comes up with a new use case, like a training, onboarding, knowledge assistance, we can onboard them on their data with a rank-based profile in four hours.

Full audit trail of who's asking what questions. There are safeguards and guardrails around it. There are evaluations that we do. If someone is asking a harmful question, we flag that, right?

Someone is asking an explicit question, we flag them. So we have all the full audit trails with the Azure AI Foundry. Now, this is the architecture that I want to go through real quick. You have your source documents.

They could be living anywhere in your enterprise. It could be a SharePoint, but the major part of it, it's all going to be on SharePoint. I believe everyone knows about SharePoint or a knowledge repository somewhere, right? OneDrive, your physical folders, BLOB containers.

I've seen a lot of documents or unstructured or semi-structured documents living on a BLOB container. So the way the pipeline works is you extract or vectorize the data that you have. It could be hundreds of, you know, PDFs or thousands of PDFs. You vectorize it.

Foundry will automatically chunk it into several chunks with the chunk IDs and then you embed it. using a large language embedding model. Then you vectorize and index it through Azure AI search service. And I'm going to take you through that. These are just the terms that you just have to remember for now.

I'm going to take you and show you what I mean. The retrieve, generate, and grounded answers are the last and final pieces, where essentially your question also gets converted to basically a vector. Who knows, I mean, do you know what a vector index is here? Any idea what a vector index is?

Now your grounding information, which is your source data, is also vectorized. So your documents that you have uploaded, your enterprise documents, your SOPs, plans, procedures, information manuals, your codes, everything is vectorized in the vector database. Those vectorizations are nothing but floating point numbers. So when you ask a question to the chat bot, where can I find a plant operating manual?

Or what is the process of drying a seed? It basically gets converted into a numerical representation. and then it looks at, so we have a big floating point number, and it goes back and looks at your vector index of your documents and tries to do a similarity search, which means it tries to go to the number which is closest to your question. Everything gets converted into a floating point number. So if you have a number like 999, it's going to look at a range of the closest numbers to 999.

And then it gives you more information and context. What that means is that you don't, it's basically not doing a keyword search, but it's going to go and search for the meaning inside the document, right? And then when we combine the hybrid retrieval, that's known as the hybrid retrieval, where you combine the keyword search and the vector search so that it gives you more context and the meaning. And then we also have another concept of re-ranking on the Foundry, which gives you even better quality of the search outcome because it's a combination of both hybrid plus re-ranking.

A vector search will not be good for specific information. It'll be only good for a contextual-based search or meaningful-based search. That's where the keyword search comes into picture as well. And when you combine both these, it becomes even more powerful.

So this is an example of a document intelligence pipeline. As I said, the source documents, we have something known as Azure Document Intelligence, which basically looks at all your documents, thousands of documents. There's no limit to it how much documents you can vectorize. you can then it automatically decides what documents need to be chunked with what tokens. Now, tokens are number of words that it will basically combine into a chunk.

So, depending on if you have short policies, the token size is chosen accordingly. By default, I think the Foundry interface picks the default token of 512 tokens, which is approximately 500 words. It does some metadata enrichment, the security, timestamp, ACL, things like that. And then the OpenAI embeddings convert that into a vector index and store it in the search service, Azure AI search service.

There are several components. It's a whole enterprise architecture framework. Within the Foundry, you are also connecting to an Azure AI search service, which actually does the vectorization process for you. Foundry hosts all the models and LLMs, which I'll show you in a little bit.

I just wanted the concepts to be clear before. Now, what is the benefit of using an Azure AI Foundry? There's a model catalog. There are different kinds of playgrounds.

There's an agent playground. There are tools that you can use. And there is also a chat playground. There are hubs and projects found.

I'm not going to go into the details. These are all high-level information. There is a concept of MCP connectors. Now, for example, how many of you here know about lake houses?

Do you have lake houses in your enterprise. So a lake house is a concept where you bring in data for analytics and machine learning and AI purposes into a central location. So all your transactional systems, you bring in the data into that main reporting engine. They used to call it the warehouses, but now the term has changed to a lake house, which is, you know, you can get billions and billions of records in your lake house.

Those are mostly structured data, not unstructured data. So with MCP connectors in Foundry, you can tap into those like a Fabric lake house. Have you heard about Fabric? It's an Azure concept.

Have you heard about Databricks? Who has heard about Databricks? Databricks has something known as Genie spaces, which basically connect to structured data. So you can bring that in data.

So you can marry the structured data, which is the relational data and unstructured data and semi-structured data using AI Foundry. So it's a platform. and a tool to build on top of. So again, this is central IT now, this is the best approach where a central IT team manages the infrastructure for Foundry and enables the different teams to use the Foundry infrastructure. So it's like provisioning the infrastructure first, and then your project teams can pick and choose the projects they want to create, agents they want to create, right, resources they want to create.

So what powers your rack system is the AI hub, AI projects that we have in the Foundry, the FCP connectors, playgrounds, vector stores, and then of course you have the observability through app insights and open telemetry. Now this is what how you are going to build a rack-based chatbot. I'll show you in a demo. We create the resources for a project, you select a model, you connect your knowledge.

This is all through zero code. Okay, you can do everything through Foundry. So basically, you create a project for Foundry, then you select the model from a catalog of 11,000 models. When I say model, it is your open source GPT models that we have, LLMs, right?

Like ChatGPT or cloud or deep, you know, Gemini, right? Meta models, deepseek. Then you connect your knowledge. It can be a SharePoint site or it can be a BLOB storage.

You configure the index. You vectorize and configure your index through Azure AI service. You can tune and evaluate your models. You can fine tune your models with your data by giving it a training data set.

And then publish and integrate, which means you can publish your agents. You would need a custom integration either through a Teams channel. Have you used Teams? Yes.

So we use the SSO inbuilt SSO. sort of for a single sign-on experience and multi-factor authentication from user perspective when he signs in. On the back end, we have Cosmos DB, which is another, you can store your chat responses and history in any database. We chose Cosmos DB because it's a big database. It's, you know, it can store, you know, terabytes of data.

So whatever interactions you have with your chat bot, we wanted to capture all the interactions that user had, what kind of questions they were asking, what kind of ratings they were giving to the responses. Whether it was a thumbs up or a thumbs down, we wanted to capture all that. So we use Cosmos DB. The knowledge part of it, the other part that you see in purple here, it's all based on Foundry.

It's all a part of the Foundry framework. Now, this is just at a glance that one of the implementations I made at Corteva was a Sprout. We call it the Sprout. It's an enterprise-grade chatbot meant for 22,000 global users for our seeds business.

So we have a seeds business and a crop protection business. This is for the seeds business. What we found is that there were a couple of use cases that we were able to enable on Foundry. One of them was an employee who was 25 years with the company, decided to leave the company.

And he had a bunch of documents that he had created. Those were all semi-structured or unstructured documents. Now we had to train a new employee in his position in 15 days. They asked us to create a profile on Sprout.

That's how the whole evolution started. We were able to create that profile and bring the new person coming in on the chatbot. That was our first validation that yes, chatbots can do a lot than we think about otherwise. So he got trained in 15 days and he is right now performing really well.

So the feedback that we are getting that yes, the chatbot was giving accurate answer. We also were increasing the confidence of the end user by citing each and every source. So citations are very important when you implement chatbots in your companies that you cite the responses and trace it back so that the users can trace it back to the original document. That way they have high trust in what they are seeing and it also is a good indicator of their hallucinations.

So we implemented that and that's quickly we created a framework so that we can scale to multiple different use cases. onboarding assistance. We have 100, we have at least 10 different kinds of profiles now serving hundreds of customers. So another profile which was very interesting was there were leaders who had to present something every Monday morning. And they had these Excel sheets that they couldn't sometimes if they spent hours making diagrams and charts, you know, things like that, you do on Excel, you create a bar chart, you create a pie chart, you create a histogram, all those regular sales report, marketing reports.

We ask questions in natural language, it comes back with a SQL query. So these are some of the use cases we have enabled at Gotiva. 95%, we've seen a 95% retrieval accuracy because we log and monitor all of our responses. Less than two second response time and 40 to 60% cost reduction, especially for places like Brazil, South America or Europe, where the documents that we have are in English, they don't need to translate because AI is a multilingual, LLMs are multilingual. So they respond back in your own language, not the English language.

So that was very helpful because users are now interacting in their own language there. In the past, they had to translate. So we saved a lot of cost there. Now, what are the situation like, what are the RAGs can be implemented not just in the seed business, but also across the board.

And we have seen if you have gone to several, websites like Amazon uses RAG all the time. There are all these, you know, chat bots that you see on your credit card, your banks, right, bank apps, they use RAG all the time because they are looking at FAQs and responding back to you. So, health care policy about retrieval, manufacturing, maintenance, SOP assistance, troubleshooting guides for your workers. If you are, if you have a shop floor and you're in the manufacturing business, you have so many processes that the person needs, employee needs to follow to create a product, right, or to basically build a product in your shop floor.

Those things can be automated through Rack-based chatbots. Logistics, route planning, customs documentation, things like that. Now, we have one of the profiles where... And I'll also show you in a demo where it can basically write up something for you.

If you give it enough context that, hey, you know, I have this requirement, can you write up a draft e-mail for me based on the context I'm giving you? Financial services use compliance bots all the time. Public sector industries are using it. Citizen service assistance, guidance, grounded regulations, et cetera.

So benchmarks that we have seen, as I said, 95% retrieval efficiency, 80% hallucination reduction, 40 to 60% cost cutting, and, you know, versus fine-tuning or retraining agents. These are the some high-level numbers, industry-wide. Scalability and security at enterprise. We have, as I said, we can support 10,000 plus concurrent users, less than 100 millisecond vector search.

Bring your own key encryption and 0 public exposure, because it's all self-contained within the enterprise. Best practices are you have to, so it is a garbage in and garbage out. Your chatbot is as good as your data. If your data is crappy, the chatbot is also going to be crappy.

So that's pretty, it's like a no-brainer. So you have to understand that the data has to be accurate for your chatbots to work accurately, especially in the RAG world where we are telling the LLM that don't use your intelligence, use our intelligence. Don't retrieve information from your memory, use our data. So retrieval design, hybrid search, re-ranking are the best practices.

Using a large language model, text embedding large is the best practice currently. Observability, you have to track what you know, what people are asking questions on. Security is very important. We have basically the custom chatbot interface to only people. the enterprise.

So that is also very important that you don't want exposure. Rotating those keys are also very important because these are all key-based authentication from your endpoints perspective. Cost optimization, we have a way to monitor cost. Now how do you know what's your input and output token cost?

And based on how many number of users are hitting your chatbot every day, there is a visibility for that too on the Foundry interface, which I'll show you. Automation. It's very important now. This might not be important for business users, but from an IT or technical users, you should be automating your deployments.

You should not be creating Foundry infrastructure manually every time, like staging up a search service or staging up a model, creating a model, adding a model. Those can all be automated through something as a DevOps, IAC, infrastructure as a code. Have you heard about infrastructure as a code? Yes.

So these can all be automated for you. so this is just a slide on trends in Azure. So what is the current trends are Foundry IQ is very important because it connects to several different kinds of systems. Foundry IQ can connect to structured data, unstructured data, semi-structured data, laying anywhere in your enterprise. It just doesn't have to be Microsoft enabled.

I'll show you an example of that as well, how we can do that easily through a workflow in Foundry. Efficient models, again, this is just for information purposes. There are some smaller, large language models that you can use for smaller tasks. You don't have to use a large language model all the time.

There's also a small LLM that you can use. Like streaming or even indexes can be updated at a time or at a frequency. Now, go to the most important section, which is the demo, which is what I think you guys are waiting for. So, let me show you how a rack.

So, first starting with what's a vector, how does a vector index looks like? So, if you see my screen here, this is how a vector index looks like. This is a IMDB movie data set, and this is a this is a set of 5000 best or top movies later on IMDB. And if you see, these are the vectors for that.

And if you see, these are the chunk IDs, chunk title. So it's a CSV file. Out of that CSV file, we have so many vectors. So I was telling you that vector is a floating point numerical value.

And every word that's in that CSV file gets converted into a vector or a numerical floating point. And when you ask a question that, hey, give me the list of top 10 highly rated movies, it's going to go and convert that to a numerical representation and find the closest match in this vector database. This is the IMDB movie data set that's hosted on a BLOB container. And I'll show you, we have just vectorized that here as a knowledge source.

So, this is the index I was talking about. The first step is to do the grounding data, which is the grounding data is actually your enterprise documents. Now, you can automate this by creating some pipelines where you can extract data from your SharePoint at a frequency, because there could be hundreds of new documents every day. You don't have to do this manually.

I'm showing it for the demo purposes, but in enterprise scenarios. You should create a document ingestion pipeline if you have a lot of unstructured data and then automate the vector indexing part of it. So the way I created this index was pretty simple. I imported the data from BLOB container.

Yeah, I'm out of the quota. So I have 3 indexes. That's why it doesn't allow me, but that's OK. So the way I do it is basically I've created these indexes and this tier of search service, which is a basic tier, has only three indexes that I can create.

This is the Foundry interface. Now coming back to Foundry, the I think I showed you the Azure AI search service. This is what I was talking about. And then the BLOB container and the vector store.

We go to the foundry interface, which is where the playground is for creating agents and fine-tuning models and creating evaluations and looking at the tools and things like that. So if you see, there are several tools available. Let me see, let me discover. These are the models that you can choose and pick and choose from.

Let's show you some of these models here. There are more than 11,000 models, and every day there are at least 2 or 3,000 models that are being added. Here we go. So cloud is there, and then you have GPT 5.4.

So these are all available to you. And then the data, as I said, these models are available exclusively for your subscription or tenant in your organization. They are not going to train these models with your data and expose it to the outside world. That's why we have Foundry, right?

Because everything is self-contained in the framework. So let's go back and build an agent. Let's create an IMDB agent, movie advisor agent. Now what's the first step?

The first step is to connect your data source, right? Because how does the agent know what to do? If you don't give it the data, the grounding data, so there is this knowledge section here right here. So it's the Foundry IQ I was talking about.

You can connect any kind of services. I want to connect my search service to the knowledge base. If you see, I have my index that I was showing you listed here. I connect my index to this Foundry IQ.

I automatically have a web search here because I also want to go and look at rotten tomatoes. And whatever movie recommendations I get from IMDB data set, I want to validate that with Rotten Tomatoes. You know Rotten Tomatoes, right? It's just a public crowdsourcing for movies.

So now I have instructions here that I have predefined. I don't want to create them by hand. So now these instructions are very important because this tells the agent on what to do. So if you read this in a nutshell, what it tells you is You are an intelligent movie discovery assistant powered by a vectorized knowledge base of 5,000 movies from the TMDB.

Now, the data set, they call it the movies database because of copyright infringement, right? So the data set I have calls it TMDB, but it's actually IMDB data. So it, and then we tell it to basically go and 1st look at our knowledge source and then go to do a live web search to Rotten Tomatoes for real-time scores, audience scores, critics, et cetera, movie reviews. These are the instructions.

These are very important. The model will exactly, and this is an exercise on its own. When we created these profiles for employees at, I mean, for our user base at Corteva, we had to go through and talk to domain experts for each and every profile that we were creating that, hey, how do you want this profile to behave? Every profile is going to behave in a different way.

And it's all configured through this prompt template. This is a very important, this is known as a system prompt engineering. This is a very important aspect in the way that you tell the agent in natural language of how it needs to behave or what persona it needs to take. When someone asks a specific question.

So we also have examples here and the outputs, how it should generate the output. Like when you generate the output, show the title, generate, director, cast, et cetera, et cetera, et cetera. right? So special capabilities, semantic discovery, and I give it more specific information, right? So then there's also voice mode, which I've not, I mean, it's cool to play with, but I've not played around with it too.

This is a new feature where you can basically speak to it and it'll respond back and also speak back to you. So this is all out-of-the-box, guys. You don't have to code for it. You don't have to do like graphs and lang chains and write anything new.

You can, and then the other part which is interesting is you can directly publish the endpoint or you can publish to teams. and Microsoft 365 as an agent. So this is out-of-the-box. And it requires very little technical insight. We just need to know how to set it up the first time.

That took us quite a while to figure out. So then there is also memory. Now some people like to also keep like the past chat history memory of the chat bot based on the The questions that people have been asking, it goes back and looks at the memory. That's why if you have seen, if you go to Claude or ChatGPT, the longer the conversations you have with it, the slower it becomes.

Because it has to go back to your past questions and respond back. Then there is guardrails I was talking about. Guardrails are nothing but safety features that people, if they are asking self-harming questions, questions that they should not be asking, questions about hacking, unethical stuff, we should block it. These are all configured by default for you.

So let's save this and let's ask a few questions here real quick. I think I had a list of questions that I had prepared. Here we go. So now the agent is practically ready.

I have pointed it to a GPT 4.1 model. I can change the model any time I want. Look at this. I can change to any model I want.

And then when I ask a question here, let's see what it comes back with. Hopefully not an error, because I've seen a lot of errors in my demos. Okay, here we go. So it gave me all the top five movies in the exact format I had asked him to do.

And also, this tool always approved this tool. Recommend 5 movies similar to The Matrix, right? So Matrix was a sci-fi movie. Inception, I think these are accurate.

Ghost in the shell, equilibrium, minority report, 13th floor. Perfect. So this is bang on. This is exactly what we wanted.

Let's talk about this. is my favorite movie. Why is Shawshank Redemption rated so high? 9.2 or something. I guess. So let's see.

Now if you choose a 5.4 model, it performs better in terms of responses, accuracy and all that stuff. So now it also tells you how much tokens it is used here. So you know how the, what is an input token and an output token? Do you guys, are you guys aware of what is an input token and an output token?

Input token is when you ask a question, your, the length of your characters in your question is, you know, gets converted to a token. So in this case, our input token was 2884 characters. The output is what it comes back with. So 343 tokens for the output.

Sometimes it also caches the previous output and shows some response. So then you save cost that way. So your cost for using the LLM is monitored this way. Now, emotional resonance, strong performance, clean writing, et cetera.

So give you some valid responses back there. Let's do another one. You see. Which movie has the highest gross collection, right?

Or made the highest money? I can ask it, grammatically incorrect questions and it'll still come back. That's what I want to show you, that even if my grammar is incorrect, it will still come back with the right responses. Yeah, Avatar, which is accurate.

So 2.9 billion at the box office. Now, I can tell him that go to Rotten Tomatoes and show me the critic reviews for Avatar. So that's a public site. If you asked me to go to a private site, what I did, you were to use a critic.

So that's a web search. I'm doing a Google search or Bing search, right? But if I have to query my own internal site for a tool, tool-based search, first of all, you'll have to configure an endpoint, like an API. And my identity that I will use is a custom, so I'm not going to, so this is a playground for testing, but I will have a custom web app or an interface.

And I have just five more minutes, so I'm just going to do this quick. I have another agent where it's basically doing something different. It's basically had a data set of diseases and predictions. It's a prediction data set where every diseases have certain symptoms.

And we can now predict the disease based on the symptoms that you have. Or like, for example, if I have a cough and cold, what could I be suffering from, right? If I have high fever, do I have malaria? We can ask these kind of questions to the data.

And then there's also a web search that it goes to WebMD to validate that information as another check. So let's take this prompt, for example. Which diseases match these symptoms most closely? Fatigue, cough, and chest pain?

Yes, sir. So for this scenario, what does the data set do? is it an Excel spreadsheet or is it a markup file? This is, in this case, it's a CSV file, which has, which has a, or an Excel or a CSV, which has columns and rows. And every disease is mapped based on symptom severity in the scale of 1 to 10.

You don't have to do anything in between the data set and. We don't have, that's the beauty. You don't have to create a very specific machine learning model for this AI to look, it will infer on its own. That's why the need of machine learning is eliminated.

So I have done a project where we had to estimate costs based on the past spend. There was an Excel sheet with 10 years worth of data for all the projects that we have spent money on and the scale and the size of the projects. So without any machine learning technology, I created an agent and gave it that data set. and I asked and I prompted the agent to behave in a certain way that go and look back all the projects which is similar to my prompt that I'm asking a question on and compare and see what would be my estimated cost in the future if I were to do a similar project. So it gave me a prediction based on my past data.

Similar to in the past, you had to write a machine learning machine, not with AI, because it has a capability. Let me ask another question. So that the with the have some safeguards or guardrails in it, will block certain prompts. If it has PHI or if it thinks that there is like a health information that he's, you know, I want to retrieve back.

I think this should work. Which symptoms most strongly categorize diabetes? So yeah, so based on the data set, we'll first look at the data set and then it'll tell you. I mean, look at the third question here: which symptoms overlap between typhoid and malaria?

Here we go. It has, you know, vomiting, high fever, etcetera, etcetera, and then severe context, right? So, basically, this is a quick demo on the disease prediction agent. The last and important demo was for the HR.

Now, every enterprise has HR, right? So now imagine HR gets hundreds and hundreds of resumes every single day, where there is a job opening, they have a database of all the resumes of current employees and newer prospective employees. Imagine there is a new job requirement that the HR is asked to look for candidates. Will the HR go and look at each and every resume, download it from a SharePoint site or from an external custom site and review each and every resume?

If she has to review 100 of resumes, she's going to take a week or so to do that. What if we build an agent that can do that? Which means the agent will go and vectorize each and every resume that's being added to a specific SharePoint site. And when you ask a question like, give me the top five data engineering profile, or the top five data engineering candidates that matter job solution.

And then you can also ask the agent to go to LinkedIn and verify whether that candidate exists or does not exist. You can also go and do a background check, go to a third-party API to do a background check based on his first, last name and the address. See if he has a criminal record or things like that. So all those things can be enabled through Foundry.

So here an example is, find me the qualified data engineers for a senior data breaks role. For example, this is my requirement, job description or requirement. Yeah, it's telling, I am the top candidate here. So that's great.

It's actually, yeah, since I built the agent, so it knows that. So again, I think this is the last part of the demo. And then I just want to show how powerful AI can be if you use it the right way with the right safeguards and with the right guard rails. I hope you enjoyed this and I hope you learned a lot from this.

Thank you. Yeah. Do different models have different efficiencies as far as token use and that sort of thing? Yeah, so the models have a higher efficiency for larger token use compared to a small LM models.

Yes, the newer models are more efficient, like Claude is by far what I've tested for Claude Opus. It's by far the best model out there for all kinds of tasks, not just chat, but also creating charts and graphs or writing code or doing some extraordinary tasks for you. But they are more costly, 7.5 times more costly than the low cost model, like GPT ones. Any other questions?

You have time for Q&A? Gail, do we have time for Q&A? OK, yeah, you can talk separately outside this. It's fine.

Thank you.