Building Enterprise-Scale RAG Chatbots Using Azure AI Foundry
AI Demo
This session provides a practical, end‑to‑end deep dive into building enterprise‑grade Retrieval Augmented Generation (RAG) systems using Microsoft Azure AI Foundry. Drawing from real‑world implementations—including a production RAG chatbot serving 22,000+ global users—the session walks participants through the complete lifecycle of creating scalable, secure, and high‑accuracy RAG solutions tailored for industry.
We begin by breaking down the RAG architecture: document ingestion using Azure Document Intelligence, adaptive chunking strategies, embedding generation, and vector indexing with Azure AI Search. The session then explores how user queries are transformed into embeddings, how retrieval pipelines work, and how Azure OpenAI models inside AI Foundry generate grounded, contextual responses.
Using visuals from the included architecture diagram, attendees learn best practices for chunk sizes, metadata, hybrid search, reranking, prompt design, and governance (RBAC, encryption, audit logging).
Real‑world examples from agriculture, manufacturing, financial services, and healthcare show how organizations use RAG for maintenance assistants, compliance bots, crop advisory tools, customer service, workflow documentation, and more. Performance benchmarks—such as 95% retrieval accuracy, sub‑2‑second responses, and 40%–60% cost reduction—demonstrate measurable business impact.
The session concludes with an interactive discussion on emerging trends—multi‑modal RAG, knowledge graphs, and real‑time streaming—and how Azure AI Foundry’s roadmap supports the next generation of enterprise AI.
This is a highly actionable, architecture‑focused session designed for leaders, engineers, and practitioners looking to implement RAG at scale with Microsoft technologies.
Key Takeaways
- Learn the complete RAG architecture using Azure AI Foundry—from document ingestion and chunking to embeddings, vector search, and grounded response generation.
- Apply practical best practices for building accurate, scalable, and secure enterprise RAG systems, including hybrid retrieval, metadata strategy, and prompt design.
- See real‑world impact in action through a live, end‑to‑end walkthrough of a production RAG chatbot serving 22,000+ users, with patterns you can immediately implement in your organization.
Transcript from Summit:
Session Transcript
Thank you. Hello. Hello. Can you guys hear me out back there? Good. Okay. Let me know if I'm being too loud. I have that habit of being loud. So thank you, Gail, for introducing me. And this is Mehul Buehler. I have 22 years of software development experience. I started with Microsoft Technology Stack as a full stack app developer, working on several Microsoft technologies, including VD ASP, customizing SharePoint, C#.NET, the classic world, the modern world,.NET Core, Angular, React, you name it. I've done all kinds of developments. But recently, I've stepped into the data and AI world. It's been three or four years now where I stepped in from an app world to a data world, and I can bring in a lot of experience. And I realized that I was able to contribute a lot there. So that's a background of my experience. Moving on to the next slides, again, this Gail has already covered this part, so I'm not going to cover it. So the agenda for today, I don't want to bore you with the slide decks and slide shows. I know you guys are here for the demo, and I have 3 interactive demos. In fact, I build these two agents where I can show you the capabilities of the agent. And then the third demo is going to be we build the agent together. So the agenda is that we covered what is RAG and why RAG. RAG architecture, Azure AI Foundry, what Azure AI Foundry is all about, how to build on Azure AI Foundry, what are the building blocks. Production design, how to design A production grade platform. For your or a framework for your rag-based scenarios. And a case study which I recently implemented at my client location, which is Corteva. I've been working with them for the last six years. We implemented an enterprise-wide chat bot for them that scales for up to 22,000 users across the globe. And so, what are some of the best practices that I have learned from my experience? So, moving on, so what is RAG? So, what is RAG? Are you guys aware of RAG? First of all, let me ask you this. OK, so pretty informed audience. Retrieval R stands for retrieval, A for augmented, and G for generation, so... Coming back here to RAG, right? So let's assume you have, and every company has this problem, right? You have an employee who joins on day one, right? He needs information right away to get started. It could be any kind of information. It could be an onboarding information, right? I need my, I need to sign up. for a 401k plan, I need to sign up for a dental, medical, vision plan. How does the employee know all this? Secondly, if you have a plant operator at a site, he needs a checklist on day one, how he should learn the whole process. There are SOPs, there are PDFs, there are hundreds and hundreds of pages of documents that he has to go through. A majority of the companies, small, big, medium, doesn't matter, have large-scale, unstructured and semi-structured documents. They have lesser databases, but more documents sitting out there in a repository somewhere. It could be a OneDrive location, it could be a SharePoint site, or it could be a person's own physical machine. There are documents everywhere, and there are several versions of these documents, making it very difficult to find relevant information when you need it the most. That's the key here. So the employee who's starting on day one needs a lot of information on day one to get started on his job. Of course, there'll be onboarding, there'll be training, things like that, but But if he's on the shop floor and he does not know what to do, then we have a bigger problem. Because then the enterprise or the organization is not allowing him or enabling him to be productive from day one. That's why the whole concept of rack-based chatbot. So when I say this, it is very serious because we have seen the benefits of implementing A rack-based chatbot at Cortiva. So we have these hundreds of planned sites and operators across the globe. And when we provision, and they don't like to read documents, especially the documents, especially from Europe, they don't like to read documents that are in English. They like to go and translate them in their local language, and then they can make sense out of it. So, when you put the LLM, which is a large language model, in front of your own enterprise documents, it all of a sudden has more context because it is not looking at the World Wide Web; it is just looking at your own documents that you care about. So, coming back to this situation where... This employee doesn't know what to do. And if an LLM chatbot can basically look at a 200-page document and give him a checklist of things that he needs to do on day one, like dry the seeds in this way, go to this plant, follow this process, follow this conditioning process, follow this quality process, those kind of things will help him get started. Now, what are the gaps? I've tried to address, I mean, talk about the traditional LLM gaps. It's a large language model, like a ChatGPT model that you've used, cloud model that you've used. It does not know anything about your enterprise data. It knows a lot about the rest of the world's data. When you ask questions to the LLM in general, it can give you generic responses, not tied to your own process. But with the rag in the enterprise, that fixes that problem, because then now you have your LLM pointing to your data and not the rest of the world. And with Azure AI Foundry, the contract with Microsoft is such that your data will not leave, or the LLM will not be trained on your data, right, or will not be retrained on your data. Outside your enterprise. So we solved that problem. There is 80% less hallucination because it is looking at a specific context. 10 to 100 times more cheaper because you don't have to retrain the model each time. It takes 2 years to retrain a large language model. But when you give it enough context, it's just going to give you a meaningful response. And you can also fine tune your LLM in Azure AI Foundry, which I'll show you in a bit. Very, very cheap, in a very cheap way. You don't have to spend millions of dollars for that. Days time to do production. Basically, you can go to production in a matter of, so at Cortiva, if someone comes up with a new use case, like a training, onboarding, knowledge assistance, we can onboard them on their data with a rack-based profile in four hours. full audit trail of who's asking what questions. There are safeguards and guardrails around it. There are evaluations that we do. If someone is asking a harmful question, we flag that, right? Someone is asking an explicit question, we flag them. So we have all the full audit trails with the Azure AI Foundry. Now, this is the architecture that I want to go through real quick. You have your source documents. They could be living anywhere in your enterprise. It could be a SharePoint, but the major part of it, it's all going to be on SharePoint. I believe everyone knows about SharePoint or a knowledge repository somewhere, right? OneDrive, your physical folders, BLOB containers. I've seen a lot of documents or unstructured or semi-structured documents living on a BLOB container. So the way the pipeline works is you extract or vectorize the data that you have. It could be hundreds of PDFs or thousands of PDFs. you vectorize it, the Foundry will automatically chunk it into several chunks with the chunk IDs, and then you embed it using a large language embedding model. Then you vectorize and index it through Azure AI Search Service, and I'm going to take you through that. These are just the terms that you just have to remember for now. I'm going to take you and show you what I mean. The retrieve, generate, and grounded answers are the last and final pieces, where essentially your question also gets converted to basically a vector. Who knows? I mean, do you know what a vector index is here? Any idea what a vector index is? OK, so I can explain. So A vector index is basically, let's assume that you have a set of floating point numbers. OK, so if you have, if you have a, if you ask a question, what is the weather outside look like today? That question gets converted into floating point numbers or coordinates like latitudes and longitudes, which does not make sense to a human but makes a lot of sense to the computer. Now, your grounding information, which is your source data, is also vectorized. So your documents that you have uploaded, your enterprise documents, your, you know, SOPs, plans, procedures, right, information manuals, your codes, everything is vectorized in the vector database. Those vectorizations are nothing but floating point numbers. So, when you ask a question to the chat bot, where can I find a plant operating manual, or what is the process of drying this? See, it basically gets converted into a numerical representation, and then it looks at, so we have a big floating point number, and it goes back and looks at your vector index of your documents and tries to do a similarity search, which means it tries to go to the number. which is closest to your question. Everything gets converted into a floating point number. So if you have a number like 999, it's going to look at a range of the closest numbers to 999, and then it gives you more information and context. What that means is that you don't, it basically is not doing a keyword search, but it's going to go and search for the mean. Meaning inside the document, right? And then when we combine the hybrid retrieval, that's known as the hybrid retrieval, where you combine keyword search and the vector search, so that it gives you more context and the meaning. And then we also have another concept of reranking on the Foundry, which gives you even better quality of... The search outcome, because it's a combination of both hybrid plus re-ranking. So semantic or vector search is incredibly powerful for natural language questions. It can paraphrase queries and conceptual lookups. This is what makes the chatbot feel like it actually understands you rather than doing a control F, like finding your information. And the BM25, which is another concept, is a keyword-based search. where it's like a keyword specialist that is looking at blind spot information which are more specific like a product code. A vector search will not be good for specific information. It will be only good for a contextual based search or a meaningful based search. That's where the keyword search comes into picture as well. And when you combine both these, it becomes even more powerful. So this is an example of a document intelligence pipeline. As I said, the source documents, we have something known as Azure document intelligence, which basically looks at all your documents, thousands of documents. There's no limit to how much documents you can vectorize. You can then it automatically decides what documents need to be chunked with what tokens. Now, tokens are a number of words. that it will basically combine into a chunk. So depending on if you have short policies or FAQs, the token size is chosen accordingly. By default, I think the Foundry interface picks the default token of 512 tokens, which is approximately 500 nodes. It does some metadata enrichment, security, timestamp, ACL, things like that. And then the open AI embeddings. convert that into a vector index and store it in the search service, Azure AI search service. There are several components. It's a whole enterprise architecture framework. Within the Foundry, you are also connecting to an Azure AI search service, which actually does the vectorization process for you. Foundry hosts all the models and LLMs, which I'll show you a little bit. I just wanted the concepts to be clear for a demo. Now, what is the benefit of using an Azure AI Foundry? There's a model catalog. There are different kinds of playgrounds. There's an agent playground. There are tools that you can use. And there is also a playground. There are hubs and projects found. I'm not going to go into the details. These are all high-level information. There is MCP concept of MCP connectors. Now, for example, How many of you here know about lake houses? Do you have lake houses in your enterprise? So a lake house is a concept where you bring in data for analytics and machine learning and AI purposes into a central location. So all your transactional systems, you bring in the data into that main reporting engine. They used to call it the warehouses, but now the term has changed to a lakehouse, which is, you know, you can get billions and billions of records in your lakehouse. Those are mostly structured data, not unstructured data. So with MCP connectors in Foundry, you can tap into those like a fabric lakehouse. Have you heard about Fabric? Azure, it's Azure concept. Have you heard about Databricks? Who has heard about Databricks? Databricks has something known as the Genie spaces, which basically connect to structured data. So you can bring that in data. So you can marry the structured data, which is the relational data, and unstructured data and semi-structured data. Using AI Foundry, so it's a platform and a tool to build on top of. So again, you know, this is central IT now, this is the best approach where a central IT team manages the infrastructure for Foundry and enables the different teams to use the Foundry infrastructure. So it's like provisioning the infrastructure first, and then your project teams can pick and choose the projects they want to create, agents they want to create. Right, resources they want to create. So what powers your rack system is the AI hub, AI projects that we have in the Foundry, the SCP connectors, playgrounds, vector stores, and then, of course, you have the observability through app insights and open telemetry. Now, this is what how we are going to build a rack-based chatbot. I'll show you in a demo. We create the resources for a project, you select a model, you connect your knowledge. This is all through 0 code. Okay, you can do everything through Foundry. So basically, you create a project for Foundry, then you select the model from a catalog of 11,000 models. When I say model, it is your open source GPT models that we have LLMs, right, like ChatGPT or Cloud or, you know, Gemini, right, meta models, deep seek, then you connect your knowledge. It can be a SharePoint site or it can be a BLOB storage. You configure the index, you vectorize and configure your index through Azure AER service. You can tune and evaluate your models. You can fine tune your models with your data by giving it a training data set. And then publish and integrate, which means you can publish your agents. You will need a custom integration either through a Teams channel. Yes, so you can create a Teams channel directly through Foundry. You don't have to write a single line of code for that. And then you can also call the APIs that you have created on the Foundry. Through a custom interface, just like a regular API call with a key-based authentication. So, again, this is a production-grade implementation that I've done at Cortiva, and this is just an example of what we have done, so... We use the SSO inbuilt SSO sort of for a single sign-on experience and multi-factor authentication from a user perspective when it signs in. On the back end, we have Cosmos DB, which is another you can store your chat responses and history in any database. We chose Cosmos DB because it's a big database. It's you know it can store you know terabytes of data, so whatever. What interactions you have with your chat bot, we wanted to capture all the interactions that user had, what kind of questions they were asking, what kind of ratings they were giving to the responses, whether it was a thumbs up or a thumbs down, we wanted to capture all that. So we use Cosmos DB. The knowledge part of it, the other part that you see in purple here, it's all based on Foundry. It's all a part of the Foundry framework. Now, this is just at a glance that one of the implementations that we did at Cortiva was a Sprout. We call it the Sprout. It's an enterprise-grade chatbot meant for 22,000 global users for our seeds business only. So we have a seeds business and a crop protection business. This is for the seeds business. What we found is that There were a couple of use cases that we were able to enable on Foundry. One of them was an employee who was 25 years with the company, decided to leave the company. And he had a bunch of documents that he had created. Those were all semi-structured or unstructured documents. Now we had to train a new employee in his position. in 15 days. They asked us to create a profile on Sprout. That's how the whole evolution started. We were able to create that profile and train the new person coming in on the chatbot. That was our first validation that yes, chatbots can do a lot than we think about otherwise. So he got trained in 15 days. And he is right now performing really well. So the feedback that we are getting that, yes, the chatbot was giving accurate answer. We also were increasing the confidence of the end user by citing each and every source. So citations are very important when you implement chatbots in your companies that you cite the responses and trace it back so that the users can trace it back to the original document. That way, they have high trust in what they are seeing, and it also is a good indicator if there are hallucinations. So... We implemented that quickly. We created a framework so that we can... And scale to multiple different use cases. Knowledge onboarding, knowledge assistance - we have 100, we have at least 10 different kinds of provides now, serving hundreds of customers. So, another profile which was very interesting was there were leaders who had to present something every Monday morning. And they had these Excel sheets that they could sometimes if they spent hours making diagrams and charts, you know, things like that, like you do on Excel. You create a bar chart, you create a pie chart, you create a histogram, all those regular sales reports, marketing reports. So we enabled that profile where they upload the Excel on the chat bot and the chat bot can info the metrics that they want and create the charts and graphs for them. So you could also use it for scenarios other than the regular chat bot experience for more of an agent. So then we also had, so we used tools like code interpreter to. enable those things. We also have other kind of agentic profiles where we are asking them to go to Databricks Genie spaces and give back structured data or structured information like a SQL query, natural language. We ask questions in natural language, it comes back with a SQL query. So these are some of the use cases we have enabled at Cortiva. 95%, we've seen a 95% retrieval accuracy because we log and monitor all of our responses. Less than two second response time and 40 to 60% cost reduction, especially for places like Brazil, South America or Europe, where the documents that we have are in English. They don't need to translate because AI is a multilingual, LSS are multilingual, so they respond back in your own language. Not the English language, so that was very helpful because users are now interacting in their own language there. In the past, they had to translate, so we saved a lot of cost there. Now, what are the situations? Like, you know, what are the rags can be implemented not just in the seed business, but also across the board? And we have seen, if you have gone to several websites, like Amazon uses rag all the time, there are all these chatbots that you see on your credit card, on your banks, bank apps, They use RAG all the time, because they are looking at and responding back to you. So, simple healthcare policy about retrieval, manufacturing, maintenance, troubleshooting guides for your workers. If you are, if you have a shop floor and you're in the manufacturing business, you have so many processes. that the person needs to follow to create a product, right, or to basically build a product in your shop floor. Those things can be automated to rag-based chatbots. Logistics, route planning, customs documentation, things like that. Now, we have one of the profiles where, and I'll try to show you in a demo, where it can basically write. write up something for you. If you give it enough context that, hey, you know, I have this requirement, can you write up a draft email for me based on the context I'm giving you? Because that financial services use compliance boards all the time, public sector industries are using it, citizen service assistance, guidance, grounded regulations, etcetera. So benchmarks that we have seen, as I said, 95% retrieval efficiency, 80% hallucination reduction, 40 to 60% cost cutting, and you know, was this fine-tuning or retraining the agents? These are some high-level numbers, industry-wide. Scalability and security at enterprise, we have, as I said, we can support 10,000 plus concurrent users, less than 100 millisecond vector search, bring your own key encryption, and zero public exposure, because it's all self-contained within the enterprise. Best practices are you have to, so it is a garbage in and garbage out. Your chatbot is as good as your data. If your data is crappy, the chatbot is also going to be crappy. So that's pretty, it's like a no-brainer. So you have to understand that the data has to be accurate for your chatbot to work accurately. especially in the rank world, where we are telling the LLM that don't use your intelligence, use our intelligence. Don't retrieve information from your memory, use our data. So retrieval design, hybrid search, re-ranking are the best practices. Using a large language model, text embedding large is the best practice currently. Observability, you have to track what you know what people are asking questions on. Security is very important. Have we have basically? the custom chatbot interface to only people who provide the enterprise. So that is also important that you don't want exposure. Rotating those keys are also very important because these are all key-based authentication from your endpoints perspective. Cost optimization, we have a way to monitor cost. Now, how do you know what's your input and output token cost? And based on how many number of users are hitting your chatbot every day? There is a visibility for that too on the Foundry interface, which I'll show you. Automation. It's very important. Now, this might not be important for business users, but from an IT or technical users, you should be automating your deployments. You should not be creating Foundry infrastructure manually every time, like staging up a search service or staging up a model, creating a model. Getting a model, those can all be automated through something as a DevOps IASTC infrastructure as a code. Have you heard about infrastructure as a code? Yes, so this can all be ordered for you. Yeah, so this is just a slide on trends in Azure. So what is the current trends are, Foundry IQ is very important because it connects to several different kinds of systems. Foundry IQ can connect to structured data, unstructured data, semi-structured data, laying anywhere in your enterprise. It just doesn't have to be Microsoft enabled. It can be also all the databases, it can be other types of databases or other types of disparate systems that you can connect through through MCP servers. Agent AI is very, basically, Microsoft is now focused on doing agent-based systems, where you have an agent which basically calls another agent, and then they basically synchronize with each other and come back with the information. That's where they're going to, the multi-supervisory agent that can delegate the tasks and come back with the accurate response. I'll show you an example of that as well, how we can do that easily through a workflow in Foundry. Efficient models, again, this is just for information purposes. There are some smaller, large language models that you can use for smaller tasks. You don't have to use a large language model all the time. There's also a small LLM that you can use. For E. Can be updated or at a frequency. Now go to the most important section, which is the demo, which is what I think you guys are waiting for. So let me show you how a rack. So first starting with what's a vector, how does a vector index look like? So if you see my screen here, this is how a vector index looks like. This is an IMDB movie data set, and this is a set of 5000. best or top movies later on IMDB. And if you see, these are the vectors for that. And if you see, these are the chunk IDs, chunk title. So it's a CSV file, and out of that CSV file, we have so many vectors. So I was telling you, the vector is a floating point numerical value, and every word that's in that CSV file gets converted into a vector. or a numerical floating point. And when you ask a question that, hey, give me the list of top 10 highly rated movies, it's going to go and convert that numerical representation and find the closest match in this vector database. This is the IMDB movie data set that's hosted on a BLOB container, and I'll show you, we have just vectorized that. AER. as a knowledge source. So this is the index that I was talking about. The first step is to do the grounding data, which is the grounding data is actually your enterprise documents. Now, you can automate this by creating something known as pipelines, where you can extract data from your SharePoint at a frequency, because there could be hundreds of new documents every day. You don't have to do this manually. I'm showing it for the demo purposes, but in enterprise scenarios, you should create a document ingestion pipeline if you have a lot of unstructured data, and then automate the vector indexing part of it. So, the way I created this index was pretty simple: I imported the data from BLOB container. Yeah, I'm out of the quota. So I have 3 indexes, that's why it doesn't allow me, but that's okay. So the way I do it is basically I've created these three indexes and this tier of search service, which is a basic tier, has only three indexes that I can create. So I couldn't show it to you, but there are these three indexes that I've created. created through an import process, which is a manual process, and you can automate that easily with very little code. So now what I'm going to do is I'm going to create an agent where we tie that agent to this particular IMDB index. Right, let's call it. So, this is an OK, this is the Foundry interface. Now, coming back to Foundry, I showed you the Azure AI search service. This is what I was talking about, and then the BLOB container and the vector store. We go to the Foundry interface, which is where the playground is for creating. agents and fine-tuning models and creating evaluations and looking at the tools and things like that. So if you see, there are several tools available. Let me see, let me discover. Oh These are the models that you can choose and pick and choose from. Go ahead and show you some of these models here. There are more than 11,000 models, and every day there are at least two or 3000 models that are being added. Yeah, we go. So cloud is there, and then you have GPT 5.4. So these are all available to you. And then the data, as I said, these models are available exclusively for your subscription or tenant in your organization. They are not going to train these models with your data and expose it to the outside world. That's why we have foundry, right, because everything is self-contained in the framework. So, let's go back and build an agent. Let's create an IMDB agent, Movie Advisor agent. ITS. Now, what's the first step? The first step is to connect your data source, right? Because how does the agent know what to do if you don't give it the data, the grounding data? So there is this knowledge section here right here. So this is a foundry IQ I was talking about. You can connect any kind of services. I want to connect my search service. to the knowledge base. If you see, I have my index that I was showing you listed here. I connect my index to this. I automatically have a web search here because I also want to go and look at Rotten Tomatoes. And whatever movie recommendations I get from IMDB data set, I want to validate that with Rotten Tomatoes. You know Rotten Tomatoes, right? It's just a It's just a public crowdsourcing for movies. So now I have instructions here that I have predefined. I don't want to create them by hand. So now these instructions are very important because this tells the agent on what to do. So if you read this, in a nutshell, what it tells you is... So you are an intelligent movie discovery assistant powered by a vectorized knowledge base of 5,000 movies from the TMDB. Now, the data set, they call it the Movies Database because of copyright infringement, right? So the data set I have calls it TMDB, but it's actually... MDB data, so, and then we tell it to basically go and 1st look at our knowledge source and then go to do a live web search to Rotten Tomatoes for real-time scores, audience scores, critics, etc., movie reviews. These are the instructions. These are very important. The model will exactly, and this is an exercise on its own. When we created these profiles for employees at, I mean, for our user bases at Corteva, we had to go through and talk to the domain experts for each and every profile that we were creating that, hey, how do you want this profile to behave? Every profile is going to behave in a different way. And it's all configured through this prompt template. This is a very important, this is known as a system prompt engineering. This is a very important aspect in the way that you tell the agent in natural language of how it needs to behave or what persona it needs to take. When someone asks a specific question. So we also have examples. and the outputs, how it should generate the output. Like when you generate the output, show the title, generate, director, cast, etc, etc, etc, etc. Right? So special capabilities, semantic discovery, and I give it more specific information, right? So then there's also voice mode, which I have not, I mean, it's cool to play with, but I have not played around with it. This is a new feature. where you can basically speak to it and it'll respond back and also feedback to you. So this is all out of the box, guys. You don't have to code for it. You don't have to do like graphs and lang change and write anything new. You can, and then the other part which is interesting is you can directly publish the endpoint. or you can publish to Teams and Microsoft 365 as an agent. So this is out of the box. And it requires very little technical insight. You just need to know how to set it up the first time. That took us quite a while to figure out. Then there is also memory. Now some people like to also keep like the past chat history memory of the chat bot based on the questions that people have been asking. It goes back and looks at the memory. That's why if you have seen, if you go to Claude or ChatGPT, the longer the conversations you have with it, the slower it becomes. because it has to go back to your past questions and respond back. Then there is a guardrails I was talking about. Guardrails are nothing but safety features that people, if they are asking self-harming questions, questions that they should not be asking, questions about hacking, unethical stuff, we should block it. These are all configured by default for you. So let's save this and let's ask a few questions here real quick. I think I had a list of questions that I prepared you ago. So now the agent is practically ready. I have pointed it to a GPT 4.1 model. I can change the model anytime I want. Look at these. I can change to any model I want. And then I'm going to ask a question here. Let's see what it comes back with. Hopefully not an error. Because I've seen a lot of errors in my demos. OK, here you go. So, it gave me all the top five movies in the exact format I had asked him to do, and also... Tool always approve this tool. They come in five movies similar to The Matrix, right? So, Matrix was a sci-fi movie, Inception. I think these are accurate. Ghost in the shell, equilibrium, minority report, 13th floor. Perfect. So this is bang on. This is exactly what we wanted. Let's talk about this. This is my favorite movie. Why is Shawshank Redemption rated so high? 9.2 or something? I guess. So you see. Now, if you choose a 5.4 model, it performs better in terms of responses, no accuracy, and all that stuff. Now, it also tells you how much tokens it is used here. So, you know how the, what is an input token and an output token? Do you guys, are you guys aware of what is an input token and an output token? Input token is when you ask a question, the length of your characters in your question is, you know, get converted to a token. So in this case, our input token was 2,884 characters. The output is what it comes back with. So 343 tokens for the output. Sometimes it also caches the previous output and shows some response. So then you save cost that way. So, your cost for using the LLM is monitored this way. Now, emotional resonance, strong performance, screenwriting, etc. So give you some valid responses back there. Let's do another one. You see. Julie. Has the highest gross collection, right, or made the high? Money. I can ask it, you know, grammatically incorrect questions and it'll still come back. That's what I want to show you, that even if my grammar is incorrect, it will still come back with the right responses. Yeah, Avatar, which is accurate. So 2.9 billion in the box office. Right now, I can tell him that go to Rotten Tomatoes and show me the critic reviews. For Avatar. So, that's a public site. If you asked to go to a private site, what I did, you were to use... A good. So, that's a web search. I'm doing a Google search or web search or Bing search, right? But if I have to... query my own internal site for a tool, tool-based search. First of all, you'll have to configure an endpoint, like an API. And my identity that I will use is a custom, so I'm not going to, so this is a playground for testing, but I will have a custom web app or an interface. And the app and interface. This needs to have access to the API, yeah, yeah. So, yeah, it gets me back there. Now, I'm going to go back quickly and show you some other agents that I built. So, this was the agent that we built for scratch in like 10 minutes, and I have just five more minutes, so I'm just going to do this quick. I have another agent where it's basically doing something different. It's basically had a data set of diseases and predictions. It's a prediction data set where every diseases have certain symptoms. And we can now predict the disease based on the symptoms that you have. Or like, for example, if I have a cough and cold, what could I be suffering from? Right? If I have high fever, do I have malaria? We can ask these kind of questions to the data. And then there's also a web search that goes to web. MKT to validate that information as another check. So, let's take this prompt: for example, which diseases match these symptoms most closely: fatigue, cough, and chest pain? Let's ask this question. And again, as you see, I have this instructions clearly defined on how the user needs to behave. If you don't give it the instructions, it's going to behave very randomly. It just will behave like a Google search, not exactly the way you want. Yes, sir. So, for this scenario, what does the data set look like? Is it an Excel spreadsheet or is it a markdown file? This is in this case, it's a CSV file which has which has a or an Excel or a CSV which has columns and rows, and every disease is mapped based on symptom severity. In the scale of 1 to 10, you don't have to do anything in between the data set and. We don't have to. That's the beauty. You don't have to create a very specific machine learning model for this AI to look. It will infer on its own. That's why the need of machine learning is eliminated. So I have done a project where we had to estimate costs based on the past spend. There was an Excel sheet with 10 years worth of data. for all the projects that we have spent money on and the scale and the size of the projects. So without any machine learning technology, I created an agent and gave it that data set. And I asked and I prompted the agent to behave in a certain way that go and look back all the projects, which is similar to my prompt that I'm asking a question on. and compare and see what would be my estimated cost in the future if I were to do a similar project. So it gave me a prediction based on my past. Past data. Similar to in the past, you had to write a machine learning machine, not with AI, because it has... Let's see, ask another question. So, with this on the because I have some cigards or guardrails in it, it will block certain prompts if it has PHI or if it thinks that there is like a health information that he is, you know, I want to retrieve that. I think this should work. Which symptoms most strongly categorized diabetes? So, yeah, so based on the data set, it'll first look at the data set, and then it'll tell you. Let me look at the third question here. Which symptoms overlap between typhoid and malaria? Here we go. Data set derived chills, you know, vomiting, high fever, etc., etc., and then severe context, right? So, basically, this is a quick demo on the disease prediction agent. The last and important demo is for the HR. Now, every enterprise has HR, right? So, now imagine HR. gets hundreds and hundreds of resumes every single day, where there is a job opening, they have a database of all the resumes of current employees and new prospective employees. Imagine, there is a new job requirement that the HR is asked to look for candidates. Will the HR go and look at each and every resume, download it from a SharePoint site or from an external, you know, custom site, and review each and every resume? If she has to review 100 of resumes, she's going to take a week or so to do that. What if we build an agent that can do that? which means the agent will go and vectorize each and every document that's being added to a specific SharePoint site. And when you ask a question like, give me the top five data engineering profile. Or do you need a top five data engineering candidates that my job is better solution, and then if you can also ask the agent to go to LinkedIn and verify whether that candidate exists or does not exist, you can also go and do a background check, go to a third-party API to do a background check based on. his first, last name, and the address. See if he has a criminal record or things like that. So all those things can be enabled through Foundry. So here an example is find me the qualified data engineers for a senior databricks role, for example. This is my requirement, job description or requirement. Yeah, it's selling. I am the top candidate here. That's great. Actually, yeah. Since I built the agent, so it knows that. So again, I think this is the last part of the demo, and then I just want to show how powerful AI can be if you use it the right way with the right safeguards. And with the right guardrails. I hope you enjoyed this and I hope you learned a lot from this. Thank you. Yeah. Do different models have different efficiencies as far as token use and that sort of thing? Yeah, so the large language models have a higher efficiency for larger token use compared to small LM models. So, are newer models more efficient? Yes, the newer models are more efficient, like Claude is by far what I've tested for. Claude Opus 4.7 is by far the best model out there for all kinds of tests, not just chat, but also creating charts and graphs or writing code. Or doing some extraordinary tasks for you. But they are more costly, 7.5 times more costly than the low cost model, like GPT ones. Any other questions? Do you have time for Q&A? Gail, do we have time for Q&A? OK, OK, yeah, you can talk separately outside this. Sorry, thank you.