Success Story #2 – Natural Language Search for Member Benefits
Production and Operations
This session showcases a hands‑on, end‑to‑end exploration of how natural language search can transform the member benefits experience within the myWellmark ecosystem. The presentation begins by introducing the core problem statement: members struggle to locate and understand their PQF benefits due to unintuitive, jargon‑heavy search tools. Using real usability testing findings, the session grounds the problem in real‑world user experience challenges.
From there, the session shifts into a practical, interactive walkthrough of the proposed AI‑powered solution. Attendees are guided through the architecture in an accessible format, visualizing how AWS S3, Bedrock, Lambda, and API Gateway work together to deliver deterministic responses to human‑language queries. The session includes three live, scenario‑based benefit searches—diagnostic colonoscopy, maternity benefits, and shoe inserts—demonstrating how natural language inputs return precise and category‑aware benefit results.
Throughout, the format blends technical explanation, real interface screenshots, and storytelling to make complex AI and NLP concepts relatable. The session concludes with measurable value insights, team learnings, production considerations, and projected cost savings, creating a clear connection between innovation, user experience, and operational impact.
Key Takeaways
- Natural language transforms benefit search.
- A practical, scalable architecture they can model.
- Clear business impact and measurable value.
Transcript from Summit:
Session Transcript
Good afternoon. I'm Nick Nystrom from Walmart Blue Cross Blue Shield of Iowa and South Dakota. I'm going to walk you guys through today a story about how to apply AI in a place maybe where information is very complex, regulated, and really incredibly human. Member benefits and customer service is where we're going to focus today. This isn't a product launch or a live demo. It's a real exploration of what worked, what didn't work, and really what we learned as a team through that lens of AI-driven experiment. So what I want to start with first today, just a quick overview about myself and a little bit about Wellmark. Wellmark is in Iowa and South Dakota. And we have about 2,000 employees. And the headquarters is in Des Moines, where I work. I'm on a member experience team that's embedded within technology at our company. And there's about 60 of us on our team. We have everybody from disciplines ranging from design to analytics to user testing to research to experience delivery. Just a lot of different, we have content writers, we have designers, obviously. So a lot of disciplines that really help me as an experience lead deliver epic experiences to our members. So just by a raise of hands, how many have Walmart insurance? Anybody here? Okay, a lot of people. We have a lot of members, about two million members in Iowa and South Dakota. My team's focused on really the first part of the member's journey. So that's the invite, the shop, the enroll in the welcome moments. So how do we show up for those members in those service moments? That's what my team's focused on, right? So a lot of it's driving adoption to our self-serve platform, which is our mobile app. But there's also initiatives I'm leading across our enterprise as well, and I'll talk a little bit about that as well. We are all trained in human-centered design. We use Luma, which is a... methodology for practitioners of human-centered design. I'm A Luma certified instructor as well, so we're also rolling out Luma practitioner trainings to the rest of the organization on different product teams. So human-centered design, if you're not familiar with it, there's a lot of different methodologies. Luma is the one that we use at Walmart. It's really just about putting the user at the center of how to solve your problem, which yields really good results for us. We do a lot of data-driven decisions. So we measure everything we build. There's been a lot of themes throughout today's talks around measurement. We measure in the form of interaction metrics using Google Analytics and our digital properties. We also have voice to customer. We have a really extensive voice to customer program. So if you interact with something, whether it's physical customer service phone call on our mobile property, you're going to get an email, right? And we're going to have to ask questions about that. And we track that on a scorecard so we can start to see sentiment on an experience that we deliver. And that really helps us iterate and refine that once we launch. And then the final thing is outcomes, right? So when I drive an Epic experience, I'm looking for outcomes. Some of them are business, some of them are member, but ultimately, outcomes, interactions, and VOC is how we really make decisions at Wellmark on the Experience Team. I've been at Wellmark for four years. Before that, I was at a company called Kingland, right down the road here in Ames, Iowa. I spent 5 1/2 years there as a product manager and product analyst. Got a lot of good experience there. I was also part of the Technology Association of Iowa's ITLI, which is their leadership program last year. I was a graduate of that. And I'm a DJ, 20 years in the wedding and corporate event space. And we've had a lot of talks about AI today. I dove in last year, actually, into creating my own music, and I leveraged Suno AI to do the vocals. I produce a lot of my own beats and all that. It's EDM house music, but I didn't have a singer. And during JC's keynote, I was loving that he was bringing a little bit of AI music flavor into that because that hit home with me. So I have 16 songs out on all streaming platforms. So if you feel like a little workout music later on, you can search me up and find my music out there. That's potentially, potentially. So let's talk about the problem statement, right? This is not a product demo, as I mentioned. This is really a story about how we got here. As AI adoption accelerates, a lot of the conversation focuses on complexity and capabilities. What technology can do, what we're going to focus on, where it fits in within our organization, how it can help our people. All of those things that apply carelessly can be not good for your organization and culture. So we're going to talk about that experience, not just the technology. We base everything in personas, as I mentioned. We have our great research team that will do foundational research that yields journey maps, personas, all of the things that I need as an experience lead to make a decision on strategy of how we approach something, which is So great to have that ability on our team. We did some testing around just searching benefits, which is a large problem. You guys may have ran into this before. Finding out what's covered, how to get service, can I get this surgery, is this preventative covered? All of those things, right? It's complex, depends on your plan, and your familiarity with healthcare. So What we looked at here was looking at trying to figure out how participants in that space want to search for benefits. And surprise, they want to use regular language. Like they just want to ask it questions. And I think we have probably ChatGPT on the commercial side to blame for that because everybody's using it off their mobile phone on the side of their desk. So Querying and asking standard questions in human language is kind of the norm now. And so we thought, well, how are we going to solve that for our complex benefits when it comes to medical jargon and all of that stuff? How are we going to do that? So that's what I want to talk about today. And I want you guys to meet Sara, right? So I want to start with a little bit of a story. Imagine you guys are a member, which I think a lot of you in this room are, you just got a bill in the mail, and it's higher than expected, but you don't get why. So what do you do, right? You do what all of us do. You go online, you search, you type something like, why wasn't my visit covered? Why was this denied? And what you get back isn't an answer. You usually get PDFs. Sometimes you'll get legal language. Sometimes you get benefit summaries written for compliance, not comprehension from a member point of view. So now you're frustrated, not only because the information doesn't exist, but because it feels impossible to find your answer. So eventually what you do, call customer service, right, which is our highest cost channel to serve our members. So you get the information from the customer service rep. A human has to translate that complex information in real time under pressure and deliver that for you, right? That moment where that search failed and support takes over is really where our story begins. So I want to show a quick video that I think will really hit home with this audience. And I'm going to switch over. Sorry, I'm not sure what that is. Oh. Okay. Jamie is about to have her first baby, so she goes to mywellmark.com to understand her medical benefits. She scrolls and scrolls and scrolls. Hundreds of options. The answers are there, but she can't find them. Frustrated, Jamie gives up and calls for help. That happened. far too often. So well, Mark fixed it. Using AI, we created a new way to search, one that understands the way people actually talk. Now Jamie types one word, and natural language search instantly finds the right coverage. Suddenly, it's all there. Prenatal care, postnatal support. Even breast pumps she didn't know were covered. One search, clear answers, and Jamie can get back to what matters most. Fewer calls, faster help, a more efficient system for everyone. So as you guys can imagine, pretty awesome, right? For our members that are calling and trying to find this stuff, they could self-serve. Our customer service agents can leverage this as well to help serve members. And that's super important, right? Because that's going to give them the value as being a Wellmark member, maybe that another health insurance carrier might not do. So we talk about the core problem. Members don't search for benefits, right? They ask questions. That's what they do. Is it covered? What do I do next? Why do I owe this? That's the fundamental issue here. Members don't search, right? They ask those questions. So search is assuming people know what to ask for. Benefits assume people know how the coverage works. Neither of those assumptions are true, however. So there's a constant mismatch between how the systems are built and how those systems behave. And that's really what we're going to focus on today with that core problem I mentioned. Why does the search fail? Traditional search assumes 2 things, right? Actually, three things when I think about it. The first is that users know the right words, which they don't always know the right words to search. Second, that the content's readable. And third, that answers all live in one place. Sadly, in healthcare, they don't live all in one place. This really breaks down. Terminology varies between plans, contents fragmented, and answers depend on context. That plan, the claim, the timing, and all the expectations that the member has All of those vary depending on the situation for the member. So that can be a big struggle on why they can't get answers and why they can't search on what they're looking to find. So when search fails, customer service absorbs the cost for us. And I mentioned that that's the highest channel cost that we have. So calls increased, handle times go up, our customer service agents are forced to act as like translators between what the member is asking in that complex benefit question and really giving them that answer that will help them in the real human situation. So this is not really a digital experience problem. It's kind of an operational one if you look at it. So this just shows kind of generically how someone would call, the confusion, a member calls, A customer service representative has to translate that, which leads to longer handle time and obviously less customer satisfaction. We all want to make sure that we can handle those member requests as soon as they come into customer service and get the member what they need when they need it. I'm also going to talk here about why this is a hard AI problem. If it were easy to apply here, I mean, it would be everywhere already. It's slowly starting to get to a place where AI is embedding everywhere. But in terms of healthcare and searching, it's not there yet, right? These healthcare benefits involve regulated content, I mentioned fragmented systems and really 0 tolerance for hallucinations. You couldn't imagine someone wanting to do a preventative service, a heart surgery, transplant, or something very serious and getting information that it's covered. And then they go and have the surgery and then they're stuck with a $100,000 bill. I mean, that happens. We hear stories of that happening. I'm sure you guys maybe know people that have had issues with getting the wrong information. So hallucinations, obviously, as you know, in AI can happen, and there's 0 tolerance to that in the healthcare space. So getting something almost right can actually be worse than getting it wrong in our profession, in healthcare. So the reality is heavily shaped on how we approach this work. But you can see here, there's a lot to be considered in this domain specifically. So let's begin with internal testing. We used a hackathon actually to do this work. Wellmark decided last year to do our first official hackathon ever. It was three days. You get to partner with anyone you wanted. You can submit ideas for a period of two weeks, and then you can actually request to be on a team, and then the teams were assembled for those three days on site. It was actually an incredible experience. This was the idea that my team had to solve search for members using AI. And we actually, out of 19 teams, we actually won. We placed first place last year. And because of that, Wellmark funded the work, which I just thought was super cool for a company to not only sponsor hackathon for three days, but then fund the winning project. So we funded that last year and we're getting ready to implement it next month for our members, which is just incredible. A year, yeah, it took a long time, but you can imagine the legal conversations and compliance conversations we've had to have and go back and forth with what we're actually saying on the screen. I think we've landed on AI assist. with a bunch of legal language, really small. So there's that too. But we leveraged, like I said, time-boxed, low-risk, cross-functional. So we had developers, we had analysts from different parts, we had operations folks. We had about 12 people on my team for those three days. And then we presented to leadership and everybody else, which was really fun, right? So That rapid failure and that controlled structure, I took that from JC this morning from his keynote, that was key for the hackathon. So they're getting ready to do that again this year. I'm looking at some potential teams to join, but this is a very cool way to not only get everybody together from a culture perspective, but actually deliver working stuff now that we've implemented, which I think is super awesome. So let's keep talking. The hypothesis, right? Our hypothesis was simple. What if people could ask questions in their own words and the system met them halfway? Not A chat bot replacing humans, not a magic answer engine, but a bridge between human language and that complex benefit logic, right? We thought about Amazon Alexa as like, vibes and we were thinking like, how do we want this to feel? We said, why couldn't you just ask Alexa like if it's covered, right? So that jokingly became how we thought about this. How easy would it be just to be like, are the things I walk around on covered, right? Which would be orthotics, right? But how does search know that you're talking about feet? right? And that's where AI comes in, right? That's where that language model comes in. So it was pretty cool to see. We had a working demo for our hackathon debut, and we had a bunch of executives coming up trying to like stump it, trying to like get it to not bring back benefits. But surprisingly, it worked very well. And they were like, okay, we can see the benefit in this. So conceptually, the architecture, legal sadly wouldn't let me put anything in here that we used, but you guys can use your imagination. We have a repository here, so that's all of our documents that we have, benefit documents, think all of that historic document. We're using AI retrieval, so semantics, vector matching. We use chunking methods, which some of the big service providers offer that in their language models. The chunking is how it takes that segment of information and displays it to the member. And there's various different methods of chunking, so we tried and tested a bunch of ones until we kind of got the result that we felt was going to give the member the best. result, which is cool. And then ultimately, we had some lambdas and some service layers we built to connect it to our member portal and our customer service CRM, things like that, which is great. So this is a little bit overview of the conceptual architect. If you want to get involved with that after, I'm more than happy to dive into that. Let's talk experience, right? So what we did, what we deliberately did not do Was. Surface raw policy text. We didn't pretend that AI was certain. We didn't optimize for cleverness. We optimized for clarity, restraint, and trust, right? We just heard all about trust. How is our members going to trust? If we get a wrong answer and they go, the doctrine's not covered and they used AI, they're not going to trust Walmart. They're not going to trust anything that we tell them. So it's super important for us to be clear and really restrain ourselves because in healthcare, that confidence without that accuracy is super dangerous for our members. We talked about the current climate. We all know kind of the story around United Healthcare and that whole sad thing that happened. We are not using AI at Walmark for any healthcare outcomes, any determinations of claims. We are not doing any of that. This is the first, we're using it internally as a workforce. We have co-pilot and all of that stuff, but we're not leveraging it for member-facing things by any means. This would be the first thing that we're leveraging AI for, but I feel like it's a very controlled application of AI. It's not generative. It's very specific. So That's kind of where we're at with the experience lens on this. Two audiences, two jobs, right? So I mentioned we have our members and we have our customer service agents. They don't need the same answers, right? Members need the clarity, they need the confidence, and they need that empathy. And what are their next steps, right? That's what they're looking for. The CXAs, however, they need the speed and the traceability to get that information quickly to deliver to the members. So it's two audiences with two completely separate jobs, but we designed for both of them at the same time. And that forced us to think more carefully about the experience outcomes that we're trying to drive, right? So that's the self-service side of things. versus the customer service side of things. But either way, they both can use the same solution, which I thought was really awesome. And it was really impactful, I think, to the leadership group to hear that, oh, we can actually not only solve members' issues, but we can solve speed and clarity around what our CXAs are doing. So what broke first, right? AI struggled where humans also struggle, which is not a surprise. That's ambiguity. So conflicting sources, vague benefit terms, those edge cases. We saw early tendencies toward overconfidence, which reinforced the need for those guardrails. We talk about enterprise-wide initiatives a lot in this space as well. This is an enterprise-wide initiative. We had to get the buy-in from the stakeholders. A lot of different departments are siloed, but we're working to come together and say, like, how can we all leverage the same tool so we can get confidence in deploying this to our members and have a better experience when it comes to finding their benefits? So what surprised us? It was how much less AI actually needed to do. Smaller, well-scoped answers actually built more trust with our testers. Retrieval, beat generation, and transparency built confidence, even when the answer was it depends, right? So it was really about making sure that we keep that level of trust when we're having interactions with our members, and really restraining the scope around it versus just letting it go, it was really important as well. So I have a couple of minutes left here, and I want to make sure I have a lot of time for questions. From an experience lesson, this isn't an experience on its own. It's part of 1. It can reduce cognitive load. but not remove complexity. It can assist, but it still needs humans in the loop, right? So that's what we're looking at here. We just were awarded some corporate insights awards, which is an industry kind of recognition for best mobile app and desktop app. We finished second on that. That's 24 health insurance carriers. were rated, and we were number two on that, right behind Anthem, which Anthem has a massive budget compared to what Walmart has in terms of experience. So I think we're doing some things right in this space, and this just kind of shows that some of the recognition is coming our way. From a technical perspective, bounded domains, clear governance, strong content, responsible AI is an experienced decision, not just a platform decision. A few organizational lessons we learned. This changed kind of how we collaborate as teams. Experience, content, operations, technology, can't operate independently anymore, right? AI forces alignment or exposes the lack thereof. So our hackathon part 2, like I said, is coming up next. But this could scale beyond healthcare, right? We're talking other domains, insurance, government, higher ed. These challenges show up everywhere. It's just about... allowing people to ask the right questions in the way that they want to and providing the answer to them. What this is not, though, is an AI reality check. It's not a replacement for humans, not a knowledge oracle, and of course, not without guardrails. I'll leave this last quote up here before I end. This was from one of our team leaders that was part of our testing group. And she, I'm not going to read the quote, but basically she handles our customer service team, and she just really saw the value of having something like this for her CXAs to use to help members. She thought this could be magnified by 200x agents if we implemented this at Wellmark. So it's a very promising piece of technology, not only for internal operations, but for our customers that use Walmart insurance. And with that, I know I ran through that a little fast, but we are a little short on time. But I wanted to open it up for a question or two if you guys have any. Yeah? Do you use metrics that measure failing queries? Yes, we do. So we're compiling that right now. I mentioned we're in UAT with all of our stuff right now, and we're finalizing all the testing. and we still have to get through like the rest of the legal compliance stuff, but they are documenting that. I haven't been as close to the implementation team, sadly. I was part of the hackathon team, but we have another AI-focused team that's doing the implementation at Walmart, but more than happy to find out. Is there a bar, like a percentage of? Success or fail? No, I'm not even sure, to be honest with you, but I'd be more than happy to find out for you. Yeah, yeah, one of the things I mentioned is like you gotta go in and tweak your data, so you find out that on your documentation there were a lot of contradictions, and yeah, yep, yep, we had to we spent a lot of time and effort over the last year. cleaning up our benefits document catalog and really optimizing that. So like everyone says, you got to have good data to get good output. So we did some of that legwork. It was in a pretty good spot before. It just was a lot, very complex and hard to understand for most people. So if it has a problem that it can't answer, does it have kind of an escape mechanism to talk to a human? I'm using an example for last year. It took me 6 hours with AT&T to fix a problem once. Wow. Because every time I went on, I had to menu AI, get to a person, get to the next person. It was highly frustrating. Yeah, so that's a great call out. Our design team has put in some things. So if you're searching and you're just not getting, I think it's like 2 searches. If it doesn't come back, there's of course some chat bot help or send a security. take your message or call customer service. So we're not going to eliminate that, but we hope that self-serve, you know, that's always the way, I think, is members want to self-serve if they can. It's just making it easy for them to do that. Does swear in? It always gets you to a person, but that works on other ones. What was that? Yeah. If you swear in on them, it gets you to a person. That's true. That works for CVS. Oh, good to know. They're our pharmacy partners.