Success Story #1 – Vision AI Efforts in Attribute Detections and Measurements
Production and Operations
Vision AI Efforts in Attribute Detections and Measurements explores how computer vision and machine learning can be applied in real-world manufacturing environments to improve quality assurance and dimensional verification without disrupting existing workflows.
This session is divided into two applied case studies. The first focuses on attribute detection, where Vision AI is used to automatically identify image quality issues as well as wrong or missing visual attributes in production images. Attendees will see how these models helped standardize inspections across multiple facilities, reduced manual review effort, and increased confidence in downstream decision-making by ensuring only usable, high‑quality images were processed further. Real production examples will be shared to illustrate how attribute‑level visibility directly impacted quality outcomes.
The second case study covers vision‑based measurements, highlighting work done to measure screen door dimensions directly from images. By combining machine learning predictions with computer vision techniques, the system was able to estimate key dimensions within tight tolerances and flag out‑of‑spec components before packaging. Results from production testing including accuracy ranges and practical limitations will be discussed.
The talk emphasizes results, lessons learned, and business impact, rather than implementation details. The format is presentation‑driven with visual examples, measurement outcomes, and discussion prompts designed to encourage audience engagement around where Vision AI delivers value – and where it still struggles – in industrial settings.
Key Takeaways
- How Vision AI can reliably catch image quality issues and wrong or missing attributes in real production environments
- What level of measurement accuracy is realistically achievable with vision‑based systems on the factory floor
- Why results‑driven Vision AI deployments deliver value
Transcript from Summit:
Session Transcript
So I kept wondering if I got the entire text of my presentation to Jake and have him read through a piece of paper. All right. Good afternoon, everybody. Welcome to my session. And thank you, Jake, for the introduction. So I'll go a little bit about Our vision AI efforts in attribute detection and measurements, none of the work we did is... groundbreaking cold fusion, but the effects of implementing them has been really good, and we are in a direction to implement more of those technologies in our company. Just a very quick overview. I've been with Iowa State for long enough to love experiments. I've been in the industry just enough to learn that I cannot take as much time as I want for experiments. So I do have background in 3D visualization, VR, AER simulations. I also teach design optimization at Iowa State. So it's a part-time gig. I have it at Iowa State. I am really excited to see that my semester is going to end enough. In a in a 10 days, so many faculty among you, you would you would know the votes. So, just a little overview of my talk. I'm going to talk about who we are, what we do, and some of the general challenges we face at Pella Corporation, and I will go into the case studies. I'm going to talk about two of them. We do attribute detection. And screen door size violation, not violation, validation. So a couple of things I will talk about each of those use cases. What is the objective? What is the existing or traditional practice with those two case studies? I will talk about how we implemented the Vision AI tech. and then some of the challenges and outcomes. So first of all, Pella is a family-owned corporation company. Last year, we celebrated our 100th year, and one of the big push last year was to use modern tech to improve our manufacturing processes, and a big part of it is Vision AI. We have 11,000 team members from across the country working for Pella, and then under the Pella umbrella, we have five different companies, and Pella is a flagship brand. So just a quick overview of where we are within the US. So Iowa has one, two, three, 4, 4, 4 locations. All of them are wood plants, but then across the country we have plants that do aluminum, vinyl, and fiberglass as well. And most of the products that we make are designed for residential homes and commercial applications. So I want to talk a little bit about the general challenges. As opposed to a typical manufacturing company, like say an automotive company, ours is unique in that every part that comes to a station, workstation, is different. because our units are custom built. So that by itself produces various challenges. The second one is in wood plants, we also have other things to deal with, wood defects, right? Cracks, knots, or pitch pockets. So those are not homogeneous, and they don't appear at the same place in a lumber piece that goes through a factory floor. So how do we identify these defects? And can we use Vision AI to identify these defects? Some of the other Issues I mentioned this in the talk well in our discussion earlier in the main room, so how do we? Make the users of this stack trust in AI. It's not very easy; it's a generally an uphill task, so... We as technology developers, we develop this tech, but if it fails at some point, then the end users are immediately going to throw it away, throw the towel, I'm not going to use that. So it's a big task for us to develop this tech. and also make sure the end users are okay with using the tech. The other things, Vision AI opportunities, where do we use Vision AI? This right from picking the material that builds a window frame, all the way. down to whether the product is installed correctly or not. So each of these places have a spot for Vision AI being implemented. So I'm going to talk about two of those case studies. The first one is attribute detection. The objective, I just put it here, to the window units. I'm going to talk about window units. But it applies to our doors as well. So do the window units that went through the production, does it have all the attributes on it that are defined in the spec? So that's the general objective. But to get there, what is the traditional or existing process? It is basically checked manually. So, I... So people that work from station to station, they have to identify if there are any issues, if there are any missing parts. And before sending it to the next station, they would have to make sure that everything is in there. But it is error prone. So can we use Vision AI to to ease their burden. So in order to do that, we designed it as a two-step process. The first one is an image quality check. So we use cameras to capture images of various stations, and we want to make sure that the images we capture are good enough to determine whether all the parts that are supposed to be on the window unit are there or not. So image quality check followed by attribute check. So if you look at this picture, right? We want to filter out those images that don't have attributes that we prefer. So the first one, unit out of view. We want to have a window unit in there when an image is captured. So the way we have it is at a station, an operator When he thinks that the product is ready to move to the next station, he clicks a button for the camera to capture an image. But sometimes he captures an image without anything in there, and that is an issue. And we would have to filter those out. And the second one, tilt table up, so that the table where Where they perform their work, it will be tilted up so that the heavy window unit can be transferred over from one station to another, but if we have the tilt table up, we don't see all the attributes in the window unit, so we have to somehow let the user know that, hey, you need to take a picture again. The third one, material obstruction. Right, if there is a pair of gloves or any tools on a window unit that is obscuring some of the attributes that we want to identify in a window unit. So, these are all under... simple classification models. We can say there is a pair of gloves, there are tools, or there's a tilt table up. But it becomes complicated really quick. Take a look at this one. The left-hand side image, we have a cardboard person and tools. All those, all three of those are obscuring the window unit. For an image to be captured properly right now. The issue is if those 3 pieces of obstruction are not on a window unit, it is okay. But because it is obscuring the window unit, that would cause an issue. So classification models will not work for us anymore. So we ended up with using Object detection, so that we can have our own logic to say if a window unit is fully exposed, it doesn't matter where the other things are, so that helped out with filtering out. Items that can go to the next step, which is our attribution detection. So, assuming that we got a good image, we run it through segmentation models to identify different aspects of a window unit, right? We position our cameras in such a way that that we get we get a good vantage point of a Of a window, so it's not exactly top-down, because we want some attributes detected around the edges of a window unit. So, with that, we train some models, identify different aspects or different attributes within the model, and then check them against the spec of that window unit. to tell there are parts missing in the window unit, or if there is an issue with the operator himself, if he did not, if he has incorrect attributes installed on the window unit. So those are the kind of things that we were able to implement using Vision AI. And this has worked out really well. With the models that we developed, we got to about 90 to 95 percent consistency, but we are shooting for more. And the other thing is we want these to be done at near real time, because once the window unit is ready, to go to the next station, there's not as much time because the operator will not wait for 30 seconds before the inference is done and say we are missing a part or something. So we started using GPU-based inferences and it worked out really well. So far, we have implemented them in Pella and one or two other facilities across Iowa, but we are trying to roll it out across the entire country as well. And the last thing is train the usage to shop floor and quality techs, so... Every once in a while, there are changes to our process. Like if we make the Pella logo sticker slightly different, that means that we are introducing error into inferencing our model, right? Or if we change the way our hardware pack is installed on the window unit. If that changes, it is going to add another line of error. So over time, there is a model drift. So what we do is we not only develop this tech, but we also write documentation procedures to teach quality techs to train models every day. every couple of months, and then reintroduce those models back into our total workflow. So far, it has been working really well. We are implementing it on a larger scale now. So I'm going to switch over to the next case study, screen door size. Validation, the objective. It's not so much whether a screen door has been assembled right or not, but it's more of whether they are putting the right product into the right box. Because we've seen a lot of times when a screen door that is supposed to be 53 inches, but by the time it goes into the box, it is 64. So there are issues, manual issues, or human errors before screen doors are packed into a shipping box. So the question was, can we do a last check to tell, to give a rough dimension, so to get rough dimensions of a screen door that is being put into a box? So we applied Vision AI, and it's a three-step process this time. The first one is an image quality check. So we filter out people that appear in the image. The next one is to overcome camera lens distortions. What does this mean? So, typically, every camera that we use, the lenses that come with them, they have they have certain optical defects, and it cannot be avoided, but we can use certain pieces of information to correct it. So, the three... main ones that these cameras come with are barrel distortion, where images sort of appear rounded, or sometimes they have pin cushion effect. So if you take a picture of, say, a pillar from a distance using your phone, you would probably realize that There is some curviness to the pictures you've taken, and that's because of the lens distortion, so we have to overcome this distortion in order to do any kind of measurements. And then apply computer vision methods to tell what is the length of a certain part, so... Overall, we filter out people using object detection. And this is a, say, an input image or an image capture from a camera. So we do calibration process. The cameras that we use, they are pretty dumb cameras. They just take pictures. They stream images or videos, but they don't do anything else. And it also gives us more opportunity to take control because we can program it in a way that we wish. So what we do is we calibrate the cameras so that we know the focal length in X direction, Y direction, what is the optical center of those. of those lenses, and what are the distortion parameters? So these are very standard computer vision methods of approaches to calibrate a camera. So once we calibrate the camera, we can do something called as undistortion. Now we can see that the image the edges of the table are straightened out, which means that I have something to work with, right? So I can do some measurements. But in addition to undistortion, I can also do perspective correction, where I change the, or transform the image so that I get an output image, something very similar to this, meaning that I can transform the image as if the camera is facing top down on the window unit itself. And because we know the dimensions of the tilt table, I can use that to map onto what the size of a screen door is, so using that, we were able to get to about... a tolerance of less than 1/4 of an inch. So within 1/4 of an inch, we're able to identify the dimensions of a screen door, and it has helped us tremendously. So 2 cases talked about. Vision AI is not something that is really hard to implement; you just have to put those pieces together. see what works with building a machine learning model and use that information to perform computer vision operations and make the tech help us. So that's where we are at and we've been working with various other initiatives as well. That brings us to the end of my talk. If you have any questions, I'll be glad to answer. I have a question. Real quick. Thanks for sharing your case studies. Can you give us some insight into how long this took to maybe pull together? Did you do it internal with your team? What's your team size? Did you have outside help? Can you give a little insight just in case any of us want to? Do something like this. I will try to give as much info as I can. Chris, please correct me if I did anything wrong. So the Vision AI team is small. We have four to five people on our team. But we use an external tool to label our images and build models. The biggest challenge with, not challenge, the biggest time taker is labeling these images. So we first have to identify what are the classes that we want to figure out. In an image, so on a screen door, what I wanted to know is... within the image, where is the screen door? So we have a collection of different screen doors with different colors, different sizes. So just like how a kid learns how to differentiate between an apple and an orange. He sees apples and oranges in different environments so that even At the end, if I take an apple, put it in the corner of a room, and ask him to tell what it is, to be able to tell that it is an apple and not an orange. So basically train the model with a whole bunch of images of a screen door. Same thing with our other attributes as well. So we Train the model. Depending on whether we want to use it for image quality, that means object detection or attribute detection. So if it is object detection, just to tell me where roughly a tilt table is, a rectangle would be enough. But if I wanted a specific attribute of a specific size, then we would do a polygonal labeling. So that means we can have a customized polygon that represents a specific item. So these are the ones that take the longest. But once the model is built, we can run a A quick script that takes in an image, spits out, and tells us what attributes are identified or detected within that image. So is this like a few month project, a few weeks? Time table to have this finished product, so... If we have the images labeled correctly, the training, the model training time itself takes anywhere from a couple of hours to a couple days. But in order to get there, there are a lot of learnings, though. And that's the one that took us most time. So we started on this approach a year and a half to two years ago. I wasn't with Keller at the time. I only started last year. But by the time we started, we made some progress and we did learn some things, whether or not whether it's object detection or segmentation models, that is something that we learn through the process. And what are the resolution of images that we need to punch into our model? Do we pick a 256 model image, or do we want to have a 1024, 1024, or do we want to go all the way to a 2K or a 4K image? So, all of that took a little bit of learning, so it's a subjective answer. If attributes are well-defined and the same ones, then you don't need as much time. But when objects do change, yes, it does take time. We have time for one more brief question. And I think I saw a hand back here first. Do you want me to go to him? Okay. and then we'll transition speakers. Yeah, two very short questions. One, I was curious what the average data set size is for like a single model of like a screen door. Like, are you talking like 100 images or a few 1000 images for a single model? And the other is, is the data just logged for later reference or is the... If there's a mistake, there's a light command, yeah, for the operator to say, "Oh, I put this together wrong," yes, so the first one. Your first question was size of the model. For screen doors, we had a few 100 of images, hundreds of images. They're pretty much homogeneous. They're pretty much consistent. But for attribute detection, we are talking about Thousands, like 4 or 5 thousands, or sometimes even more. So, the other question you had was... Log for later reference. Or is there something to like help an operator, the person that built the screen door, that they did something wrong? Yes, so we have a heads-up display, a TV monitor, basically, so after an inference is done, if it detects that some of the attributes are missing, then... It does, we send the inference picture to the heads-up display to tell that, hey, something is wrong. So it can go green, orange, or red. Red means you have to stop and you have to address it. Orange, it is okay.