Success Story #1 - Vision AI Efforts in Attribute Detections and Measurements - CIRAS AI Summit

PRODUCTION AND OPERATIONS

3:10 PM – 3:55 PM

Room 275

SPEAKER

Vijay Kalivarapu

Senior AI Engineer, Pella Corporation

Use with AI

Copy this session's complete context to paste into ChatGPT, Claude, or any AI assistant.

Preview context block

## Session: Success Story #1 – Vision AI Efforts in Attribute Detections and Measurements
**Track:** Production and Operations | **Time:** 3:10 PM–3:55 PM | **Room:** 275 | **Type:** Success Story
**Conference:** CIRAS AI Summit for Iowa — May 6, 2026, Scheman Building, Iowa State University, Ames IA

### Speaker(s)

**Vijay Kalivarapu** — Senior AI Engineer, Pella Corporation (Pella, IA)
Vijay Kalivarapu has a Ph.D. majoring in Mechanical Engineering and co-majoring in Human-Computer Interaction from Iowa State University. He is currently a Senior Artificial Intelligence Engineer at Pella Corporation, based in Pella, Iowa, where he applies computer vision and machine learning to manufacturing quality and measurement challenges.

His work focuses on vision‑based image quality assessment, wrong and missing attribute detection, and dimensional measurements in production environments. Vijay has led multiple industrial Vision AI initiatives that emphasize scalable deployment, measurable accuracy, and business impact.

In this session, he shares results, lessons learned, and practical insights from applying Vision AI to real manufacturing problems that highlight where these approaches deliver value and where limitations remain.

### Session Description

Vision AI Efforts in Attribute Detections and Measurements explores how computer vision and machine learning can be applied in real-world manufacturing environments to improve quality assurance and dimensional verification without disrupting existing workflows.

This session is divided into two applied case studies. The first focuses on attribute detection, where Vision AI is used to automatically identify image quality issues as well as wrong or missing visual attributes in production images. Attendees will see how these models helped standardize inspections across multiple facilities, reduced manual review effort, and increased confidence in downstream decision-making by ensuring only usable, high‑quality images were processed further. Real production examples will be shared to illustrate how attribute‑level visibility directly impacted quality outcomes.

The second case study covers vision‑based measurements, highlighting work done to measure screen door dimensions directly from images. By combining machine learning predictions with computer vision techniques, the system was able to estimate key dimensions within tight tolerances and flag out‑of‑spec components before packaging. Results from production testing including accuracy ranges and practical limitations will be discussed.

The talk emphasizes results, lessons learned, and business impact, rather than implementation details. The format is presentation‑driven with visual examples, measurement outcomes, and discussion prompts designed to encourage audience engagement around where Vision AI delivers value - and where it still struggles - in industrial settings.

 

### Other sessions in the Production and Operations track

- Success Story #2 - Natural Language Search for Member Benefits (3:10 PM–3:55 PM)
- Industrial AI Success Stories: Because Even My Title Needed Machine Learning (10:20 AM–11:05 AM)
- Tabular Foundation Models Meet Manufacturing: A Practical Exploration (11:15 AM–12:00 PM)
- AI Attribute Intelligence: Automating Detection, Extraction, and Standardization at Scale (1:20 PM–2:05 PM)

### Suggested prompts for this session

- "What questions should I prepare to ask the speaker(s) at this session?"
- "Create a structured note-taking template for this session focused on actionable takeaways"
- "Based on this session description, what background reading should I do to get the most value?"
- "After I attend, help me create an action plan for implementing what I learned"
- "How does this session connect to the other sessions in the Production and Operations track?"

## Session: Success Story #1 &#8211; Vision AI Efforts in Attribute Detections and Measurements
**Track:** Production and Operations | **Time:** 3:10 PM–3:55 PM | **Room:** 275 | **Type:** Success Story
**Conference:** CIRAS AI Summit for Iowa — May 6, 2026, Scheman Building, Iowa State University, Ames IA

### Speaker(s)

**Vijay Kalivarapu** — Senior AI Engineer, Pella Corporation (Pella, IA)
Vijay Kalivarapu has a Ph.D. majoring in Mechanical Engineering and co-majoring in Human-Computer Interaction from Iowa State University. He is currently a Senior Artificial Intelligence Engineer at Pella Corporation, based in Pella, Iowa, where he applies computer vision and machine learning to manufacturing quality and measurement challenges.

His work focuses on vision‑based image quality assessment, wrong and missing attribute detection, and dimensional measurements in production environments. Vijay has led multiple industrial Vision AI initiatives that emphasize scalable deployment, measurable accuracy, and business impact.

In this session, he shares results, lessons learned, and practical insights from applying Vision AI to real manufacturing problems that highlight where these approaches deliver value and where limitations remain.

### Session Description

Vision AI Efforts in Attribute Detections and Measurements explores how computer vision and machine learning can be applied in real-world manufacturing environments to improve quality assurance and dimensional verification without disrupting existing workflows.

This session is divided into two applied case studies. The first focuses on attribute detection, where Vision AI is used to automatically identify image quality issues as well as wrong or missing visual attributes in production images. Attendees will see how these models helped standardize inspections across multiple facilities, reduced manual review effort, and increased confidence in downstream decision-making by ensuring only usable, high‑quality images were processed further. Real production examples will be shared to illustrate how attribute‑level visibility directly impacted quality outcomes.

The second case study covers vision‑based measurements, highlighting work done to measure screen door dimensions directly from images. By combining machine learning predictions with computer vision techniques, the system was able to estimate key dimensions within tight tolerances and flag out‑of‑spec components before packaging. Results from production testing including accuracy ranges and practical limitations will be discussed.

The talk emphasizes results, lessons learned, and business impact, rather than implementation details. The format is presentation‑driven with visual examples, measurement outcomes, and discussion prompts designed to encourage audience engagement around where Vision AI delivers value - and where it still struggles - in industrial settings.

&nbsp;

### Other sessions in the Production and Operations track

- Success Story #2 - Natural Language Search for Member Benefits (3:10 PM–3:55 PM)
- Industrial AI Success Stories: Because Even My Title Needed Machine Learning (10:20 AM–11:05 AM)
- Tabular Foundation Models Meet Manufacturing: A Practical Exploration (11:15 AM–12:00 PM)
- AI Attribute Intelligence: Automating Detection, Extraction, and Standardization at Scale (1:20 PM–2:05 PM)

### Suggested prompts for this session

- "What questions should I prepare to ask the speaker(s) at this session?"
- "Create a structured note-taking template for this session focused on actionable takeaways"
- "Based on this session description, what background reading should I do to get the most value?"
- "After I attend, help me create an action plan for implementing what I learned"
- "How does this session connect to the other sessions in the Production and Operations track?"

TRACK Production and Operations

FORMAT Success Story

ROOM 275

The talk emphasizes results, lessons learned, and business impact, rather than implementation details. The format is presentation‑driven with visual examples, measurement outcomes, and discussion prompts designed to encourage audience engagement around where Vision AI delivers value – and where it still struggles – in industrial settings.

Key Takeaways

How Vision AI can reliably catch image quality issues and wrong or missing attributes in real production environments
What level of measurement accuracy is realistically achievable with vision‑based systems on the factory floor
Why results‑driven Vision AI deployments deliver value

Continue the conversation with Vijay Kalivarapu at the Production & Operations Facilitated Discussion — 2:15 PM - 3:00 PM, Room 220-230-240

Session Recording

Session Data

Download SRT (Captions) Attendee Slides (PDF) AI-Formatted PDF Download Session Bundle (ZIP)

Transcript from Summit:

00:00 Introduction of Dr. Vijay Kalivarapu Slide: 1

vijay kalivarapu pella corporation computer vision machine learning manufacturing

So with that, good afternoon. It is my pleasure to introduce Dr. Vijay Kalivarapu, a Senior Artificial Intelligence Engineer at Pella Corporation. Dr. Kalivarapu holds a PhD in Mechanical Engineering with a co-major in Human Computer Interaction from Iowa State University. So big fan. He brings deep expertise in applying computer vision, machine learning, and 3D visualization to real-world manufacturing challenges. His work focuses on vision-based quality assessment, attribute detection, and precision measurement in production environments, where he has led multiple initiatives delivering scalable solutions with measurable business impact. In today's session, we will share practical insights from deploying vision AI in manufacturing, highlighting how these technologies improve quality assurance and measurement accuracy, as well as where they face limitations in real-world sessions. So please join me in welcoming Dr. Vijay Kalivarappu. So I kept wondering if I got the entire text of my presentation to Jake and have him read through a piece of paper.

01:11 Speaker Background and Vision AI Overview Slide: 2

iowa state 3d visualization vr ar simulations

All right. Good afternoon, everybody. Welcome to my session. And thank you, Jake, for the introduction. So I'll go a little bit about our vision AI efforts in attribute detection and measurements. None of the work we did is groundbreaking cold fusion, but the effects of implementing them has been really good, and we are in a direction to implement more of those technologies in our company. Just a very quick overview. I've been with Iowa State for long enough to love experiments. I've been in the industry just enough to learn that I cannot take as much time as I want for experiments. So I do have background in 3D visualization, VR, AR simulations.

02:12 Talk Structure and Pella Corporation Overview Slides: 3, 2

window attribute detection screen door size validation pella corporation manufacturing

I also teach design optimization at Iowa State. So it's a part-time gig. I have it at Iowa State. I am really excited to see that my semester is going to end in 10 days. So any faculty among you, would know the votes. So it's a little overview of my talk. I'm going to talk about who we are, what we do, and some of the general challenges we face at Pella Corporation. And I will go into the case studies. I'm going to talk about two of them, window attribute detection and screen door size violation. Well, not violation, validation. So a couple of things I will talk about each of those use cases. What is the objective? What is the existing or traditional practice With those two case studies, I will talk about how we implemented the Vision AI tech and then some of the challenges and outcomes. So first of all, Pella is a family-owned corporation company. Last year, we celebrated our 100th year.

03:13 Pella's Scale and Geographic Distribution Slides: 9, 3

pella 11000 employees wood plants aluminum vinyl

And one of the big push last year was to use modern tech to improve our manufacturing processes. And a big part of it is Vision AI. We have 11,000 team members from across the country working for Pella. And then under the Pella umbrella, we have five different companies, and Pella is a flagship brand. So just a quick overview of where we are within the US. So Iowa has one, two, three, 4, 4 locations. All of them are wood plants, but then across the country we have plants that do aluminum, vinyl, and fiberglass as well. And most of the products that we make are designed for residential homes and commercial applications. So I want to talk a little bit about the general challenges. As opposed to a typical manufacturing company, like say an automotive company, ours is unique in that every part that comes to a station, workstation, is different.

04:23 Manufacturing Challenges in Custom Production Slide: 3

custom manufacturing wood defects cracks knots pitch pockets

because our units are custom built. So that by itself produces various challenges. The second one is in wood plants, we also have other things to deal with. Wood defects, right? Cracks, knots, or pitch pockets. So those are not homogeneous and they don't appear at the same place in a lumber piece that goes through our factory floor. So how do we identify these defects? And can we use Vision AI to identify these defects? Some of the other issues, I mentioned this in the talk, well, in our discussion earlier in the main room. So how do we make the users of this tech trust in AI. It's not very easy. It's a generally an uphill task.

05:22 Building User Trust in AI Systems Slide: 3

user trust ai adoption end users technology acceptance reliability

So we as technology developers, we develop this tech. But if it fails at some point, then the end users are immediately going to throw it away, throw the towel. I'm not going to use that. So it's a big task. for us to develop this tech and also make sure the end users are okay with using the tech. The other things, vision AI opportunities, where do we use vision AI? This right from picking the material that builds a window frame. all the way down to whether the product is installed correctly or not. So each of these places have a spot for Vision AI being implemented. So I'm going to talk about two of those case studies. The first one is attribute detection. The objective, I just put it here, to the window units.

06:22 Attribute Detection Case Study Objective Slide: 3

attribute detection window units quality inspection manual inspection error reduction

I'm going to talk about window units, but it applies to doors as well. So Do the window units that went through the production, does it have all the attributes on it that are defined in the spec? So that's the general objective. But to get there, what is the traditional or existing process? It is basically checked manually. So people that work from station to station, they have to identify if there are any issues, if there are any missing parts. And and before sending it to the next station, they would have to make sure that everything is in there. But it is error prone. So can we use Vision AI to ease their burden? So in order to do that, we designed it as a two-step process. The first one is an image quality check. So we use cameras to capture images of various stations.

07:24 Image Quality Check Implementation Slide: 9

image quality camera system image capture tilt table obstructions

And we want to make sure that the images we capture are good enough to determine whether all the parts that are supposed to be on the window unit are there or not. So image quality check followed by attribute check. So if you look at this picture, right, we want to filter out those images that don't have attributes that we prefer. So the first one, unit out of view. We want to have a window unit in there when an image is captured. So the way we have it is at a station, an operator, when he thinks that the product is ready to move to the next station, he clicks a button for the camera to capture an image. But sometimes he captures an image without anything in there. And that is an issue. And we would have to filter those out. And the second one, tilt table up, so that the table where they perform their work, it will be tilted up so that the heavy window unit can be transferred over from one station to another.

08:29 Object Detection for Complex Obstructions Slide: 9

object detection classification models obstructions gloves tools

But if we have the tilt table up, we don't see all the attributes in the window unit, so we'd have to somehow let the user know that, hey, you need to take a picture again. The third one, material obstruction, right? If there's a pair of gloves or any tools on a window unit that is obscuring some of the attributes that we want to identify in a window unit. So these are all under simple classification models. We can say there is a pair of gloves, there are tools, or there's a tilt table up. But it becomes complicated really quick. Take a look at this one. The left-hand side image, we have a cardboard person and tools. All three of those are obscuring the window unit. for an image to be captured properly. Now, the issue is if those 3 obstruction, pieces of obstruction, are not on a window unit, it is okay.

09:32 Segmentation Models for Attribute Verification Slide: 16

segmentation models attribute verification camera positioning spec comparison missing parts

But because it is obscuring the window unit, that would cause an issue. So classification models will not work for us anymore. So we ended up with using object detection. So that we can have our own logic to say if a window unit is fully exposed, it doesn't matter where the other things are. So that helped out with filtering out items that can go to the next step, which is our attribution detection. So assuming that we got a good image, We run it through segmentation models to identify different aspects of a window unit. We position our cameras in such a way that we get a good vantage point of a window. So it's not exactly top-down because we want some attributes detected around the edges of a window unit.

10:32 Model Performance and Real-Time Inference Slide: 16

model accuracy 95 percent consistency gpu inference real-time processing inference speed

So with that, we train some models. identify different aspects of different attributes within the model, and then check them against the spec of the window unit to tell there are parts missing in the window unit, or if there is an issue with the operator himself, if he did not, if he has incorrect attributes installed on the window unit. So those are the kind of things that we were able to implement using Vision AI. And this has worked out really well. With the models that we developed, we got to about 90 to 95% consistency, but we are shooting for more. And the other thing is we want these to be done at near real time. Because once the window unit is ready to go to the next station, there's not as much time, because the operator will not wait for 30 seconds before the inference is done and say we are missing a part or something.

11:37 Model Maintenance and Retraining Process Slide: 16

model drift retraining quality techs documentation process changes

So we started using GPU-based inferences, and it worked out really well. So far, we have implemented them in Pella and one or two other facilities across Iowa, but we're trying to roll it out across the entire country as well. And the last thing is train the usage to shop floor and quality techs. So every once in a while, there are changes to our process. Like if we make the Pella logo sticker slightly different, that means that we are introducing error into inferencing our model, right? Or if we change the way a hardware pack is installed on the window unit, if that changes, it is going to add another line of error. So over time, there is a model drift. So what we do is we not only develop this tech, but we also write documentation procedures to teach quality techs to train models every couple months and then reintroduce those models back into our total workflow.

12:46 Screen Door Size Validation Objective Slide: 3

screen door size validation packing errors dimensional check quality control

So far, it has been working really well. We are implementing it on a larger scale now. So I'm going to switch over to the next case study, screen door size validation. The objective, it's not so much whether a screen door has been assembled right or not, but it's more of whether they are putting the right product into the right box. Because we've seen a lot of times when a screen door that is supposed to be 53 inches, But by the time it goes into the box, it is 64. So there are issues, manual issues, or human errors before screen doors are packed into a shipping box. So the question was, can we do a last check to tell, to give a rough dimensions, or to get rough dimensions of a screen door that is being put into a box?

13:48 Three-Step Process for Size Measurement Slide: 3

three-step process lens distortion barrel distortion optical defects computer vision

So we applied Vision AI, and it's a three-step process this time. The first one is an image quality check. So we filter out people that appear in the image. The next one is to overcome camera lens distortions. What does this mean? So typically, every camera that we use, the lenses that come with them, they have certain optical defects. and it cannot be avoided, but we can use certain pieces of information to correct it. So the three main ones that these cameras come with are barrel distortion, where images sort of appear rounded, or sometimes they have thin cushion effect. So if you take a picture of... of, say, a pillar from a distance using your phone, you would probably realize that there is some curviness to the pictures you've taken.

14:49 Camera Calibration and Distortion Correction Slide: 11

camera calibration focal length optical center distortion parameters object detection

And that's because of the lens distortion. So we have to overcome this distortion in order to do any kind of measurements. And then apply computer vision methods to tell what is the length of a certain part. So overall, we filter out people using object detection. And this is, say, an input image or an image capture from a camera. So we do calibration process. The cameras that we use, they are pretty dumb cameras. They just take pictures, they stream images or videos, but they don't do anything else. And it also gives us more opportunity to take control because we can program it in a way that we wish. So, what we do is we calibrate the cameras so that we know the focal length in X direction, Y direction, what is the optical center of those lenses, and what are... the distortion parameters. So these are very standard computer vision methods of approaches to calibrate a camera.

15:58 Perspective Correction and Measurement Accuracy Slide: 11

undistortion perspective correction top-down view dimensional accuracy quarter inch tolerance

So once we calibrate the camera, we can do something called as undistortion. Now we can see that the image, the edges of the table are straightened out, which means that I have something to work with, right? So I can do some measurements. But in addition to undistortion, I can also do perspective correction where I change the, transform the image so that I get an output image, something very similar to this. Meaning that I can transform the image as if the camera is facing top down on the window unit itself. And because we know the dimensions of the tilt table, I can use that to map on to what the size of a screen door is. So using that, we were able to get to about a tolerance of less than 1/4 of an inch. So within 1/4 of an inch, we were able to identify the dimensions of a screen door, and it has helped us tremendously.

17:05 Implementation Feasibility and Q&A Introduction Slide: 11

implementation feasibility machine learning computer vision operations ongoing initiatives

So 2 cases talked about. Vision AI is not, It's not something that is really hard to implement. You just have to put those pieces together, see what works with building a machine learning model, and use that information to perform computer vision operations and make the tech help us. So that's where we are at, and we've been working with various other initiatives as well. That brings us to the end of my talk. If you have any questions, I'll be glad to answer. I have a question. Real quick. Thanks for sharing your case studies. Can you give us some insight into how long this took to maybe pull together? Did you do it internal with your team? What's your team size?

18:04 Team Size and External Labeling Tools Slide: 11

team size external tools image labeling model building training data

Did you have outside help? Can you give a little insight just in case any of us want to do something like this? I will try to give as much info as I can. Chris, please correct me if I did anything wrong. So the Vision AI team is small. We have four to five people on our team. But we use an external tool to label our images and build models. The biggest challenge with, not challenges, the biggest time taker is is labeling these images. So we first have to identify what are the classes that we want to figure out in an image. So on a screen door, what I wanted to know is within the image, where is the screen door? So we have a collection of different screen doors with different colors, different sizes. So just like how a kid learns how to differentiate between an apple and an orange.

19:07 Labeling Approaches for Different Detection Types Slide: 11

object detection polygonal labeling bounding boxes attribute detection labeling methods

He sees apples and oranges in different environments so that even at the end, if I take an apple, put it in the corner of a room, and ask him to tell what it is, to be able to tell that it is an apple and not an orange. So basically train the model with a whole bunch of images of a screen door. Same thing with our other attributes as well. So we train the model, Depending on whether we want to use it for image quality, that means object detection or attribute detection. So if it is object detection, just to tell me where roughly a tilt table is, a rectangle would be enough. But if I wanted a specific attribute of a specific size, then we would do a polygonal labeling. So that means we can have a customized polygon that represents a specific item.

20:08 Project Timeline and Learning Process Slide: 11

project timeline model training time learning curve image resolution object detection

So these are the ones that take the longest. But once the model is built, we can run a quick script that takes in an image, spits out and tells us what attributes are identified or detected within that image. So is this like a few-month project, a few weeks? Or what's your time table to have this finished product? So if we have the images labeled correctly, the model training time itself takes anywhere from a couple of hours to a couple of days. But in order to get there, are a lot of learnings, though. And that's the one that took us most time. So we started on this approach a year and a half to two years ago. I wasn't with Pella at the time. I only started last year. But by the time we started, we made some progress, and we did learn some things.

21:14 Dataset Size and Operator Feedback Systems Slide: 11

dataset size training images hundreds vs thousands operator feedback error detection

Whether or not, whether it's object detection or segmentation models, that is something that we learned through the process. and what are the resolution of images that we need to punch into our model? Do we pick a 256 model image, or do we want to have a 1024, 1024, or do we want to go all the way to a 2K or a 4K image? So all of that took a little bit of learning. So it's a subjective answer. If attributes are well-defined and there's the same ones, then you don't need as much time. But when objects do change, yes, it does take time. We have time for one more brief question. And I think I saw a hand back here first. Do you want me to go to him? Okay. And then we'll transition speakers. Yeah, two very short questions.

22:17 Training Data Volumes for Different Models Slide: 11

training data volume few hundred images thousands of images homogeneous data attribute detection

One, I was just curious, what the average data set size is for like a single model of like a screen door. Are you talking like 100 images or a few thousand images for a single model? And the other is, the data just logged for later reference or is if there's a mistake, does a light come on? Yeah, for the operator to say, oh, I put this together wrong. Yes. So the first one, your first question was. Size of the model. size of the model. For screen doors, we had a few hundreds of images, hundreds of images. They're pretty much homogeneous. They're pretty much consistent. But for attribute detection, we're talking about thousands, like 4 or 5 thousands, or sometimes even more. So the other question you had was... Is the data just logged for later reference?

23:19 Heads-Up Display Feedback System Slide: 11

heads-up display tv monitor color-coded alerts red orange green operator feedback

Or is there something to help an operator, the person that built the screen door that they did something wrong? Yes. So we have a heads-up display, a TV monitor basically. So after an inference is done, if it detects that some of the attributes are missing, then it does, we send the inference picture to the heads-up display to tell that, hey, something is wrong. So it can go green, orange, or red. Red means you have to stop and you have to address it. Orange, it is okay. All right, folks, that's our time for this one, but I'm sure Vijay will be around afterwards if we have more questions. So thank you. Everybody join me in thanking Vijay for his wonderful presentation. And

So with that, good afternoon. It is my pleasure to introduce Dr. Vijay Kalivarapu, a Senior Artificial Intelligence Engineer at Pella Corporation. Dr.

Kalivarapu holds a PhD in Mechanical Engineering with a co-major in Human Computer Interaction from Iowa State University. So big fan. He brings deep expertise in applying computer vision, machine learning, and 3D visualization to real-world manufacturing challenges. His work focuses on vision-based quality assessment, attribute detection, and precision measurement in production environments, where he has led multiple initiatives delivering scalable solutions with measurable business impact.

In today's session, we will share practical insights from deploying vision AI in manufacturing, highlighting how these technologies improve quality assurance and measurement accuracy, as well as where they face limitations in real-world sessions. So please join me in welcoming Dr. Vijay Kalivarappu. So I kept wondering if I got the entire text of my presentation to Jake and have him read through a piece of paper.

All right. Good afternoon, everybody. Welcome to my session. And thank you, Jake, for the introduction.

So I'll go a little bit about our vision AI efforts in attribute detection and measurements. None of the work we did is groundbreaking cold fusion, but the effects of implementing them has been really good, and we are in a direction to implement more of those technologies in our company. Just a very quick overview. I've been with Iowa State for long enough to love experiments.

I've been in the industry just enough to learn that I cannot take as much time as I want for experiments. So I do have background in 3D visualization, VR, AR simulations. I also teach design optimization at Iowa State. So it's a part-time gig.

I have it at Iowa State. I am really excited to see that my semester is going to end in 10 days. So any faculty among you, would know the votes. So it's a little overview of my talk.

I'm going to talk about who we are, what we do, and some of the general challenges we face at Pella Corporation. And I will go into the case studies. I'm going to talk about two of them, window attribute detection and screen door size violation. Well, not violation, validation.

So a couple of things I will talk about each of those use cases. What is the objective? What is the existing or traditional practice With those two case studies, I will talk about how we implemented the Vision AI tech and then some of the challenges and outcomes. So first of all, Pella is a family-owned corporation company.

Last year, we celebrated our 100th year. And one of the big push last year was to use modern tech to improve our manufacturing processes. And a big part of it is Vision AI. We have 11,000 team members from across the country working for Pella.

And then under the Pella umbrella, we have five different companies, and Pella is a flagship brand. So just a quick overview of where we are within the US. So Iowa has one, two, three, 4, 4 locations. All of them are wood plants, but then across the country we have plants that do aluminum, vinyl, and fiberglass as well.

And most of the products that we make are designed for residential homes and commercial applications. So I want to talk a little bit about the general challenges. As opposed to a typical manufacturing company, like say an automotive company, ours is unique in that every part that comes to a station, workstation, is different. because our units are custom built. So that by itself produces various challenges.

The second one is in wood plants, we also have other things to deal with. Wood defects, right? Cracks, knots, or pitch pockets. So those are not homogeneous and they don't appear at the same place in a lumber piece that goes through our factory floor.

So how do we identify these defects? And can we use Vision AI to identify these defects? Some of the other issues, I mentioned this in the talk, well, in our discussion earlier in the main room. So how do we make the users of this tech trust in AI.

It's not very easy. It's a generally an uphill task. So we as technology developers, we develop this tech. But if it fails at some point, then the end users are immediately going to throw it away, throw the towel.

I'm not going to use that. So it's a big task. for us to develop this tech and also make sure the end users are okay with using the tech. The other things, vision AI opportunities, where do we use vision AI? This right from picking the material that builds a window frame. all the way down to whether the product is installed correctly or not.

So each of these places have a spot for Vision AI being implemented. So I'm going to talk about two of those case studies. The first one is attribute detection. The objective, I just put it here, to the window units.

It is basically checked manually. So people that work from station to station, they have to identify if there are any issues, if there are any missing parts. And and before sending it to the next station, they would have to make sure that everything is in there. But it is error prone.

So can we use Vision AI to ease their burden? So in order to do that, we designed it as a two-step process. The first one is an image quality check. So we use cameras to capture images of various stations.

We want to have a window unit in there when an image is captured. So the way we have it is at a station, an operator, when he thinks that the product is ready to move to the next station, he clicks a button for the camera to capture an image. But sometimes he captures an image without anything in there. And that is an issue.

And we would have to filter those out. And the second one, tilt table up, so that the table where they perform their work, it will be tilted up so that the heavy window unit can be transferred over from one station to another. But if we have the tilt table up, we don't see all the attributes in the window unit, so we'd have to somehow let the user know that, hey, you need to take a picture again. The third one, material obstruction, right?

If there's a pair of gloves or any tools on a window unit that is obscuring some of the attributes that we want to identify in a window unit. So these are all under simple classification models. We can say there is a pair of gloves, there are tools, or there's a tilt table up. But it becomes complicated really quick.

Take a look at this one. The left-hand side image, we have a cardboard person and tools. All three of those are obscuring the window unit. for an image to be captured properly. Now, the issue is if those 3 obstruction, pieces of obstruction, are not on a window unit, it is okay.

So that helped out with filtering out items that can go to the next step, which is our attribution detection. So assuming that we got a good image, We run it through segmentation models to identify different aspects of a window unit. We position our cameras in such a way that we get a good vantage point of a window. So it's not exactly top-down because we want some attributes detected around the edges of a window unit.

And the other thing is we want these to be done at near real time. Because once the window unit is ready to go to the next station, there's not as much time, because the operator will not wait for 30 seconds before the inference is done and say we are missing a part or something. So we started using GPU-based inferences, and it worked out really well. So far, we have implemented them in Pella and one or two other facilities across Iowa, but we're trying to roll it out across the entire country as well.

And the last thing is train the usage to shop floor and quality techs. So every once in a while, there are changes to our process. Like if we make the Pella logo sticker slightly different, that means that we are introducing error into inferencing our model, right? Or if we change the way a hardware pack is installed on the window unit, if that changes, it is going to add another line of error.

So over time, there is a model drift. So what we do is we not only develop this tech, but we also write documentation procedures to teach quality techs to train models every couple months and then reintroduce those models back into our total workflow. So far, it has been working really well. We are implementing it on a larger scale now.

So I'm going to switch over to the next case study, screen door size validation. The objective, it's not so much whether a screen door has been assembled right or not, but it's more of whether they are putting the right product into the right box. Because we've seen a lot of times when a screen door that is supposed to be 53 inches, But by the time it goes into the box, it is 64. So there are issues, manual issues, or human errors before screen doors are packed into a shipping box.

So the question was, can we do a last check to tell, to give a rough dimensions, or to get rough dimensions of a screen door that is being put into a box? So we applied Vision AI, and it's a three-step process this time. The first one is an image quality check. So we filter out people that appear in the image.

The next one is to overcome camera lens distortions. What does this mean? So typically, every camera that we use, the lenses that come with them, they have certain optical defects. and it cannot be avoided, but we can use certain pieces of information to correct it. So the three main ones that these cameras come with are barrel distortion, where images sort of appear rounded, or sometimes they have thin cushion effect.

So if you take a picture of... of, say, a pillar from a distance using your phone, you would probably realize that there is some curviness to the pictures you've taken. And that's because of the lens distortion. So we have to overcome this distortion in order to do any kind of measurements. And then apply computer vision methods to tell what is the length of a certain part.

So overall, we filter out people using object detection. And this is, say, an input image or an image capture from a camera. So we do calibration process. The cameras that we use, they are pretty dumb cameras.

They just take pictures, they stream images or videos, but they don't do anything else. And it also gives us more opportunity to take control because we can program it in a way that we wish. So, what we do is we calibrate the cameras so that we know the focal length in X direction, Y direction, what is the optical center of those lenses, and what are... the distortion parameters. So these are very standard computer vision methods of approaches to calibrate a camera.

Meaning that I can transform the image as if the camera is facing top down on the window unit itself. And because we know the dimensions of the tilt table, I can use that to map on to what the size of a screen door is. So using that, we were able to get to about a tolerance of less than 1/4 of an inch. So within 1/4 of an inch, we were able to identify the dimensions of a screen door, and it has helped us tremendously.

That brings us to the end of my talk. If you have any questions, I'll be glad to answer. I have a question. Real quick.

Thanks for sharing your case studies. Can you give us some insight into how long this took to maybe pull together? Did you do it internal with your team? What's your team size?

So the Vision AI team is small. We have four to five people on our team. But we use an external tool to label our images and build models. The biggest challenge with, not challenges, the biggest time taker is is labeling these images.

So we first have to identify what are the classes that we want to figure out in an image. So on a screen door, what I wanted to know is within the image, where is the screen door? So we have a collection of different screen doors with different colors, different sizes. So just like how a kid learns how to differentiate between an apple and an orange.

So if it is object detection, just to tell me where roughly a tilt table is, a rectangle would be enough. But if I wanted a specific attribute of a specific size, then we would do a polygonal labeling. So that means we can have a customized polygon that represents a specific item. So these are the ones that take the longest.

But once the model is built, we can run a quick script that takes in an image, spits out and tells us what attributes are identified or detected within that image. So is this like a few-month project, a few weeks? Or what's your time table to have this finished product? So if we have the images labeled correctly, the model training time itself takes anywhere from a couple of hours to a couple of days.

But in order to get there, are a lot of learnings, though. And that's the one that took us most time. So we started on this approach a year and a half to two years ago. I wasn't with Pella at the time.

I only started last year. But by the time we started, we made some progress, and we did learn some things. Whether or not, whether it's object detection or segmentation models, that is something that we learned through the process. and what are the resolution of images that we need to punch into our model? Do we pick a 256 model image, or do we want to have a 1024, 1024, or do we want to go all the way to a 2K or a 4K image?

So all of that took a little bit of learning. So it's a subjective answer. If attributes are well-defined and there's the same ones, then you don't need as much time. But when objects do change, yes, it does take time.

We have time for one more brief question. And I think I saw a hand back here first. Do you want me to go to him? Okay.

And then we'll transition speakers. Yeah, two very short questions. One, I was just curious, what the average data set size is for like a single model of like a screen door. Are you talking like 100 images or a few thousand images for a single model?

And the other is, the data just logged for later reference or is if there's a mistake, does a light come on? Yeah, for the operator to say, oh, I put this together wrong. Yes. So the first one, your first question was.

Size of the model. size of the model. For screen doors, we had a few hundreds of images, hundreds of images. They're pretty much homogeneous. They're pretty much consistent.

But for attribute detection, we're talking about thousands, like 4 or 5 thousands, or sometimes even more. So the other question you had was... Is the data just logged for later reference? Or is there something to help an operator, the person that built the screen door that they did something wrong?

Yes. So we have a heads-up display, a TV monitor basically. So after an inference is done, if it detects that some of the attributes are missing, then it does, we send the inference picture to the heads-up display to tell that, hey, something is wrong. So it can go green, orange, or red.

Red means you have to stop and you have to address it. Orange, it is okay. All right, folks, that's our time for this one, but I'm sure Vijay will be around afterwards if we have more questions. So thank you.

Everybody join me in thanking Vijay for his wonderful presentation. And