Tabular Foundation Models Meet Manufacturing: A Practical Exploration
Production and Operations
Manufacturing AI problems share a common profile: small labeled datasets, heterogeneous process and sensor variables, missing values, and the need for reliable predictions with minimal tuning. For years, gradient-boosted trees like XGBoost and CatBoost have been the default choice for these tabular prediction tasks — from predicting tool wear in milling to estimating creep rupture life of turbine components to detecting process anomalies.
A new class of pretrained models — tabular foundation models (TFMs) — is challenging this status quo. Models such as TabPFN, TabICL, and Mitra can ingest raw tabular data and deliver competitive predictions in seconds without task-specific training, hyperparameter tuning, or elaborate feature engineering. Their strengths — robustness to missing data, handling of mixed feature types, and strong performance in small-sample regimes — align remarkably well with the realities of manufacturing data.
This talk introduces tabular foundation models to the manufacturing and applied AI community. We begin with an accessible overview of how TFMs work and what distinguishes them from conventional ML pipelines. Through select case studies in machining and materials performance prediction, we explore what changes when a traditional ML workflow is replaced with a tabular foundation model on real manufacturing problems. We examine where these models deliver genuine advantages, where they encounter limitations, and what practical considerations arise when thinking about deployment. The talk concludes with a forward look at open opportunities at this intersection — including few-shot anomaly detection, integration with physics-informed modeling, cross-process transfer learning, and real-time shop floor deployment.
Key Takeaways
- Tabular foundation models are a natural fit for manufacturing AI problems.
- TFMs don't replace domain expertise — they lower the barrier to entry.
- The intersection of tabular foundation models and manufacturing is wide open.
Transcript from Summit:
Session Transcript
Thanks, thanks a lot, Jake. I hope I'm audible. So, good morning, everyone. As thanks for Jake for the nice introduction. So, today I'll be talking about tabular foundation models for manufacturing. Just to give you the context of how this comes in, right? So, have I'm sure you know you all have seen some kind of a tabular data, you know, A bunch of inputs, probably an output, you know, you want to predict whether you know what is the manufacturing rate you have or what is the prediction of you know when you are going to get your order from Amazon, whether it's going to come tomorrow or the day after, and all. So, each of them, they're all tabular data in some form or the other, and... Have you ever wondered, you know, can I just dump all this information into ChatGPT and, you know, ask it to give me the predictions and get done with it? Has anyone tried something like that so far? Okay, at least one. But, you know, I'm sure, do you have any experience you want to share in terms of, you know, What you saw when you did that? Freshman, I would say some of the more basic ones would be like, maybe I'm working on a problem, there's just a lot of data or parts to it, I try to upload it all at once and sometimes it gets lost or it doesn't have the right calculation, sometimes I might break it up, that sometimes helps, but other times it's just too much. Yeah, so, as you mentioned, I said that too much data is one thing, but fundamentally, there's one problem with, you know, dumping all the tabular data to, you know, ChatGPT or any of the any of your favorite LLMs today that you're using. The problem is that they are not meant for tabular data. They are meant for language. They are meant to have a conversation with you. And they're usually called as large language models or foundation models for language, per se. But at the same time, you know, in the recent last few years, there's been a lot of shift towards, you know, building foundation models for tabular data. And that's what is the term Tabular Foundation Modus that you're seeing here. If I want to explain in, you know, comparing what is tabular foundation model to what we do in traditional machine learning models, in traditional machine learning, you have a data set, you have a new data set, and essentially you train a fresh model, train them, tune the hyperparameters, validate, and then deploy. And you do this for almost every task that you are trying to work on. Quite often than not, you know, you have to train the models from scratch. You have to do a lot of feature engineering. And quite often than not, anyone who has worked with real-world data, they know that the most important thing that you see is cleaning up the data. There are a lot of cases where you have missing data. You have cases where you know the data is probably noisy and you have to do a lot of data imputation and things like that to even before get started to you know train the model and. What we saw as an opportunity is, you know, the foundation model stroke, right? So think about it today that, you know, in ChatGPT, even if you write gibberish, you know, not perfectly grammatical sentences, and even perhaps with a lot of typos and everything, it still understands what you're saying and it still tries to, you know, figure out, you know, what you are trying to do and then gives you an output. Which probably is relevant to, unless and until you are giving completely and expect some output, it may not, but as long as it's reasonably okay, it gives you somewhat reasonable output, right? So, similarly, even if you think of the case of, you know, tabular foundation models, if there is some noise... If there is some cases where you know there's some missing data, there's a lot of these issues, you don't have to, you know, clean up yourself. That is 1 relief that you'll see, and the next part is obviously just like, you know, in ChatGPT, you don't for say a medical application, you don't train a separate ChatGPT model. Or, for finance application, you don't train your own model directly; at least you can fine-tune them later on and other things, but most often than not, just using ChatGPT alone gets you quite far along. It is the same idea which you can think of in Tableau Foundation models as well, that you know. There is one, you know, foundation model that is in a pre-trained with a lot of synthetic data from different, different, you know, possibilities. So, think about it, you know, you're training on, you know, pre-training on almost millions of tabular data, synthetic data, but you know, tabular data, it has learned a lot about, you know, what are the possibilities of, you know, how the numbers are, how the... trends are shifting, what kind of trends are even possible in a data and things like that. It gets some kind of an insight from it in some sense, right? So, in the same idea, you can think of it that, you know, once you have had you have this pre-trained model, be it for, you know, GPT for medical application, for finance application, for agriculture application. And whatnot, we can use the foundation model to make predictions, zero-shot or probably few-shot in some sense. Okay, so just to bring the idea of what zero-shot or few-shot means, just like in ChatGPT, you say here are some examples of, you know... How I want the response to be, that is zero shot, like few shot, where you know you're giving some example, and zero shot is essentially where you're just saying that this is what it is, give me an output. So, in both the cases, you can work with it. So, to summarize all the, you know, unique, you know... Benefits of using a tabular foundation model. One foundation model, it helps us in, you know, reducing the training time. You don't have to train your own models. It makes it robust and things like that. And then it works in especially the regime of low data. Earlier, if you have to train your own model, then the question comes, you know, how much data do I need? How many data points should I collect? And it usually goes into thousands, you know, probably more than thousands as well, millions and all. But with these kinds of, you know, tabular foundation models, maybe perhaps you just need 20 examples, 30 examples of data, and perhaps you'll get much better. Better results than what you would imagine, so that's one particular benefit that I see, particularly in manufacturing, you know, where collecting data is very difficult, and that is 1 particular reason why you know Tableau foundation models are very impactful. The third is, as I said, data imputation and all, right? So as I mentioned, you know, you don't have to clean up the data. You don't have to do any pre-processing per se before you feed the data. It can understand even if it is noisy, even if the data is missing and all. And then the 4th is domain agnostic, as I already mentioned. And then the fifth part is, think about it, that you know, and that's a tabular intelligence in itself is something that's being developed in the recent times and a lot of industries are investing on it. In fact, you know, Amazon has its own tabular foundation model called Mitra and Prior Labs is a startup which is working on tabular foundation models and And then there are other startups which are doing a lot of work in building zero-shot and few-shot models, which help us in doing a lot of tabular intelligence. I do have some demos I can show you to us if time permits. So, to give you an idea of, you know, of course, there is a whole idea of, you know, how the model is trained and other things, but the architecture of it and all, but the idea is that, you know, what you feed the model is your X-train, which is your inputs for your training as data, and or whatever you call as, you know, a small set of data as a... Context that you are providing, and then the Y train is what is the labels that are supposed to be, and then X test is the label, the samples, the inputs for which you want to make prediction. OK, so just like you say, here are a few examples: these are X, these are Y, and you're asking what is the prediction on, you know, a few set of X that you have, you know, you know the inputs and you want to know what the output is. Just like, you know, GPT is a transformer-based model. This is also a transformer-based model. Tab PFN V2 is one of the Tableau Foundation models that exists right now. Apart from that, there are many others, like Tab ICL, Tab DPT, and all. These are all Tableau Foundation models, but the most famous one is Tab PFN. prior fitted networks. It's created by the startup called Prior Labs, based out of Germany. I think they also have an office in New York now. So these TAP PFN models are also based on transformer. They use the same self-attention across the rows to create a context. And Right now, you know, earlier when we started working on TabPFN and all, the context window was around 10,000 rows and, you know, 500 features or 500 columns of tabular data. Now, these models can work with almost around 100,000 rows. and 5000 or 2000 or 5000, one of those numbers of columns. Again, that's not a stopping point. If you are, you know, hitting any of these hurdles, there are ways you can even get around those. But this is where the current, you know, Limits are for these kinds of models, just like you know when we started out with in a ChatGPT 3.5 or four, your context window was you know 256,000 tokens or something like that, but now we are working with 1 million tokens and all, so the same way the context window is something that you know is right now at this stage, but we are hoping that you know this expands even further. Yes. OPFIN. Which one? Yes, so it's a Holman et al. is a paper which was basically Nature paper which was published on using TAP PFN for scientific applications and all, yeah, thank you, yeah. I have a few examples I wanted to show. The first example is of a manufacturing machining data. And you know, this is an extreme example, but you know, I thought, let's start with this extreme example. This is actually from a paper which came out in around 2010 or something. This had results of around, you know, the inputs are speed, the cutting velocity, the feed, feed rate for turning operation, and then the depth of cut, and then the nose radius of the tool for inputs. Output is surface roughness. That you get as an output of whatever workpiece you are working on, and simple manufacturing process, you have four input parameters, you have one output with zero shot, you know, just using 22 rows, training rows, and you know, using the rest of the samples for testing and all, you get around. R-square value. It's quite good in terms of, you know, zero shot getting this kind of a performance, whereas if you expect, you know, perhaps you train your own model. When I started working on this kind of a very small data set, the best model I got was around 0.91.92. Given the same data split and all, so this was a very good performance when we started out. This is one example, you know, as I say here, most regressors collapse below 50 samples. You know, you don't really, how do you even work with 50 samples for any of the machine learning models and all is a question that quite often people ask, but you know, this is one example where we have been able to do very with very few shots being able to do good prediction. And then we, this is another example of a problem. You know, we are in Iowa, so we should talk about agriculture to some extent or the other. So this is agriculture yield prediction. We published this in a triple AI workshop. Earlier this year, we worked with three datasets. OK, the three datasets are for soybean in the US. It has about eighty-six 1000 samples, and then we have global from multiple regions and all. around 28,000 samples, and then one specific data set for European Union, around 8,600 samples. The inputs for this is, you know, as you can imagine, yield prediction, you need to know what is the kind of weather in that area. So aggregated features of weather. and some crop information and things like that. And then you also have, so this doesn't have any missing values, but as you can see, this one has about a 5 to 13% of missing values. And this is categorical heavy in terms of the samples and everything. Very heterogeneous in terms of, because you're working with a very diverse and complete sample, so you can see large, complete, diverse, complete samples, and then the small but missing samples case as well. So, these are three varieties of cases that you can see, as you can see here. Tab PFN V2 with almost zero shot performs much better than, you know, all the machine learning models that we have known all along, like, you know, Cat Boost, XG Boost, Random Forest. If you have worked in machine learning in tabular data for a while, you would have heard of any of these terms quite easily. And you can see that this performs much better than those zero shot, and then there's something called as auto gluon, which is essentially, you know, fine-tuning whatever you get from Tap PFN on top of it to, you know, essentially make it even better. So, you can that's the essentially the whole story that we have here. Theh. Another example is this global case, where you can see that, you know... Even with zero shot, we are able to get almost close performance to this, but obviously random forest is doing better in this case, but not that different in a 9716 to 9794. It's not like you know you have a major difference there, but still something to note in that sense. The key part is the compute part. You can get this result in less than a second, rather than training a model, preparing the model, and doing all the things that you have to do for training and doing any of these things. Same way, you can see this one is when you have missing samples. Random forests and all don't do as good, but in a Tap BFN we do, because of all the, you know, auto imputation and things like that, it does much better than, you know, all the things. There are a few things I probably need to mention is, you know, how the data imputation is impacting the entire thread in general, but In terms of performance, it gives you 991 instead of all the 93s and 97s that you have seen all along, but this certainly is an example of how it gets impacted in general, so this is about, you know... And how we can see that, you know, TAPDF and V2 or in our Tableau foundation models in general can perform much better than you know what you have seen so far. This, if you want to see in terms of, you know, a different when to use what and all, you can clearly see that, you know, when you have a large and complete data set, you can always, if you have a large data set, you know, you can always argue that, you know, I can... Always, perhaps, you know, train my fine-tune my model, in which case you can go with AutoGluon or AutoML type architectures, where you create a model using TAP OPFIN, but you can always fine-tune it with AutoGluon type architectures, and you do much better, but if you have diverse and perhaps complete, then you can either go... with table foundation models, or you can even go with auto multi architectures. But if you have small and missing data type scenarios and things like that, going with table foundation models helps A lot. So I think one bottom line that you'll see is, especially if you're running into a low data regime, Tableau Foundation models certainly win. Second thing that you'll notice is that you know it can work with large datas as well, but you know you can always improve because you have more data, so you can always do better. And the other thing is, you know, the bigger picture that you need to understand is Tableau Foundation models. are not going to replace traditional machine learning models in any day. It can, in terms of, you know, it can be fine-tuned, type of foundation models can be further fine-tuned using AutoML and all. But in general, the idea that we are trying to say is that, you know, use it. To get an initial guess, right? So, it's very quick, and you can get responses very quick, and you can, you know, use it to work on a bigger picture, rather than, you know, just the machine learning model that you're trying to work with. So, think about it that, you know, when we talk about physical AI or any of these things, right simulations and all, we always say, "I don't care about the accuracy of the as long as I'm able to quickly iterate over and go forward, right? That's how the digital twin, the idea of digital twin and all work. Same way, you know, if your goal is to not just get some kind of a machine learning model, very perfectly accurate machine learning model, but you want to get some kind of, you know, a close to accurate model. And then, you know, you want to quickly iterate and see, you know, what else can I do? Can I do I need to add more data? Do I need to bring more other data features and things like that? I don't want to sit on, you know, keep on training a model when I don't even know whether that model is really what I want to train, or is the data is the problem, or what is the problem, right? Quite often than not, what I've seen when working with different industries is that, you know, There is data which you need to improve on, and you need to also improve on the model. But this at least helps me in focusing on the model, on the data, because I know that the model can do as best as what I want in some sense. There's another example here. Vehicle sensor data, it's like what you get from a canvas. the sensor data from a canvas to essentially, you know, in this case, it's for, you know, large combines to essentially detect some kind of, you know, information of soil moisture or different things that you can get. Sorry, correct? Yes, so it is from that, again, for the sake of, you know, anonymity, I'm not providing what combined what data and other things, but the idea is we had about, you know, 8 features of vehicles, you know, or can bus signals. Aggregated by different unit IDs of experiments that we were doing, and then it's again a tablet regression model, where you're essentially trying to understand in real life. There's a lot of this is one of the most noisy data that I've ever seen. It has all kinds of heterogeneity in terms of the inputs going all the way to NANDs, but you know, very high numbers at the same time, very low numbers in terms of 10 to the power minus 8 and things like that. And then it also had a lot of missing values. cases where you know the sensor was not really robust, we don't even know whether the sensor can be reliable or not, or should I even use that data or not, and things like that. And there's no real linear structure that you can work with. So this is as real as it could get in terms of, you know, the data set that you can see. Here again, you can see all the models kind of give up when this did much better than the rest of them. Again, of course, you can say it's not that different, 878288, it's not that different. But the key part is we were able to do this in less than a day, so we could at least understand what is the data issues, what are the different things, and we could, you know, go ahead and do other things that we wanted to do, because this model is not the only thing that was stopping us. We wanted to use this model to, you know, go build Something else for the sensor to improve the sensor, understand what sensors do we need to replace, and things like that. One thing that you'll see is, you know, especially if you are using the number of samples that you're using, right? So you can see if you're using 10% of the samples, then you get 0.84 type correlation, but if you go all the way to using 50%, you get 0.88. So, but you can, you know, further keep increasing and see what happens, but in most of the cases, it Do that well after that, so I think 882, and I think it's more or less stuck over there; it doesn't go further from there, and... With this, you know, I think one thing I wanted to talk about is, you know, how the industry is moving in terms of different things, so... So far, what we have seen is in terms of, you know, given a data, you know, making the prediction and things like that. The question that I think mostly all of you may have is, you know, okay, what do I do with it? How does it matter to me? And that's where, you know, the idea of... other models that I was talking about, like Kumo AI is 1 model. It's A relational foundation model built on top of a Tableau foundation model. So think about this as, you know, it works with multiple tables. It understands the relation between them and tries to use that to essentially have a conversation with you. You can ask questions in terms of, you know, what are the insights on this? Then it will essentially identify the relations of all of them. You don't need to flatten the data of multiple tables together to essentially get one big table and then work with it. So this kind of a relational foundation model is something that people are using now, especially in DoorDash, Snowflake, and all to understand. What are the relations? How do I understand the insights of them, and then you know go from there? The other models like AWS has, you know, Mitra on top of that, I think anyone of here anyone here has heard of Amazon Quick? So Amazon Quick or AWS Quick is 1 another dashboard type platform which has this kinds of features of you know having a conversation based on a data set. It can you can have conversations based on Tabular data set. You can connect S3 buckets and then directly work with it and you know have some kind of conversations with that and all. So that's something that I've seen people do quite a lot. And again, there are other major players that you can think of in this space which are doing something similar to this. If you ask me where the future is in some sense, you can think of, obviously, tabular intelligence is something that we have seen quite a bit in terms of how we can use. in different spaces. I've covered agriculture, manufacturing, and, you know, autonomous vehicles and things like that. But you can use it for other applications as well, and medical application, FinOps, and a lot of applications have these things. Again, the other thing is missing values. You usually try to imputate and do something on your own. But here, you are using auto-imputation and things like that, and that is something that helps us a lot, and perhaps that can help us in understanding probably that, you know, maybe missing values are not really, you know, a bug. It's perhaps something deeper inside that you can get from those things. And quite often than not, we realize that, you know, LLMs or AI models can understand data differently from what we do. So perhaps when we get a different insight than what we have seen so far. And then I think one thing that is a relief for us is that, you know, perhaps You don't need a lot of data. So far, we thought we need to collect a lot of data to train our models and do things. But perhaps we don't need a lot of data. We just need a few samples, few hundreds, even thousands, or even probably a million max, but you don't need a lot of data to start. Training your own models or using your own models for doing tabular intelligence, particularly. So, with this, since I have some time, I can quickly show you a demo, but before I go there, are there any questions that I can answer for you? Edge AI will be a great example where to use this stuff, right? Yes. Edge AI is something, it will be useful, but there's one caveat to understand that, you know, these are all foundation models. Just as much as you can't put a big llama model in an edge device, you'll have such considerations. But I think this is relatively easy. You can use it on your own laptop, so it's not that bad in terms of memory and compute and all. Any other questions? Thanks. I appreciate my background is in metal cutting, so I appreciate that you had the example in turning. Sometimes tabulated data has a different purpose for why it was constructed. So like your turning example is essentially a set of experimental test results. that map out some of the parameter space, but there's also guidelines in handbooks that are essentially an encoding of knowledge of lookup tables of ranges of parameters to use or lookup tables for roughness that was achieved under certain conditions. And sometimes it's a different thing. It's A lookup table like. maybe material properties. So different metals have different stiffnesses, yield strengths, and ultimate tensile strengths. And I'm wondering about the role of the tabular intelligence in the context of a combination of the purpose for which the tabulated data was created and the purpose for which the user is trying to use it? Yeah, this is an excellent question, right? So I think that's very close to what I was talking about, the Kumo FAI part that, you know, perhaps you may have a lot of large database, right? You may have, as you mentioned, you know, different material properties. different material manufacturing conditions, and you know, even you may have a database of, you know, multiple manufacturing conditions like turning, cutting, milling, and all. You can have a lot of conditions which can all be part of the same database, but you can essentially, instead of you writing a particular lookup table or a SQL query, say that, you know, this is the data that I want, and then perhaps have, you know, some kind of an insight from it. You could say in a natural language that, you know, hey, I want to find out what are the, just like, you know, you go to perhaps your bank account now and say, I want to know what are the trends of my last one year of purchases I've had and things like that. Then it will essentially filter out the data that is relevant to it and then. provide you some insights from it. So think about it in that perspective. So it can essentially do that relational database, understand the relation of multiple data, or even filter out the data using a SQL query or something, and give you something which is more relevant to what you want. So, but again, the key thing is To know, in terms of, you know, what data sets exist with you, so that you can have that kind of relational graph built in, so that you can actually do something like that. Does that make sense? I Any other questions, Jay? Thank you, sir. So, I got a question about the low data usage for training the model, yeah? To understand that you don't need as much data to train the model, but if there is inherent bias. the amount of existing data that you're using for training, how does it help with extrapolating it for something that's not there in the data? So example, right? So we have temperature data. All my temperature data is around, say, One 100 degrees Fahrenheit, but there's only a few points that are, like I said, 300 Fahrenheit. I understand it works on low data, but I don't have enough data for 300 Fahrenheit. Would it still be able to do predictions correctly with lesser data, or do we have to mash the data in the beginning itself so that there is good spread of it? Right, so it's a great question, right? So, yes, there will be some bias which you with the state of the starting data that you start or the data that you're starting with, right? So, if you're saying that you know you're only going to start with say all 100 degree and then probably one or two samples of 300 Fahrenheit, maybe expecting to get some good results with 300 Fahrenheit may be an over expectation over there. Obviously, the bias is built in the model per se, because what it is doing is it's seeing some kind of a, you know, relation or a trend within the data and, you know, saying it doesn't understand that it's a temperature. It doesn't even understand that, you know, from one temperature to another temperature regime, something is changing. Just as much as you know, ChatGPT, if you give a bunch of things and ask something as an output, it may not even do because it doesn't understand the connection between, you know, multiple files that you have provided and, you know, what is it that you're asking as an output. It's the same way, you know, that extrapolation capability is certainly going to be. Dependent on the bias on the data that you're providing, in some sense, if you provide a very clean data of, you know, fully balanced data, then it may do much better. I think the question that we should probably look for is, you know... The way to rephrase it is, given the data, the best model performance that you could get in a very, you know, less time is going to be what you get from Tableau Foundation models. You could probably perhaps invest more energy to slightly move it by a little bit. But data is the king, ultimately. You know, you need to probably work on the data, and that's where you'll probably, you know, as if you are starting out, you see that you know there are a bunch of three hundreds, and you know the rest of the data is in hundreds. You see, this is the best performance you can get. Is it sufficient? Maybe it's sufficient, because that you are probably doing anomaly detection. It doesn't matter whether it's predicting or thinking it is 300 or thinking it's 150. It's just for predicting anomaly. It's anomalous, so it's good. So you don't need perhaps more data. But if you are making more clear prediction of, you know, some specific trend of, you know, how, you know, metal manufacturing processes from 100 degrees to 300 degrees, There's a complete difference on the formability, the material properties, and everything changed quite drastically between these two regimes. Then, in that case, perhaps you may need, you know, some more data to collect in the rest of the regime. So, depending on the data, but at the same time, you know, it gives you a good quick start for you to go from there, basically. Thank you. Any other questions? All right, then. If there are no further questions, I mean, if you have questions, I can answer them later on as well, but I still want to see if I can show you a quick demo. Um... So there are, you know, I am using two examples for demo. One is from Prior Labs.ai. That's the startup which essentially runs or built the model, the TAPPF and V2 model that I was talking about. And And you can see that you can actually upload your own data set and play around with it and do things, especially, you know, on any like either this model or previous models and things like that. And in this case, I just chose one of the samples data. As you can see, there are a lot of samples that are there already. It can be either sales or it can be industrial, and you can see that there's a concrete compressive strength data set. That's why I had loaded earlier. A As you can see, it has a bunch of features. And then finally, the target and what the prediction is in some sense. As you can see, TAP OPFIN in this case for this particular data set, which, you know, is, I can provide the exact metrics, but it gets better performance than even random forest, XG boost and all. And linear regression has the maximum error. Tap PFNV 2.5 plus has the minimum error in MSE. And it provides a simple output. So this is, and you can easily upload your own data set. It allows you to directly upload. Either a CSV file or an Excel file with header rows of around 20 to 40,000 rows, and including a column on what to predict and all. So, this is a simple, you know, you set interface-based way of how you can do it. Or if you are more like me who likes to code, then you can always get the code, run it on your own local machine. You don't want to use that server, you want to run it on your own local machine. You can just download the model and then run it in your own local machine. And that is also equally easy. You can just access it from here and then run it. This is just how to run just the tabular foundation model alone. And there is another example, which is the Kumo RFM that I was talking about. So this is an example of, you can see in the Kumo RFM, they already have a few data sets here. One of them is e-commerce, where they have data on returns, views, items, orders, and users, and all, or the other data sets like insurance, F1 racing, and all, or you can even upload your own data set or link it to your Amazon S3 buckets. or snowflake for that matter as well. And once you have, either you can infer the schema or you can actually write down the schema as well. That's an option that you can do. So once you provide the schema and all that information, it will create a graph, something like this, to come up with the entire, you know, the whole. idea of how each data is related to other and table is related to other. And once you have all of this ready, you know, you can, once you have the data, you can always go here. And all you have to do is, you know, you have to select which table are you working with. So say you're saying e-commerce, then it will say how many orders. Will each user have in the next 30 days? And that's a question that you're asking, and then it will analyze your question. It will essentially, if you can see here, it is making a query in a SQL query, in terms of a predict query language, they say. where you are saying, you know, we are predicting the orders and for each user, and then it essentially finds out what is the SQL query that it needs to run and create a table. And then once it has the table, it will make a prediction based on that. Okay. And you can further ask more questions and, you know, have a conversation in some sense. OK, so just wanted to show you these two examples of, you know, how you can use this to create tabular data and, in a tabular foundation, use this for inference and all. Yes, Vijay. What was the first one? What was the first tool that you showed? It's called Prior Labs. Prior, like Tap PFN is the model. Prior Labs is the startup which actually trained that model. Thank you. Yeah. You. Yes, it's free of cost. I have not paid a single cent so far to them. Yes. And my slides will be there and they have that will have the link. So you should be able to access it from there as well. So yeah. I was going to ask a question in those regards too about favorite tools. Obviously, this is one of them. Any other favorite tools based on benefits that they might have over this? So the thing is, this area is quite in a in as any AI models space, you know, ChatGPT like 5.5 and then Opus, Cloud Opus, they're all fighting with each other. Same way there are. So this is when we started working on it, Tab PFN was the V1 and then we had Tab by CL and then Tab DPT. So these are the three major ones. Tap PFN, ICL, ICL is in context learning, and then DPT is, I think, some transformer, predictive transformer. I don't know what the D is on top of my head. So these three models have been fairly, really good. And then Amazon Smithra is the other one which came out very recently. Some of these four, three or four are the ones which are right now are doing really good. If you ask me which one is best so far, I think a prior labs, the PFN is well tested in so many broad areas. You know, Hitachi and so many companies have already used it. I have myself worked with several industries too, you know, Help them use tabular models for their problems and all, so Tab BFN is the first go-to. If not, you can go to Tab ICL, and Mitra is the third one, and that's the rating if you want. If you ask me today, tomorrow, I don't know. Yes. Right now, are they training models off the data you give them? So, your question is, is it safe to use it on company code and all? Is that company data? Company data? Yes, absolutely, because especially when I work with industries, I do not use the user interface of this. I literally have the trained, like download the trained model. It is so small that it even runs; I can do inference on my Mac, that one, and it runs, yeah, yeah, I think we have time for one more question, yes. You say you download the code, you use it, and just like the thing is, I've been asking for Max because I think, I mean, right now, because the Nvidia's, you're fighting gamers for it too, and the price is crazy, and so it's like we say is like, get the Max, right? Yeah, and you feel that? That's like a policy to go? Well, not really. I mean, it so happened that I'm using a Mac and it's working very good. I did my PhD using GPU computing and all, so yes, I understand that you know you would go with that. Perhaps the other alternative is you can train these kinds of models. Easily accessible from even Google Collab or things like that, so that's another way you can quickly train the model. It's going to be very easy to do it in Google Collab as well. I agree, but Google Colab is a temporary instance; you will probably train the model and download it to your local, yes.