DockerCon
A MLOps Platform for all data personas – Lessons learned from IKEA
Karan Honavar, Engineering Manager, IKEA
Harish Sankar, Lead Engineer, IKEA
Fernando Dorado Rueda, MLOps Engineer, IKEA
Transcript
Good morning everybody. I’m Karan from IKEA, Engineering Manager. If you heard our keynote, you know what IKEA does. And this is a bit of a deep dive session about what we spoke on for 10 minutes in the keynote. So, what we will do today is discuss why do we need a platform for MLOps at IKEA? What data personas exist? I mean, who are we building the platform for, who are our consumers? Then we’ll deep-dive a little bit more on the components and then start to demo. My colleagues will come in to show you the demo, and hopefully we have some time left for Q&A.
Table of Contents
Why do we need MLOps?
IKEA has a vision to create a better everyday life for many people, which means we strive everyday to build products that make your home furnishing needs disappear. And coming to why we need a platform for MLOPS, right, we need to accelerate the lifecycle. We discussed in the keynote as well that we need to be fast from idea to production. And that’s really, really required to keep up with not just the competition, but the technology that’s emerging all the time. And standardization and collaboration, I mean, many of us in the company are doing the same things, but differently. And in data science, if you want to do scaling, you need to standardize as much as possible.
Collaboration becomes essential. You need to be able to reuse components that you’ve built for a certain model, another model that could probably do the same, right? And scaling ML essentially means that we’re able to deploy on the edge, we’re able to deploy elements, we’re able to deploy just a simple classification algorithm. So the entire scope of something like a time series model, too, for example, a Genie AI.
Then observability and explainable AI — this becomes very critical to us. We have a customer promise that we will always use our customers’ data to do the right things for them. And that we do not sell or misuse data. So we need to be able to develop explainable AI and also look at, for example, security and compliance. In Europe, we have the AI Act coming up, and that is likely to have a big impact on how we develop and deploy AI. And these things come together for us to say, why do we need a platform?
Consumer landscape
Next we’ll look at the consumer landscape or who are we delivering a platform to? So we did this assessment in the company saying, okay, now who’s working with data and algorithms? And then we see someone of a research data scientist persona, and this persona is working with the models that are coming out of universities, the latest in the market, right? Trying to see how we can bring something from what’s an experiment to a real-life product.
Then we have the business data scientist. There is essentially looking at converting a business problem to a data problem running a lot of proof of concepts with business and doing a lot of ideation.
Then we have the citizen data scientist. This is a person who’s a data analyst who’s working within the countries like a market manager, store manager that also has access to data, but has, for example, use cases like how effective was my marketing campaign, or how can increase footfall, and stuff like that.
Then you have what we call an engineering data scientist, and this persona is someone that is working with the code, has both DevOps and data scientist experience, and these are the ones that are having a lot of models in production.
ML lifecycle
Now, moving further, the concept with which we built our platform is headless by design, and you can use it as an optional component in any part of the ML lifecycle. So you could just come ask for a development environment and take it away. You can develop it on your machine and come to us just to deploy. Or, you can deploy it yourself in the software but then literally observe with us, which means the monitoring observability is a component that we take care of.
If you look at the construct overall, it’s looking at how the ML lifecycle works basically, and then on the left we always have data, right? I mean, data is, for example, coming from the marketplace, we coming from the catalog, and it could be both business data and operational data. Then you have basically the code repositories, artifact trees where you’re able to import and work with the code. There is an experimentation component, which is essentially allowing you to try different models so that you can choose the right model for the use case. To do that, of course, you build models; models need features, and features come from a feature store. So we need to build a feature store.
Then you have the whole deployment journey with how you want to deploy a blue-green, canary, and all of those, and then the inference where the data is hitting the model. And then we come to the observability journey, where it’s speaking of the model observability, performance accuracy as well as drift, for example, to see how the model is changing with regards to the input data or the output data and so on. And then this whole thing has a managed layer, which means I, as a data scientist, am able to govern, control, and own my model.
Moving ahead to develop, I mean nothing much here, reusing, for example, GitHub artifact trees, that is provided in the company. So we are not inventing anything here. The experimentation platform essentially looks at ingestion. ML appliance provides an integrated development environment which can do experiment tracking, experiment registration, as well as connect to high-capacity compute, like GPUs. Then moving ahead, we have the feature store, which is essentially a repository for all the features developed for a certain model that you can contribute to or consume from — basically that is built with different models.
So you have the entire feature lifecycle management, which involves when do we retire a feature, or when a feature is new, and what state is it in? Then, a catalog, which actually states the different features and where they’ve been used. And then pipeline, which means that you are able to have your data pipelines to your experimentation platform to reuse or contribute features.
Coming to deploy, in the keynote, we showed a lot of containerization and separation techniques because we need to be able to separate the model from the software layer and then deploy it in different styles. For example, shadow, blue-green, canary, and then, of course, the whole deployment pipeline, with the model CI, CD, versioning, and so on. And we also have the model testing piece, where we are seeing that the model is giving the results, or the data scientists that they get the results that they want.
Then the whole governance and auditability. This is extremely new, which means that we have an ethics and responsibility organization that would like to audit models to see how it’s behaving. What are the feature importance that was given into deploying the model and so on or developing the model, right?
Moving ahead then, we look at the inference pipeline where it’s pretty standard with logging everything. I mean log, log, log — we believe in that and it’s required for the future. A task orchestrator that is basically putting the data across to the different models and then of course the inferencing could be batch. I mean, models in, for example, fulfillment, supply, or forecasting of stock. They work with batch, and then you have the real-time inferencing, which is like models that sit on IKEA.com, for example.
And then you have the whole observability layer with two components. One is operational monitoring, which is essentially focusing on model performance and the platform availability, model availability. Then the observability layer, which is focusing on drift, on outlier references, and then looking at retraining pipelines.
So that’s the observability layer, and then we come to the managed layer, which we built ourselves. It’s the internal design library, and there you have the model registry, which is the centralized repository for all the models that we have in the company, versions of those models so that you’re able to promote remote versions. Metadata repository, which is for the model models, metadata, and then the whole lifecycle management of the model and access management, which means the model can be private, public as well as shareable — that I can actually share it with individuals instead of making it public for example.
Within the whole managed layer is the experience layer, which is the frontend that I spoke about, and it also has a role-based access space, which means it’s where I get to see what for example cost of my model is, how much data is hitting on my model, and so on. And then, of course, the different integrated development environment flavors that you will see in the demo as well, where I can actually have some sort of a buffet concept of what would I like to develop, or what requirements do I need in my development environment?
Then our data marketplace interaction, you will see a data catalog interaction as well. But essentially I can — sitting in one place — look at all the various data sources in the company that I have access to and actually put the data in. Then we have a whole one-click deployment journey. If you can’t do it where you code, you can also do it with frontend, and the same one-click observability journey, because we have all these personas that we need to work with.
These are the platform components that we have. The infrastructure is on Kubernetes; it’s the heart of the platform. And then the code is essentially Python React, and then we moved a lot of cloud-native tools that we use. So we use Docker a lot for containerization; we use CSD pipelines with ARGO CD, KNative, Kafka, Prometheus, and then a lot of machine learning specific frameworks like MLflow, Seldon Core, Alibi, ZenML, Feast feature store, for instance, and the entire UX and engineering that ties all of this together from IKEA.
I would like to welcome Harish and Fernando, engineers in the team. Harish is a lead engineer that is the architect for this, and Fernando is the machine learning engineer that will run the demo for us. Welcome.
Thank you, Karan. Good morning, everyone. I’m sure Karan has given you the background about what is the MLOps platform? What are the different constructs of it, and what are the different components the platform holds? I will quickly switch to a simple UI demo — not getting into details — but this is our platform MLOps.ikea.com.
Platform demo
So this is our MLOps platform, and the purpose of the platform is very simple. Just like how I walked in today — a data scientist doesn’t need to carry anything with him. He just walks in with an idea to create a model with an algorithm, which probably he has in mind. He comes here.
He comes here; go to develop, get a development environment, what we call a training lab, and get an environment, CPUs, GPUs, and whatnot, for the model development, PyTorch, Python, Anaconda, all the snakes. Develop those software components for the model development kit. Develop the model, deploy, register the model and come for a deployment. The purpose here is you walk in empty-handed just with an idea, you get the environment, put your ideas into the play, create a better model, and deploy it, and walk back. And hand off to anyone else as well within the teams.
So you don’t get an affinity back to an infrastructure. The blade, the bare metal V1, or a VMV1, is my machine where I developed this model. This is a machine where I developed another model. We go away from those kinds of legacy mindsets and just have the infrastructure abstraction completely taken away by the platform and just use it like a just SaaS solution. Of course, containers are being given to them on top of the GPUs and CPUs.
But I’ll quickly show what is there in develop. So, develop is a place where you get your development environment. You get to deploy it in your own cloud or in the cloud of the platform as well. You select the project which you are part of and select the GitHub actions, which is needed, and let’s give a demo. Docker.com. We’ll select our favorite zone, and you can get a GPU environment as well directly from here or maybe a lightweight model development. You get a CPU environment and animation of any type. And here, usually you get different IDEs needed for the model development. It could be PyTorch, it could be Python, or other.
Basically, you select what are the different other platforms and the software packages you need, and you click on deploy. Immediately by clicking on deploy, you will have the machines coming up right away along with the access to the Jupyter notebooks installed on them and the means to connect to the ID and then do a remote development. That’s again the develop. So this is a similar, whatever I take it there comes get listed back in the service area. Let’s say this is an instance which I created before. I could actually see the different steps of the machines being created and as well as the metadata of this development environment.
And moving further basically this is a space that we call a service space. Every model gets into a container, and they get listed under the services. Let’s say I have developed 20, and I’m going to get 20, and all of them will be listed on here. This is also a place where I register what are my models, what is current model, all of the model governance also gets registered back into the service area.
So now I showed you how we can get the environment, and how you can actually list the models. What should I need? What I need for model development is, of course, the data. That’s where the data catalog comes into the play. This is a place that has been always a problem with any model development. To find the right data for my model development I have even a quality data for my model development. So let’s say it could be an online payment, it could be an analytics platform, it could be people data. I don’t have to, as a data scientist, go somewhere else to find the data and then take the data, do an extraction transform, and then load it back into the model and do the model development. I can feature on the data right away from here and I am showing sample data here. But I can see the different data schemas available. I can select all of them, and it will quickly let me do the query for it. Which will be used directly into the model which is imprinted back. And that’s the data catalog part.
Now let’s say someone has developed a model. It could be a Llama model, it could be a public model, which is downloaded. Maybe a little bit of fine-tuning is done. But not using the platform. Let’s say they want to come and deploy it here. That’s where the deploy piece comes in. I don’t have to bring a code. I don’t have to bring the artifacts. I don’t have to bring the file. I don’t have to bring all of those things, dependency packages with the code to do a deployment of a model. All I need to bring is Docker, which is prebuilt, and which is set with all the dependencies matrix. Neater for the model packages. All being inside the docker image can be brought into the platform. And then right from here, they can be publicly inferenced. And then the docker image basically goes into the deployment stage and gets an end point for production. You bring the docker image here. You get the end point. And then it serves any end users based on the ingress and egress policies.
So this is a deployment space. It’s pretty much similar to what you are seeing in the dev. I gave a model name. And let’s say I use XTBoost. And I’ll go for a custom. And it asks me for a Docker image. I can provide an image link. And I give my model name and the creator and go forward with the deployment. The end result of this is going to be an end point which is deployed as in public in front of.
And that’s the deploy. And last, I wanted to touch on the feature store. What is the purpose of it? Feature store is basically a centralized repository of features, which is used by a model. I mean, it’s a common problem where, let’s say I developed a model A. And out of 100GB of data, I probably use 100MB of it. Let’s say, I’m developing a model which needs inputs like age, location, and the type of interest — what they have purchased. And I have used these three features, standard extraction, and I have developed a model. But then, all those features should not die after the model development is done. It is just like democratic data. It’s about sharing the features of the model to other persons as well. So, in that way, someone else who needs to develop a model will use this, look at my features of the models, and then do a redevelopment.
It’s a centralized space where I can say what are all the features which are associated with the model. I mean, we don’t have to carry around the data. I mean, it’s a metadata of the data, which says these are the ABCD, which is used in the model. And this can be used during audit. This can be used during the business discussions, months, and any of those, whenever the model is getting into the review stage. But all of this will be shown in detail in the back end by my colleague, Fernando.
Thank you so much, Harish. Hi, everyone. Fernando here, I’m a media officer engineer, and a technical guy who write the code. The thing, and I will show in detail a demo about how Huawei Kia uses this service. First, in my case, say I’m a data scientist, and I want to train a model. We can use this service that we provide. So what is the first step?
Data wrangling
The first step is to decide what data we need. Where we take the data, we take the data from the catalog, from the query that my colleague has shown. So I will import the data that we need for this problem. And I will import the data. I’ll create a panda data phrase. So me, as a data scientist, once you understand what is inside the data. What is the magic, what are the numerical columns, categorical columns, and how we can create features. Or how we can decide which feature I want to fit into the model. This is the column that I use. This is some statistical distribution about the columns.
Now I will create some correlation about the data. For example, this is a Titanic dataset, where I want to evaluate the probability of some data. For example, the probability of survival for a person based on the class, based on age, based on family, etc. For example, looking in the correlation, we can see that the first class in the Titanic has more probability to survive. Or based on the sex, the female has more probability. Now, it’s common to start creating some graphs in order to visualize how our data is distributed. For example, based on age, there is less probability to survive if you are an old person, which is normal. We will start making our assumptions, create some hypotheses that we need to validate. This is typical work for the data side.
Based on assumption, it’s time to start with feature engineering. What is feature engineering? Feature engineering is to create, combine, or delay new features based on the importance that could have to train a good model. So, I discovered that for example, based on the name, I can take the title — mister, miss, etc. So this could affect the model’s final quality. For example, we can see how this title affects survival. Then, you know, AI models cannot process text, so we need to transform the text or categorical columns into integer or what code encoders, which is quite the normal process.
After that, I have created new columns. So this will be my final data set based on the solution that I did in the feature engineering process. Now, I spend so much time creating this feature and researching, that I would like to share this feature with the AI community. So what should I do? I can push this data frame. I will not do that because the data is already there in the feature store. When the data is in the feature store, you can see that we have the data in features. And you can see the description of the feature that I already created. This is an enriched data set. And I can share this data set with any data scientist who wants the data.
Data sharing
So imagine the case that I am another data scientist. And I have a subset of data where I only have data in the passenger ID. How I can fill this subset of data with the features that the other data scientist discovered? This is using FIS. FIS is an open source library that we use as feature restore. So we set up the credential in order to connect with FIS. We indicate that this is the feature and the version that we want. And we select that the data frame that we want to fill is the subset of data that we already created.
So I execute this block. And you can see how based on the subset of data, I will be able to fill the training data set. In FIS, you have the offline feature store. This is useful for model training and the online feature restore that can be used for inference. Because in inference, you want to use the most recent data. So this will take a while. We have loaded data from BigQuery. As you can see here, we have the details, what is the project, what is the original dataset that we load in the previous block. And we can use this information together. As you can see now is executed. Based on the time standard, the passenger ID, we are able to fill the dataset with the other features that other data scientist had discovered. This helps us a lot in order to reduce the time that a model takes to read production, which sometimes is from weeks to months.
Now, I take the data from the feature store. I would like to create a model. In this example, I will not search for the best model. I would like to take random for a classifier — we are in a binary classification problem. I got the model. I would like to validate the data with the most popular metric in binary classification, which is accuracy if one scores precision recall.
Supposing that I’m happy with this model, now I want to deploy. What should I do? Next step is to save the artifacts. Artifacts are like the model files, etc. Now, at IKEA, we provide a standard way to create model class. You can use a cookie-cutter template, whatever you want, but at the end, a model class is quite simple. You need the init, you need the load, and you need the predict. In the load, you will load the artifacts, you can download from Google, you can do whatever you want. In the predict function, in this example, I will show how using only the passenger ID can generate a prediction using the future from the feature store. In a way that I will reduce a lot, the data that I need to share with the model, reducing the latency, which is a problem, especially in websites, etc. This is the code for I based on the passenger ID. I will take the features from the feature store, and I will generate some predictions.
This model class should be encapsulated in a Docker file. The Docker file is quite simple. We will be using Seldon Core. Seldon Core is a tool that enables you to transform a model from artifacts into a production-ready microservice. You can deploy in Docker, you can deploy in Kubernetes, you can deploy whatever you want. The good thing is that you will get a survivability by design. That means that you can get the metrics with the model, for example, using Prometheus, and visualizing graph in real time. I can test locally, for example, I can go to the folder. I can load this class, and I can test locally. Here, for example, using only the passenger ID, I was able to generate the probability of survival based only on the passenger ID. The modeling side will fill the features that we use for training.
Now, starting to build the Docker image, and for that, we are going to use Docker. In this case, the images are already built, but here are the comments. We need to build, push, and we can even run to test locally using Docker. In our case, we want to show capability of MLOps at IKEA. So we can provide the backend API, you are the technical guy, or you can deploy a model through the front end.
Deploying the model
In this case, I will be deploying the model through the backend, from which I indicate what is the Docker image, with the user, with the deploy, what is the prepackage server, what is the hardware that you need, the number of replicas, because remember that we are deploying Kubernetes, so we can scale whatever you want. You receive a high number of queries at any time, we can automatically scale and maintain the latency stable. So I indicate some metadata, for example, in this case, the input data is the passenger ID, even I can show some inputs example.
Now, I want to deploy. After I query the API, then this model will be deployed. So, I can check that this model will have this deployed. Now, you remember, my colleague, Harish, showed that we have the model space. In the model service, you can get all the input that we introduce when we are deploying the model, what is the passenger ID, what is the output format, even we can query the model in real time through the front end, and you will get all the predictions in real time. Also, at IKEA, service is very important. So we need to know what is happening with the model, who is using the model, what is the input that the model is receiving, what is the output for that input? Because we have a plan to incorporate this algorithm, data drift, in real time, concept drift, a bit of detection for problems.
So, this time that you query the model, you will see here an entry about who is using the model, what is the input that he is sending, what is the output by the model. And then you will get some graph about what is the total number or the number that this model is receiving, what is the success rate for any prediction, what is the latency. We have this for all the models that we have. You like, for example, and you want to share this model with your team, you can click on settings, and you can add user, and when you add this user, your colleague will be able to visualize this model in the model space. You can also change the model.
Also, I want to use this model in an application, how can I do it? Well, you can copy the prediction URL and then you can go to your code. You need to talk of course about security, you have to send your email, the person who is using the model, what is the guest on data. And as you can see, the model is able to generate predictions in real time and you will automatically get the observability that we showed before. So, that’s all from my side. Thank you very much. If you have any questions, we have time to discuss. Thank you.
Q&A
So, this is the total stack. So, Seldon is the package that is primarily doing the deployments, but then there is a lot of cloud native applications that we bring to make sure that the deployment is seamless.
MLflow is the experimentation platform, that’s a separate cluster. Argo CD is essentially doing the CI series pieces across. Do you have any of the scheduler you want here? Actually, yes, it is not all of that. We are not there for the scheduler for the GPU part. In fact, the GPUs even we just got the GPUs for the training power pieces. That scheduler is not there at the moment. That means, let’s say this is 100 GB of data. Go ahead and run the batch training on Friday end of the day. That scheduler part is still handled over the coding level, not on the platform yet, but that’s the piece which we are on the backlog for us to do the development.
I think it starts with the model size. I mean the LLMOps, I mean the observability is a bit more difficult. The model sizes are far higher than a regular model. I think that’s where the challenge begins. How can we use the same standardized process to package and deploy the model efficiently enough? And I can say that it’s not a problem that we’ve completely solved. When you look at the GenAI models, the LLMs, if you want to deploy like a 15GB, 16GB model, as compared to a standard model, it’s much, much bigger. So when the size becomes bigger, you need to deploy on a GPU, then that also becomes expensive and it also becomes challenging to manage it.
What was the other challenging aspect of the LLMs? Yeah, so LLMs is the challenge. I think that was the question. I mean, our point is, it’s with relation to the deployment. I think that’s the question he was asking. Deploying a normal model classification binary deep learning models. I mean we were using Seldon for doing the deployment, and it is working absolutely fine. The challenge arises when it is a LLM model, which is very huge in size even for a normal run to even take and do the deployment back into the Seldon. That’s one of the primary challenges we face with LLM models. Again, it’s just infrastructure for challenges, not on the platform level.
The further challenge is on his question regarding LLMOps, right? Right now, we are, like Karan mentioned, we have not solved the problem of LLMOps. MLOps platform is now just scaling more to adhere and adapt to the LLMOps of operating models. Right now, we are just deploying one LLM model at a time, just like trading a normal model. But the multi-modal orchestration between the LLMOps, right? That’s something which is still undiscovered or unresolved, which is what is an expectation for all the MLOps platforms to sub-sort supporting LLMOps as well. Right? So that’s the answer. Yeah.
Are there any other questions? How did your processes change since the introduction of these highly useful pre-trained models?
In a way, it is, I mean, it has kind of made us data scientists go into a detour. It’s the actual observation from our side. All the ChatGPT, when it all came in, everybody took a detour there, started using it, then started getting problems with the latency, right? Which has become one of the biggest hurdles as well. Now, we even started getting enterprise ChatGPT, it is OpenAI as an enterprise cloud as well. That solves part of the problem, but still what we think is we still get the top-level results from the model, not the distant results, which we wanted to get from the model. So that’s where we started to detour. I mean, we said, okay, that’s very good research. We did a real detour by using the pre-trained models, but like one size doesn’t fit. So we started getting into our own LLM model and started doing training internally to give very customized answer results, streamlined for our consumers. So in that way, we also don’t go into the view of black box. So we always will be able to explain why did we say this answer to a consumer, right? And also the auditability as well. So, that’s the detour we have taken now.
Open source models
We chose open source models. I think that’s more relevant to say, I mean, OpenAI and ChatGPT is a closed model. You can find you can do a lot of things with it, but it’s not open source. In an open source model, there is a lot more information available on how the model was developed, what it was trained on. So we need to look at how we essentially bring explainability into the picture, right? And that is required as something for the future with the AI Act, where models will be classified as very high risk, high risk, medium risk, low risk. And then we see that a lot of our models that are customer or marketing oriented will fall in the high-risk category. So, if we’re using let’s say a multi-model LLM setup to do an assisted shopping journey and then there’s something that’s gone wrong with the predictions, and let’s say the model starts hallucinating on stuff and that’s when we need to have control. That’s when we need to be able to justify what we’re using and why. And that’s why the approach right now is to look at open source models to have more control over what we send out to our customers.
So what are the certain models we use a lot? Essentially if you look at it with the multi-model setup, we use Llama, even one closed source like, for example, Bard. Again, I want to clarify, we are the platform, so we let different data scientists, arbitrary teams, use their own models. And we have seen capability in Llama and other models, but also it’s been used right out as well by the data scientists. They have their own autonomy.
Exactly, and apart from that reason, no matter what our users use, we should have a platform that delivers to their purpose. I think that is the motto, right? I mean you saw the persona piece in the beginning as well. There could be someone who’s a data analyst that wants to be able to deploy a model or there is an experienced data scientist that’s going to work with you know the latest model out of research. It’s more for us as a platform to provide for all those personas in the best ways possible to be able to deliver value.
What do you use to scale it? So we are a platform, and we use platforms that the company provides, so we have a team that provides a managed GK cluster, so we use them, and they’re responsible to make sure that our scaling requirements are met. So we work as teams, right? I mean, I don’t own the entire stack, I own one part of the stack, and I reuse the stack from many people.
And, just to add, the scaling also from this platform layer goes in three ways. I mean the scale out happens back in Google. Managed GK is a resource as well. And also we are trying to give what I would call unbiased autonomy for the consumers. And the customer gets to choose, right? Let me not put it in Google and spend something, let me scale it back into OpenShift back into the clusters, or let me get a dedicated cluster. So the consumer of the platform chooses whether I should go scale further back in Google or back on premises, and they also make the call based on the size of the model, on a number of influencing factors, it depends on the traffic, and, of course, cost in the end, right? So, based on all of those different parameters, they make the call of where they want to scale. But again, the straight answer to your point is we use Zmanaged to do the scaling of cluster parts.
All right, thank you so much for listening. Thank you.
Learn more
- How IKEA Retail Standardizes Docker Images for Efficient Machine Learning Model Deployment
- New to Docker? Get started.
- Get the latest release of Docker Desktop.
- Have questions? The Docker community is here to help.
Find a subscription that’s right for you
Contact an expert today to find the perfect balance of collaboration, security, and support with a Docker subscription.