DockerCon

Streamlining Distributed Graph-based LLM and AI Infrastructure with Docker Extension

Siwei Gu, Chief Evangelist, NebulaGraph

Recorded on November 6th, 2023
This DockerCon talk explores the fundamental concepts of graph, graph databases, and NebulaGraph. Learn about the integration of graph + Large Language Models (LLM) and how this combination enhances both existing LLM stacks and knowledge graph processes. Find out how leveraging Docker optimizes infrastructure, accelerates dev/production deployment, and enhances AI development efficiency.

Transcript

It’s a pleasure to have this chance to share with you our works around graph and large language modules and how we got the development environment boost with the Docker extension.

I’m the contributor of the open source graph database called NebulaGraph and also the author of graph plus AI-related projects like NebulaGraph DGL and NebulaGraph AI Suite. I’m also a top 10 contributor to the orchestrator project called Llama Index. So I enjoy building in public and sharing things around the language model and graph, so feel free to reach out to me.

Today, our topic will start with some background on what is graph, and why we need graph. I will give a brief introduction of our project called NebulaGraph, and then I will share some interesting exploration and our research on how graph can help the RAG paradigm of the language module we have working. Then, finally I will introduce our Docker extension.

So, what is graph? Why do we need to care about graph? The definition — the graph term in the sense of the graph theory — comes from the old question called the seven-bridge problem, where in the old city in Europe, we have rivers across the city that break the land into different pieces, and we have seven bridges out there.

So the question was, is there a way that you can travel all of the seven bridges but without repeating them and traverse each one only once. So the spoiler of the answer is no. But then people wanted to get this conclusion proven. So in an issue paper, they started to abstract this into a mathematical problem. So the way they are doing this abstraction was to map the land into small dots — or what we call vertex — and the bridges will be abstracted as a line connecting them or the edge. So in terms of graph theory, a graph is just a set of vertexes and edges connecting those vertexes.

Table of Contents

    Knowledge graph

    Actually, graph is already underneath, enabling a lot of things in our real world. One of the first things I want to share is called knowledge graph. Knowledge graph is a term invented by Google when they started to handle a certain type of searching. Like you can search keywords, and that was fine with the traditional search way. You are setting the inverted index just like Elasticsearch. But when people are searching things like the age of a celebrity, right? How do we do that with the inverted index? It’s not doable. So, the way Google was trying to fix that was to set up a system called a knowledge graph.

    The idea comes from an older term called semantic network. Literally, they just put the entities of the knowledge into the nodes in the graph, and the edges are just relationships between them.

    Graph use cases

    And another use case we will offer more sense of how graph works. This is a typical use case in a recommendation system. So, think of a toy version of Netflix, and we are having the system recommend movies to our new users. So imagine this user has already watched a couple of movies and rated some of them high, and we can go from those nodes, reversely on a similar edge. So we will reach out to other users that share a similar interest in those similar movies. And then we will go from there to reach out to other nodes also rated high by those candidates that share similar tastes. But those movies are not directly connected to our user. So that’s a simple recommendation system.

    In another case, in the real world, our recommendation system would be a complex system, and many times it’s a black box. But the graph here can also help because we can set up a graph, and we can get the user and the recommendation candidates, and we can do a fine pass. This is a typical graph matching pattern. In that case, you will have generated reasoning of the recommendation results. So the graph can also enable interpretable recommendation.

    We also see graphs in social networks. Imagine in LinkedIn, you have your second-degree friends. You can have some recommended new friends. So it could be that you have shared many mutual friends, but you are not connected yet. And you can have some graph insights from the social network graph, and come to the cloud native or visualization or SRE domain, so we can put all of the things in one graph. In that way, we can have a lot of interesting benefits, like you can propagate your states, like you have one hypervisor, and it has a security issue or a disk issue, a high load, and you can propagate this alarm or state or security concern across the whole graph immediately, and you can also do some graph-based algorithm.

    There are some algorithms to help to detect the clustering in your data. So some of them sit closely in some sense of graph algorithm, and you can pick some of the notes in the whole graph to treat them as more important than others. So in case that node comes with an alarm, you can treat it with other severity. So that’s only one case of how graph can help in this domain. We can also set up something like a data lineage — to track your lineage of all your data assets, your column, your data workflow, so they can all be connected and inspected in a graph view in real-time fashion.

    This is another use case in fraud detection or risk management. So either you are running ecommerce bank or a content website. There are patterns of the fraud, and if you don’t manage — you don’t control them — there could be a mess. So the pattern of fraud can be normally expressed by a certain type of graph pattern. For example, this device or this IP address was connected to a bunch of huge numbers of other events or created posts by different users. So this pattern can be recognized as a high-risk situation. So also you can have one node that’s marked as a black one, and when you have a new post or new transaction on ecommerce. So maybe in three or four hops, they are connected. You have a chance to detect them and prevent this transaction in real-time. So this is a very common graph use case.

    And then we come to manufacturing, you know, the non-digital real cases. This is the example of a car manufacturing supply chain case. You can put the features, the module, the components, the suppliers, everything, in one group in one graph. So the insights can be packaged in a way that we couldn’t imagine. It’s a very interesting way to get insights and set up your features of service providing that we couldn’t imagine without the graph. There are actually a lot of other use cases, but in the final ones I want to put is this. So this is a use case that the graph can help when we are setting up a language module-based application. I would dive into some details on how this could help interestingly later, but before that I want to quickly continue another background on the graph database.

    So maybe many of us are already familiar with that, but deciding whether to bring yet another database in our system can be tricky. And here’s maybe some of the most concerning reasons. One reason is the graph database can enable the query on the graph sense of data in the graph semantic. What does that mean? For example, if as the pattern I mentioned in the recommendation system, you want to find the path to have the reasoning of why you recommended this to the user. So this is a typical easy version of a graph query in the graph world, but it’s relatively hard in your tabular database. Ideally you can put things even as a graph data in tabular way in the RDBMS, but when you want to query them it’s really hard. So the downside of query is just a fine path between two nodes in one graph in the arbitrary type of edge types. So this is just a beginning version of a graph query pattern.

    So another reason is there’s a fun fact that RDBMS doesn’t perform well in the relationship — the graph-wise relationship — traversal. So I will demonstrate why that is. So imagine this is RDBMS, so when we are doing a typical graph traverse from one point to another. So we are just the snowbrothers who jump from one stage to another. So this is a join, and when we join to another table we want to run all the way from one to another point to find your next related connective node.

    This doesn’t scale because it’s highly related to the data scale, because the data are sorted in a way based on your key. So when you’re one doing the graph traverse that will be extremely expensive. So we can do some you know hacks to mitigate it. So we can like use some magic to run faster or you know throw the snowballs farther, but that’s a mitigation not a solution. So your data scales growth in a higher range. Sometimes it’s a difference between whether you can query patterns in like half a second versus half an hour. So that’s not accepted in many graph-wise use cases.

    So think of a graph database as the green bottle of magic, so that you can literally fly from one point to another. So think of that’s closely to all one effort when you’re doing one hub of the graph traversal. And that basically doesn’t matter how your data scales, and it’s relatively cheap in one hub of graph query. So in real-world graph query that could be multiple hubs and that could make a lot of differences. So this is why we need a graph database. I hope that makes sense. So quickly — marketing time — why do we need yet another project? So why do we need NebulaGraph?

    NebulaGraph

    NebulaGraph was designed day zero as a distributed architecture, so it scales perfectly. And so, this is a picture showing a small shape in the river. When we want to move it back, we just get one or two people to push it back. If you recall, like two years ago, we had a container ship stuck in the canal, blocking the world’s transportation for months. So some problems can look similar, but the solution could be totally different. NebulaGraph was built for the hyperscale, to handle hundreds of billions of nodes and trillions of edges, and it’s distributed, and it’s designed to be collaborative and open source from day zero.

    Retrieval augmented generation

    Okay, we come back to our topic of how graph can help in the language module. So the most commonly used pattern that we want to set up, a language module based application, was called in content learning or RAG paradigm. So RAG refers to retrieval augmented generation. So the process of a RAG is just to do the retrieval on your private data before you are calling the generation or calling the language module to synthesize your solutions or answers or your next hop of your React pipeline. So the idea is that language module forever changed how we want to set up a smart application.

    Previously, we had to train a module to enable some kind of relatively smarter automation tasks to do so, but with a language module we can just use the prompts. But in the real world sometimes we’re not just preparing the prompts to let the language module help do something, we have to also provide the contents — the private data — of the domain knowledge. In practice, the way that RAG is doing it’s just that we prepare or index our private data when we are doing the indexing phase. So during the query time, we send the task and compare it with our index data to retrieve the data that we needed in certain queries or certain questions. Then the language module has the capability of content learning to sense that’s our question. And in practice, the way that we are indexing things — or the narrow definition or the most common use — the fundamental method is called split and embedding.

    Embedding is just a machine learning way to mapping the real world things into a vector sense. In this vector space every node or every vector represents one entity and how close they are represents how semantically close, how similar, they are. With this concept, we can split our private data into small chunks and then create the embedding vectors for all of them. So when we are doing the query time, we can create the vector expression, a preposition of the task in the same embedding module so that we can semantically search the related data or chunks of data to enable that. So this is how the RAG paradigm works, but there are challenges, because for now it’s really easy. We can use like four or five lines of code to set up your query robot with your own data with just the language module and maybe launching our Llama index. It’s perfect for a fancy demo, but when we push further to have a production-ready requirement, it’s really hard. We have a lot of things to do.

    One of the reasons for that comes from the nature of the way we are doing the retrieval. So when we are doing the split of the data, this is the data that we want to refer to, so we need to split them into small chunks. But this split has a strong assumption of what’s the size of your trunk. Like imagine we are doing a QA system based on a book about Steve Jobs, and we put page of the data in one trunk. We create the summary, so underneath is just embedding of each page when we are asking about Steve Jobs and Apple, so we create the embedding of this question as well. So we are searching in the vector space on Steve and Apple.

    So imagine the things about Apple lies on page one and page two. There is information on page one and page two, but there could be other information pieces spread in other pages. For example, here is one sentence which is about Steve and Apple — very important, but there’s only one line of information in this page. In this case, the split doesn’t work, so you can fail to retrieve the important piece of knowledge when it’s fine grained, spread in other pages. By nature, a challenge of this approach is that you actually break the interrelationship, the interactions, between the information as a whole. Because our knowledge is somewhat like a tree or graph, they have a connection — not a linear fashion — so we just linearly split them and structure them. So this by nature can lose the information of this global context. So that’s one of the sources we contribute to the hallucination.

    Another challenge here is about the embedding itself. Normally, we are creating the vector representations with general-purpose embedding. The general-purpose embedding works in most cases, but it’s not aware of our domain knowledge. Sometimes we have terms or knowledge with every word you can recognize, but if you’re not a domain expert, you are not aware of what they actually mean. The problem here is the embedding was based on the literal — the common sense — way of semantic placing of those information in the space.

    So on the left side, this is an example we encounter in the real world, that you want to set up a QA chat board related to some ecommerce use cases. Here people are questioning about the insulated cup, but so in the embedding space, the system considers the insulated greenhouse related, and we can’t understand why they’re related. So technical speaking, the two keywords have a lot in common, and they even look close, right? But as a human we know when we are questioning about one of them, we don’t care about the other. So that’s a typical situation where embedding can cause hallucination in a retrieval phase. The way out here is we can create, we can fine tune, and make our embedding more aware of our real-world problem. But sometimes the embedding fine tune is, you know, complex and costly, so we can somewhat mitigate it based on the knowledge graph.

    When we are setting up the language module-based applications, we are actually dealing with knowledge. So a knowledge graph can help in some way because by nature it’s a refined version and has fine-grained segmentation of your knowledge sources. Also it pursues the interactions between the entities of the nodes. So that by nature can help us mitigate the problem introduced by the split and also the interconnections in your knowledge graph. If you are properly set up, it can pursue/persist the domain knowledge. I will give you an example demo later as well.

    So here is how we want to mitigate those problems, and we are still in the paradigm of the RAG, the in-content learning. When we are doing indexing we not only create embeddings and vector stores of our data, we also extract data from the raw unstructured data instead of into a knowledge graph. And when we are doing the query, the retrieval phase, we not only find the huge trunk of knowledge with vector search, we also extract the key entities and relationships of your tests or question. And we find them towards our knowledge graph and that’s a subgraph in a whole graph, and that could be your actual contents to help you together generate the final results.

    In this demo, I set up a data source from a Wikipedia page. It’s related to the Guardians of the Galaxy, Volume 3 — my favorite movie this year — so we are setting up a knowledge graph towards them and we ask things about maybe Rocket or Lylla. And underneath, so this is part of the retrieved knowledge sentence, so underneath is a subgraph. We will have a visual aid version later to show you more of an idea, and then we combine all that information to learn how we want to answer this question.

    This is the simple version of the graph, a RAG and, yes, and but in real-world, we consider purely graph RAG doesn’t work well because knowledge graph is not the automated way, or it’s not the only way. You want persistent knowledge because the density of knowledge is not always that high. In the unstructured data in the large trunk of data, there are some details that we care about as well. So the perfect way to do that is to combine the two, so I will demonstrate how they perform later. This is how the graph RAG works in a visualized way. So when we want to retrieve the related thing from the graph, so we will get a subgraph something like this. Then we sense a problem, since the answer is based on that.

    Another demo that I want to show you is how we try to evaluate how this worked together with the vector. So this is an example. We set up a data source from Wikipedia, about a NASA science problem, and we have three types of retrieval and the generation. The first column is purely done from the vector search the embedding way, and the answer is very rich and correct, and the third column is the pure graph. As I mentioned, the information was not that rich, but it’s also correct but when we combine the two, we will find interesting findings here. So think of the question about Steve and Apple. The knowledge about Apple can be spread in a fine-grained way. From this example, we can see the strong line is the knowledge piece does not retrieve the by vector but it comes from the graph. So when we combine two we will have a better performance when we are pursuing the qualities.

    The final example is about hallucination by nature comes from the embedding. So this is the similar asset asking things about the Guardians of the Galaxy, and we tried to mock up a question that does exist in the movie but looks as if it did. And we are asking this long question to the vector with the retrieval from the vector DB and it comes with a whole hallucination. But when we do so towards the knowledge graph, it comes strictly to let us know there is no related thing about this question. So eventually, we set up a cross-check mechanism to do retrieval in parallel of the two, so when in certain questions only one of them has a retrieval result and the other is empty, we will enable our double-check process from both. So in that case we will mitigate such a hallucination.

    Everything I mentioned above can be reproduced locally, and I upstream everything that we can do with this approach to the open source community. So for now with Llama index, you can do the graph RAG just with three lines of code. The first line is about the indexing time, so you can with one line code so you can create a knowledge graph and vector embedding from a certain type of documentation, or they have a bunch of supported data formats. The second line is just to create a query engine so on top of that you can do the graph RAG. And this is the line that you want to actually ask a question, yeah.

    Docker extension

    The final topic was about Docker. So the database itself is a distributed project, you have multiple components in all and that’s only for the graph query. So we have our other projects, like the ASB, which is based on the Spark GraphX. It’s yet another distributed system. So we want to make everything up and ready for our data scientists. Or in the graph case, a lot of users of the graph database aren’t even computer scientists, they could be risk experts or you know supply chain experts, but they still want to use graph database. So it’s really hard for them to use the command line or compose on their Windows on macOS desktop.

    So that’s where the Docker extension helps. I think Docker extension is just another step that the Docker team was doing previously. They just put C group, namespace, everything in a really good abstraction to enable us to enjoy all those technologies in an elegant way. But the Docker extension is also something you can put everything in a server side, in a Linux side, in a distributed fashion, just in a graphic, non-command live fashion.

    We can do that in any desktop operating system, in the Docker desktop, so this is all you need. And for the non-tech guy, this is just yet another software. So in the extension marketplace, you can just search Nebula, and you can install the NebulaGraph extension. So just with one click, you can have really a lot of different components like you know the graphic UI and the graph query with everything set up already. So with that, in just five minutes our community users, no matter what their background, they can leverage all the benefits, all the magic, just so very easy.

    I am also the author of this extension, so I actually made an interesting hack on this to enable optionally a workload to be installed. If you are interested in that, check out this repository of the extension code as well. I think that’s everything I want to share today. So graph is something related to vertex and edges, and the graph query can be some type of graph-wise pattern matching, and that requires yet another database to enable your real-world problem. NebulaGraph is something excellent in graph data at scale, and the graph can help language module patterns when we’re doing the RAG. With the Docker extension, we can try everything just in five minutes. Actually, I put some other information on this slide so if you’re interested, feel free to talk to me about graph and how graph can help language modules.

    Do we have questions? Please.

    So one of the doubts that I have is that it was able to answer simple queries and tell me about Sebastian, some person. But when it comes to multi-op queries, it was like not that optimal. So how do you go about dealing with that in the future?

    For now, actually we’re just at the beginning point. So, like are you using the graph RAG or the text decipher? Graph RAG. So for now the implementation is just a low-hanging fruit, and we have still a lot of things to improve because we have strong assumptions on the graph schema. We have a lot to improve, and I’m happy to talk to you in detail later. Thank you.

    Learn more

    This article contains the YouTube transcript of a presentation from DockerCon 2023. “Streamlining Distributed Graph-based LLM and AI Infrastructure with Docker Extension” was presented by Siwei Gu, Chief Evangelist, NebulaGraph.

    Find a subscription that’s right for you

    Contact an expert today to find the perfect balance of collaboration, security, and support with a Docker subscription.