DockerCon

How to Quickly Build LangChain-based, Database-Backed GenAI Applications within Docker

Harrison Chase, Founder and CEO, LangChain

Michael Hunger, Head of Product Innovation & Developer Strategy, Neo4j

Recorded on October 20th, 2023
This presentation describes challenges and benefits of using large language models and describes new technology to help developers quickly set up and build LangChain-based, database-backed GenAI applications within Docker.

Transcript

Thank you for coming to this workshop where we’re going to be talking about how to build GenAI applications with the new GenAI stack that we put together. My name’s Harrison Chase, I’m the CEO and co-founder of LangChain. I’m Michael Hunger, head of product innovation at Neo4j. You might have seen us this morning in the keynote and what we want to do now is go a little bit deeper into the whys and hows, what’s behind the scenes, what the code actually looks like, and how you can get started.

This whole GenAI stack initiative has four partners: LangChain and Neo4j, all orchestrated by Docker and Ollama for the local LLMs. To give you some background, let’s talk about large language models (LLMs) and GenAI in general. Who has not yet used an LLM? One? Okay, so probably most of you have used ChatGPT in some form or another.

Table of Contents

    Foundation models

    LLMs are a part of the new type of foundation models that are used to have been trained on a large amount of data to basically predict information. Traditionally, machine learning models were in the realm of very smart data scientists who trained models for very specific purposes. So, for instance, you had a legal model to analyze legal text, or you had an image recognition model to recognize certain parts or certain types of objects in an image. And this meant you always had to fine-tune or train models for very specific purposes, and you always had to put in a lot of specific effort by data scientists to make this happen.

    With foundation models, all this shifted because they basically say, okay, let’s just train models on a massive amount of data. And then they use a prediction mechanism basically to say, okay, based on this data, it’s kind of almost like world knowledge that these models learned. How can the inputs of the users, either as text or images or videos be used, for instance, to generate information, detect things, extract information, and all the other things. And the really cool thing about this is that these models are now available to developers. You don’t need to be a data scientist anymore. We’ll talk a little bit about this later.

    And then also in 2017, the transformer architecture from Google, allowed us to train these kinds of models. And the training volume has gone up quite a bit. So these large models have been now trained on trillions of tokens of information. And you have models from every kind of provider, from Google, from OpenAI, from Amazon, Anthropic, and Meta, of course, with Llama too. So there’s a lot of models out there, right? And what’s kind of interesting from a certain training size, suddenly these models started to show emergent behavior. Up to a certain level, you kind of could expect what they would produce, but from a certain training size, there was like a threshold where suddenly it felt like more, right? There was almost some level of understanding or some level of pseudo-human interactions possible, which is what we now see today in GPT 3 and 4, for instance.

    What’s really interesting as well is that these models now can hold conversations. They can keep previous contexts. They can work with larger context windows and things like that, which allows you to build much more natural working applications on top of them. So why are they so hot right now?

    You can do a lot of stuff with LLMs. It’s more like a general-purpose tool, like a calculator or computer in some way, right? So that doesn’t mean that an LLM will replace you, but it makes you fast or more productive or smarter in some way as well. You can automate a lot of data-retrieval tasks with LLMs. You don’t need to go and train a specific model to extract data information from things. You can improve the end user experiences both inside of your organization, but also outside. For instance, if you have databases, in the past either people had to get apps written or write SQL or whatever query language themselves to access the data. Now you can open this up to more people asking just natural language questions of your data, which is really, really nice.

    And, of course, we are kind of all drawn by information. And LLMs can help us or general foundation models can help us to make more sense or quickly summarize this information. Of course, there’s a trap, right? On one side, LLMs are used to generate information and on the other side, it’s used to summarize information again, so it could also just send the condensed version of the information. So there’s also a lot of traps in the general space. And especially for us as developers, co-generation is really cool. So many of you might have used Copilot, but there are also many other tools, like Duet AI. That’s SQL generation from natural language or information generation from natural language. There’s a lot of stuff that we as developers can do better. So I saw a really good article that said basically it enables me to do things that usually would have taken five days and takes it down to two to three hours to build new things that have not been used before.

    Challenges

    There’s some challenges with GenAI, right? One is kind of the hallucination, parroting, right? It just makes stuff up instead of saying, I don’t know. The reinforcement learning with human feedback basically made these models more like, “Oh, I always have to please the human,” right? So instead of saying, I don’t know, they very convincingly generate information, links, and other things, even if it’s all fake. That’s a big problem, because you don’t want to put such a system into your application or into your enterprise. You want to be able to trust these models.

    And then training data cut-off is a big problem. Because all these trained models go to a certain day, and then what happens with all the information that comes later? Or how do you access your own private information that sits in your own databases with these models? It’s currently not possible that you have some plugins for accessing the web and other things, but what happens with your databases and such, right? And then you have to hold security compliance, PIA, and so on.

    They’re great at language understanding, and that’s what we kind of use them for. So we basically just use their language skills, and we completely ignore all the training data that they have been used on. So hallucinations again, very confident, making stuff up. It’s sometimes like me talking about stuff, right? Like imposter syndrome. I already mentioned most of these. They’re also all the ethical aspects. Where did that data come from? Have people been compensated for, for instance, authors and artists for using their data for training? How do people or large companies model label their data? There’s the whole labor and exploitation aspect. So there’s a lot of stuff. Prompt injection is a big topic that’s totally underserved.

    There are lots of challenges that exist, as such that need to be addressed kind of one by one, but in general, I think in the end, it will be more beneficial than negative. So I’m not on the negative side.

    With all these challenges with LLM, how can we make it better? We have some options. You can either take an existing model and fine tune it, but it’s a lot of effort. And oftentimes the outputs and results are not, at least today, not there yet, what we want to take in a system model and fine-tune it. You can provide a few examples when you talk to the LLM. But then you basically have to hard-code these examples, which is kind of not really helpful. And then there’s grounding or retrieval augmentation, which we’ll go into, which is basically you provide the LLM with the information that it should use to answer the user’s question. And you just use the language skills of summarization to get there. But there are really good opportunities for developers now.

    Interactions

    One of the most exciting things about this world that we live in now is that you don’t have to be a data scientist to interact with these really powerful machine learning models. A lot of the main ones that people are using are just behind APIs. So OpenAI is behind an API, Anthropic is behind an API, and so you can interact with them super easily. And then there’s also really exciting packages like Ollama, which we’ll be using, which are locally hosted models, but they’re bundled up super nicely. You no longer have to train the model; you just have to run it and serve it. And Ollama makes that really easy. So we’re seeing this kind of democratization of who can use these large language models.

    There’s work to be done as well. It’s not like it’s a finished project, where there’s nothing left to be done, and we’re all just going to sail off into the sunset. There’s a lot of work that needs to be done to use these language models. So prompting is an entirely new kind of discipline that’s emerged. These language models generally take text as input and output text. And the output text is often either a response to a user or something that says how to do something downstream. And so it’s really, really important to be able to get the output text that you want, that yields the best result. And the way to do that is through prompting, where you carefully construct the string that you’re going to pass into the language model.

    Other things besides the prompt also affect the language model. These models have things like temperature, which affect how random they respond. And if you want a really deterministic output or pretty close to a deterministic output, you can set the temperature really low. If you want a more creative and variable response, you set it really high. And there’s no one that’s right all the time. They’re useful for different applications. So some of the creative chat bots that we see out there probably have a pretty high temperature. Some of the more personal assistants that really want to do particular things really well have a low temperature. And so there’s a lot of techniques and triggers that developers can learn and become experts in in order to best use these models.

    There’s also all of the kind of like using the output of the language model. And that’s where LangChain comes into play a little bit. The idea is the output may be a text, but it often has more structure than just a text. And so you want to make sure that it’s structured in the right way so that you can parse it, and process it, and maybe use it as input to an API or to run a SQL command, or things like that. So there’s a lot of work to be done in hooking up these language models to other sources of data and computation.

    I think as we talked about before, there’s a wide variety of foundation models available. You’ve got the major cloud providers all jumping in. You’ve got local models like Olama, which anyone can run on their own computer. And then there’s also AI-native startups like OpenAI and Anthropic, which are, like, at the base layer of a lot of the applications. One level up, you’ve got orchestration frameworks like LangChain, LlamaIndex, and HumanLoop, which help orchestrate a lot of the language models and connect them to other sources of data and computation. And we’ll show more what that looks like in the RAG app.

    You’ve then got infrastructure which kind of encompasses all of this. There’s deployment, monitoring, eval, tooling. A lot of this is similar to software engineering, but there’s also some differences in LLM-specific tooling that’s starting to come out. LangSmith, which we showed is one example of that, but there’s a lot others that are really focused on monitoring the API responses. Some are more focused on the ML side of things of helping you drill down into what people are actually using your chatbot for. There’s a lot of new tooling emerging in this infrastructure category.

    And the last thing that’s proven really important, as Michael said, in order to get really good responses, you generally want to have some sort of database that you’re connected to and is powering the actual facts of the application. So vector search and vector databases have emerged as this new form that’s proven really powerful to connect to language models, but there’s also all the existing databases like SQL and graph databases, which you can still use to provide this context.

    So looking at this in a little bit more broken-down picture, you can see this is from A16 Z; it’s from a few months ago. That’s why it says partial next to it, because the space changes really, really fast. You can see that there’s this orchestration layer right in the middle and that’s kind of where LangChain sits. But then there’s all these other pieces that are all around it. You’ve got the playground, you’ve got APIs, and plugins. At the top level, you have kind of like the contextual data, so data pipelines and betting models, vector databases. And if we were to extend that today, we’d also put other databases, like graph databases and SQL there. Then you’ve got all the infrastructure in the bottom with the LLM cache, logging, LLM Ops, validation. And you can even zoom into that more. And then you finally get to the app hosting.

    A lot of what we’re showing today looks like a pretty simple app at the surface, but we’ll show the code, and we’ll show a lot of these different components behind the scenes. Part of the value proposition of the GenAI stack is to simplify this a bit and give you a really easy way to get started with it.

    LangChain

    We’ve talked about LangChain a little bit so far. Just to dive into that a little bit more, LangChain is a GenAI orchestration framework. It connects to all these different modules that you see up here. So tools, example selectors, prompts, vector stores, document loaders, open parser, text, splitters, models, probably other ones as well that we’re not mentioning here. All of these are different modules in LangChain. And applications generally consist of four or five or six different modules constructed in various ways. So we provide kind of a standard interface for all those modules.

    We have a standard interface for I think over 50 different LLM providers. And then we also have implementations of some of those modules that we do ourselves. So text splitters, for example, are one implementation that we have that’s in LangChain that’s kind of like native to LangChain. The other value prop of LangChain are prebuilt chains and agents. As I said, a lot of these applications are these modules constructed in particular ways. We have a lot of off the shelf chains and agents that are basically prebuilt templates to accomplish a variety of tasks. And we’ll be using one of them today to do the retrieval, QA with sources chain, and we can dive into what exactly is going on behind the scenes there. I’m going to hand it back to Michael to talk more about RAG.

    RAG

    RAG is all the rage today. As I mentioned, the backing or feeding information from databases or data sources to language models to help you. This is kind of currently the best way of approaching the hallucination and knowledge kind of thing. RAG is an acronym that stands for retrieval augmented generation. There was a paper, I don’t know, two years ago or a year ago, something that, or even not six months ago, for me, where they kind of explored this, and this really stuck because people saw the value of it. And basically you ignore LLM training data, and you just use language skills. You take the user question. You turn the user question either into a vector embedding or into a database query, then send it to your database or data source. Extract the relevant information for this question — not all the information in your database because that’s too much, of course, right? But the relevant information for this question. And then you send the question with this information to the LLM to produce the answer for the person. So that’s kind of the approach that you also see here in this pattern.

    Sometimes you can even decide, actually do I need to get to the database to get some information? For instance, if you just have a creative task, like, you know, generate me some ideas for my next project or something like that. For creative tasks, you can also go directly to the LLM, but for everything where you actually want to access your internal data, then you would go to the database and do this.

    So, the GenAI stack: As we showed this morning as well in the keynote, there are not just the components that we talked about, but also a number of example applications already part of the stack. We’re going to add more to it, and we, of course, are interested in your feedback. So if you have ideas or questions, feel free to open GitHub issues or send pull requests to the repository as well.

    The first app that we have in there is the knowledge graph import that basically fetches data from an API, creates the data database, generates the embedding source of embedding in the vector index. And so makes it available for the other apps to use that.

    The second one that we showed this morning was the chatbot that allows you to either do direct LLM interactions or use RAG with the database. We also have an interactive app that says, hey, if we didn’t have the information in our knowledge base to answer this question, then can you pick basically the top or the best rated questions in our knowledge base to generate a new ticket for an internal support team to answer this question. So it would take all the input from the user question plus the tone and style and type of the best rated or highest rated questions in our knowledge base and then generate a new question in the same tone and style as such.

    Then, the fourth app that we have in there is basically chat to your PDF. You upload a PDF, it’s turned into chunks, into vectors, and then you can use our chain and the UI and stream that to chat to the PDF or ask it questions about this PDF. This is also a common example of how you make documents accessible that are not databases as such. And, again, building blocks for the GenAI stack as we mentioned several times already. And it’s all pulled together by Docker Compose.

    There are brand-new base images for Ollama, especially also Linux working out of the box. The LangChain-based image is both of official images as well, and then the Neo4j official image which has been around for quite some time are all pulled together by Docker Compose set up, which basically orchestrates and configures all the environment variables, such as which model to use, what is the length, connection information, for instance, or what kind of internal information do I also want to put it in.

    So if you look at this, the flow in the app, this is kind of what we’ve talked about in the containers, right? So you would basically take a user question in the Streamlit app — we’re using Streamlit as a UI framework today because for Python developers, that’s the best way or the quickest way to get to prototypical UIs. You don’t have to have an API and JavaScript front end, but you can basically just use Python directly to run your application. So you take the user question, you go to the embedding model to generate a vector embedding for the user question. For instance, you can use sentence transformers but also Llama 2, or others. You get the embedding, you go to the data vector search in the database, you do the vector search and then because it’s a graph database, you take the top K elements from the vector search and expand the context.

    So what else is related to this question, right? What are the top rated answers, which other questions are related to this through tags and other things as well? Then you take this information and send it to the chat model — which is a different model than the embedding model — through LangChain and get the answer back, which you then render in the app as such. Let’s go into that and show the demo, I’ll start and then Harrison will look at the second part.

    Demo

    As we showed you this morning, you can basically run a “docker compose up”. You can also look at the compose file, while it’s doing things here. So the compose file has a bunch of components in here. We have the LLM image for the local Ollama, managing the local LLMs. Different pull model image, which pulls basically the image from the Ollama registry. So, depending on which model you want to use, you would pull this model. These models can be pretty big, so these small models like LLM are 27 billion, which is 4 gigabytes, the 13 billion one is 10 gigabytes, and the 70 billion one is 30 gigabytes or something like that. That’s a lot of data, but Olama also holds this locally, so you don’t have to reload it. So if you pull it once, then it’s on your machine, and you can run this. Then database Neo4j has checked here, and then we have our different apps. So we have the loader, we have the bot, we have the PDF bot, and that’s basically our images that are exposing different ports on this machine. And the configuration then passes in either information for local images or you can also put in your open AI key and then you can use GPT3 or GPT4 as models as well.

    Meanwhile our “docker compose down” was successful, and if I start, this just basically pulls up the LLM model, the database, and the apps as well. While it’s doing this, I can show you in the UI. Basically we have our loader here that should be running right now, which basically means you can pick any Stack Overflow tech that you’re interested in.

    For instance, imagine we want to add Docker questions to our database. It fetches the last end pages of questions — all the questions, all the answers, all the users, all the tags — and embeds this into the vector embeddings and then fetches that. So this should take roughly 10 to 20 seconds to fetch the last 100 questions on Stack Overflow about Docker. So if you also answered the question in the last few days, then your question will probably be in here. So it adds it to the graph database. So you see the import was now successful. You can jump here directly with the link to the database, and here you see this data model with questions, answers, users, and tags. And if you go to the database, then we can explore this data, navigate through it. You will see in purple the questions, the blue ones are the answers, orange the tags, and then I can say I want to expand, for instance, this tag, and then I see all these other questions as well in here. I create a database and so on, and all of them also have vector embeddings, so when I basically look at one of these questions, then you see here you have the question body and here you have the embedding data directly attached to your entity, which is then also stored any vector index for the search.

    And so that’s the database as such and how you can populate it. And it uses just a Stack Overflow API call to fetch the data for the Stack Overflow. But you can imagine this data can come from anywhere, from any API, from any data source, you name it. And then the bot is in the first mode when RAG is disabled. It talks to the LLM — whichever LLM you have basically used directly — and then you can say, for instance, how do I use Docker Scout or something like that. And then I choose the information, and it basically gives you something, but I don’t know. I think it’s Llama 2. It might actually have information about this. But let’s see, if you use the question that we had this morning, how do I summarize PDFs using LangChain? Then it says basically I don’t know anything about LangChain, because the knowledge cut off was before LangChain was released last November, right? Yeah, it thinks it’s blockchain. Oh yeah. So, then you can basically compare this with the same question going to the RAG store. So it basically does what I explained before. It takes your question, turns it into a vector, goes to the vector index, searches the top K documents or elements in the graph database, and then goes from these questions to what are the most highly rated or accepted answers, and passes those questions plus these answers to the LLM to answer our question here.

    There’s a lot of stuff happening behind the scenes, and we’ll look into the code as well. And so this is actually taken from the Stack Overflow articles that are in the database, and one really nice aspect of the RAG pattern is also that you get basically referenceable and verifiable sources back from your answers. Right. So it’s not just that it generates an answer, but if you pass into the LLM links or URLs or other kinds of sources that your text chunks or your information are basically verified by, then you can tell the LLM and prompt it use that as the regenerated source links when it outputs or summarizes the answer. So you can then use those to go into the details. Then I can just click on one of these links and get my information here.

    And the last application is, as I mentioned, a PDF chatbot. If I’m not happy with the answer from the LLM, then I can say generate a draft ticket for me — that’s the other app that I mentioned before which basically takes the top-rated information from the database or the top-rated questions from the database and generates an answer in the style of that question as well.

    Here we basically uploaded Einstein’s patents and inventions, and we can say “list all inventions” in the file. Then it basically takes these text chunks again that were extracted out of the pdf and summarizes and fetches the most relevant information, and then returns the answer. So that’s kind of the outside, so all these apps are streamed apps. We’re going to add more language and apps that have APIs and a JavaScript front end as well. So this is only like the starting point. And you see basically from the pdf that Einstein decided that a blouse was a practical invention rather than a scientific one, which is also in the pdf as well.

    If you want to look into the code for all of these, that is probably also something that Harrison can talk about a little bit more. You have basically functions that, for instance, only use the LLM, which is this one that has a prompt that says you’re in helpful assistance to answer these questions, and then uses prompt templates to fill in the human questions and the system message questions. And then it uses a streaming output. Like ChatGPT, it generates the answer as a stream. So that’s why you have this callback here. Then it basically invokes the chain with user input, and the callbacks, and then starts generating the answers.

    That’s basically all the code that you need to do an LLM-only chat. If you want to do a direct one, then we have a more involved piece of code here, the prompts are a little bit more involved. We kind of explain what you’re seeing and that you have sources and the links, and then here in this section will be the summaries from the database query. And the user question gets passed in into the user template, as such. So it also says each answer should also have a link as well. And we also say if you don’t know, then don’t try to answer; just say that you don’t know.

    Then this is taken into a system and human templates chat prompt, and then we see the vector and knowledge graph integration, they will basically just say from the LangChain, the vector store package, or part. You can pull in the Neo4j vector store. You can also pull in other vector stores if you want to, and say basically this is my vector index, this is the field to look for, and then you can actually provide this graph, which then goes to the question and fetches related information to pass back to the LLM. Then you combine this all within the retrieval QA chain that you configured with the vectors as retriever. I want to have the top two elements. You can increase this, but then the amount of text that gets passed to the element gets bigger, and there are some effects of that. One is that it becomes more probabilistic because it starts, for instance, to ignore stuff in the middle of the prompt, and things like that. So you know, two to five depending on the volume of text that you have is usually good, and then this is your QA chain that is used to answer the question.

    So that’s kind of the code behind the scenes, and Harrison will look a little bit into how we can figure out what’s actually happening while this code is running.

    Yeah, absolutely. So the last thing that Michael showed was this retrieval QA with sources chain, which is a class in the chain that contains a lot of logic for doing this exact RAG-based application. That’s great because it means it’s easy to get started with just that class, but that’s a little bit unideal because that class actually contains a decent amount of logic in it. And that’s one of the actually simpler classes in LangChain as well. Some of the more complex ones have even more logic inside them.

    LangSmith

    As you’re building your application, it’s great to get started with a few lines of code, but it’s really important to understand what’s going on under the hood so that you can debug it and improve it and really bring it from a prototype to production. To help with that, we’ve created LangSmith, which is a debugger. It does a lot of things, but the main thing that people use it for is debugging and observability to see exactly what’s going on under the hood.

    If we look at, we can see here there are actually three different types of sequences. So this is the first one. This is just the first LLM call without any RAG things. And this is just a really simple prompt in LLM, so it’s pretty straightforward. Just to use this as an example of showing off some of the cool things that LangSmith enables as well, though, you can see exactly what the inputs and outputs to each step are. You can see any metadata associated with it. And then when you go into the LLM place in particular, you can actually open up a little playground and then mess around with it. So you can see the settings over there. You can change the inputs and then rerun it and get an output and basically debug it. This is really helpful when you have a sequence of multiple LLM calls. If you want to try to debug, say, the third one in that sequence, you’d need to recreate the exact state that it was in, the exact input variables, and the exact prompt template. So, rather than trying to do all of that, you know, independent in your app, you can kind of just easily open up the playground here.

    Going back, we see that there’s two more types of chains. These are actually very similar chains, but with just different prompts. And so if we look at this one, the retrieval with the retrieval QA with sources chain, we can see the sequence of steps. There’s first this retriever step, and then there’s this StuffDocumentsChain. If we look at the retriever step, we can see that what this is doing is taking in a query and returning a list of documents. Here we can inspect, like, what exactly are the documents that we’re getting back. And we can see them all here. This StuffDocumentsChain basically takes these inputs. So it’s got a question, it’s got the chat history, and now it has these input documents. And these are coming from the previous retriever step.

    And then under the hood, what it’s doing is passing all of those into a final call to the language model. And so if we open this up in the playground again, we can now see the system prompt, which was what Michael was pointing out earlier. Use the following pieces of context, the answer the question at the end. The context contains questions, blah, blah, blah, blah, blah, we get down here and we now see the content. And it’s pasted in here. And then all the way at the end, we see the user question. How do I summarize PDFs using LangChain? We can see exactly what this prompt looks like by the time the retrieval step has been done.

    If we go back to here, we can see that this chain looks basically the same. And so what’s going on is the main difference is the prompt that’s being passed in. And this is why the prompts were so important earlier, because they tell the language model how to perform. And so if you remember the difference between this result, where we get the references. And this result, where we don’t, the only difference is the prompts. One prompt is telling it to generate a list of sources with its answers, the other one is not asking for it. So that shows a lot of the flexibility that you can have with this type of prompting strategy.

    Application details

    I believe that’s what I wanted to show in here, so we’ll go back to slides for now. I think the next few slides are all walking through the details of this application. We’ll share these so you can see more detail what we’re using Docker Compose for a lot of the orchestration. We’ve got the various containers that are all getting pulled up. Ollama — we’re using their local LLM. And, these are the other LLMs that we could sub in should we want to. And all of these have kind of like integrations with LangChain with the same exact interface. So it’s very easy to swap them in and out and experiment.

    I believe there was someone during the keynote who was talking about the importance of basically this rapid experimentation phase to figure out what you or your company is going to build for GenAI and actually have product market fit with that. So that’s part of the value prop of LangChain as well as this rapid experimentation. Neo4j obviously as the knowledge graph here. And I think one really interesting thing, one big benefit of Neo4j in my mind is that visualization of all the different concepts that you’ve got going on. Again, connecting it to your data. A lot of it’s throwing the data into a vector store. It’s really, really nice to be able to visualize what exactly is in your vector store the same way that it’s really nice to be able to visualize what’s going on in the chain.

    And I think one theme of all these LLM applications is that as you add this like stochastic nature into it, observability becomes really, really important. That’s a big bonus of this GenAI stack that we’ve put together and so is the Neo4j stuff used to ground LLMs in the knowledge graph. We can see this import step. The indexing step is really complicated as well. So that’s probably a separate talk by itself. There’s a lot of different options that go on behind the scenes in terms of how you pre-process this text, split it into chunks, put it exactly into the vector databases with the different tags. So that you can get this nice visualization right here.

    The LangChain part again is kind of like these different components, the ways of constructing the app. And then you see a few of the applications that were created. First, was this support agent bot, which answers based on the question. It’s written in Python; the UI’s in Streamlit. And it’s got uses, the LLM chain prompts embedding in LLM, and then Neo4j under the scene. And specifically you’ve got this, like, user question that comes in a QA chain that configures the prompts to provide an output, list out the sources, and then render it in the UI.

    More details here on what specifically is going on with the Neo4j vector store, so you embed the question as a vector you pass it in. You find relevant documents to that vector and then pass it back. This is the workflow that Michael showed off as well, where you’re generating a ticket if there’s no answer in the knowledge base. You can turn the user question into a ticket. And this is a really interesting point, because I think one of the big things that people are still trying to figure out, and why it’s a really good time to be a developer right now, is like what is the right UX for all these GenAI applications? They can be really good, and they can be really powerful, but they’re not perfect. So how do you both communicate that but then also make up for that with things that go into the UX? And this is a bit, you know, application-specific, which again is why it’s a great time to be building applications in GenAIi right now.

    This is a screenshot from the LangChain documentation that just shows off some of the use cases that we have, so if any of these spark your fancy, be sure to go check them out. There’s a lot of different use cases, and there’s also a lot of different integrations, and so those are on the other side. And so you can see that we have a lot with all the main, like, cloud providers but then all those different components. You know there’s 50 to 100 under each of those for the different types of tools retrievers LLMs that you can pull in. LangSmith, as we showed, is kind of like the observability step for your LLM applications — what are the sequence of steps? What are the inputs and outputs of each step? Right now, it’s in a private beta, but we’ve made a code for everyone here today, so we’d love for you guys to try it out and that QR code should open you up to LangSmith right there. If you miss this, come see me afterwards. I’m happy to get you set up. That enables views like this where you can kind of see the exact trace and the exact inputs. Michael is going to talk a little bit now about how you can access all this stuff that we’ve been talking about.

    Get up and running

    So where can you get it? The easiest way is just to open Docker Desktop in the learning center, and in the middle you should see it under AI and ML guides — the GenAI stack. You click on a link, and it gets you basically to the git repository that you can clone and get it downloaded. Run “docker compose up” and you should be up and running. If you want to add other things, or if you want to change the types of models that it uses, it’s all in the configuration. Stay tuned; there’s more to come there.

    Please feel free to give us feedback. Our own developer conference is going on on the 26th of October, so if you want to learn more about this graph way of seeing things and connecting information, come by. We have 24 hours of content around all time zones with more than a hundred talks. ML/AI is a big aspect of that, including powerful visualizations and so on. Thank you so much.

    Q&A

    We can also do questions right now, but if you don’t have a microphone or something, you have to probably yell at us, so feel free to.

    So, yes, a graph database is first of all a general-purpose transactional database that stores information as entities and relationships. So, unlike in relational databases, where you just have tables, you have entities as one type of object and relationships as another one.

    A vector database, or vector index in more general, is an index that takes multi-dimensional floating point and codings of the essence of information. And vector encoding on vector embedding is more like the what’s the essence of the sentence, or what’s the essence of this picture, or what’s the essence of this word or video or something, in a multi-dimensional vector space. And these vector indexes of vector databases basically allow you to find for a given vector.

    Like you ask a question, what are the most similar or most close other vectors in my database that already exist either by distance or by angular distance or cosine or Euclidean distance? Vector databases are basically just very fast algorithms that take large volumes of these vectors that preexist and find the closest ones to your input. Many other vector databases are out there,  for instance, Pinecone, and so on. But many of the other databases like Neo4j, MongoDB, Postgres, and others have added vector indexes to their regular databases, so you get the benefit of having a regular database together with the vector search. So, if you have a very narrow use case that you just want to do vector search and nothing else, and a vector database is good, but if you have all existing data in your database then you can use the vector search capabilities of your database as such.

    So, can you say make prompts from what you are going to want to do with a vector database? I read about this concept of, you know, asking AI to create prompts in them. In that case, you have an idea or something like that, but it can assist you in like a digital prompt.

    There’s been a few research papers in that vein of taking a prompt and asking the LLM to improve it. I think we have implementations of one. There’s one called APE (automatic prompt engineer) that does that. I’m actually a little bit bearish on that use case, because I think a lot of the times when LLMs get things wrong, it’s not that there’s like extra space where there shouldn’t be a space or things. Like you look at some MidJourney prompts, and they have these like random characters, right? And there’s a kind of art that goes into figuring out what sequence of words to put next to each other.

    I don’t think LLMs are quite like that, and I think a lot of times when they mess up, they’re just lacking the correct context to answer — like they don’t have the data that they need to be grounded and the instructions just aren’t clear. And so if the instructions aren’t clear, it doesn’t matter. There’s one thing that I am more bullish on and that’s maybe the idea of using an LLM to suggest ways that a human could bring more to the prompt — not necessarily doing it itself, because I think it’s hard for an LLM to automatically bring that information in, but saying what things like aren’t clear.

    Thank you.

    Learn more

    This article contains the YouTube transcript of a presentation from DockerCon 2023. “How to Quickly Build LangChain-based, Database-Backed GenAI Applications within Docker” was presented by Harrison Chase, Founder and CEO of LangChain, and Michael Hunger, Head of Product Innovation & Developer Strategy at Neo4j.

    Find a subscription that’s right for you

    Contact an expert today to find the perfect balance of collaboration, security, and support with a Docker subscription.