How to Run Hugging Face Models Programmatically Using Ollama and Testcontainers

Hugging Face now hosts more than 700,000 models, with the number continuously rising. It has become the premier repository for AI/ML models, catering to both general and highly specialized needs.

As the adoption of AI/ML models accelerates, more application developers are eager to integrate them into their projects. However, the entry barrier remains high due to the complexity of setup and lack of developer-friendly tools. Imagine if deploying an AI/ML model could be as straightforward as spinning up a database. Intrigued? Keep reading to find out how.

2400x1260 how to run hugging face models programmatically using ollama and testcontainers

Introduction to Ollama and Testcontainers

Recently, Ollama announced support for running models from Hugging Face. This development is exciting because it brings the rich ecosystem of AI/ML components from Hugging Face to Ollama end users, who are often developers. 

Testcontainers libraries already provide an Ollama module, making it straightforward to spin up a container with Ollama without needing to know the details of how to run Ollama using Docker:

import org.testcontainers.ollama.OllamaContainer; 

var ollama = new OllamaContainer("ollama/ollama:0.1.44"); 
ollama.start();

These lines of code are all that is needed to have Ollama running inside a Docker container effortlessly.

Running models in Ollama

By default, Ollama does not include any models, so you need to download the one you want to use. With Testcontainers, this step is straightforward by leveraging the execInContainer API provided by Testcontainers:

ollama.execInContainer("ollama", "pull", "moondream");

At this point, you have the moondream model ready to be used via the Ollama API. 

Excited to try it out? Hold on for a bit. This model is running in a container, so what happens if the container dies? Will you need to spin up a new container and pull the model again? Ideally not, as these models can be quite large.

Thankfully, Testcontainers makes it easy to handle this scenario, by providing an easy-to-use API to commit a container image programmatically:

public void createImage(String imageName) {
var ollama = new OllamaContainer("ollama/ollama:0.1.44");
ollama.start();
ollama.execInContainer("ollama", "pull", "moondream");
ollama.commitToImage(imageName);
}

This code creates an image from the container with the model included. In subsequent runs, you can create a container from that image, and the model will already be present. Here’s the pattern:

var imageName = "tc-ollama-moondream";
var ollama = new OllamaContainer(DockerImageName.parse(imageName)
.asCompatibleSubstituteFor("ollama/ollama:0.1.44"));
try {
ollama.start();
} catch (ContainerFetchException ex) {
// If image doesn't exist, create it. Subsequent runs will reuse the image.
createImage(imageName);
ollama.start();
}

Now, you have a model ready to be used, and because it is running in Ollama, you can interact with its API:

var image = getImageInBase64("/whale.jpeg");
String response = given()
.baseUri(ollama.getEndpoint())
.header(new Header("Content-Type", "application/json"))
.body(new CompletionRequest("moondream:latest", "Describe the image.", Collections.singletonList(image), false))
.post("/api/generate")
.getBody().as(CompletionResponse.class).response();

System.out.println("Response from LLM " + response);

Using Hugging Face models

The previous example demonstrated using a model already provided by Ollama. However, with the ability to use Hugging Face models in Ollama, your available model options have now expanded by thousands. 

To use a model from Hugging Face in Ollama, you need a GGUF file for the model. Currently, there are 20,647 models available in GGUF format. How cool is that?

The steps to run a Hugging Face model in Ollama are straightforward, but we’ve simplified the process further by scripting it into a custom OllamaHuggingFaceContainer. Note that this custom container is not part of the default library, so you can copy and paste the implementation of OllamaHuggingFaceContainer and customize it to suit your needs.

To run a Hugging Face model, do the following:

public void createImage(String imageName, String repository, String model) {
var model = new OllamaHuggingFaceContainer.HuggingFaceModel(repository, model);
var huggingFaceContainer = new OllamaHuggingFaceContainer(hfModel);
huggingFaceContainer.start();
huggingFaceContainer.commitToImage(imageName);
}

By providing the repository name and the model file as shown, you can run Hugging Face models in Ollama via Testcontainers. 

You can find an example using an embedding model and an example using a chat model on GitHub.

Customize your container

One key strength of using Testcontainers is its flexibility in customizing container setups to fit specific project needs by encapsulating complex setups into manageable containers. 

For example, you can create a custom container tailored to your requirements. Here’s an example of TinyLlama, a specialized container for spinning up the DavidAU/DistiLabelOrca-TinyLLama-1.1B-Q8_0-GGUF model from Hugging Face:

public class TinyLlama extends OllamaContainer {

    private final String imageName;

    public TinyLlama(String imageName) {
        super(DockerImageName.parse(imageName)
.asCompatibleSubstituteFor("ollama/ollama:0.1.44"));
        this.imageName = imageName;
    }

    public void createImage(String imageName) {
        var ollama = new OllamaContainer("ollama/ollama:0.1.44");
        ollama.start();
        try {
            ollama.execInContainer("apt-get", "update");
            ollama.execInContainer("apt-get", "upgrade", "-y");
            ollama.execInContainer("apt-get", "install", "-y", "python3-pip");
            ollama.execInContainer("pip", "install", "huggingface-hub");
            ollama.execInContainer(
                    "huggingface-cli",
                    "download",
                    "DavidAU/DistiLabelOrca-TinyLLama-1.1B-Q8_0-GGUF",
                    "distilabelorca-tinyllama-1.1b.Q8_0.gguf",
                    "--local-dir",
                    "."
            );
            ollama.execInContainer(
                    "sh",
                    "-c",
                    String.format("echo '%s' > Modelfile", "FROM distilabelorca-tinyllama-1.1b.Q8_0.gguf")
            );
            ollama.execInContainer("ollama", "create", "distilabelorca-tinyllama-1.1b.Q8_0.gguf", "-f", "Modelfile");
            ollama.execInContainer("rm", "distilabelorca-tinyllama-1.1b.Q8_0.gguf");
            ollama.commitToImage(imageName);
        } catch (IOException | InterruptedException e) {
            throw new ContainerFetchException(e.getMessage());
        }
    }

    public String getModelName() {
        return "distilabelorca-tinyllama-1.1b.Q8_0.gguf";
    }

    @Override
    public void start() {
        try {
            super.start();
        } catch (ContainerFetchException ex) {
            // If image doesn't exist, create it. Subsequent runs will reuse the image.
            createImage(imageName);
            super.start();
        }
    }
}

Once defined, you can easily instantiate and utilize your custom container in your application:

var tinyLlama = new TinyLlama("example");
tinyLlama.start();
String response = given()
.baseUri(tinyLlama.getEndpoint())
.header(new Header("Content-Type", "application/json"))
.body(new CompletionRequest(tinyLlama.getModelName() + ":latest", List.of(new Message("user", "What is the capital of France?")), false))
.post("/api/chat")
.getBody().as(ChatResponse.class).message.content;
System.out.println("Response from LLM " + response);

Note how all the implementation details are under the cover of the TinyLlama class, and the end user doesn’t need to know how to actually install the model into Ollama, what GGUF is, or that to get huggingface-cli you need to pip install huggingface-hub.

Advantages of this approach

  • Programmatic access: Developers gain seamless programmatic access to the Hugging Face ecosystem.
  • Reproducible configuration: All configuration, from setup to lifecycle management is codified, ensuring reproducibility across team members and CI environments.
  • Familiar workflows: By using containers, developers familiar with containerization can easily integrate AI/ML models, making the process more accessible.
  • Automated setups: Provides a straightforward clone-and-run experience for developers.

This approach leverages the strengths of both Hugging Face and Ollama, supported by the automation and encapsulation provided by the Testcontainers module, making powerful AI tools more accessible and manageable for developers across different ecosystems.

Conclusion

Integrating AI models into applications need not be a daunting task. By leveraging Ollama and Testcontainers, developers can seamlessly incorporate Hugging Face models into their projects with minimal effort. This approach not only simplifies the setup of the development environment process but also ensures reproducibility and ease of use. With the ability to programmatically manage models and containerize them for consistent environments, developers can focus on building innovative solutions without getting bogged down by complex setup procedures.

The combination of Ollama’s support for Hugging Face models and Testcontainers’ robust container management capabilities provides a powerful toolkit for modern AI development. As AI continues to evolve and expand, these tools will play a crucial role in making advanced models accessible and manageable for developers across various fields. So, dive in, experiment with different models, and unlock the potential of AI in your applications today.

Stay current on the latest Docker news. Subscribe to the Docker Newsletter.

Learn more