Using an AI Assistant to Script Tools

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

LLMs are now quite good at transforming data. For example, we were recently working with some data generated by the Pylint tool. This tool generates big arrays of code violations. 

2400x1260 docker labs genai

Here’s an example showing the kind of data that gets returned.

[
    {
        "type": "convention",
        "module": "app",
        "line": 1,
        "column": 0,
        "endLine": 1,
        "endColumn": 13,
        "path": "src/app.py",
        "symbol": "missing-module-docstring",
        "message": "Missing module docstring",
        "message-id": "C0114"
    },
    {
       ...
    },
    ...
]

During this session with our AI assistant, we decided that it would be helpful to create a database and insert the data to make it easier for the AI to analyze (LLMs are very good at writing SQL). As is now our habit, we wrote a quick prompt to see if the assistant could generate the SQL:

1. Read the json data from /thread/violations.json
2. For each element in the array, transform each element into two SQL INSERT statements.

* The first should insert columns PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN into a table named RANGES.
* The second should insert columns MESSAGE, TYPE, RANGE, and VIOLATION_ID into a table named VIOLATIONS.

LLMs are obviously good at tasks of this kind, and this was no exception. Our prompt engine had been augmented with a function to read local files but, besides that, this was a pretty straightforward prompt (we used GPT-3.5). The LLM responded with the correct INSERT statements.

INSERT INTO RANGES (PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN)
VALUES ('src/app.py', 1, 1, 0, 13);
INSERT INTO VIOLATIONS (MESSAGE, TYPE, RANGE, VIOLATION_ID)
VALUES ('Missing module docstring', 'convention', (SELECT ID FROM RANGES WHERE PATH='src/app.py' AND START_LINE=1 AND START_COLUMN=0), 'C0114');
...
...

We’re starting to get accustomed to this kind of capability, so the result wasn’t too surprising. However, what about the context window here? It really doesn’t make a lot of sense to pass all of this data to the LLM, especially if this is a task that we’ll need to do continuously. Plus, this is really not how a programmer would have solved the problem.

Programmers would write programs.

So, instead of asking the LLM to do a thing, we should try asking it to write a program to do that same thing. Starting with the same prompt, let’s prefix it with “Please write a JavaScript program to …” In other words, let’s ask the LLM to describe how it would accomplish the task in JavaScript, and hopefully automate itself out of the loop. Here’s the adjusted prompt:

Write a javascript function that reads json from a file named `/thread/violations.json`.

For each element in the array, transform each element into two SQL INSERT statements.

* The first should insert columns PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN into a table named RANGES.
* The second should insert columns MESSAGE, TYPE, RANGE, and VIOLATION_ID into a table named VIOLATIONS.

and write them to the file /thread/insert.sql

Now execute the javascript code in a container.

Tool code

Current LLMs can do tasks of this kind. Here’s a sample of the output generated by GPT-3.5. A quick scan will convince many of you that this is probably going to do the trick. However, if our prompt engine is already running in Docker, we might as well just run it:

const fs = require('fs');
const crypto = require('crypto');

function getRandomId() {
    return crypto.randomBytes(16).toString('hex');
}

function escapeString(str) {
    return str.replace(/'/g, \\\"''\\\");
}

let data = JSON.parse(fs.readFileSync('/thread/violations.json', 'utf8'));
let sqlStatements = '';

data.forEach((violation) => {
    let id = getRandomId();
    sqlStatements += `INSERT INTO RANGES(ID, PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN) VALUES('${id}', '${escapeString(violation.path)}', ${violation.line}, ${violation.endLine}, ${violation.column}, ${violation.endColumn});\\\\n`;
    sqlStatements += `INSERT INTO VIOLATIONS(MESSAGE, TYPE, RANGE) VALUES('${escapeString(violation.message)}', '${escapeString(violation.type)}', '${id}');\\\\n`;
    });

fs.writeFileSync('/thread/insert.sql', sqlStatements);

Running the tool

You may not have noticed but at the end of our prompt, we added a final instruction: “Now execute the JavaScript code in a container”. This is a nice addition to our session, because it means we get to see the results.

This is also where tool calling comes back into the picture. To give our AI the capacity to try running the program that it has just written, we have defined a new function to create an isolated runtime sandbox for trying out our new tool.

Here’s the agent’s new tool definition:

tools:
  - name: run-javascript-sandbox
    description: execute javascript code in a container
    parameters:
      type: object
      properties:
        javascript:
          type: string
          description: the javascript code to run
    container:
      image: vonwig/javascript-runner
      command:
        - "{{javascript|safe}}"

We’ve asked the AI assistant to generate a tool from a description of that tool. As long as the description of the tools doesn’t change, the workflow won’t have to go back to the AI to ask it to build a new tool version.

The role of Docker in this pattern is to create the sandbox for this code to run. This function really doesn’t need much of a runtime, so we give it a pretty small sandbox.

  • No access to a network.
  • No access to the host file system (does have access to isolated volumes for sharing data between tools).
  • No access to GPU.
  • Almost no access to software besides the Node.js runtime (no shell for example).

The ability for one tool to create another tool is not just a trick. It has very practical implications for the kinds of workflows that we can build up because it gives us a way for us to control the volume of data sent to LLMs, and it gives the assistant a way to “automate” itself out of the loop.

Next steps

This example was a bit abstract but in our next post, we will describe the practical scenarios that have driven us to look at this idea of prompts generating new tools. Most of the workflows we’re exploring are still just off-the-shelf tools like Pylint, SQLite, and tree_sitter (which we embed using Docker, of course!). For example:

  1. Use pylint to extract violations from my codebase.
  2. Transform the violations into SQL and then send that to a new SQLite.
  3. Find the most common violations of type error and show me the top level code blocks containing them.

However, you’ll also see that part of being able to author workflows of this kind is being able to recognize when you just need to add a custom tool to the mix.

Read the Docker Labs GenAI series to see more of what we’ve been working on.

Learn more