Developing Resilient Applications with Toxiproxy and Testcontainers

How can we ensure the reliability of applications nowadays? One way is to use Testcontainers and Toxiproxy to seamlessly add chaos engineering practices to your test suite.

Testcontainers is a well-known library that offers disposable containers to use during tests and development. It also provides ready-to-use definitions for your favorite database, message broker, or anything running in a container.

Toxiproxy is an open source project by Shopify. It allows testing and simulation of network failure scenarios. Examples of network problems it can simulate include latency, bandwidth restrictions, and complete failures.

Banner developing resilient applications with toxiproxy and testcontainers

Combining Testcontainers and Toxiproxy enables developers to test how their applications behave under different network conditions and failure scenarios. This process helps to prevent issues before reaching the production environments, and it therefore ensures the development of more reliable and resilient applications.

Let’s build a Spring Boot application. You can start by downloading the initial project. It’s a Maven project using Spring Boot 3 with dependencies that are required for this example. For more information on how to add integration tests to a Spring Boot application, check out this previous article on the topic.

Next, let’s add another Testcontainers module dependency to include the Toxiproxy container abstractions.

<dependency>
   <groupId>org.testcontainers</groupId>
   <artifactId>toxiproxy</artifactId>
   <scope>test</scope>
</dependency>

In this example, we will be using a Spring Boot application along with R2DBC for accessing a traditional relational database. Why R2DBC? Because it provides a user-friendly reactive-API to deal with errors and exceptions which can occur during database interactions.

Note: For brevity and demonstration purposes, we added the code for handling the retries to our test code. In practice, such code should normally be part of the application code.

First of all, let’s define our container database by using PostgreSQLContainer and ToxiproxyContainer. We will also put both containers into the same network:

private static final Network network = Network.newNetwork();
@Container
private static final PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:15-alpine")
   .withCopyFileToContainer(MountableFile.forClasspathResource("db.sql"), "/docker-entrypoint-initdb.d/")
   .withNetwork(network)
   .withNetworkAliases("postgres");
@Container
private static final ToxiproxyContainer toxiproxy = new ToxiproxyContainer("ghcr.io/shopify/toxiproxy:2.5.0")
   .withNetwork(network);

where db.sql content is:

CREATE TABLE IF NOT EXISTS pokemon(id serial primary key, name varchar(255) not null);
INSERT INTO pokemon (name) VALUES ('Bulbasaur'), ('Squirtle'), ('Charmander');

Now, let’s create a ToxiproxyClient and the proxy for accessing the PostgreSQL database, which will be used to inject network failures:

@DynamicPropertySource
static void sqlserverProperties(DynamicPropertyRegistry registry) throws IOException {
   var toxiproxyClient = new ToxiproxyClient(toxiproxy.getHost(), toxiproxy.getControlPort());
   postgresqlProxy = toxiproxyClient.createProxy("postgresql", "0.0.0.0:8666", "postgres:5432");
   var r2dbcUrl = "r2dbc:postgresql://%s:%d/%s".formatted(toxiproxy.getHost(), toxiproxy.getMappedPort(8666), postgres.getDatabaseName());
   registry.add("spring.r2dbc.url", () -> r2dbcUrl);
   registry.add("spring.r2dbc.username", postgres::getUsername);
   registry.add("spring.r2dbc.password", postgres::getPassword);
}
  1. toxiproxyClient is built using toxiproxy.getHost() and toxiproxy.getControlPort() from the container instance.
  2. postgresqlProxy is the proxy to be used to inject network failures. Port 8666 is already exposed by ToxiproxyContainer, and it is used as an entrypoint for the PostgreSQL connection.
  3. The R2DBC URL is constructed using the proxy’s host and port instead of PostgreSQL ones.
  4. R2DBC configuration properties are set.

Additionally, we need data model classes. For the current sample, these are the Pokemon entity and PokemonRepository. These are going to be used in our tests:

public record Pokemon(Long id, String name) {
}
public interface PokemonRepository extends R2dbcRepository<Profile, Long> {
}

Our first test will make sure that everything is working as expected under normal conditions: Query the database and verify the expected count of the found records.

@Test
void normal() {
   StepVerifier.create(this.pokemonRepository.findAll()).expectNextCount(3).verifyComplete();
}

Now, let’s add some latency to the connection, which will become visible when executing the query:

@Test
void withLatency() throws IOException {
   postgresqlProxy.toxics().latency("postgresql-latency", ToxicDirection.DOWNSTREAM, 1600).setJitter(100);
       StepVerifier.create(this.pokemonRepository.findAll()).expectNextCount(3).verifyComplete();
}
  1. Using postgresqlProxy, a latency of 1600 ms +/- 100 ms will be created on the way from server to the client.
  2. The query is performed and, after that time, the records will be returned as usual.

This time, let’s write a test that has a timeout configured. We don’t want our database operation to hang for too long.

@Test
void withLatencyWithTimeout() throws IOException {
   postgresqlProxy.toxics().latency("postgresql-latency", ToxicDirection.DOWNSTREAM, 1600).setJitter(100);
       StepVerifier.create(this.pokemonRepository.findAll().timeout(Duration.ofMillis(50)))
           .expectError(TimeoutException.class)
           .verify();
}
  1. Using postgresqlProxy, a latency of 1600 ms +/- 100 ms will be created on the way from server to the client.
  2. A timeout of 50 ms has been configured in our reactive code and due to the latency being higher, a TimeoutException is produced and captured by test.

The third test using Toxiproxy will be about testing retries in our database operations.

@Test
void withLatencyWithRetries() throws IOException {
   Latency latency = postgresqlProxy.toxics().latency("postgresql-latency", ToxicDirection.DOWNSTREAM, 1600).setJitter(100);
   StepVerifier.create(this.pokemonRepository.findAll()
                   .timeout(Duration.ofSeconds(1))
                   .retryWhen(Retry.fixedDelay(2, Duration.ofSeconds(1))
                           .filter(throwable -> throwable instanceof TimeoutException)
                           .doBeforeRetry(retrySignal -> logger.info(retrySignal.copy().toString()))))
           .expectSubscription()
           .expectNoEvent(Duration.ofSeconds(4))
           .then(() -> {
               try {
                   latency.remove();
               } catch (IOException e) {
                   throw new RuntimeException(e);
               }
           })
           .expectNextCount(3)
           .expectComplete()
           .verify();
}
  1. Using postgresqlProxy, a latency of 1600 ms +/- 100 ms will be created on the way from server to the client.
  2. In our reactive code, a timeout of 1s is configured. Also, a retry configuration of 2 maximum attempts with a delay of 1s for only TimeoutException has been configured.
  3. No events are expected for about 4 seconds.
  4. Finally, latency is removed, and therefore records are retrieved.

Thanks to the Toxiproxy Java client, we can inject some latency of 1600 ms with a jitter of 100 ms. It means, it can run between 1500 and 1700 milliseconds. Also, the direction is set to Downstream, which means the latency only affects the direction server to the client.

Finally, let’s use a different toxic which will cut the bandwidth between server and client.

@Test
void withConnectionDown() throws IOException {
   postgresqlProxy.toxics().bandwidth("postgres-cut-connection-downstream", ToxicDirection.DOWNSTREAM, 0);
   postgresqlProxy.toxics().bandwidth("postgres-cut-connection-upstream", ToxicDirection.UPSTREAM, 0);
   StepVerifier.create(this.pokemonRepository.findAll().timeout(Duration.ofSeconds(5)))
       .verifyErrorSatisfies(throwable -> assertThat(throwable).isInstanceOf(TimeoutException.class));
   postgresqlProxy.toxics().get("postgres-cut-connection-downstream").remove();
   postgresqlProxy.toxics().get("postgres-cut-connection-upstream").remove();
   StepVerifier.create(this.pokemonRepository.findAll()).expectNextCount(3).verifyComplete();
}
  1. The downstream bandwidth is completely cut from server to client
  2. The upstream bandwidth is completely cut from client to server
  3. The database operation in the test is configured with a timeout of 5s. Because there is effectively no network connection, a TimeoutException will be produced.
  4. Both toxics created at the beginning are removed, therefore restoring the connection
  5. Records can be retrieved from the database successfully.

As we can see, the test passed but it took a little bit more time, around 5s.

This shows how easy you can use Toxiproxy to manipulate network latency, throughput, or cut the connection completely. Combining this flexibility with a Testcontainers based setup simplifies configuration and gives you precise control when and what type network failures you want to introduce in the tests.

Conclusion

In this article, we looked at how you can add chaos engineering practices and programmatically inject network failures into your tests. The combination of Testcontainers and Toxiproxy has a lot of synergy: Testcontainers allows you to run unit tests with real dependencies, like the PostgreSQL database we used in the sample application. Toxiproxy implements true network failure effects.

In modern software development, testing these network-affected scenarios is essential to make sure that even when something is failing the application is behaving properly. And, with Testcontainers-based tests, the process doesn’t require separate complex setups or manually managed test environments!

Learn more