Generative AI with Spring Boot and Spring AI

by Horatiu Dan

Context

It’s been more than 20 years since Spring Framework appeared in the software development landscape and 10 since Spring Boot version 1.0 was released. By now nobody should have any doubt that Spring has created a unique style through which developers are freed from repetitive tasks and left to focus on the business value delivery. As years passed, Spring’s technical depth has continually increased, covering a wide variety of development areas and technologies. On the other hand, its technical breadth has been continually expanded as more focused solutions have been experimented, proof of concepts created and ultimately promoted under the projects’ umbrella (towards the technical depth).

One such example is the new Spring AI project which, according to its reference documentation, aims to ease the development when a generative artificial intelligence layer is aimed to be incorporated into applications. Once again, developers are freed from repetitive tasks and offered with simple interfaces for a direct interaction with the pre-trained models that incorporate the actual processing algorithms.

By interacting with generative pre-trained transformers (GPTs) directly or via Spring AI programmatically, users (developers) do not need to (although it would be useful) posses extensive machine learning knowledge. As an engineer, I strongly believe that even if such (developer) tools can be rather easily and rapidly used to produce results, it is advisable to temper ourselves, to switch to a watchful mode and try to gain a decent understanding of the base concepts first. Moreover, by following this path, the outcome might be even more useful.

Purpose

This article shows how Spring AI can be integrated into a Spring Boot application and fulfill a programmatic interaction with Open AI. It is assumed that prompt design in general (prompt engineering) is a state-of-the-art activity. Consequently, the prompts used during experimentation are quite didactic, without much applicability. The focus here is on the communication interface, that is, Spring AI API.

Before the Implementation

First and foremost, one shall clarify the self rationale for incorporating and utilizing a GPT solution, in addition to the desire to deliver with greater quality, in less time and with lower costs.

Generative AI is said to be good at doing a great deal of time-consuming tasks, quicker, more efficiently and output the results. Moreover, if these results are further validated by experienced and wise humans, the chances to obtain something useful increase. Fortunately, people are still part of the scenery.

Next, one shall resist the temptation to jump right into the implementation and at least dedicate some time to get a bit familiar with the general concepts. An in-depth exploration of generative AI concepts is way beyond the scope of this article. Nevertheless, the “main actors” that appear in the interaction are briefly outlined below.

The Stage – Generative AI is part of Machine Learning that is part of Artificial Intelligence

Input – the provided data (incoming)

Output – the computed results (outgoing)

Large Language Model (LLM) – the fine-tuned algorithm that based on the interpreted input, produces the output

Prompt – a the state-of-the-art interface through which the input is passed to the model

Prompt Template – a component that allows constructing structured parameterized prompts

Tokens – the components the algorithm internally translates the input into, then uses to compile the results and ultimately constructs the output from.

Model’s Context Window – the threshold the model limits the number of tokens count per call (usually, the more tokens are used, the more expensive the operation is)

Finally, an implementation may be started, but as it progresses, it is advisable to revisit and refine the first two steps.

Prompts

In this exercise, we ask for the following:

Write {count = three} reasons why people in {location = Romania} should consider a {job = software architect} job. 
These reasons need to be short, so they fit on a poster. 
For instance, "{job} jobs are rewarding."

This basically represents the prompt. As advised, a clear topic, a clear meaning of the task and additional helpful pieces of information should be provided as part of the prompts, in order to increase the results’ accuracy.

The prompt contains three parameters, which allow coverage for a wide range of jobs in various locations.

  • count – the number of reasons aimed as part of the output
  • job – the domain, the job interested in
  • location – the country, town, region, etc. the job applicants reside

Proof of Concept

In this post, the simple proof of concept aims the following:

  1. integrate Spring AI in a Spring Boot application and use it
  2. allow a client to communicate with Open AI via the application
    • client issues a parametrized HTTP request to the application
    • the application uses a prompt to create the input, sends it to Open AI retrieves the output
    • the application sends the response to the client

Set-up

  • Java 21
  • Maven 3.9.2
  • Spring Boot – v. 3.2.2
  • Spring AI – v. 0.8.0-SNAPSHOT (still developed, experimental)

Implementation

Spring AI Integration

Normally, this is a basic step not necessarily worth mentioning. Nevertheless, since Spring AI is currently released as a snapshot, in order to be able to integrate the Open AI auto-configuration dependency, one shall add a reference to Spring Milestone / Snapshot repositories.

<repositories>
	<repository>
		<id>spring-milestones</id>
		<name>Spring Milestones</name>
		<url>https://repo.spring.io/milestone</url>
		<snapshots>
			<enabled>false</enabled>
		</snapshots>
	</repository>
	<repository>
		<id>spring-snapshots</id>
		<name>Spring Snapshots</name>
		<url>https://repo.spring.io/snapshot</url>
		<releases>
			<enabled>false</enabled>
		</releases>
	</repository>
</repositories>

The next step is to add the spring-ai-openai-spring-boot-starter Maven dependency.

<dependency>
	<groupId>org.springframework.ai</groupId>
	<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
	<version>0.8.0-SNAPSHOT</version>
</dependency>

Open AI ChatClient is now part of the application classpath. It is the component used to send the input to Open AI and retrieve the output back.

In order to be able to connect to the AI Model, the spring.ai.openai.api-key property needs to be set-up in the application.properties file.

spring.ai.openai.api-key = api-key-value

Its value represents a valid API Key of the user on behalf which the communication is made. By accessing [Resource 2] one can either sign-up or sign-in and generate one.

Client – Spring Boot Application Communication

The first part of the proof of concept is the communication between a client application (e.g. browser, curl etc.) and the application developed. This is done via a REST controller, accessible via a HTTP GET request.

The URL is /job-reasons together with the three parameters previously outlined when the prompt was defined, which conducts to the following form:

/job-reasons?count={count}&job={job}&location={location}

and the corresponding controller:

@RestController
public class OpenAiController {

	@GetMapping("/job-reasons")
    public ResponseEntity<String> jobReasons(@RequestParam(value = "count", required = false, defaultValue = "3") int count,
                                             @RequestParam("job") String job,
                                             @RequestParam("location") String location) {
        return ResponseEntity.ok().build();
    }
}

Since the response from Open AI is going to be a String, the controller returns a ResponseEntity that encapsulates a String. If we run the application and issue a request, currently nothing is returned as part of the response body.

Client – Open AI Communication

Spring AI currently focuses on AI Models that process language and produce language or numbers. Examples of Open AI models in the former category are GPT4-openai or GPT3.5-openai.

For fulfilling an interaction with these AI Models, which actually designate Open AI algorithms, Spring AI provides a uniform interface.

ChatClient interface currently supports text input and output and has a simple contract.

@FunctionalInterface
public interface ChatClient extends ModelClient<Prompt, ChatResponse> {
	default String call(String message) {
		Prompt prompt = new Prompt(new UserMessage(message));
		return call(prompt).getResult().getOutput().getContent();
	}

	ChatResponse call(Prompt prompt);
}

The actual method of the functional interface is the one usually used.

In the case of our proof of concept, this is exactly what it is needed, a way of calling Open AI and sending the aimed parametrized Prompt as parameter. The following OpenAiService is defined, where an instance of ChatClient is injected.

@Service
public class OpenAiService {

    private final ChatClient client;

    public OpenAiService(OpenAiChatClient aiClient) {
        this.client = aiClient;
    }
	
	public String jobReasons(int count, String domain, String location) {
        final String promptText = """
                Write {count} reasons why people in {location} should consider a {job} job.
                These reasons need to be short, so they fit on a poster.
                For instance, "{job} jobs are rewarding."
                """;

        final PromptTemplate promptTemplate = new PromptTemplate(promptText);
        promptTemplate.add("count", count);
        promptTemplate.add("job", domain);
        promptTemplate.add("location", location);

        ChatResponse response = client.call(promptTemplate.create());
        return response.getResult().getOutput().getContent();
    }
}

With the application running, if the following request is performed, from the browser:

http://localhost:8080/gen-ai/job-reasons?count=3&job=software%20architect&location=Romania

the below result is retrieved:

  1. Lucrative career: Software architect jobs offer competitive salaries and excellent growth opportunities, ensuring financial stability and success in Romania.
  2. In-demand profession: As the demand for technology continues to grow, software architects are highly sought after in Romania and worldwide, providing abundant job prospects and job security.
  3. Creative problem-solving: Software architects play a crucial role in designing and developing innovative software solutions, allowing them to unleash their creativity and make a significant impact on various industries.

which is exactly what it was intended – an easy interface through which the Open AI GPT model can be asked to write a couple of reasons about why a certain job in a certain location is appealing.

Adjustments and Observations

The simple proof of concept developed so far mainly uses the default configurations available.

The ChatClient instance may be configured according to the desired needs via various properties. As this is beyond the scope of this writing, only two are exemplified here.

spring.ai.openai.chat.options.model designates the AI Model to use. By default, it is ‘gpt-35-turbo’, but ‘gpt-4’ and ‘gpt-4-32k’ designate the latest versions. Although available, one may not be able to access these using a pay-as-you-go plan, but there are additional pieces of information available on the Open AI website for accommodating it.

Another property worth mentioning is spring.ai.openai.chat.options.temperature. According to the reference documentation, the sampling temperature controls the “creativity of the responses”. It is said that higher values make the output “more random”, while lower ones “more focused and deterministic”. The default value is 0.8, if we decrease it to 0.3, restart the application and ask again with the same request parameters, the below result is retrieved.

  1. Lucrative career opportunities: Software architect jobs in Romania offer competitive salaries and excellent growth prospects, making it an attractive career choice for individuals seeking financial stability and professional advancement.
  2. Challenging and intellectually stimulating work: As a software architect, you will be responsible for designing and implementing complex software systems, solving intricate technical problems, and collaborating with talented teams. This role offers continuous learning opportunities and the chance to work on cutting-edge technologies.
  3. High demand and job security: With the increasing reliance on technology and digital transformation across industries, the demand for skilled software architects is on the rise. Choosing a software architect job in Romania ensures job security and a wide range of employment options, both locally and internationally.

It is visible that the output is way more descriptive in this case.

One last consideration is related to the structure of the output obtained. It would be convenient to have the ability to map the actual payload received to a Java object (class or record, for instance). As of now, the representation is textual and so was the implementation. Output parsers may achieve this, similarly to Spring JDBC’s mapping structures.

In this proof of concept, a BeanOutputParser is used, which allows deserializing the result directly in a Java record as below:

public record JobReasons(String job,
                         String location,
                         List<String> reasons) {
}

by taking the {format} as part of the prompt text and providing it as an instruction to the AI Model.

The OpenAiService method becomes:

public JobReasons formattedJobReasons(int count, String job, String location) {
	final String promptText = """
			Write {count} reasons why people in {location} should consider a {job} job.
			These reasons need to be short, so they fit on a poster.
			For instance, "{job} jobs are rewarding."
			{format}
			""";

	BeanOutputParser<JobReasons> outputParser = new BeanOutputParser<>(JobReasons.class);

	final PromptTemplate promptTemplate = new PromptTemplate(promptText);
	promptTemplate.add("count", count);
	promptTemplate.add("job", job);
	promptTemplate.add("location", location);

	promptTemplate.add("format", outputParser.getFormat());
	promptTemplate.setOutputParser(outputParser);

	final Prompt prompt = promptTemplate.create();

	ChatResponse response = client.call(prompt);
	return outputParser.parse(response.getResult().getOutput().getContent());
}

When invoking again, the output is as below:

{
	"job":"software architect",
	"location":"Romania",
	"reasons":[
		"High demand",
		"Competitive salary",
		"Opportunities for growth"
	]
}

The format is the expected one, but the reasons appear less explanatory, which means additional adjustments are required in order to achieve better usability. From a proof of concept point of view though, this is acceptable, as the focus was on the form.

Conclusions

Prompt design is an important part of the task – the better articulated prompts are, the better the input and the higher the output quality is.

Using Spring AI to integrate with various chat models is quite straightforward – this post showed-cased an Open AI integration.

Nevertheless, in the case of Gen AI in general, just as in the case of almost any technology, it is very important to get familiar at least with the general concepts first. Then, to try to understand the magic behind the way the communication is carried out and only afterwards to start writing “production” code.

Last but not least, it is advisable to further explore the Spring AI API to understand the implementations and remain up-to-date as it evolves and improves.

The code is available here.

References

[1] – Spring AI Reference

[2] – Open AI Platform

[3] – The picture is from San Siro Stadium, taken (NOT AI generated) on the summer of 2023, in Milan, Italy while Coldplay was performing.

PostgreSQL Views with Runtime Parameters

by Horatiu Dan

There aren’t few the situations when applications are requested to be agile and versatile enough so that they can run dynamic reports for which the input comes at runtime.

This post aims to present a way of achieving it by leveraging the temporary configuration parameters supported by PostgreSQL databases.

According to the PostgreSQL documentation, starting with version 7.3, it is possible to set a configuration parameter using the set_config(name, value, is_local) function. Later, the value of the previously set parameter may be read using the current_setting(name) function, converted if needed and used. If the third parameter of the former function is true, the changed setting will only apply to the current transaction.

This is exactly what is needed here, a way of providing a runtime parameter value that can be used as part of an atomic operation.

Set-Up

The sample application is build with:

  • Java 21
  • Spring Boot version 3.1.15
  • PostgreSQL Driver version 42.6.0.
  • Liquibase 4.20.0
  • Maven 3.6.3

At application level, the Maven project is configured to use Spring Data JPA and Liquibase dependencies.

The domain is represented by products, whose prices are in various currencies. For converting between currencies, a currency exchange rate exists. The goal is to be able to read all products with their prices represented in a certain currency, at the rate of a certain day.

Proof of Concept

In order to start modelling, one shall first create a new schema, once connected to the database.

create schema pgsetting;

There are three entities – Product, Currency, CurrencyExchange.

@Entity
@Table(name = "product")
public class Product {

    @Id
    @Column(name = "id")
    private Long id;

    @Column(name = "name", nullable = false)
    private String name;

    @Column(name = "price", nullable = false)
    private Double price;

    @ManyToOne
    @JoinColumn(name = "currency_id")
    private Currency currency;
	
	...
}

@Entity
@Table(name = "currency")
public class Currency {

    @Id
    @Column(name = "id", nullable = false)
    private Long id;

    @Column(name = "name", nullable = false)
    private String name;

	...
}


@Entity
@Table(name = "currency_exchange")
public class CurrencyExchange {

    @Id
    @Column(name = "id", nullable = false)
    private Long id;

    @Column(name = "date", nullable = false)
    private LocalDate date;

    @ManyToOne
    @JoinColumn(name = "from_currency_id", nullable = false)
    private Currency from;

    @ManyToOne
    @JoinColumn(name = "to_currency_id", nullable = false)
    private Currency to;

    @Column(name = "value", nullable = false)
    private Double value;

	...
}

Each one has a corresponding CrudRepository.

@Repository
public interface ProductRepository extends CrudRepository<Product, Long> { }

@Repository
public interface CurrencyRepository extends CrudRepository<Currency, Long> { }

@Repository
public interface CurrencyExchangeRepository extends CrudRepository<CurrencyExchange, Long> { }

The data source is configured as usual in the application.properties file, together with the path to the Liquibase changelog file that records a few simple change sets for initializing the schema with the three tables and the relations among them.

For details, the application properties and db/changelog/schema-init.xml files may be explored.

The root changelog file is:

<?xml version="1.1" encoding="UTF-8" standalone="no"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd">

    <include file="/db/changelog/schema-init.xml"/>
    
</databaseChangeLog>

When the application is started, the change sets are executed in the order they are declared. So far, everything is straight forward, nothing out of the ordinary – a simple Spring Boot application whose database changes are managed with Liquibase.

Creating the Dynamic Report

Let’s assume that currently the application has two currencies defined – RON and EUR and two products with their prices recorded in different currencies.

Currency

+--+----+
|id|name|
+--+----+
|1 |RON |
|2 |EUR |
+--+----+

Product

+--+-------------------+-----+-----------+
|id|name               |price|currency_id|
+--+-------------------+-----+-----------+
|1 |Swatch Moonlight v1|100  |2          |
|2 |Winter Sky         |1000 |1          |
+--+-------------------+-----+-----------+

CurrencyExchange rates for the 15th of November

+--+----------+----------------+--------------+-----+
|id|date      |from_currency_id|to_currency_id|value|
+--+----------+----------------+--------------+-----+
|1 |2023-11-15|2               |1             |5    |
|2 |2023-11-15|2               |2             |1    |
|3 |2023-11-15|1               |2             |0.2  |
|4 |2023-11-15|1               |1             |1    |
+--+----------+----------------+--------------+-----+

The aimed result is a product report with all prices in EUR, using the exchange rate from the 15th of November 2023. This means the price of the second product needs to be converted.

To ease the design, the previously set goal is divided into smaller parts, then conquered. Conceptually, products shall be fetched and their prices converted (if needed).

  1. Fetch the products
  2. Convert the prices in the requested currency, using the exchange rate of the requested day

The former is trivial, a Spring Data Repository method would easily allow getting the products – List<Product> findAll().

The latter is achievable through a query that makes the conversions.

SELECT p.id,
       p.name,
       p.price * e.value price,       
       e.to_currency_id currency_id,
       e.date
FROM product p
LEFT JOIN currency_exchange e on p.currency_id = e.from_currency_id and 
		e.to_currency_id = 2 and
		e.date = '2023-11-15'

In order to unite the two, the following are accomplished:

  • a view is defined, for the above query – product_view

It is defined in the product-view.sql file and added as an idempotent operation in a repeatable Liquibase change set that is run whenever changed.

<?xml version="1.1" encoding="UTF-8" standalone="no"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd">

    <include file="/db/changelog/schema-init.xml"/>

    <changeSet id="repeatable" author="horatiucd" runOnChange="true">
        <sqlFile dbms="postgresql" path="db/changelog/product-view.sql"/>
    </changeSet>

</databaseChangeLog>
  • a new entity – ProductView – is defined as part of the domain, together with the corresponding repository
@Entity
@Immutable
public class ProductView {

    @Id
    private Long id;

    private String name;

    private Double price;

    private LocalDate date;

    @ManyToOne
    @JoinColumn(name = "currency_id")
    private Currency currency;
	
	...
}
@Repository
public interface ProductViewRepository extends org.springframework.data.repository.Repository<ProductView, Long> {

    List<ProductView> findAll();
}

The application is now able to construct the desired report, but only for a hardcoded currency and exchange rate.

In order to pass the two at runtime, the following are performed in the same transaction:

  • the two parameter values are set as configuration parameters – SELECT set_config(:name, :value, true)
  • the ProductView entities are fetched using the repository method

Also, the product_view is modified to read the configuration parameters set as part of the current transaction and select the data accordingly.

SELECT p.id,
       p.name,
       p.price * e.value price,
       e.date,
       e.to_currency_id currency_id
FROM product p
LEFT JOIN currency_exchange e on p.currency_id = e.from_currency_id and
		e.to_currency_id = current_setting('pgsetting.CurrencyId')::int and
		e.date = current_setting('pgsetting.CurrencyDate')::date;

current_setting('pgsetting.CurrencyId') and current_setting('pgsetting.CurrencyDate') calls read the previously set parameters, which are further converted and used.

The implementation needs some additional adjustments.

ProductViewRepository is enhanced with a method that allows setting the configuration parameters.

@Repository
public interface ProductViewRepository extends org.springframework.data.repository.Repository<ProductView, Long> {

    List<ProductView> findAll();

    @Query(value = "SELECT set_config(:name, :value, true)")
    void setConfigParam(String name, String value);
}

The last parameter is always set to true, thus the value is kept only during the current transaction.

Also, a ProductService is defined to clearly mark all operations involved in the transaction.

@Service
public class ProductService {

    private final ProductViewRepository productViewRepository;

    public ProductService(ProductViewRepository productViewRepository) {
        this.productViewRepository = productViewRepository;
    }

    @Transactional
    public List<ProductView> getProducts(Currency currency, LocalDate date) {
        productViewRepository.setConfigParam("pgsetting.CurrencyId",
                String.valueOf(currency.getId()));

        productViewRepository.setConfigParam("pgsetting.CurrencyDate",
                DateTimeFormatter.ofPattern("yyyy-MM-dd").format(date));

        return productViewRepository.findAll();
    }
}

The name of the parameters are the ones used in the product_view definition.

To certify the implementation, two tests are set-up.

@SpringBootTest
class Product1Test {

    @Autowired
    private CurrencyRepository currencyRepository;

    @Autowired
    private ProductRepository productRepository;

    @Autowired
    private CurrencyExchangeRepository rateRepository;

    @Autowired
    private ProductService productService;

    private Currency ron, eur;
    private Product watch, painting;
    private CurrencyExchange eurToRon, ronToEur;
    private LocalDate date;

    @BeforeEach
    public void setup() {
        ron = new Currency(1L, "RON");
        eur = new Currency(2L, "EUR");
        currencyRepository.saveAll(List.of(ron, eur));

        watch = new Product(1L, "Swatch Moonlight v1", 100.0d, eur);
        painting = new Product(2L, "Winter Sky", 1000.0d, ron);
        productRepository.saveAll(List.of(watch, painting));

        date = LocalDate.now();
        eurToRon = new CurrencyExchange(1L, date, eur, ron, 5.0d);
        CurrencyExchange eurToEur = new CurrencyExchange(2L, date, eur, eur, 1.0d);
        ronToEur = new CurrencyExchange(3L, date, ron, eur, .2d);
        CurrencyExchange ronToRon = new CurrencyExchange(4L, date, ron, ron, 1.0d);
        rateRepository.saveAll(List.of(eurToRon, eurToEur, ronToEur, ronToRon));
    }
}

The former fetches the products with prices in EUR, using the recorded exchange rates.

@Test
void prices_in_eur() {
	List<ProductView> products = productService.getProducts(eur, date);
	Assertions.assertEquals(2, products.size());

	Assertions.assertTrue(products.stream()
			.allMatch(product -> product.getCurrency().getId().equals(eur.getId())));

	Assertions.assertTrue(products.stream()
			.allMatch(product -> product.getDate().equals(date)));

	Assertions.assertEquals(watch.getPrice(),
			products.get(0).getPrice());
	Assertions.assertEquals(painting.getPrice() * ronToEur.getValue(),
			products.get(1).getPrice());
}

When called, product_view is:

+--+-------------------+-----+-----------+----------+
|id|name               |price|currency_id|date      |
+--+-------------------+-----+-----------+----------+
|1 |Swatch Moonlight v1|100  |2          |2023-11-15|
|2 |Winter Sky         |200  |2          |2023-11-15|
+--+-------------------+-----+-----------+----------+

The latter fetches the products with prices in RON, using the same exchange rates.

@Test
void prices_in_ron() {
	List<ProductView> products = productService.getProducts(ron, date);
	Assertions.assertEquals(2, products.size());

	Assertions.assertTrue(products.stream()
			.allMatch(product -> product.getCurrency().getId().equals(ron.getId())));

	Assertions.assertTrue(products.stream()
			.allMatch(product -> product.getDate().equals(date)));

	Assertions.assertEquals(watch.getPrice() * eurToRon.getValue(),
			products.get(0).getPrice());
	Assertions.assertEquals(painting.getPrice(),
			products.get(1).getPrice());
}

When called, product_view is:

+--+-------------------+-----+-----------+----------+
|id|name               |price|currency_id|date      |
+--+-------------------+-----+-----------+----------+
|1 |Swatch Moonlight v1|500  |1          |2023-11-15|
|2 |Winter Sky         |1000 |1          |2023-11-15|
+--+-------------------+-----+-----------+----------+

Sample Code

Available here.

Resources

  1. PostgreSQL System Admin Functions
  2. The picture was taken at Legoland, Germany

Idempotent Liquibase Change Sets

by Horatiu Dan

Abstract

“Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” [Resource 3].

The purpose of this article is to outline a few ways of creating idempotent changes when the database modifications are managed with Liquibase. Throughout the life time of a software product that has such tier, various database modifications are being applied as it evolves. The more robust the modifications are, the more maintainable the solution is. In order to accomplish such a way of working, it is usually a good practice to design the executed change sets to have zero side effects, that is to be able to be run as many times as needed with the same end result.

The simple proof of concept built here aims to show case how Liquibase change sets may be written to be idempotent. Moreover, the article explains in more depth what exactly happens when the application starts.

Set-up

  • Java 17
  • Spring Boot v.3.1.0
  • Liquibase 4.20.0
  • PostgreSQL Driver 42.6.0
  • Maven 3.6.3

Proof of Concept

As PostgreSQL is the database used here, first and foremost one shall create a new schema – liquidempo. This operation is easy to accomplish by issuing the following SQL command, once connected to the database.

create schema liquidempo;

At application level:

  • The Maven Spring Boot project is created and configured to use the PostgreSQL Driver, Spring Data JPA and Liquibase dependencies.
  • A simple entity is created – Human – with only one attribute, a unique identifier which is also the primary key at database level.
@Entity
@Table(name = "human")
@SequenceGenerator(sequenceName = "human_seq", name = "CUSTOM_SEQ_GENERATOR", allocationSize = 1)
public class Human {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO, generator = "CUSTOM_SEQ_GENERATOR")
    @Column(name = "id")
    private Long id;

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }
}

For convenience, when entities are stored, their unique identifiers are generated using a database sequence, called human_seq.

  • The data source is configured as usual in the application.properties file
spring.datasource.type=com.zaxxer.hikari.HikariDataSource
spring.datasource.url=jdbc:postgresql://localhost:5432/postgres?currentSchema=liquidempo&useUnicode=true&characterEncoding=utf8&useSSL=false&allowPublicKeyRetrieval=true
spring.datasource.username=postgres
spring.datasource.password=123456

spring.jpa.database-platform=org.hibernate.dialect.PostgreSQLDialect
spring.jpa.hibernate.ddl-auto=none

The previously created schema is referred in the connection URL. DDL handling is disabled, as the infrastructure and the data are intended to be persistent when the application is restarted.

  • As Liquibase is the database migration manager, the changelog path is configured in the application.properties file as well.
spring.liquibase.change-log=classpath:/db/changelog/db.changelog-root.xml

For now, the db.changelog-root.xml file is empty.

The current state of the project requires a few simple change sets, in order to create the database elements depicted around the Human entity – the table, the sequence and the primary key constraint.

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd">

    <changeSet author="horatiucd" id="100">
        <createSequence sequenceName="human_seq" startValue="1" incrementBy="1"/>
    </changeSet>

    <changeSet author="horatiucd" id="200">
        <createTable tableName="human">
            <column name="id" type="BIGINT">
                <constraints nullable="false"/>
            </column>
        </createTable>
    </changeSet>

    <changeSet author="horatiucd" id="300">
        <addPrimaryKey columnNames="id" constraintName="human_pk" tableName="human"/>
    </changeSet>

</databaseChangeLog>

In order for these to be applied, they need to be recorded as part of db.changelog-root.xml file, as indicated below.

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd">

    <include file="db/changelog/human_init.xml"/>

</databaseChangeLog>

When the application is restarted, the three change sets are executed in the order they are declared.

INFO 9092 --- [main] liquibase.database      : Set default schema name to liquidempo
INFO 9092 --- [main] liquibase.lockservice   : Successfully acquired change log lock
INFO 9092 --- [main] liquibase.changelog     : Creating database history table with name: liquidempo.databasechangelog
INFO 9092 --- [main] liquibase.changelog     : Reading from liquidempo.databasechangelog
Running Changeset: db/changelog/human_init.xml::100::horatiucd
INFO 9092 --- [main] liquibase.changelog     : Sequence human_seq created
INFO 9092 --- [main] liquibase.changelog     : ChangeSet db/changelog/human_init.xml::100::horatiucd ran successfully in 6ms
Running Changeset: db/changelog/human_init.xml::200::horatiucd
INFO 9092 --- [main] liquibase.changelog     : Table human created
INFO 9092 --- [main] liquibase.changelog     : ChangeSet db/changelog/human_init.xml::200::horatiucd ran successfully in 4ms
Running Changeset: db/changelog/human_init.xml::300::horatiucd
INFO 9092 --- [main] liquibase.changelog     : Primary key added to human (id)
INFO 9092 --- [main] liquibase.changelog     : ChangeSet db/changelog/human_init.xml::300::horatiucd ran successfully in 8ms
INFO 9092 --- [main] liquibase               : Update command completed successfully.
INFO 9092 --- [main] liquibase.lockservice   : Successfully released change log lock

Moreover, they are recorded as separate rows in the databasechangelog database table.

+---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+
|id |author   |filename                   |dateexecuted              |orderexecuted|exectype|md5sum                            |description                                           |
+---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+
|100|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.184239|1            |EXECUTED|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq                 |
|200|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.193031|2            |EXECUTED|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human                           |
|300|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.204184|3            |EXECUTED|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human|
+---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+

So far, everything is straight forward, nothing out of the ordinary – a simple Spring Boot application whose database changes are managed with Liquibase.

When examining the above human_init.xml file, one can easily depict the three scripts that result from the three changesets. None is idempotent. It means that if they are executed again (although there is no reason for doing it here) errors will occur because the human_seq sequence, the human table and the human_pk primary key already exist.

Idempotent Change Sets

If the SQL code that results from the XML change sets had been written directly and aimed to be idempotent, it would have read as follows:

CREATE SEQUENCE IF NOT EXISTS human_seq INCREMENT 1 MINVALUE 1 MAXVALUE 99999999999;

CREATE TABLE IF NOT EXISTS human (
	id SERIAL CONSTRAINT human_pk PRIMARY KEY
);

If the two commands are executed several times, no errors occur and the outcome remains the same. After the first run, the sequence, the table and the constraint are created, then every new execution leaves them in the same usable state.

The aim is to accomplish the same in the written Liquibase change sets (change log).

According to the Liquibase documentation [Resource 1] – “Preconditions are tags you add to your changelog or individual changesets to control the execution of an update based on the state of the database. Preconditions let you specify security and standardization requirements for your changesets. If a precondition on a changeset fails, Liquibase does not deploy that changeset.”

These constructs may be configured in various ways, either at change log or change set level. For simplicity, the three change sets of this proof of concept will be made idempotent.

Basically, whenever a change set fails to execute because the entity (sequence, table or primary key) already exists, it would be convenient to continue and not halt the execution of the entire change log and not be able to start the application.

In this direction, Liquibase preconditions provides at least two options:

  • either skip over the changeset and continue with the change log, or
  • skip over the change set but mark it as executed and continue with the change log.

Either of the two can be configured by adding a preConditions tag in the change set of interest and setting the onFail attribute as CONTINUE (the former case) or MARK_RAN (the latter case).

In pseudo-code, this looks as below:

<changeSet author="horatiucd" id="100">
	<preConditions onFail="CONTINUE or MARK_RAN">
		...
	</preConditions>
	...
</changeSet>

This seems in line to the initial desire – execute the change set only if the preconditions are met. Next, each of the two situations is analyzed.

onFail=”CONTINUE”

The change log file – human_init_idempo_continue.xml – becomes as below:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd">

    <changeSet author="horatiucd" id="101">
        <preConditions onFail="CONTINUE">
            <not>
                <sequenceExists sequenceName="human_seq"/>
            </not>
        </preConditions>
        <createSequence sequenceName="human_seq" startValue="1" incrementBy="1"/>
    </changeSet>

    <changeSet author="horatiucd" id="201">
        <preConditions onFail="CONTINUE">
            <not>
                <tableExists tableName="human"/>
            </not>
        </preConditions>
        <createTable tableName="human">
            <column name="id" type="BIGINT">
                <constraints nullable="false"/>
            </column>
        </createTable>
    </changeSet>

    <changeSet author="horatiucd" id="301">
        <preConditions onFail="CONTINUE">
            <not>
                <primaryKeyExists primaryKeyName="human_pk" tableName="human"/>
            </not>
        </preConditions>
        <addPrimaryKey columnNames="id" constraintName="human_pk" tableName="human"/>
    </changeSet>

</databaseChangeLog>

For each item, the precondition checks if it does not exist.

When running the application, the log shows what is executed:

INFO 49016 --- [main] liquibase.database     : Set default schema name to liquidempo
INFO 49016 --- [main] liquibase.changelog    : Reading from liquidempo.databasechangelog
INFO 49016 --- [main] liquibase.lockservice  : Successfully acquired change log lock
Running Changeset: db/changelog/human_init_idempo_continue.xml::101::horatiucd
INFO 49016 --- [main] liquibase.changelog    : Continuing past: db/changelog/human_init_idempo_continue.xml::101::horatiucd despite precondition failure due to onFail='CONTINUE': 
          db/changelog/db.changelog-root.xml : Not precondition failed
Running Changeset: db/changelog/human_init_idempo_continue.xml::201::horatiucd
INFO 49016 --- [main] liquibase.changelog    : Continuing past: db/changelog/human_init_idempo_continue.xml::201::horatiucd despite precondition failure due to onFail='CONTINUE': 
          db/changelog/db.changelog-root.xml : Not precondition failed
Running Changeset: db/changelog/human_init_idempo_continue.xml::301::horatiucd
INFO 49016 --- [main] liquibase.changelog    : Continuing past: db/changelog/human_init_idempo_continue.xml::301::horatiucd despite precondition failure due to onFail='CONTINUE': 
          db/changelog/db.changelog-root.xml : Not precondition failed
INFO 49016 --- [main] liquibase              : Update command completed successfully.
INFO 49016 --- [main] liquibase.lockservice  : Successfully released change log lock

As expected, all three preconditions failed and the execution of the change log continued.

The databasechangelog database table does not have any records in addition to the previous three, which means the change sets will be attempted to be executed again at the next start-up of the application.

onFail=”MARK_RAN”

The change log file – human_init_idempo_mark_ran.xml – is similar to the one in human_init_idempo_continue.xml, the only difference is the onFail attribute, which is set as onFail="MARK_RAN".

The db.changelog-root.xml root change log now looks as below:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd">

    <include file="db/changelog/human_init.xml"/>
    <include file="db/changelog/human_init_idempo_continue.xml"/>
    <include file="db/changelog/human_init_idempo_mark_ran.xml"/>
    
</databaseChangeLog>

For this proof of concept, all three files were kept on purpose, in order to be able to observe the behavior in detail.

If the application is restarted, no errors are encountered and the log depicts the following:

INFO 38788 --- [main] liquibase.database      : Set default schema name to liquidempo
INFO 38788 --- [main] liquibase.changelog     : Reading from liquidempo.databasechangelog
INFO 38788 --- [main] liquibase.lockservice   : Successfully acquired change log lock
INFO 38788 --- [main] liquibase.changelog     : Reading from liquidempo.databasechangelog
Running Changeset: db/changelog/human_init_idempo_continue.xml::101::horatiucd
INFO 38788 --- [main] liquibase.changelog     : Continuing past: db/changelog/human_init_idempo_continue.xml::101::horatiucd despite precondition failure due to onFail='CONTINUE': 
          db/changelog/db.changelog-root.xml : Not precondition failed
Running Changeset: db/changelog/human_init_idempo_continue.xml::201::horatiucd
INFO 38788 --- [main] liquibase.changelog     : Continuing past: db/changelog/human_init_idempo_continue.xml::201::horatiucd despite precondition failure due to onFail='CONTINUE': 
          db/changelog/db.changelog-root.xml : Not precondition failed
Running Changeset: db/changelog/human_init_idempo_continue.xml::301::horatiucd
INFO 38788 --- [main] liquibase.changelog     : Continuing past: db/changelog/human_init_idempo_continue.xml::301::horatiucd despite precondition failure due to onFail='CONTINUE': 
          db/changelog/db.changelog-root.xml : Not precondition failed
Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::101::horatiucd
INFO 38788 --- [main] liquibase.changelog     : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::101::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': 
          db/changelog/db.changelog-root.xml : Not precondition failed
Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::201::horatiucd
INFO 38788 --- [main] liquibase.changelog     : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::201::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': 
          db/changelog/db.changelog-root.xml : Not precondition failed
Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::301::horatiucd
INFO 38788 --- [main] liquibase.changelog     : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::301::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': 
          db/changelog/db.changelog-root.xml : Not precondition failed
INFO 38788 --- [main] liquibase               : Update command completed successfully.
INFO 38788 --- [main] liquibase.lockservice   : Successfully released change log lock

The change sets with onFail="CONTINUE" were tried to be re-executed, as this is a new attempt, while the ones with onFail="MARK_RAN" were marked in the databasechangelog and will be passed over at the next star-up.

+---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+
|id |author   |filename                                   |dateexecuted              |orderexecuted|exectype|md5sum                            |description                                           |
+---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+
|100|horatiucd|db/changelog/human_init.xml                |2023-05-26 16:23:17.184239|1            |EXECUTED|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq                 |
|200|horatiucd|db/changelog/human_init.xml                |2023-05-26 16:23:17.193031|2            |EXECUTED|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human                           |
|300|horatiucd|db/changelog/human_init.xml                |2023-05-26 16:23:17.204184|3            |EXECUTED|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human|
|101|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.453305|4            |MARK_RAN|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq                 |
|201|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.463021|5            |MARK_RAN|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human                           |
|301|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.475153|6            |MARK_RAN|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human|
+---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+

At the next run of the application, the log will be similar to the one where the onFail was set on "CONTINUE".

One more observation is worth making at this point. In case a change set whose preconditions do not fail, they are executed normally and recorded with exectype = EXECUTED in the databasechangelog table.

Conclusions

This article presented two ways of writing idempotent Liquibase change sets, practice that allows having more robust and easy to maintain applications. This was accomplished by leveraging the change set preConditions tag inside the change log files. While both onFail attribute values – CONTINUE and MARK_RAN – may be used depending on the actual performed operation, the latter seems more appropriate for this proof of concept as it does not attempt to re-run the change sets at every start-up of the application.

Resources

  1. Liquibase Documentation
  2. Source code for the sample application
  3. Idempotence