PostgreSQL and typeorm - Practical transactions

Photo by Taylor Vick on Unsplash

Lesson goal

To understand what a transaction does for us and how to choose when one is required in our code.

I will also show how to add transactions using typeorm.

Database course index

This is part of a full course on persistence in postgres with typeorm and sql!

There is a github repo to go with this course. See part 2 for instructions.

You can see examples of typeorm in a real app on Use Miller on GitHub

The problem with multi-user apps

We need databases to be multi-user. That is, we need to allow more than one customer to read from it and write to it at one time. Otherwise they wouldn't be very suitable for web applications!

This leads us to having to deal with "concurrent" access.

1. Concurrency control in databases

When multiple users are accessing and modifying the same data in a database at the same time, there is the potential for conflicts to arise.

For example, In a banking application we might have an automatic bill payment being paid AND the account owner withdrawing money from the ATM at the same time. Both processes would read the current balance to check if there are enough funds before proceeding.

Concurrency issue banking

The bill payment would write the new balance before the ATM transaction did because it's a computerised transaction while the customer is taking time trying to remember the amount of cash that they really need for lunch.

So the ATM transaction would later attempt to write the old balance - the ATM withdrawal amount. Now we have inconsistent data.

Note: This is a simplified example. In reality, the ATM transaction would be rejected because the software would use some sort of atomic update mechanism and it would detect that the balance had changed. But the point is that the data is inconsistent at the point of attempting the persistence of the ATM balance transaction.

This situation is acute in most programs using ORMs because we often select a bunch of data out to our program, make a decision based on the data and then write back modified data using the domain models.

const myStateRecord = repo.find({ id: 1 });
if ((myStateRecord.state = TIME_TO_DO_SOMETHING)) {
  myStateRecord.state = DOING_THE_THING;
  repo.save(myStateRecord);
  //fire off an event to do the thing
}

Transactions are what we use to ensure that the steps in our program logic are all executed as one "unit". If any part of the execution fails we can revert all the changes and do something to fix the issue.

A transaction is a sequence of database operations that are treated as a single unit. This means that either all of the operations in the transaction are completed successfully, or none of them are.

In order to use transactions we "lock" the data while we're using it. A "lock" means that other users can't access the data until we're done with it.

Unfortunately this leads to another issue - deadlocks.

A deadlock is a situation in which two or more transactions are waiting for each other to release locks they have acquired on data in the database.

This can lead to a situation in which none of the transactions can make progress, and the system can become "deadlocked" until the deadlock is resolved.

Most database systems kill a transaction that has been waiting for a lock for too long. This is called a "timeout".

const transaction = repo.createTransaction();
transaction.start();
const myStateRecord = repo.find({ id: 1 });
if ((myStateRecord.state = TIME_TO_DO_SOMETHING)) {
  myStateRecord.state = DOING_THE_THING;
  repo.save(myStateRecord);
  transaction.commitTransaction();
  //fire off an event to do the thing
}

Glossary: Isolation levels

We can control what the database does when there are updates to data that affect other transactions that are in progress.

We can have one transaction pay close attention to how another transaction has modified data it is also using, or we can have a transaction completely ignore updates made by another transaction.

Another option would be to just error if any any data used by a transaction was modified by another transaction.

These are referred to as "isolation levels".

It's important to note that different databases on the same server can have different default isolation levels set. You need to confirm the default isolation level before making any assumptions about it.

It's also important to note that different database clients - e.g. typeorm - can have a different default isolation level to the database! For example .NET data access libraries will often use serializable as the default isolation level even though SQL Server's default is READ_COMMITTED.

In general it's better to always specify the isolation level in your code so you're protected from changes later.

isolation leveldescription
READ UNCOMMITTEDIn postgres this is the same as read committed
READ COMMITTEDA concurrent transaction can modify data you have previously read, your transaction will only see the updated values if the other transaction commits. This is the default isolation level in Postgres, oracle and ms sql server.
REPEATABLE READA concurrent transaction can modify data you have previously read, the transaction will fail if the transactions could have resulted in different states if committed in a different order
SERIALIZABLEIf a concurrent transaction modifies data your transaction is trying to commit, your transaction will fail. Front end devs: This is the default isolation level in SQLite

There's two important things to note here. The default isolation level in postgres is READ COMMITTED. This means that, your transaction won't read other transaction's changes until they are committed. Once they are committed they can be read. This means other transactions can modify data while your transaction is running


-- the select * is just illustrative. don't do this.

select * from my_table where ...
--do a few other commands
update my_table...
-- right here another transaction commits and modifies the data you were working with
select * from my_table -- this will contain the new data unless you changed it!

You can hint to postgres that you will be modifying the rows with FOR UPDATE.

select * from my_table FOR UPDATE where ...
--do a few other commands
-- right here another transaction commits - IT WILL ERROR because you have locked those rows

Note that you could consider using a serializable isolation level which would "lock" all the rows anyway, if using serializable isolation level you don't need the FOR UPDATE, but you will always lock things. Just be aware of the concurrency technique you need for a given scenario.

Using transactions in typeorm

There are multiple ways to use transactions with typeorm but if you use the queryRunner method you can control the transaction rollback yourself.

const connection = await AppDataSource.connection();
const queryRunner = connection.createQueryRunner();
const petOwner = new PetOwner();
petOwner.name = "tx_owner";

const pet1 = new Pet();
pet1.name = "tx pet 1";
pet1.owner = petOwner;
pet1.type = "cat";

const pet2 = new Pet();
pet2.name = "tx pet 2";
pet2.owner = petOwner;
pet2.type = "dog";

await queryRunner.connect();
await queryRunner.startTransaction();
try {
  async (transactionalEntityManager: EntityManager) => {
    await transactionalEntityManager.save(petOwner);
    await transactionalEntityManager.save(pet1);
    await transactionalEntityManager.save(pet2);

    await transactionalEntityManager.commitTransaction();
  };
} catch {
  // since we have errors lets rollback the changes we made
  await queryRunner.rollbackTransaction();
} finally {
  // you need to release a queryRunner which was manually instantiated
  await queryRunner.release();
}

Summary

If you're selecting some data to make a decision, and then writing back new information based on that decision, you likely need a transaction.

If you're only selecting data you almost certainly don't need a transaction.

You should understand what READ_COMMITTED does for you in Postrgres (https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED).

Lesson index