PostgreSQL and typeorm - Caching

Photo by Markus Spiske on Unsplash

With most web applications you can drastically increase performance by using caching for data that's frequently read across network boundaries. This lesson will explore some common caching techniques, you'll learn how some common tools and libraries provide caching for us.

While caching helps with performance it can also cause some surprises and bugs in applications and i'll discuss some of those too.

Database course index

This is part of a full course on persistence in postgres with typeorm and sql!

There is a github repo to go with this course. See part 2 for instructions.

You can see examples of typeorm in a real app on Use Miller on GitHub

Caching benefits

If you look at data flow in an app you can see how latency is reduced when a cache is used. The example below shows how a read might be cached.

"Caching overview"

Caching is used outside of networked applications also. Computer processors make heavy use of caching for example. Its orders of magnitude faster to read from RAM Vs a hard drive, same improvement reading from a processor cache Vs computer RAM.

Caching will usually make the cost of your application infrastructure cheaper. You can keep most data on a cheap storage system and only put current data on a faster storage system.

Caching helps to smooth out unpredictable access patterns. The effect of even very short caching can be significant.

If you have something that is read heavy but changes often and you cache it for just one second, you can guarantee that any downstream infrastructure can only get 60 requests/minute. Any hardware these days could easily handle 60rpm. Most people that use the system wont notice a delay less than 1 second for new data.

Cache smoothing

In the diagram above there are 500k r/pm to some infrastructure without a cache. This would be pretty tough for a decent RDBMS to handle but Redis could handle that cheaply. In the second diagram you can see the effect of putting a short cache in front of the RDBMS - RDBMS requests are flat and predictable.

Common caching scenarios

Here are some tools you probably use every day that use caching

DNS

The domain name system that runs the internet is based on local caches. Records propagate around the internet with Time to Live set. Your browser first asks the cache in your computer how to find google.com, if that record is stale then the browser goes to your ISP for the DNS record. If your ISP doesn't have a record then the browser goes to one of the root nameservers for the internet.

Having a local cache for common DNS records like google.com results in much faster browser experience.

React Query

The react query library is an excellent data retrieval library for client applications. React query has built in read-through caching. Each request your application makes is stored in a local cache. React query will reuse the data in the local cache to quickly display data to the user. Then it will refresh the data in the background to reduce the appearance of spinners.

Apps with data stores

Most applications store data somewhere, often a database. Most applications are read heavy so we can increase performance and reduce the cost of the data store by utilising a cache between the application and the data store.

Common caching patterns

There are a bunch of very common caching patterns. These three are the main ones I see when building web applications.

Cache aside

Cache aside is very common. The application is aware of the cache and acts as the coordinator.

Cache aside caching

Assuming a cache miss scenario:

  1. The application checks the cache for data.
  2. Cache returns empty response
  3. Application gets the data from the database
  4. Application stores the data for the next cache retrieval

There are issues with cache aside caching. The application has to be aware of the cache and it has to be aware of the data store. The developer has to be careful to keep the cache and data store in sync.

Read through caching and Write behind caching

In read through and write behind caching the application only interacts with the cache. The cache layer then communicates with the database layer directly. The application is never aware of the database layer.

Read through write behind

Assuming a cache read miss:

  1. The application queries the cache
  2. The cache has no data so it queries the datastore

The cache stores the result and returns it to the client application

Assuming a cache write

  1. The application writes data to the cache
  2. The cache writes the data to the database

These two patterns are not always used together. e.g. you might always read through the cache but write directly to the database. It's just easier to talk about them in one section because the principal is so similar.

Slonik Postgres client can provide an example of read through caching. It allows you to configure interceptors. The interceptor are middleware that run on your query pipeline.

You can then apply hints like @cache-ttl 10 to the interceptors in your sql query. See the documentation at https://github.com/stockholmux/slonik-redis-cache

const
import { sql, createPool } from 'slonik'
import redis from 'redis'
import redisCache from 'slonik-redis-cache'

const client = redis.createClient({
  /* connection options */
})

const pool = createPool('postgres://localhost:5432/applicationDb', {
  interceptors: [redisCache(client)],
})

pool.connect(async (connection) => {
  let results = await connection.query(
    sql`SELECT * FROM aTable -- @cache-ttl 10`
  )
  console.log(results)
})

An example of write through pattern can be described using Redis Enterprise with Redis Gears (https://docs.redis.com/latest/modules/redisgears/).

Redis Gears allows you to set up Redis to run Python and Java with hooks into the Redis runtime. The hook for Write Through Caching will propagate Redis writes to your datastore.

Some GraphQL implementations provide write-behind and read-through caching. In particular Apollo provides these caching strategies. Read more about how they work on the Apollo site.

Cache Invalidation

If your datastore has data that isn't in the cache yet then you have a stale cache. Whether this is an issue is highly dependent on the data and your application. At some stage you will want to invalidate this data so that the cached record gets replaced with the new record.

There are excellent articles online describing how to solve this issue so i'll just give a brief overview here. Check out https://redis.com/blog/three-ways-to-maintain-cache-consistency/ for a deep dive into cache consistency and cache invalidation.

The following are the common methods used to invalidate cached data. One important thing to note is that if you use write-through then your cache should always be up to date.

Time to live

Time to live invalidation is extremely common. You ask the cache to automatically expire an item after a set time period, meaning your data layer will have to go to the source to refresh the item after some set time period.

DNS is a great example of using TTL in each layer. E.g. if you lookup Google.com it is cached for the TTL in your computer, then the TTL in your ISP, then the TTL in the root nameservers. This is a great way to reduce the number of requests to the source of truth.

Once the TTL is reached your DNS client, typically your browser, will go to the next layer in the chain to get the latest record.

Cache on write

For data with a high read rate and that changes infrequently, you can simply update the cache each time you write to your datastore. This invalidation strategy becomes expensive as write frequency increases.

Cache Eviction

Least recently used is an eviction method. This focuses on freeing resources, rather than keeping data fresh, by deleting from the cache data that isn’t getting read very often.

There are other common eviction mechanisms

  • Most Recently Used
  • Least Recently User
  • First in, First out
  • Last in, first out

Their usage will be highly dependent on your application.

Issues to watch our for with caching

Caching is notorious in software development for causing difficult to debug issues. For example if you're debugging an issue and the data doesn't make sense, check for caching somewhere. Caching is a form of state in these scenarios.

These are some of the things I look for when caching is present in a system.

Inappropriate load for caching

If your database is not under significant load adding a database can have a negative effect because the latency between app -> Cache -> DB can be more than a more traditional app -> DB not under load

Caching too eagerly

If you cache many things that are not read very often then you will fill up your cache. This will make it more expensive to run and having so many records will slow down the cache.

Incorrect configuration

If you cache data for too long or you don't clear the cached data at the right time then you might return stale data to a customer.

Conclusion

You'll come across caching on most apps. Tweaking the caching strategies and invalidation strategies that are used will help you to improve user experience and cost of your infrastructure.

Caching introduces complexity that has to be managed carefully.

Database course index

This is part of a full course on persistence in postgres with typeorm and sql!

There is a github repo to go with this course. See part 2 for instructions.