Distributed Locks and How to Implement Them with Redis

This article is also available in Tiếng Việt.

Hi everyone 👋, I'm Hung Anh.

In distributed systems, ensuring data consistency and preventing race conditions is a major challenge, especially when many processes or services access shared resources concurrently. One of the key solutions to this problem is the use of a distributed lock.

In this article, I will help you understand what a distributed lock is, why it is needed, the different ways to implement one, and how to build it with Redis.

NOTE

This is part of my notes on distributed systems. If you only remember one thing: a lock is a safety mechanism, not a performance feature — reach for it only when correctness depends on it.

Let's get started.

1. The Problem

Imagine a bank payment processing system where multiple processes need to update an account balance whenever a transaction occurs. Here is the scenario I'll use:

User A has $1000 in their bank account.
A makes two requests at the same time:
- Request 1: Withdraw $200.
- Request 2: Transfer $300 to B.

From the sequence diagram, you can see that:

Both requests read A's account balance at the same time:
- Request 1 reads the balance: $1000.
- Request 2 reads the balance: $1000.
Request 1 withdraws money:
- Calculates the new balance: 1000 - 200 = 800.
- Writes the account balance of $800 back to the system.
Request 2 transfers money (processed concurrently with request 1):
- Calculates the new balance: 1000 - 300 = 700.
- Writes the account balance of $700 back to the system ⇒ overwriting the result of request 1.

In reality, the total amount A withdrew and transferred is 200 + 300 = $500, but the system records a balance of$ 700.

Let's look at the sample code below to make this easier to picture.

import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.CompletableFuture;
import lombok.SneakyThrows;

class DistributedLock {

   static String USER_ID = "A";

   static Map<String, Integer> USERS = new HashMap<>() {{
       put(USER_ID, 1000);
   }};

   public static void main(String[] args) {
       //
       CompletableFuture.allOf(
           CompletableFuture.runAsync(DistributedLock::fundOut),
           CompletableFuture.runAsync(DistributedLock::fundTransfer)
       ).join();

       // {A=700}
       System.out.println(USERS);
   }

   static void fundOut() {
       Integer balance = getBalance(USER_ID);
       balance  = balance - 200;
       updateBalance(USER_ID, balance);
   }

   static void fundTransfer() {
       Integer balance = getBalance(USER_ID);
       balance  = balance - 300;
       updateBalance(USER_ID, balance);
   }

   @SneakyThrows
   static Integer getBalance(String userId) {
       Integer balance = USERS.get(userId);
       Thread.sleep(1000L);
       return balance;
   }

   static void updateBalance(String userId, Integer balance) {
       USERS.put(userId, balance);
   }
}

With the current approach, the system above suffers from two problems:

Race condition: Both requests access and operate on the shared resource (the account balance) without any coordination.
Data inconsistency: The system does not record the correct balance after the transactions complete.

The solution to this problem is to process the two requests sequentially, ensuring that only one process is allowed to access the resource at any given time. This is precisely the purpose of a distributed lock.

2. Distributed Lock

2.1. What is a Distributed Lock?

A distributed lock is a mechanism for controlling access to shared resources in a distributed system. Unlike traditional locking mechanisms that only work within a single instance, a distributed lock ensures that at any given moment only one process can access a resource, regardless of whether those processes are running on one or many different servers. This mechanism helps prevent conflicts, protect data integrity, and maintain consistency in complex distributed environments.

To work effectively, a distributed lock must guarantee the following three properties:

Safety: At any point in time, only one process is allowed to lock a resource. This ensures that no two processes can access the same resource simultaneously, avoiding conflicts or data corruption.
Liveness: The lock must be released as soon as the process holding it finishes its work or fails. This prevents a resource from staying locked for too long.
Availability: Other processes in the system must be able to acquire the lock after it has been released.

2.2. Why are Distributed Locks Important?

In distributed systems, many processes may run on one or more different nodes and access a shared resource at the same time. This can easily lead to issues such as race conditions, data inconsistency, or other unexpected behavior. A distributed lock is the solution that prevents these situations, keeping the system stable and reliable.

Some common use cases for distributed locks include:

Job Scheduling: Distributed systems often include many cron jobs, and these cron jobs should not be executed on multiple instances of a service.
Preventing concurrent database updates: When multiple processes update the same database record at the same time, a distributed lock ensures that only one write operation happens at a time.
Preventing duplicate requests: For various reasons such as programming bugs or network errors, a request may be sent more than once. If two identical requests reach the system, a distributed lock helps reject one and process only the remaining request.

2.3. Approaches to Implementing Distributed Locks

There are many technologies and solutions used to implement distributed locks, each with its own advantages and disadvantages. Below are some of the most common ones:

Database-based Locking: A common approach is to use a database to store lock information. For example, you can create a table to mark whether a particular resource is currently locked. This is a simple solution that integrates easily into an existing system, but it can suffer from reduced performance when there are too many concurrent operations.
Zookeeper-based Locking: Apache Zookeeper is a popular tool in distributed systems that ensures the servers (nodes) in a distributed system can work together in a synchronized and consistent manner. Zookeeper supports distributed locks through the use of ephemeral nodes. While Zookeeper provides strong, reliable consistency, setting it up and managing it can be complex and labor-intensive.
Redis-based Locking: Redis is a fast and lightweight in-memory data store that is very popular for implementing distributed locks. Redis provides a locking mechanism that ensures only one process can set a value for a given key if that key does not already exist. Redis is well suited for systems that demand high speed, low latency, and easy configuration. However, you need to carefully handle edge cases such as deadlocks or expired locks.

Among these solutions, Redis is the most popular choice thanks to its simplicity, speed, and scalability.

3. Implementing a Distributed Lock with Redis

Let's return to our original problem and look at how to implement a distributed lock with Redis through the diagram below.

Now, the processing flow for the two requests proceeds as follows:

Requests 1 and 2 both request to create a lock in Redis to lock A's user_id:
- Request 1 creates the lock first ⇒ Request 1 is processed (step 1.1).
- Request 2 must wait until request 1 releases the lock or until the lock expires (Request 2 repeats step 1 to request the lock again).
Request 1 finishes processing:
- Calculates the new balance: 1000 - 200 = $800.
- Writes the account balance of $800 back to the system.
- Releases the lock in Redis.
Request 2 acquires the lock and processes its transaction:
- Calculates the new balance: 800 - 300 = $500
- Writes the account balance of $500 back to the system
- Releases the lock in Redis.

We can generalize the request processing flow combined with a distributed lock mechanism through the code below.

func update() {
   try {
       // Try lock "key" with TTL = X seconds in Y seconds
       if(tryLock(key, value, X, Y)) {
           // Handle get, calculate and update resource
           // Return
       }
   } finally {
       unLock(key, value)
   }
   //
   throw Exception("Try lock timeout")
}

In the code above, we have two functions, tryLock() and unlock(), to implement the distributed lock mechanism. Before running the main logic, tryLock has to request the creation of a lock in Redis with a lock TTL of X seconds. If the lock cannot be created, tryLock keeps retrying to create it for up to Y seconds; if it succeeds, the subsequent logic is executed, otherwise it stops and ends processing.

There are three common approaches to implementing a distributed lock with Redis: using the SET command with the NX parameter, using a Lua Script, and using the Redisson library. Each approach has its own advantages and disadvantages, depending on your system's specific requirements and how familiar you are with each option. Understanding these will help you make an informed decision that best fits your project's needs.

3.1. Using the SET Command with the NX Parameter

The simplest way to implement a distributed lock with Redis is to use the SET command with the NX parameter. This command only sets a key to a given value if that key does not already exist. This way, we set a unique key in Redis representing the locked resource. If the key is set successfully, the lock is acquired. Otherwise, the lock has already been acquired by another process, and the current process must wait for the lock to be released.

The Redis commands I'll use are:

Lock: SET lock_key value NX PX ttl
- ttl: The expiration time of the lock. The lock will be released after ttl milliseconds.
- If the SET succeeds, the command returns OK; otherwise it returns (nil).
Unlock: DEL (UNLINK) lock_key: Deletes the lock, ignoring it if the lock does not exist.
Get: GET lock_key: Returns the value of lock_key, or (nil) if it does not exist.

To implement this approach, we'll use the jedis library. It is a simple, lightweight Redis client with strong community support and is recommended by the official redis site, and it is also officially supported by Spring Data Redis.

import lombok.SneakyThrows;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
import redis.clients.jedis.params.SetParams;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;
import java.util.concurrent.CompletableFuture;

class DistributedLock {

   static String USER_ID = "A";

   static Map<String, Integer> USERS = new HashMap<>() {{
       put(USER_ID, 1000);
   }};

   static JedisPool jedisPool = new JedisPool(new JedisPoolConfig(), "localhost", 6379);

   public static void main(String[] args) {
       //
       CompletableFuture.allOf(
           CompletableFuture.runAsync(() -> lock(DistributedLock::fundOut, UUID.randomUUID().toString())),
           CompletableFuture.runAsync(() -> lock(DistributedLock::fundTransfer, UUID.randomUUID().toString()))
       ).join();

       // {A=500}
       System.out.println(USERS);
   }

   static void fundOut() {
       Integer balance = getBalance(USER_ID);
       balance = balance - 200;
       updateBalance(USER_ID, balance);
   }

   static void fundTransfer() {
       Integer balance = getBalance(USER_ID);
       balance = balance - 300;
       updateBalance(USER_ID, balance);
   }

   @SneakyThrows
   static Integer getBalance(String userId) {
       Integer balance = USERS.get(userId);
       Thread.sleep(1000L);
       return balance;
   }

   static void updateBalance(String userId, Integer balance) {
       USERS.put(userId, balance);
   }

   static void lock(Runnable runnable, String requestId) {
       try {
           if (!tryLock(USER_ID, requestId, 3000, 4000)) {
               return;
           }

           //
           runnable.run();
       } finally {
           unlock(USER_ID, requestId);
       }
   }

   static boolean tryLock(String lockKey, String identifier, int lockExpire, int tryLockTimeOut) {
       try (Jedis jedis = jedisPool.getResource()) {
           //
           long start = System.currentTimeMillis();

           //
           while (true) {
               //
               String lockRes = jedis.set(lockKey, identifier, new SetParams().nx().px(lockExpire));

               //
               if (lockRes != null) {
                   return true;
               }

               //
               if (System.currentTimeMillis() - start > tryLockTimeOut) {
                   throw new RuntimeException("Try lock timeout");
               }
           }
       }
   }

   static void unlock(String lockKey, String identifier) {
       try (Jedis jedis = jedisPool.getResource()) {
           if (identifier.equals(jedis.get(lockKey))) {
               jedis.del(lockKey);
           }
       }
   }
}

Comparing this with the original code in the The Problem section, I've added the following:

An additional lock() method to apply the distributed lock to the two functions fundOut() and fundTransfer(). The lock() method wraps the logic of the transfer and withdrawal functions. Before these two functions run their main logic, they must request a lock in Redis with a lock TTL of 3s. Whichever function acquires the lock first runs first. The other one keeps retrying to acquire the lock for up to 4s; if it acquires the lock, the main logic is executed, otherwise the main logic is skipped.
In the tryLock() method, I use a while loop to create the lock by repeatedly calling the SET command with the NX parameter. If the lock is acquired within the allowed wait time, it returns true; otherwise it throws an error.
In the unlock() method, I call the DEL command to delete the lock after the update logic completes. If an error occurs while executing the logic, the lock is also deleted from Redis to avoid leaving the resource locked. In addition, when calling tryLock, I store an identifier value, which is the requestId generated for the two transfer and withdrawal requests. On unlock, I use the GET command to retrieve the identifier value and compare it with the requestId stored in the lock. If the requestId matches the identifier value, the lock is released; otherwise it is skipped. This is absolutely necessary to avoid accidentally releasing a lock held by another thread.

After implementing this and rerunning the code, the result is now recorded as {A=500} instead of {A=700} as before. This proves that after applying the distributed lock mechanism, user A's balance is updated correctly.

Using the SET command with the NX parameter together with DEL is simple, effective, and fast. However, you'll need to understand the underlying mechanics and write the code yourself to implement and maintain it over time.

3.2. Using a Lua Script

We can use a Lua Script to implement a distributed lock with Redis. This approach is similar to the previous one, except that we group the commands to be executed into a script that runs on the Redis server. In addition, a Lua script guarantees atomicity, allowing us to execute multiple commands as a single command, either all succeeding or none of them succeeding.

I'll modify tryLock() and unlock() to run a Lua script. The result after rerunning the code is still {A=500} as expected.

static boolean tryLock(String lockKey, String identifier, int lockExpire, int tryLockTimeOut) {
   try (Jedis jedis = jedisPool.getResource()) {
       // Lua script to acquire the lock
       String luaScript =
           "if redis.call('set', KEYS[1], ARGV[1], 'NX', 'PX', ARGV[2]) then " +
           "return 1; " +
           "else " +
           "return 0; " +
           "end";

       long start = System.currentTimeMillis();

       // Try acquiring the lock for a certain period
       while (System.currentTimeMillis() - start < tryLockTimeOut) {
           Object result = jedis.eval(luaScript, 1, lockKey, identifier, String.valueOf(lockExpire));

           // If the lock is acquired
           if ("1".equals(result.toString())) {
               return true;
           }
       }

       // If lock acquisition failed
       return false;
   }
}

static void unlock(String lockKey, String identifier) {
   try (Jedis jedis = jedisPool.getResource()) {
       // Lua script for releasing the lock
       String luaScript = "if redis.call('get', KEYS[1]) == ARGV[1] then " +
           "return redis.call('del', KEYS[1]); " +
           "else " +
           "return 0; " +
           "end";

       jedis.eval(luaScript, 1, lockKey, identifier);
   }
}

While this approach is more complex than the previous one and requires some knowledge of Lua scripting, in return it guarantees atomicity when running a group of commands. It also reduces network latency, since you only need to send the command to the Redis server once.

3.3. Using the Redisson Library

Redisson is a powerful Java library designed for working with Redis. It provides simple APIs and abstracts away the complex logic underneath, allowing developers to use it easily without worrying too much about the implementation details. One of the standout features Redisson offers is its distributed locking mechanism, which helps synchronize tasks in distributed environments, ensures data integrity, and improves performance for large-scale systems.

Below is sample code that uses the Redisson library to implement a distributed lock.

import lombok.SneakyThrows;
import org.redisson.Redisson;
import org.redisson.api.RLock;
import org.redisson.api.RedissonClient;
import org.redisson.config.Config;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;

class DistributedLock {
   // ...

   static RedissonClient redissonClient;

   static {
       Config config = new Config();
       config.useSingleServer().setAddress("redis://localhost:6379");
       redissonClient = Redisson.create(config);
   }

   public static void main(String[] args) {
       //
       CompletableFuture.allOf(
           CompletableFuture.runAsync(() -> lock(DistributedLock::fundOut)),
           CompletableFuture.runAsync(() -> lock(DistributedLock::fundTransfer))
       ).join();

       // {A=500}
       System.out.println(USERS);
   }

   static void lock(Runnable runnable) {
       RLock lock = redissonClient.getLock(USER_ID);

       try {
           // Try to acquire the lock with a timeout of 3 seconds and lease time of 4 seconds
           if (lock.tryLock(3, 4, TimeUnit.SECONDS)) {
               runnable.run();
           }
       } catch (InterruptedException e) {
           throw new RuntimeException("Failed to acquire lock", e);
       } finally {
           if (lock.isHeldByCurrentThread()) {
               lock.unlock();
           }
       }
   }
   // ...
}

In the code above, I removed the two functions tryLock() and unlock() that I created earlier and instead used the tryLock() and unlock() functions from the Redisson library. You can see that the code has become significantly shorter and easier to read. We also no longer need to manually set a requestId on the lock, because Redisson has a special mechanism to check whether the current thread is the one holding the lock. This ensures that only the thread holding the lock has the right to release it.

Redisson offers simplicity and convenience, but it may consume more resources than implementations that use raw Redis commands, though this overhead is not significant. On top of that, because of its high level of abstraction, it can be harder to control the locks and the logic executed under the hood.

4. Best Practices

In this section, I'll share some best practices for implementing distributed locks with Redis. These tips will help you optimize your system's performance, ensure data consistency, and improve fault tolerance in a distributed environment.

Set a reasonable expiration time (TTL) for the lock (don't set it too long).
Always release the lock when the processing logic finishes or when any error occurs.
Use distributed locks in the right place at the right time; avoid overusing them, as they can increase processing latency.
Start a transaction inside the distributed lock.
Deploy a Redis Cluster to ensure high availability and improve reliability.
You can combine a distributed lock with optimistic locking to ensure data consistency.

5. Conclusion

A distributed lock is an essential solution for ensuring data integrity and avoiding race conditions, and it is commonly used in problems such as job scheduling, preventing duplicate requests, and sequentially processing data update logic.
Redis-based locking is a popular choice thanks to its high speed and powerful data operations. There are two main approaches: using the SET command with the NX parameter, and using the Redisson library. Each method has its own trade-offs, and which one to choose depends on your system's specific requirements.
Use distributed locks in the right place and always release the lock as soon as the logic finishes or an error occurs.

That brings us to the end of the article. Although the content is fairly long, I hope what I've shared brings you fresh and valuable knowledge that you can apply effectively in real-world projects. See you in the next articles.

Happy reading! 🍻

Distributed Locks and How to Implement Them with Redis

1. The Problem

2. Distributed Lock

2.1. What is a Distributed Lock?

2.2. Why are Distributed Locks Important?

2.3. Approaches to Implementing Distributed Locks

3. Implementing a Distributed Lock with Redis

3.1. Using the SET Command with the NX Parameter

3.2. Using a Lua Script

3.3. Using the Redisson Library

4. Best Practices

5. Conclusion

6. References

Related articles

Message Broker

COUNT(*) vs COUNT(1): Which One Performs Best?

CronJob & Cron Expressions

Subscribe to the newsletter

Tags

CronJob & Cron Expressions