I was recently working on an application that had a sharded MySQL database. While working on this application, I needed to generate unique IDs that could be used as the primary keys in the tables.

When you’re working with a single MySQL database, you can simply use an auto-increment ID as the primary key, But this won’t work in a sharded MySQL database.

So I looked at various existing solutions for this, and finally wrote a simple 64-bit unique ID generator that was inspired by a similar service by Twitter called Twitter snowflake.

In this article, I’ll share a simplified version of the unique ID generator that will work for any use-case of generating unique IDs in a distributed environment, not just sharded databases.

I’ll also outline other existing solutions and discuss their pros and cons.

Existing Solutions

UUID

UUIDs are 128-bit hexadecimal numbers that are globally unique. The chances of the same UUID getting generated twice is negligible.

The problem with UUIDs is that they are very big in size and don’t index well. When your dataset increases, the index size increases as well and the query performance takes a hit.

MongoDB’s ObjectId

MongoDB’s ObjectIDs are 12-byte (96-bit) hexadecimal numbers that are made up of -

  • a 4-byte epoch timestamp in seconds,
  • a 3-byte machine identifier,
  • a 2-byte process id, and
  • a 3-byte counter, starting with a random value.

This is smaller than the earlier 128-bit UUID. But again the size is relatively longer than what we normally have in a single MySQL auto-increment field (a 64-bit bigint value).

Database Ticket Servers

This approach uses a centralized database server to generate unique incrementing IDs. It’s like a centralized auto-increment. This approach is used by Flickr.

The problem with this approach is that the ticket server can become a write bottleneck. Moreover, you introduce one more component in your infrastructure that you need to manage and scale.

Twitter Snowflake

Twitter snowflake is a dedicated network service for generating 64-bit unique IDs at high scale. The IDs generated by this service are roughly time sortable.

The IDs are made up of the following components:

  • Epoch timestamp in millisecond precision - 41 bits (gives us 69 years with a custom epoch)
  • Configured machine id - 10 bits (gives us up to 1024 machines)
  • Sequence number - 12 bits (A local counter per machine that rolls over every 4096)

They have reserved 1-bit for future purpose. Since the IDs use timestamp as the first component, they are sortable.

The IDs generated by twitter snowflake fits in 64-bits and are time sortable, which is great. That’s what we want.

But If we use Twitter snowflake, we’ll again be introducing another component in our infrastructure that we need to maintain.

Distributed 64-bit unique ID generator inspired by Twitter Snowflake

Finally, I wrote a simple sequence generator that generates 64-bit IDs based on the concepts outlined in the Twitter snowflake service.

The IDs generated by this sequence generator are composed of -

  • Epoch timestamp in milliseconds precision - 42 bits. The maximum timestamp that can be represented using 42 bits is 242 - 1, or 4398046511103, which comes out to be Wednesday, May 15, 2109 7:35:11.103 AM.
  • Machine ID - 10 bits. This gives us 1024 machines.
  • Local counter per machine - 12 bits. The counter’s max value would be 4095. After that, It rolls over and starts from 0 again.

Your microservices can use this Sequence Generator to generate IDs independently. This is efficient and fits in the size of a bigint.

Here is the complete program -

import java.net.NetworkInterface;
import java.security.SecureRandom;
import java.time.Instant;
import java.util.Enumeration;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Distributed Sequence Generator.
 * Inspired by Twitter snowflake: https://github.com/twitter/snowflake/tree/snowflake-2010
 */
public final class SequenceGenerator {
    private static final AtomicInteger counter = new AtomicInteger(new SecureRandom().nextInt());

    private static final int TOTAL_BITS = 64;
    private static final int EPOCH_BITS = 42;
    private static final int MACHINE_ID_BITS = 10;

    private static final int MACHINE_ID;
    private static final int LOWER_ORDER_TEN_BITS = 0x3FF;
    private static final int LOWER_ORDER_TWELVE_BITS = 0xFFF;

    public static long nextId() {
        long curMs = Instant.now().toEpochMilli();
        long id = curMs << (TOTAL_BITS - EPOCH_BITS);
        id |= (MACHINE_ID << (TOTAL_BITS - EPOCH_BITS - MACHINE_ID_BITS));
        id |= (getNextCounter() & LOWER_ORDER_TWELVE_BITS);
        return id;
    }

    private static int getNextCounter() {
        return counter.getAndIncrement();
    }

    static {
        MACHINE_ID = createMachineId();
    }

    private static int createMachineId() {
        int machineId;
        try {
            StringBuilder sb = new StringBuilder();
            Enumeration<NetworkInterface> networkInterfaces = NetworkInterface.getNetworkInterfaces();
            while (networkInterfaces.hasMoreElements()) {
                NetworkInterface networkInterface = networkInterfaces.nextElement();
                byte[] mac = networkInterface.getHardwareAddress();
                if (mac != null) {
                    for(int i = 0; i < mac.length; i++) {
                        sb.append(String.format("%02X", mac[i]));
                    }
                }
            }
            machineId = sb.toString().hashCode();
        } catch (Exception ex) {
            machineId = (new SecureRandom().nextInt());
        }
        machineId = machineId & LOWER_ORDER_TEN_BITS;
        return machineId;
    }
}

The above generator uses the machine’s MAC address to create a unique identifier for that Machine. You can also configure a MachineID in the environment variable and use that directly. That will guarantee uniqueness.

Let’s now understand how it works. Let’s say it’s June 9, 2018 10:00:00 AM GMT. The epoch timestamp for this particular time is 1528538400.

To generate the ID, we’ll first fill the leftmost 42 bits with this value using a left-shift-

id = 1528538400 << (64 - 42)

Next, we take the configured machine ID and fill the next 10 bits with the machine ID. Let’s say that the machine ID is 786 -

id |= 786 << (64 - 42 - 10)

Finally, we fill the last 12 bits with the local counter. Considering the counter’s next value is 3450 (after taking out the lower order 12 bits as done in the above program), the final ID is obtained like so -

id |= 3450 

That gives us our final ID.

More Learning Resources