Comparing ID Generation Technique (Incremental, UUID, and Snowflake)
ID or Identifier is a common method to differentiate between multiple entities or object. But what option is available and which one is the best for your usecase. Let's try to find out together. Before we start comparing multiple ID generation technique we must understand first, what is an ID (Identifier) is.
What is an Identifier?
Let's ask wikipedia about the definition of an ID.
Based on wikipedia's definition, ID is meant to identify a unique object. The key point here is unique, where it meant that it's only one of them exist. Maybe you are wondering why it has to be unique in the first place right? Why we should not have the same ID in the very first place? Well let's answer that with counter example first. Let's say we use name
as an ID, let's say i ordered a package for Mr. Radzig
. When the delivery man arrive, he will have to look for person with name Mr. Radzig. This is fine when we only have 1 person named Mr. Radzig, but what if there is now two Mr. Radzig? To which Radzig should the delivery man give it's package? One of the way to solve this problem was giving number to Mr. Radzig name so that there will be Mr. Radzig 1 and Mr. Radzig 2. Now the problem is solved, and you just learned yourself one of the way to generate and ID with incremental approach. Now that we understand what is an ID, it's time to learn how ID can be generated shall we?
Identifier generation method
There are multiple ways to generate an ID which is incremental
, snowflake
and uuid
. Each with their own pros and const. Sure there are other ways such as hashing and composite ID but we will leave it for another topic since they have some major drawbacks that might become apparent later.
Incremental / Auto Increment
Now, we learn the incremental approach first. You might already familiar with this if you work with database. This is commonly found in the DDL syntax with name of AUTO_INCREMENT
or BIG SERIAL
. The basic idea for this is pretty simple, we have a counter that keep increasing with each call. This typically done as atomic operation in the server where each request to it will just increment the number and return it to the caller.
This method is very simple but has 1 major drawback, which is the speed of ID generation won't scale horizontally. There is a limit on how much ID generated per second which is tied with how powerful the system is. Also remember that there is an atomic operation involved in this method which means there can only be one operation at a time. An increment in one server has to lock other server. This method are prone to single point of failure since it depends on other system to generate the ID.
This ID generation method is best used for simple application where the insert operation is minimum.
UUID
UUID or universally unique identifier is an ID generation method that rely on randomness. It basically just try to make the ID as random as possible without causing collision. UUID has multiple version which offers their own pros and cons. Here are the example of an UUID : 2f1195d0-9a28-4c2c-b49d-d262d8f95682
. UUID is another simple solution to generate an ID which don't rely on a central machine to generate. It also can be easily used without much setup making it ideal choice for most cases.
Since UUID uses hash technique, that means it will still have a chance to collide with another number (even tough i never see it collide but has seen some post that mentions it did collide on their case). UUID also takes significantly more space than other ID generation method.
This ID generation method is best used for highly concurrent insert where failure is tolerable.
Snowflake
Snowflake ID is actually not a single ID but instead combination of ID that makes it unique. Snowflake usually combine Machine ID
, Current Time
and Collision ID
. The Collision ID is just an auto increment just like the incremental approach. As the name suggest, the Collision ID is used to avoid collision when called repeatedly. It also uses Current Time to further increase the uniqueness. The Machine ID is used to guarantee uniqueness on each machine so each machine can generate ID on it's own. Snowflake works just like the incremental version, but to avoid single point of failure it allow each machine to generate it's own ID. Snowflake works by setting each machine with it's own ID. Each time a generation request is issued, the ID will be generated based on this step : pick it's machine ID, and then get current time and then get current collision ID. After that, it will merge all of the IDs by performing bit shift and or operation. Example : Time of 12345, Machine ID 2, and Collision ID 1 will be turned into 123456789 << 8 | 2 << 4 | 1
which will yield ID of 31604938017
. When the collision ID reach the limit, it will have to wait until the next time window. The example of snowflake ID can be found on this blog such as in this post : https://rendoru.com/blog/post/comparing-postgresql-batch-insertion-multi-value-insert-prepared-statement-copy-4476081907302658 . If you notice the 4476081907302658
is actually a snowflake ID for that post.
Snowflake has some drawbacks that you have to consider. First, it is important to pick the correct bit count for each Machine ID, Time Stamp, and Collision ID. Wrong selection of Machine ID count will result in limited number of machine to join the cluster. Wrong selection of Time Stamp will cause the ID to be short lived before overflow. Wrong selection of Collision ID will cause it to frequently block until the next time unit which basically defeat the purpose of using snowflake ID. Another problem with snowflake ID is that, it takes some space for storing the ID. I mean look at the previous post I mention, instead of having ID of 12, it now has an ID of 4476081907302658
.
This ID generation method is best used for highly concurrent insert where failure is not an option.
Remarks
To end and simplify this post you can refer to below table
Auto Increment | UUID | Snowflake | |
---|---|---|---|
Fault Tolerant | Not Supported | Supported | Supported |
Ease of setup | Easy | Easy | Hard |
Concurrent generation | Not Supported | Supported | Supported |
Uniqueness | Unique | Depend on entropy | Unique |