ULID (Universally Unique Lexicographically Sortable Identifier)

[Codes and Calculators Home][Home]

This creates a ULID (Universally Unique Lexicographically Sortable Identifier). UUIDs (Universally Unique IDentifiers) lack any real structure, and do not support the linking of objects and unique names. ULID aims to address this by created a cexicographically sortable system. It has a 26 character string and uses Base32 for its character (five bits per charater). With Base 32, we have a character set of "A-Z2–9=] - notice that '0' and '1' are not used, as they can be confused with 'O' and "I'.

UUID (Universally Unique IDentifiers)

One of the great innovations for Apollo computers was the usage of UUIDs (Universally Unique IDentifiers) - aka GUIDs (Global Identifiers). These were used to uniquely define each computer, and could thus support their licencing. Basically, the administrator would generate the UUIDs for the computers on their network, and the software company could create unique licences for them. For the first versions of UUIDs, the identifier used the MAC address of the network card. As I remember - at the time - there was always a panic point in the year, when we had to renew our licences, and where there were often downtimes as we scrambled to get the computer licenced (especially if we mistyped the UUID).

Overall, a UUID is a 128-bit label that we can use to digitally identify something. A typical form is to use 32 hexadecimal digits, and then arranged in a 8–4–4–4–12 format:

8f6ec199-f33b-4c1a-a9cc-65cedbbadf0b

UUID Version 1 used the MAC address of a device to generate the identifier, but with Version 4, we have an almost completely random value. A simple Python program for this is:

import uuid

myuuid = uuid.uuid4()

print('UID: ' + str(myuuid))

This uses the Version 4 format, and which is completely random. While we can have around 2¹²⁸ different UUIDs, they are slightly constrained to 5.3x10³⁶. Overall, there is virtually no chance of ever creating the same UUID - and are thus fairly safe from collisions. The standard for UUID is defined in RFC 4122. Overall this specification defines that the output from a generator should be in lowercase, but where a reader can accept both upper and lower case hexadecimal letters. While Apollo initially created UUIDs, it was quickly adopted by Microsoft for Microsoft Windows. The format basically contains three fields for the current time, and then a random node identifier:

8f6ec199-f33b-4c1a-a9cc-65cedbbadf0b
[time-low]-[time-mid-and-high]-[clock]-[node]

The [time-low] and [time-mid-and-high] are both parts of the current timestamp, and the [node] part is a unique identifier for a node. In the past, this has been a 48-bit MAC address for the local Ethernet adaptor. The timestamp is a 60-bit field and defined as Coordinated Universal Time (UTC) - this contains a count of 100ns intervals since 0:00 on 15 October 1582. UUID Version 4 generates truly random or pseudo-random numbers and can be used for object identifiers in code, while versions 3 and 5 focus on creating unique name-based UUIDs.

But, UUIDs are now struggling in this where we now support not only computers but IoT devices and sensors. For Version 4, we have basically a random number, and have very little information on how data objects could be linked, while versions 3 and 5 again struggle to link entities within the same domain. This causes a fragmentation with the data structures. And, so, now meet ULID (Universally Unique Lexicographically Sortable Identifier)

ULID (Universally Unique Lexicographically Sortable Identifier)

UUIDs thus lack any real structure and do not support the linking of objects and unique names. ULID aims to address this by creating a lexicographically sortable system. It has a 26-character string and uses Base32 for its character (five bits per character). With Base 32, we have a character set of "A-Z2–9=] - notice that '0' and '1' are not used, as they can be confused with 'O' and "I'. For example, if we have "help", this is encoded in binary as:

01101000 01100101 01101100 01110000
We can then rearrange in groups of 5 bits with:
01101  00001 10010  10110  11000   11100   00 [000]
(13-N) (1-B) (18-S) (22-W) (24-Y)  28 (4) ( 0-A)

And thus we get [here]:

Message:  help
Type:   base32
Encoding:  NBSWY4A=

and where the value is padded with zeros to make a multiply of five bits, and then padded with "=" to make a multiple of four Base64 characters. With a ULID we have 128-bits, and so we divide it into 26 5-bit values and create our 26-character ULID. As we use a 128-bit format for ULID, it keeps compatibility with UUIDs, but the core strength is its ability to link IDs.

ULIDs are available to integrate into most of the common programming languages. In Python, we can integrate with:

import ulid, uuid, datetime

value = uuid.uuid4()
print (f"UUID Version 4:\t\t\t\t{value}")

res=ulid.from_uuid(value)
print(f"ULID (from UUID):\t\t\t{res}")

res=ulid.from_timestamp(datetime.datetime(2023, 1, 1))
print(f"ULID (1/1/2023):\t\t\t{res}")

res=ulid.from_timestamp(datetime.datetime.now())
print(f"ULID ({datetime.datetime.now()}):\t{res}")

In this case, we create a normal UUID Version 4 ID, and get the result of:

UUID Version 4:    37090347-328b-449c-8305-d4c02900aa53

If we run again we get:

UUID Version 4:    5d794cc5-4c08-4f48-8f6a-2af4fb02d817
Notice that there is now a real linkage between the two. Now let's generate based on 1/1/2023:
ULID (1/1/2023):   01GNNA1J00R8V3KBGGN1F1G7WV

and again:

ULID (1/1/2023):   01GNNA1J00T8E8119ETYKAHZNY

Notice that they share the same first part: "01GNNA1J00". This relates to the date defined. In this way, we can now store the ULIDs with a structure based on the date they were created. Now, if we try the current time, we get:

ULID (2022-12-31 08:41:27.789520): 01GNKNFNFDH8NT0HJHJT6Z2PDC

And then a few seconds later:

ULID (2022-12-31 08:41:58.399964): 01GNKNGKBZX83GHAWS8KBND4D0

Again, we can see a shared part in the date, but a different time stamp overall, and a random part at the end. The format is then:

ttttttttttrrrrrrrrrrrrrrrr

where
t is Timestamp (10 characters)
r is Randomness (16 characters)

and where the timestamp stores, 32 bits for the high time (the date), and then the next 16 bits for the millisecond, and the last for the nanoseconds:

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      32_bit_uint_time_high                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     16_bit_uint_time_low      |       16_bit_uint_random      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |

Notice the major change is to move the high part of the clock to the start of ID. This allows us to store the date at the start, and make it easier to structure based on the date, and time. If the timestamps occur in the same millisecond, we can see we get different timestamps:

01BX5ZZKBKACTAV9WEVGEMMVRY
01BX5ZZKBKACTAV9WEVGEMMVRZ
01BX5ZZKBKACTAV9WEVGEMMVS0
01BX5ZZKBKACTAV9WEVGEMMVS1

ULID (Universally Unique Lexicographically Sortable Identifier)

UUID (Universally Unique IDentifiers)

ULID (Universally Unique Lexicographically Sortable Identifier)

Referencing this page