Skip to content

Latest commit

 

History

History
150 lines (116 loc) · 4.93 KB

0064_how_to_use_uuid.md

File metadata and controls

150 lines (116 loc) · 4.93 KB

Originally from: tweet, LinkedIn post.


How to use UUID

I post a new PostgreSQL "howto" article every day. Join me in this journey – subscribe, provide feedback, share!

As of now (PG16, 2023), Postgres implements UUID versions from 1 to 5, based on RFC 4122.

A UUID value can be generated using get_random_uuid(), it generates UUID version 4 (source code for PG16):

nik=# select gen_random_uuid();
           gen_random_uuid
--------------------------------------
 c027497b-c510-413b-9092-8e6c99cf9596
(1 row)

nik=# select gen_random_uuid();
           gen_random_uuid
--------------------------------------
 08e63fed-f883-45d8-9896-8f087074bff5
(1 row)

In standard UUIDs, the version can be understood looking at the first character after the 2nd hyphen:

08e63fed-f883-4 ...  👈 this means v4

The values are coming in a "pseudorandom" order. This has certain negative impact on performance: in a B-tree index, inserts happen in various locations, which, in general, affects write performance, as well as performance of Top-N reads (selecting N latest rows).

There is a proposal to implement newer versions of UUID both in RFC and Postgres – v7 provides a time-based UUID that includes a millisecond-precision timestamp, sequence number, and additional entropy in the form of random or fixed bits. This kind of UUID not only ensures global uniqueness but also preserves the temporal aspect, which can be very beneficial for performance.

UUID values are 16-byte – the same as timestamptz or timestamp values.

Good materials explaining performance aspects:

Since Postgres doesn't support UUID v7 natively yet, there are two options to use them

  • generate on client side
  • implement a helper function in Postgres.

For the latter approach, here is SQL function (thanks @DanielVerite):

create or replace function uuid_generate_v7() returns uuid
as $$
  -- use random v4 uuid as starting point (which has the same variant we need)
  -- then overlay timestamp
  -- then set version 7 by flipping the 2 and 1 bit in the version 4 string
select encode(
  set_bit(
    set_bit(
      overlay(
        uuid_send(gen_random_uuid())
        placing substring(int8send(floor(extract(epoch from clock_timestamp()) * 1000)::bigint) from 3)
        from 1 for 6
      ),
      52, 1
    ),
    53, 1
  ),
  'hex')::uuid;
$$ language SQL volatile;

Examples:

nik=# select uuid_generate_v7();
           uuid_generate_v7
--------------------------------------
 018c1be3-e485-7252-b80f-76a71843466a
(1 row)

nik=# select uuid_generate_v7();
           uuid_generate_v7
--------------------------------------
 018c1be3-e767-76b9-93dc-23c0c48be6c7
(1 row)

nik=# select uuid_generate_v7();
           uuid_generate_v7
--------------------------------------
 018c1be3-e973-7704-82ad-5967b79cf5c4
(1 row)

After a few minutes:

nik=# select uuid_generate_v7();
           uuid_generate_v7
--------------------------------------
 018c1be8-5002-70ab-96c0-c96ad5afa151
(1 row)

A few notes:

  1. If you use these value in the ORDER BY clause, the chronological order will persist.

  2. For the first 3 values (that we generated during a few seconds) there is a common prefix, 018c1be3-e, and with the last value that was generated slightly later, there is common prefix 018c1be.

  3. Note 7 after the second hyphen in all values:

    018c1be3-e973-7... 👈 this means v7
    
  4. The function returns a value of the UUID type, so it's still 16-byte (while text representation of it would take 36 characters including hyphens, meaning 40 bytes total with VARLENA header):

    nik=# select pg_column_size(gen_random_uuid());
     pg_column_size
    ----------------
                 16
    (1 row)
    
    nik=# select pg_column_size(uuid_generate_v7());
     pg_column_size
    ----------------
                 16
    (1 row)