Skip to content

Latest commit

 

History

History
66 lines (54 loc) · 1.71 KB

0078_estimate_yoy_table_growth.md

File metadata and controls

66 lines (54 loc) · 1.71 KB

Originally from: tweet, LinkedIn post.


How to estimate the YoY growth of a very large table using row creation timestamps and the planner statistics

I post a new PostgreSQL "howto" article every day. Join me in this journey – subscribe, provide feedback, share!

Assume we have a 10 TiB table created many years ago, partitioned or not, like this one:

create table t (
  id int8 primary key,   -- of course, not int4
  created_at timestamptz default now(),
  ...
);

and we need to quickly understand how the table grew year over year, assuming that no rows were deleted (or only a negligible amount). So, we just need to count rows for every year.

A straightforward approach would be:

select
  date_trunc('year', created_at) as year,
  count(*)
from t
group by year
order by year;

However, for a 10 TiB table, we'd need to wait many hours, if not days, for this analysis to complete.

Here is fast, but not a precise way to get the row counts for each year (assuming the table has up-to-date stats; if in doubt, run ANALYZE on it first):

do $$
declare
  table_fqn text := 'public.t';
  year_start int := 2000;
  year_end int := extract(year from now())::int;
  year int;
  explain_json json;
begin
  for year in year_start..year_end loop
    execute format(
      $e$
        explain (format json) select *
        from %s
        where created_at
        between '%s-01-01' and '%s-12-31'
      $e$,
      table_fqn,
      year,
      year
    ) into explain_json;

    raise info 'Year: %, Estimated rows: %',
      year,
      explain_json->0->'Plan'->>'Plan Rows';
  end loop;
end $$;