Originally from: tweet, LinkedIn post.
I post a new PostgreSQL "howto" article every day. Join me in this journey – subscribe, provide feedback, share!
There are various approaches to formatting SQL. There two major ones are:
-
Traditional: all keywords are uppercase, formatting is organized with a central whitespace pipe, advertised in Joe Celko’s "SQL Programming Style" book. One version is provided at SQL Style Guide. Example:
SELECT a.title, a.release_date, a.recording_date FROM albums AS a WHERE a.title = 'Charcoal Lane' OR a.title = 'The New Danger';
I personally find this style very old-fashioned, inconvenient, and cumbersome, and I prefer another one:
-
Modern: often (not always) lowercase keywords and left-align hierarchical indentation similar to what is used in regular programming languages. Example:
select a.title, a.release_date, a.recording_date from albums as a where a.title = 'Charcoal Lane' or a.title = 'The New Danger';
Here, we'll discuss a version of the latter ("modern") approach, derived from Mozilla's SQL Style Guide.
The key difference from Mozilla's guide is that here we use lowercase SQL. The reason behind it is simple: an uppercase SQL makes more sense when it's embedded into a different language, to help distinguish it from the surrounding code.
But if you work with large query texts provided separately in various SQL editors, perform optimization tasks, and so on, the use of uppercase keywords becomes less convenient. Lowercase keywords are easier to type. Finally, constantly shouting at your database is not a good idea.
At the same time, it might still make sense to use uppercase SQL keywords when using them in, for example, natural language texts, for example:
.. using multiple JOINs ..
Always use lowercase for reserved keywords such as select
, where
, or as
.
- Use consistent and descriptive identifiers and names.
- Use lower case names with underscores, such as
first_name
. Do not use CamelCase. - Functions such as
cardinality()
,array_agg()
, andsubstr()
are identifiers and should be treated like variable names. - Names must begin with a letter and may not end in an underscore.
- Only use letters, numbers, and underscores in variable names.
When choosing between explicit or implicit syntax, prefer explicit.
Always include the as
keyword when aliasing a variable or table name, it's easier to read when explicit.
✅ Good:
select date(t.created_at) as day
from telemetry as t
limit 10;
❌ Bad:
select date(t.created_at) day
from telemetry t
limit 10;
Always include the join
type rather than relying on the default join.
✅ Good:
select
submission_date,
experiment.key as experiment_id,
experiment.value as experiment_branch,
count(*) as count
from
telemetry.clients_daily
cross join
unnest(experiments.key_value) as experiment
where
submission_date > '2019-07-01'
and sample_id = '10'
group by
submission_date,
experiment_id,
experiment_branch;
❌ Bad:
select
submission_date,
experiment.key as experiment_id,
experiment.value as experiment_branch,
count(*) as count
from
telemetry.clients_daily,
unnest(key_value) as experiment -- implicit JOIN
where
submission_date > '2019-07-01'
and sample_id = '10'
group by
1, 2, 3; -- implicit grouping column names
Avoid using implicit grouping column names.
✅ Good:
select state, backend_type, count(*)
from pg_stat_activity
group by state, backend_type
order by state, backend_type;
🆗 Acceptable:
select
date_trunc('minute', xact_start) as xs_minute,
count(*)
from pg_stat_activity
group by 1
order by 1;
❌ Bad:
select state, backend_type, count(*)
from pg_stat_activity
group by 1, 2
order by 1, 2;
All root keywords should start on the same character boundary.
✅ Good:
select
client_id,
submission_date
from main_summary
where
sample_id = '42'
and submission_date > '20180101'
limit 10;
❌ Bad:
select client_id,
submission_date
from main_summary
where sample_id = '42'
and submission_date > '20180101';
Root keywords should be on their own line in all cases except when followed by only one dependent word. If there are more than one dependent words, they should form a column that is left-aligned and indented to the left of the root keyword.
✅ Good:
select
client_id,
submission_date
from main_summary
where
submission_date > '20180101'
and sample_id = '42'
limit 10;
It's acceptable to include an argument on the same line as the root keyword if there is exactly one argument.
🆗 Acceptable:
select
client_id,
submission_date
from main_summary
where
submission_date > '20180101'
and sample_id = '42'
limit 10;
❌ Bad:
select client_id, submission_date
from main_summary
where submission_date > '20180101' and (sample_id = '42' or sample_id is null)
limit 10;
❌ Bad:
select
client_id,
submission_date
from main_summary
where submission_date > '20180101'
and sample_id = '42'
limit 10;
If parentheses span multiple lines:
- The opening parenthesis should end the line.
- The closing parenthesis should line up under the first character of the line that starts the multi-line construct.
- Indent the contents of the parentheses one level.
✅ Good:
with sample as (
select
client_id,
submission_date
from main_summary
where
sample_id = '42'
)
...
❌ Bad (terminating parenthesis on shared line):
with sample as (
select
client_id,
submission_date
from main_summary
where
sample_id = '42')
...
❌ Bad (no indent):
with sample as (
select
client_id,
submission_date
from main_summary
where
sample_id = '42'
)
...
"and" and "or" should always be at the beginning of the line.
✅ Good:
...
where
submission_date > '2018-01-01'
and sample_id = '42'
❌ Bad:
...
where
submission_date > '2018-01-01' and
sample_id = '42'
Do not use nested queries. Instead, use common table expressions (CTEs, keyword WITH
) to improve readability.
✅ Good:
with sample as (
select
client_id,
submission_date
from main_summary
where
sample_id = '42'
)
select *
from sample
limit 10;
❌ Bad:
select *
from (
select
client_id,
submission_date
from main_summary
where
sample_id = '42'
)
limit 10;
This document is heavily influenced by the SQL Style Guide and Mozilla SQL Style Guide. Proposals, extensions, and fixes are welcome.