Note: These are tentative, and may be revised based on community feedback.
To support precise transparent definitions of all data summarized within substrate-etl report summaries (and used in polkaholic.io) (see All metrics are imperfect, but many are useful. Let’s make them more available), we attempt to define terms used in this repo both in English and through BigQuery on the substrate-etl datasets.
The open source approach taken here is of transparency and reproducibility: with exact BigQuery computations, anyone can reproduce any datapoint and improve any definition with adjustments to query form.
The Polkaholic Indexing process results in BigQuery tables representing:
- blocks, extrinsics, events, transfers (e.g.
substrate-etl.kusama.blocks0
,substrate-etl.kusama.extrinsics0
,substrate-etl.kusama.events0
,substrate-etl.kusama.transfers0
) -- derived by decoding every finalized block - active accounts (e.g.
substrate-etl.kusama.accountsactive0
) and passive accounts (substrate-etl.kusama.accountspassive0
) -- derived from combining the above tables, as described in Account Metrics - balances (e.g.
substrate-etl.kusama.balances0
) - derived from each chains native balances (system.accounts
) and non-native (assets or tokens or some other pallet) at the end of the day - new accounts (e.g.
substrate-etl.kusama.accountsnew0
) and reaped accounts (substrate-etl.kusama.accountsreaped0
) -- derived from daily differences in thebalances
tables
All data is based on UTC -- where the indexing process results in a load at the conclusion of each UTC day (starting at 4pm PT, completed by 7pm). In practice, some data is filled in and a secondary load is done within 24-72 hours with automated Github Actions to generate summary reports.
Both polkadot
and kusama
datasets is held in a public BigQuery substrate-etl
project accessible through Google Cloud Platform and further contains:
- XCM Transfers (e.g.
substrate-etl.polkadot.xcmtransfers
substrate-etl.kusama.xcmtransfers
)
In addition, the combination of both relay chains are aggregated in a 3rd data substrate-etl.dotsama
dataset, summarized here and escribed in Dotsama Metrics.
- Active Accounts (Substrate): Accounts that have signed an extrinsic on a Substrate chain
- System Accounts (Substrate): Accounts that have participated in consensus and produced a block
- Passive Accounts (Substrate): Accounts that aren't active but have any balance changes for any asset (native or non-native asset, within chain or cross-chain). This can include proxy and multisig accounts, remote execution triggered by other accounts, transfer recipients (including crowdloan and staking rewards distribution), and balance deductions (e.g., pre-authorized transferFrom/proxy called by other active accounts).
The above definitions are mechanized in substrate-etl
BigQuery below. The following computes Active Accounts, System Accounts and Passive Accounts for the Kusama relay chain for February 1, 2023 using the following public data:
substrate-etl.kusama.extrinsics0
substrate-etl.kusama.blocks0
substrate-etl.kusama.transfers0
The counts of each are shown in the report summaries. To compose for
any other chain or date, simply replace 0
with a paraID (e.g. 2023
for Moonriver), adjust the relay chain from kusama
to polkadot
if appropriate, and the date 2023-02-01
with another date in the
queries below.
SELECT address_ss58,
address_pubkey,
max(accountType) AS accountType,
Max(blockTime) AS last_active_ts
FROM
(WITH activeUserAccount AS
(SELECT signer_ss58 AS address_ss58,
signer_pub_key AS address_pubkey,
"User" AS accountType,
Max(block_time) AS block_time
FROM `substrate-etl.kusama.extrinsics0`
WHERE DATE(block_time) = "2023-02-01"
AND signed = TRUE
GROUP BY address_ss58,
address_pubkey),
activeSystemAccount AS
(SELECT author_ss58 AS address_ss58,
author_pub_key AS address_pubkey,
"System" AS accountType,
Max(block_time) AS block_time
FROM `substrate-etl.kusama.blocks0`
WHERE DATE(block_time) = "2023-02-01"
GROUP BY address_ss58,
address_pubkey) SELECT address_ss58,
address_pubkey,
max(accountType) AS accountType,
Max(block_time) AS blockTime
FROM activeSystemAccount
GROUP BY address_ss58,
address_pubkey
UNION ALL
(SELECT address_ss58,
address_pubkey,
max(accountType) AS accountType,
Max(block_time) AS blockTime
FROM activeUserAccount
GROUP BY address_ss58,
address_pubkey))
GROUP BY address_ss58,
address_pubkey
The list of passive accounts is kept in the following date-partitioned table, built using the query form above:
substrate-etl.{relayChain}.accountsactive{paraID}
for all chains e.g. for Polkadot and Karura:
substrate-etl.polkadot.accountsactive0
substrate-etl.kusama.accountsactive2000
The # of Active Accounts is computed using the above table in update_blocklog
for all chains with archive nodes.
WITH ActiveAccount AS
(SELECT address_ss58,
max(accountType) AS accountType
FROM
(WITH activeUserAccount AS
(SELECT signer_ss58 AS address_ss58,
signer_pub_key AS address_pubkey,
"User" AS accountType,
Max(block_time) AS block_time
FROM `substrate-etl.kusama.extrinsics0`
WHERE DATE(block_time) = "2023-02-01"
AND signed = TRUE
GROUP BY address_ss58,
address_pubkey),
activeSystemAccount AS
(SELECT author_ss58 AS address_ss58,
author_pub_key AS address_pubkey,
"System" AS accountType,
Max(block_time) AS block_time
FROM `substrate-etl.kusama.blocks0`
WHERE DATE(block_time) = "2023-02-01"
GROUP BY address_ss58,
address_pubkey) SELECT address_ss58,
address_pubkey,
max(accountType) AS accountType,
Max(block_time) AS blockTime
FROM activeSystemAccount
GROUP BY address_ss58,
address_pubkey
UNION ALL
(SELECT address_ss58,
address_pubkey,
max(accountType) AS accountType,
Max(block_time) AS blockTime
FROM activeUserAccount
GROUP BY address_ss58,
address_pubkey))
GROUP BY address_ss58,
address_pubkey),
TransferAccount AS
(SELECT address_ss58,
address_pubkey,
Max(block_time) AS block_time
FROM
(SELECT to_ss58 AS address_ss58,
to_pub_key AS address_pubkey,
Max(block_time) AS block_time
FROM `substrate-etl.kusama.transfers0`
WHERE DATE(block_time) = "2023-02-01"
GROUP BY address_ss58,
address_pubkey
UNION ALL SELECT from_ss58 AS address_ss58,
from_pub_key AS address_pubkey,
Max(block_time) AS block_time
FROM `substrate-etl.kusama.transfers0`
WHERE DATE(block_time) = "2023-02-01"
GROUP BY address_ss58,
address_pubkey)
GROUP BY address_ss58,
address_pubkey)
SELECT address_ss58,
address_pubkey,
Max(block_time) AS last_passive_ts
FROM TransferAccount
WHERE address_ss58 not in
(SELECT address_ss58
FROM ActiveAccount)
AND address_ss58 IS NOT NULL
GROUP BY address_ss58,
address_pubkey
ORDER BY address_pubkey;
The list of passive accounts is kept in the following date-partitioned tables, built using the query form above:
substrate-etl.{relayChain}.accountspassive{paraID}
for all chains e.g. for Polkadot and Karura
substrate-etl.polkadot.accountspassive0
substrate-etl.karura.accountspassive2000
The # of Passive Accounts metric is computed in update_blocklog
for all chains with archive nodes.
The list of addresses with balances is kept in the following date-partitioned table:
substrate-etl.{relayChain}.balances{paraID}
for all chains e.g. for Polkadot and Karura:
substrate-etl.polkadot.balances0
substrate-etl.kusama.balances2000
The # of Addresses with Balances metric is computed in load_bqlogfn
after loading the balances table for all chains.
The list of accounts that are newly present in one day and were not present on the previous day is kept in the following date-partitioned tables:
substrate-etl.{relayChain}.accountsnew{paraID}
for all chains e.g. for Polkadot and Karura:
substrate-etl.polkadot.accountsnew0
substrate-etl.kusama.accountsnew2000
It is computed using the balances
table. Note an account can be "new" multiple times in a day range, e.g. new on day N, reaped on day N+1, new again on day N+2.
The # of New Accounts metric is computed using the above table in update_blocklog
for all chains with archive nodes.
The list of accounts that are newly present in one day and were not present on the previous day is kept in the following date-partitioned table:
substrate-etl.{relayChain}.accountsreaped{paraID}
for all chains e.g. for Polkadot and Karura:
substrate-etl.polkadot.accountsreaped0
substrate-etl.kusama.accountsreaped2000
It is computed using the balances
table. Note an account can be "reaped" multiple times in a day range, e.g. reaped on day N, new on day N+1, reaped again on day N+2.
The # of Reaped Accounts metric is computed using the above tables in
update_blocklog
for all chains.
The # of Signed Extrinsics is computed using the following table using the column signed
using update_blocklog
for all chains:
substrate-etl.{relayChain}.extrinsics{paraID}
The # of Events metric is computed using the following table, and includes events from both signed and unsigned extrinsics using update_blocklog
:
substrate-etl.{relayChain}.events{paraID}
The # of Transfers metric is computed using the following table, and includes transfers from both signed and unsigned extrinsics using update_blocklog
substrate-etl.{relayChain}.transfers{paraID}
Important: Because some parachains include duplicate events for the same underlying transfer, a best effort was made to dedup events to only include one event. Further details concerning XCM Transfers will be provided.
The process of matching XCM Transfers between an origination chain and a destination chain with topics will be detailed further in Q3 2023.
Important: Because some parachains do not provide archive nodes (see "Issues"), only tables from chains that provide archive nodes can be included. If new archive nodes become available for these chains, the above datasets may be adjusted.
The above is computed with dump_networkmetrics
once all chains have completed that can be ignored (marked with an "Ignore" flag). Some chains without archive nodes must be ignored (generally these are new chains) as a condition in the above
aggregation. Otherwise this is initiated once all chains are marked as Audited or AuditRequired.