Discussion of TAP 16: Snapshot Merkle trees #134

mnm678 · 2021-02-03T18:35:18Z

This is a thread to discuss snapshot Merkle trees, introduced in #125.

Pull requests relating to TAP 16: #133

Outstanding issues and questions relating to this TAP (from @joshuagl in #125 (comment))

A reader has to understand Merkle trees to implement this TAP, for example to understand how to compute the root hash from a leaf node. Is that a concern? Should we link to a good overview of Merkle trees? Should we specify how this is expected to work?
- are the metadata extensions for the Merkle tree truly abstract enough to support arbitrary (non-Merkle) tree algorithms? Should we PoC some other algorithms? perhaps in a different implementation?
Further PoC development of auditor integrations?
Recommendations on specifying the algorithms for a POUF
The Specification section suggests the snapshot Merkle tree replaces the single snapshot metadata file – is there any reason we shouldn't generate both? If we generate a Merkle tree only, should integrations still have a snapshot role with associated key? Should we explore this a bit more?
In places the TAP seems to suggest an auditor is required, whereas in others it indicates auditors are optional. Let's be sure to clarify.

joshuagl · 2021-02-10T14:31:31Z

Merkle tree implementations are susceptible to a second pre-image attack (see, for example, here or here). Fortunately, there is a well-known fix (as implemented in Certificate Transparency): differentiating between leaf nodes and internal nodes in the tree by prepending different byte values for each node type to the hash data of the node.

With TAP 16, how can help implementors avoid opening themselves up to this kind of attack? Especially given that we are trying to avoid specifying the Merkle tree algorithm as part of the TUF specification. Is this an argument for including the algorithm in the specification (where we could specify the need for different byte prefixes for internal and lead nodes), rather than as part of an implementation's POUF?

I recognise that auditors help here detect this second preimage attack against the Merkle tree. However, as above, the TAP doesn't appear to have decided whether auditors are required or not.

mnm678 · 2021-02-10T18:03:00Z

I hesitate to specify the exact algorithm, as there was some interest in using Verkle trees, or other variations. It also fits with the overall 'no judgement' philosophy of TUF.

That being said, I think the TAP should mention this attack, and perhaps other details about Merkle trees. This may also be a good fit for the secondary literature discussed in theupdateframework/specification#91.

joshuagl · 2021-02-15T16:50:55Z

Some additional issues/questions for consideration based on my final review before hitting approve:

is there a more compact format for the metadata, rather than having a merkle_path and path_directions as separate fields in the object with indexed values?
Does removing the snapshot key decrease our compromise resilience by increasing the value of compromising the timestamp key?
the obvious auditor implementation would require us to continue to generate the snapshot metadata file, as well as snapshot Merkle trees. Are there less obvious auditor implementations that may allow a repository not to generate a (potentially very large) snapshot metadata file?

jku · 2024-11-19T13:41:51Z

(just leaving notes for future now that I'm reading this again)

the TAP "security analysis" section should talk about the increased attack surface and increased client complexity -- I'm still not totally convinced adding this into TUF as is, is the right move. If the repository can use (and switching between) all the different delegation models as it wishes, the clients have the classic TLS handshake issue: a compromised/malicious repository can just choose the delegation mechanism that it has found a client bug in.

We might want to consider calling this a different protocol (that only supports merkle delegation). Alternatively -- and I'm hand waving now -- we could consider making it possible for TUF root role to explicitly limit the features that are allowed in this repository (making it impossible to modify this selection with just lower level keys): in practice this could be a string array in the root payload that limits which features are allowed features: ["merkle-delegation"] or features: ["path-delegation", "bin-delegation"]

trishankatdatadog · 2024-11-19T14:07:08Z

We might want to consider calling this a different protocol (that only supports merkle delegation). Alternatively -- and I'm hand waving now -- we could consider making it possible for TUF root role to explicitly limit the features that are allowed in this repository (making it impossible to modify this selection with just lower level keys): in practice this could be a string array in the root payload that limits which features are allowed features: ["merkle-delegation"] or features: ["path-delegation", "bin-delegation"]

I agree. Right now, it's not obvious to a client which TAP(s), if any, a repo supports at all.

jku · 2024-11-19T14:26:36Z

TAP says

This information will be included in the following metadata format:
{ “leaf_contents”: {METAFILES},
  “merkle_path”: {INDEX:HASH}
  “path_directions”:{INDEX:DIR}
}
Where METAFILES is the version information as defined for snapshot metadata

I'm not sure if this tries to say that leaf_contents value matches the METAFILES defined in the spec itself... but it definitely is not the same type of value (In the spec METAFILES is a dict of dicts with the outer dict key being the role filename)

jku · 2024-11-19T14:40:55Z

On Client interaction with auditors:

Clients must ensure that snapshot Merkle trees have been verified by an auditor

I understand we would like this to happen but this cannot be a "must" for clients:

A client cannot enforce the existence of trusted third party auditors
The verification mechanisms described are either impractical (this covers the idea of including auditor signature in the timestamp) or not well defined (this covers the other two mechanisms)

jku · 2024-11-20T09:11:27Z

Things I believe are missing from TAP:

client workflow changes. My impression is that changes are roughly these
- snapshot is not used
- for each targets metadata that needs to be downloaded
  - first download the merkle node for that targets role
  - verify that node is part of the merkle root (downloading the required other merkle nodes)
  - use the merkle node to find the targets metadata version (and hash), then download the actual targets metadata
  - continue normal client workflow (handle artifacts, delegations) based on contents of this targets metadata
merkle node API definition (meaning where do clients download the nodes): this is not covered in TAP but based on the example code it seems the suggestion is to use <METADATA_URL>/<ROLENAME>-snapshot.json. This approach seems to make collisions possible (possibly only in the non-consistent repository case), I would like that not be the case. Maybe something like <METADATA_URL>/merkle/<ROLENAME>.json would work -- although see the next point WRT snapshot stability
discussion on effects on snapshot consistency, aka the clients practical ability to fetch a working snapshot of the repository:
- assumptions:
  - client needs to download 100 targets metadata files to find all the artifacts it needs
  - client wants to use a single repository snapshot for all those metadata
  - client has to work synchronously: it has to download and parse the first artifact to find out what is the second artifact it needs (this is how pypi still works right now)
  - repository uploads new artifacts once a minute
- the assumptions lead to, let's say, a 5 minute window when the client is making merkle node fetches. This leads to an issue:
  - client wants to fetch nodes for a specific merkle root during the 5 min window
  - repository only makes the nodes available for the current merkle root which changes every minute: In other words the merkle node API returns different content for the same request at different times
- possibly something like <METADATA_URL>/merkle/<MERKLE_ROOT>.<ROLENAME>.json would work for snapshot consistency? it does mean the server side is a bit more complicated as it now needs to link nodes to roots
discussion on performance: this does likely lower the bandwidth requirements compared to a (PyPI size) traditional TUF repo but my guess is that number of HTTP requests is going to multiply several times. It's hard to guesstimate though, I would like to see some actual calculations (preferably for both ends of the use case spectrum: downloading a single artifact or downloading multiple hundreds of artifacts at a time).
discussion on attack surface and client complexity increase as mentioned in an earlier comment: How do we manage the spec becoming more and more complex to implement and how do we prevent the handshake problem where even an online key compromise can lead to attacker selecting the protocol features needed to compromise a specific client

The main issue on that list seems to be that clients cannot download a consistent snapshot of the repository as the merkle node API is not consistent as far as I can tell. Maybe I've misunderstood something?

jku · 2024-11-22T09:11:59Z

discussion on performance

I don't claim to have really crunched the numbers but just to provide a reason why this should be looked at, a quick estimate:

Assumptions:

repository is "pypi scale": 600000 projects, each gets their own delegation
One level of "intermediate" delegations is enough to make targets metadata download sizes reasonable (average intermediate targets metadata will then have 774 delegations)
A package installer installs a single package without dependencies. The installer has not been used in a few days: caches are cold

Comparison

baseline

number of requests: Today installers like pip do ~3 requests (pip version check, index download, package download)

bandwidth: TODO

Traditional TUF

number of requests: A TUF-enabled installer would do 9 requests (baseline + 6 tuf metadata downloads)

bandwidth: TODO (snapshot is very large, the 3 levels of targets metadata are significant)

merkle-TUF

number of requests: A merkle-TUF enabled client would do ~66 requests (baseline + traditional TUF + 3 merkle verifications, each requiring ~19 downloads)

bandwidth: TODO (nothing very large, the 3 levels of targets metadata are significant)

merkle-TUF with optimizations

Here we assume repository makes complete "inclusion proofs" available to clients (this is not in current TAP)

number of requests: A optimized-merkle-TUF enabled client would do ~12 requests (baseline + traditional TUF + 3 merkle inclusion proofs)

bandwidth: TODO (nothing very large, the 3 levels of targets metadata are significant)

mnm678 mentioned this issue Feb 3, 2021

Add TAP introducing snapshot Merkle trees #125

Merged

mnm678 added this to the TUF 1.4.0 milestone Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion of TAP 16: Snapshot Merkle trees #134

Discussion of TAP 16: Snapshot Merkle trees #134

mnm678 commented Feb 3, 2021

joshuagl commented Feb 10, 2021

mnm678 commented Feb 10, 2021

joshuagl commented Feb 15, 2021

jku commented Nov 19, 2024

trishankatdatadog commented Nov 19, 2024

jku commented Nov 19, 2024

jku commented Nov 19, 2024

jku commented Nov 20, 2024 •

edited

Loading

jku commented Nov 22, 2024 •

edited

Loading

Discussion of TAP 16: Snapshot Merkle trees #134

Discussion of TAP 16: Snapshot Merkle trees #134

Comments

mnm678 commented Feb 3, 2021

joshuagl commented Feb 10, 2021

mnm678 commented Feb 10, 2021

joshuagl commented Feb 15, 2021

jku commented Nov 19, 2024

trishankatdatadog commented Nov 19, 2024

jku commented Nov 19, 2024

jku commented Nov 19, 2024

jku commented Nov 20, 2024 • edited Loading

jku commented Nov 22, 2024 • edited Loading

Assumptions:

Comparison

baseline

Traditional TUF

merkle-TUF

merkle-TUF with optimizations

jku commented Nov 20, 2024 •

edited

Loading

jku commented Nov 22, 2024 •

edited

Loading