Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 740: tweak JSON simple API prescriptions #3768

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
244 changes: 157 additions & 87 deletions peps/pep-0740.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ These changes have two subcomponents:

* Changes to the currently unstandardized PyPI upload API, allowing clients
to upload digital attestations as :ref:`attestation objects <attestation-object>`;
* Changes to the :pep:`503` and :pep:`691` "simple" APIs, allowing clients
to retrieve both digital attestations and
* Changes to the :ref:`HTML and JSON "simple" APIs <packaging:simple-repository-api>`,
allowing clients to retrieve both digital attestations and
`Trusted Publishing <https://docs.pypi.org/trusted-publishers/>`_ metadata
for individual release files as :ref:`provenance objects <provenance-object>`.

Expand Down Expand Up @@ -75,7 +75,7 @@ Additionally, this proposal identifies the following motivations:
of the metadata needed by the index to verify an attestation's validity.

This PEP proposes a generic attestation format, containing an
:ref:`attestation payload for signature generation <payload-and-signature-generation>`,
:ref:`attestation statement for signature generation <payload-and-signature-generation>`,
with the expectation that index providers adopt the
format with a suitable source of identity for signature verification, such as
Trusted Publishing.
Expand Down Expand Up @@ -116,8 +116,8 @@ areas of Python packaging:
metadata within the cryptographic envelope.

For example, to prevent domain separation between a distribution's name and
its contents, this PEP proposes that digital attestations be performed over
``HASH(name || HASH(contents))`` rather than just ``HASH(contents)``.
its contents, this PEP uses in-toto Statements to bind the distribution's
contents (via SHA256 digest) to its filename.


Previous Work
Expand Down Expand Up @@ -196,6 +196,9 @@ Index changes
Simple Index
^^^^^^^^^^^^

The following changes are made to the
:ref:`simple repository API <packaging:simple-repository-api-base>`:

* When an uploaded file has one or more attestations, the index **MAY**
provide a ``.provenance`` file adjacent to the hosted distribution.
The format of the ``.provenance`` file **SHALL** be a JSON-encoded
Expand Down Expand Up @@ -223,17 +226,19 @@ Simple Index
JSON-based Simple API
^^^^^^^^^^^^^^^^^^^^^

The following changes are made to the
:ref:`JSON simple API <packaging:simple-repository-api-json>`:

* When an uploaded file has one or more attestations, the index **MAY**
include a ``provenance`` object in the ``file`` dictionary for that file.
The format of the ``provenance`` object **SHALL** be a JSON-encoded
:ref:`provenance object <provenance-object>`, which **SHALL** contain
the file's attestations.
include a ``provenance`` key in the ``file`` dictionary for that file.

* The index **MAY** choose to modify the ``provenance`` object, under the same
conditions as the ``.provenance`` file specified above.
The value of the ``provenance`` key **SHALL** be a JSON string, which
**SHALL** be the SHA256 digest of the associated ``.provenance`` file,
as in the Simple Index.

See :ref:`changes-to-provenance-objects` for an additional discussion of
reasons why a file's provenance may change.
See :ref:`appendix-3` for an explanation of the technical decision to
embed the SHA256 digest in the JSON API, rather than the full
:ref:`provenance object <provenance-object>`.

These changes require a version change to the JSON API:

Expand All @@ -260,13 +265,28 @@ object is provided as pseudocode below.

verification_material: VerificationMaterial
"""
Cryptographic materials used to verify `message_signature`.
Cryptographic materials used to verify `envelope`.
"""

envelope: Envelope
"""
The enveloped attestation statement and signature.
"""


@dataclass
class Envelope:
statement: bytes
"""
The attestation statement.

This is represented as opaque bytes on the wire (encoded as base64),
but it MUST be an JSON in-toto v1 Statement.
"""

message_signature: str
signature: bytes
"""
The attestation's signature, as `base64(raw-sig)`, where `raw-sig`
is the raw bytes of the signing operation over the attestation payload.
A signature for the above statement, encoded as base64.
"""

@dataclass
Expand Down Expand Up @@ -302,63 +322,47 @@ object) by selecting a new version number.

.. _payload-and-signature-generation:

Attestation payload and signature generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The *attestation payload* is the actual claim that is cryptographically signed
over within the attestation object (as the ``message_signature``).

The attestation payload is encoded as an :rfc:`8785` canonicalized JSON object,
with the following pseudocode layout:

.. code-block:: python

@dataclass
class AttestationPayload:
distribution: str
"""
The file name of the Python package distribution.
"""

digest: str
"""
The SHA-256 digest of the distribution's contents, as a hexadecimal string.
"""

The value of ``distribution`` is the same distribution filename that appears
in the :pep:`503` and :pep:`691` APIs. For example, ``distribution`` would be
``sampleproject-1.2.0-py2.py3-none-any.whl`` for the following simple index
entry:
Attestation statement and signature generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: html
The *attestation statement* is the actual claim that is cryptographically signed
over within the attestation object (i.e., the ``envelope.statement``).

<a href="https://example.com/...">sampleproject-1.2.0-py2.py3-none-any.whl</a><br/>
The attestation statement is encoded as a
`v1 in-toto Statement object <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__,
in JSON form. When serialized the statement is treated as an opaque binary blob,
avoiding the need for canonicalization. An example JSON-encoded statement is
provided in :ref:`appendix-4`.

In practice, this means that ``distribution`` is defined by the PyPA's
living specifications for
:ref:`binary distributions <packaging:binary-distribution-format>` and
:ref:`source distributions <packaging:source-distribution-format>`, although
non-conforming distributions may be hosted by the index.
In addition to being a v1 in-toto Statement, the attestation statement is constrained
in the following ways:

The following pseudocode demonstrates the construction of an attestation
payload and its signature:
* The in-toto ``subject`` **MUST** contain only a single subject.
* ``subject[0].name`` **MUST** be the distribution's filename, appropriately
normalized. See below for notes on sdist and wheel filename normalization.
* ``subject[0].digest`` **MUST** contain a SHA-256 digest. Other digests
**MAY** be present. The digests **MUST** be represented as hexadecimal strings.
* The following ``predicateType`` values are supported:

.. code-block:: python
* `SLSA Provenance <https://slsa.dev/provenance/v1>`__: ``https://slsa.dev/provenance/v1``
* `PyPI Publish Attestation <https://docs.pypi.org/attestations/publish/v1>`__: ``https://docs.pypi.org/attestations/publish/v1``

def build_payload(dist: Path) -> AttestationPayload:
return AttestationPayload(
distribution=dist.name,
digest=sha256(dist.read_bytes()).hexdigest,
)
For the distribution's filename, normalization should be performed according to
the :ref:`binary distribution format <packaging:binary-distribution-format>` or
:ref:`source distribution format <packaging:source-distribution-format>`, as
appropriate. In particular, the distribution name **MUST** be lowercased and
normalized, per :ref:`name normalization <packaging:name-normalization>`,
followed by replacing runs of ``-`` with ``_``. Similar, the distribution version
**MUST** be normalized per the
:ref:`version specification <packaging:version-specifiers>`.
Comment on lines +350 to +357
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this some more, and even this may not be sufficient, and will be annoying to do in practice:

  1. parse_wheel_filename is a (somewhat) one-way operation, since it fully expands the tag set
  2. The tag set itself isn't subject to any canonicalization rules, e.g. around the order of tags in each group in compressed tag sets

Given the above, it may just be easier to say that the distribution's filename needs to be a valid name, and is considered equal if it parses equally, meaning that the comparison is:

# wheels
parse_wheel_filename(actual) == parse_wheel_filename(expected)

# sdists
parse_sdist_filename(actual) == parse_sdist_filename(expected)


attestation_payload = build_payload("sampleproject-1.2.0-py2.py3-none-any.whl")
Examples of non-normalized filenames and their normalized equivalents are
provided in :ref:`appendix-5`.

# canonical_json is a fictitious module that performs RFC 8785 canonical
# JSON serialization.
encoded_payload = canonical_json.dumps(asdict(attestation_payload))

raw_signature = signing_key.sign(encoded_payload, ECDSA(SHA2_256()))
message_signature = b64encode(raw_signature)
The signature over this statement is constructed using the
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__,
with a ``PAYLOAD_TYPE`` of ``application/vnd.in-toto+json`` and a ``PAYLOAD_BODY`` of the JSON-encoded
statement above. No other ``PAYLOAD_TYPE`` is permitted.

.. _provenance-object:

Expand All @@ -368,9 +372,8 @@ Provenance objects
The index will serve uploaded attestations along with metadata that can assist
in verifying them in the form of JSON serialized objects.

These *provenance objects* will be available via both the :pep:`503` Simple Index
and :pep:`691` JSON-based Simple API as described above, and will have the
following layout:
These *provenance objects* will be available via both the Simple Index
and JSON-based Simple API as described above, and will have the following layout:

.. code-block:: json

Expand Down Expand Up @@ -488,7 +491,8 @@ for changes to the provenance object include but are not limited to:
Attestation verification
------------------------

Verifying an attestation object requires verification of each of the following:
Verifying an attestation object against a distribution file requires verification of each of the
following:

* ``version`` is ``1``. The verifier **MUST** reject any other version.
* ``verification_material.certificate`` is a valid signing certificate, as
Expand All @@ -497,9 +501,12 @@ Verifying an attestation object requires verification of each of the following:
* ``verification_material.certificate`` identifies an appropriate signing
subject, such as the machine identity of the Trusted Publisher that published
the package.
* ``message_signature`` can be verified by ``verification_material.certificate``,
using the reconstructed attestation payload as the cleartext input. The
verifier **MUST** reconstruct the attestation payload itself.
* ``envelope.statement`` is a well-formed in-toto v1 Statement, with a subject
and digest that match the distribution's filename and contents.
* ``envelope.signature`` is a valid signature for ``envelope.statement``
corresponding to ``verification_material.certificate``,
as reconstituted via the
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__.

In addition to the above required steps, a verifier **MAY** additionally verify
``verification_material.transparency_entries`` on a policy basis, e.g. requiring
Expand Down Expand Up @@ -543,19 +550,6 @@ unstated presumption with earlier mechanisms, like PGP and wheel signatures.
This PEP does not preclude or exclude future index trust mechanisms, such
as :pep:`458` and/or :pep:`480`.

Flexible attestations
---------------------

This PEP specifies a fixed attestation payload (defined in
:ref:`payload-and-signature-generation`), binding the contents of each uploaded
file to its public name on the index. This payload format is fixed and
inflexible to ease implementation, and to minimize additional mechanical
changes to the index itself (e.g., needing to store and service detached
attestation documents).

This PEP does not preclude or exclude future more flexible attestation payload
formats, such as ones built on `in-toto <https://in-toto.io/>`__.

Recommendations
===============

Expand Down Expand Up @@ -628,7 +622,7 @@ of signed inclusion time, and can be verified either online or offline.

inclusion_proof: InclusionProof
"""
The actual inclusion proof the the log entry.
The actual inclusion proof of the log entry.
"""


Expand Down Expand Up @@ -668,6 +662,82 @@ of signed inclusion time, and can be verified either online or offline.
Cosigned checkpoints from zero or more log witnesses.
"""

.. _appendix-3:

Appendix 3: Simple JSON API size considerations
===============================================

A previous draft of this PEP required embedding each
:ref:`provenance object <provenance-object>` directly into its appropriate part
of the JSON Simple API.

The current version of this PEP embeds the SHA256 digest of the provenance
object instead. This is done for size and network bandwidth consideration
reasons:

1. We estimate the typical size of an attestation object to be approximately
5.3KB of JSON.
2. We conservatively estimate that indices eventually host around 3 attestations
per release file, or approximately 15.9KB of JSON per combined provenance
object.
3. As of May 2024, the average project on PyPI has approximately 21 release
files. We conservatively expect this average to increase over time.
4. Combined, these numbers imply that a typical project might expect to host
between 60 and 70 attestations, or approximately 339KB of additional JSON
in its "project detail" endpoint.

These numbers are significantly worse in "pathological" cases, where projects
have hundreds or thousands of releases and/or dozens of files per release.

.. _appendix-4:

Appendix 4: Example attestation statement
=========================================

Given a source distribution ``sampleproject-1.2.3.tar.gz`` with a SHA-256
digest of ``e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855``,
the following is an appropriate in-toto Statement, as a JSON object:

.. code-block:: json

{
"_type": "https://in-toto.io/Statement/v1",
"subject": [
{
"name": "sampleproject-1.2.3.tar.gz",
"digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}
}
],
"predicateType": "https://some-arbitrary-predicate.example.com/v1",
"predicate": {
"something-else": "foo"
}
}

.. _appendix-5:

Appendix 5: Examples of distribution filename normalization
===========================================================

The following lists some examples of **non-normalized** distribution filenames,
followed by their normalized counterparts.

Non-normalized:

.. code-block::

SampleProjectCapitalized-1.2.3.tar.gz
too-many-dashes-1.2.3.tar.gz
nonnormalized_version-1.2.3-PREVIEW1-py3-none-any.whl

Normalized:

.. code-block::

sampleprojectcapitalized-1.2.3.tar.gz
too_many_dashes-1.2.3.tar.gz
nonnormalized_version-1.2.3preview1-py3-none-any.whl

Copyright
=========

Expand Down