Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 716: Normalization of Project Names in Metadata and Filenames #3171

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -596,6 +596,7 @@ pep-0712.rst @ericvsmith
pep-0713.rst @ambv
pep-0714.rst @dstufft
pep-0715.rst @dstufft
pep-0716.rst @dstufft
# ...
# pep-0754.txt
# ...
Expand Down
319 changes: 319 additions & 0 deletions pep-0716.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,319 @@
PEP: 716
Title: Normalization of Project Names in Metadata and Filenames
Author: Donald Stufft <[email protected]>
PEP-Delegate: Paul Moore <[email protected]>
Discussions-To:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard reminder to update Discussions-To and Post-History with the PEP discussion thread created just after merging this

Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 11-Jun-2023
Post-History:


Abstract
========

This PEP standardizes on where and when project names should and should not be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This PEP standardizes on where and when project names should and should not be
This PEP standardizes where and when project names should and should not be

Elide unnecessary word

normalized in the packaging toolchain.


Motivation
==========

Historically there was effectively little to no requirements on the valid values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Historically there was effectively little to no requirements on the valid values
Historically, there was effectively little to no restriction on the valid values

Fix grammar errors (agreement)

of names in the packaging ecosystem. Projects that wanted to interpret those
names had to cope with a wide range of values, and had to each implement their
own normalization schemes to try and detect names that were the same, but
"spelled" differently.

Over the intervening years, various PEPs have ratcheted down various pieces of
metadata such as version numbers (:pep:`440`), filenames in bdists (:pep:`427`),
names in the Simple API (:pep:`503`), and filenames in sdists (:pep:`625`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
names in the Simple API (:pep:`503`), and filenames in sdists (:pep:`625`).
names in the Simple Repository API (:pep:`503`), and filenames in sdists (:pep:`625`).

Specify the actual name of the API, and be clear at a glance what API it actually is


Unfortunately, a complex interaction between these various standards *and*
changes made to the specifications without an associated PEP, have created a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
changes made to the specifications without an associated PEP, have created a
changes made to the specifications without an associated PEP have created a

Stray comma

situation where the ecosystem is in an inconsistent and broken state with
regards to normalization of names.

The brokenness is currently around ``.``, but the underlying issue actually
affects any unnormalized name that is being emitted.

The path to getting to where we are today was roughly:

1. :pep:`427` was accepted with two different requirements on what was a valid
filename. One requirement, specified in prose which if read strictly did not
actually make sense, and another requirement implemented in code that did.
Comment on lines +45 to +46
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename. One requirement, specified in prose which if read strictly did not
actually make sense, and another requirement implemented in code that did.
filename: one specified in prose, which did not actually make sense
if read strictly; and another implemented in code, that did.

Fix grammar and punctuation errors and use clearer, easier to parse syntax and structure


Rather than normalization, :pep:`427` focused on what was *valid* and
provided a way to escape characters that were not valid in the filename but
were valid in the "source" metadata (Name and Version).

All tools at this time, implemented themselves using the second of those
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All tools at this time, implemented themselves using the second of those
All tools at this time implemented the second of those

Fix bad grammar and punctuation, and trim redundant wordiness that does not aid understanding

requirements, and escaped as expected.
2. :pep:`440` was accepted, which put strict requirements on what was a valid
version and defined a normalization procedure for valid but differently
specified versions.

This normalization used ``-``, which was an invalid value for :pep:`427` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This normalization used ``-``, which was an invalid value for :pep:`427` and
This normalization used ``-``, which was an invalid value for wheel filenames per :pep:`427` and

Clarity what this was actually invalid for (it took a bit of thought to confirm the connection)

required escaping to ``_``, so :pep:`440` was extended to allow that as an
optional spelling of ``_``, which would normalize to ``-``.
Comment on lines +59 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here. You say that PEP 440 (versions) "was extended to allow that (?) as an optional spelling of _", but it is unclear what "that" is—PEP 440 already allowed -, and _ as mentioned in contrast to "that" later in the sentence, yet it seems _ should be the "optional spelling" instead, going by the above statement that - is the normal form in PEP 440. So it seemed you might have meant

Suggested change
required escaping to ``_``, so :pep:`440` was extended to allow that as an
optional spelling of ``_``, which would normalize to ``-``.
required escaping to ``_``, so :pep:`440` was extended to allow
optionally spelling it as ``_``, which would normalize to ``-``.

However, I can't find any mention of - being the normalized form of any part of the version in PEP 440 normalization (as opposed to both - and _ being normalized to . in various places). It seems I might still be missing something here?

3. :pep:`503` was accepted, which specified a normalization of the project name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. :pep:`503` was accepted, which specified a normalization of the project name
3. :pep:`503` was accepted, which specified
:ref:`the normalization of the project name <packaging:name-normalization>`

Link/cross reference the current canonical spec

*when* querying the Simple API for a project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*when* querying the Simple API for a project.
*when querying the Simple Repository API for a project*.
  • Italicize the correct text
  • Use the API's actual, much more descriptive name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, alternatively

Suggested change
*when* querying the Simple API for a project.
*when querying the Simple Repository API* for a project.

4. The spec for ``.dist-info`` required normalization of the name (using the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. The spec for ``.dist-info`` required normalization of the name (using the
4. The :ref:`spec for .dist-info <packaging:recording-installed-packages>`
required normalization of the name (using the

Cross reference the actual spec in question

:pep:`503` rules) but did not specify a requirement on version. The :pep:`503`
normalization uses ``-``, but tools that locate ``.dist-info`` use the ``-``
character to split between name and version, so in practice nobody was
following this requirement.

Thus, the ``.dist-info`` spec was `updated <https://github.com/pypa/packaging.python.org/pull/781>`__,
without a PEP, to make the spec more closely align with common practice. The
result of that being that the spec states that name must be normalized as
per :pep:`503` and versions must be normalized as per :pep:`440`, but
escaping ``-`` with ``_``.

This change recognized that there are many existing ``.dist-info`` directories
that are not normalized, and thus instructs tools to expect ``.dist-info``
directories with unnormalized values, but that all tools must write normalized
values going forward.
5. It was `noted <https://discuss.python.org/t/5605>`__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. It was `noted <https://discuss.python.org/t/5605>`__
5. It was `noted <https://discuss.python.org/t/5605>`__

that :pep:`427` required that the segments of the filename contain only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that :pep:`427` required that the segments of the filename contain only
that :pep:`427` required that the segments of wheel filenames contain only

Clarify what filenames we're talking about here

alphanumeric characters, ``_``, and ``.``, and that all other characters must
be escaped with ``_``. However, :pep:`440` allows the use of ``!`` and ``+``,
which meant that those characters got escaped to ``_``, which could then not
be parsed back into their original versions.
6. As a result of that discussion, the Wheel specs were `updated <https://discuss.python.org/t/5605>`__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercase wheel and extra space

Suggested change
6. As a result of that discussion, the Wheel specs were `updated <https://discuss.python.org/t/5605>`__
6. As a result of that discussion, the wheel specs were `updated <https://discuss.python.org/t/5605>`__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
6. As a result of that discussion, the Wheel specs were `updated <https://discuss.python.org/t/5605>`__
6. As a result of that discussion, the :ref:`wheel spec <packaging:binary-distribution-format>`
was `updated <https://github.com/pypa/packaging.python.org/pull/844>`__
  • Make the link actually point to the update as I presume you intended, isntead of a repeat of the discussion linked above (I assume copy/paste mistake)
  • Actually link the wheel spec in question
  • Fix grammar
  • Incorporate the above case change

and then `updated again <https://github.com/pypa/packaging.python.org/pull/1032>`__,
without a PEP, to require that versions were normalized using :pep:`440` and
then ``-`` was escaped with ``_``.
Comment on lines +87 to +88
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was rather confused by this, such that I had to look up the relevant PRs in question to figure out what this was intended to mean—in particular, I initially thought that the "and then" referred to the two separate changes, instead of two steps in the normalization process, and that it was talking about imposing additional requirements rather than effectively relaxing existing ones. This should hopefully clarify it a bit:

Suggested change
without a PEP, to require that versions were normalized using :pep:`440` and
then ``-`` was escaped with ``_``.
without a PEP, to instead require normalizing the version component of the filename
by :pep:`440` followed by escaping ``-`` to ``_``.


That change also required that runs of ``-_.`` should be replaced with ``_``
as well as lowercasing everything. It noted that it was equivalent to :pep:`503`
normalization followed by replacing ``-`` with ``_``.
Comment on lines +90 to +92
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I was very confused here before I read the actual change—this doesn't makes clear that this, now, is now referring to the normalization of distribution names instead of versions (or other parts of the filename), and that it again relaxes some existing requirements rather than adding a new one.

Suggested change
That change also required that runs of ``-_.`` should be replaced with ``_``
as well as lowercasing everything. It noted that it was equivalent to :pep:`503`
normalization followed by replacing ``-`` with ``_``.
That update also changed the distribution name normalization in wheel filenames
to that of lowercasing everything and replacing runs of ``-_.`` with ``_``,
which is equivalent to :pep:`503` normalization followed by escaping ``-`` to ``_``.

Also, fix some textual issues.


It was `noted 9 months later <https://discuss.python.org/t/5605/21>`__, that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It was `noted 9 months later <https://discuss.python.org/t/5605/21>`__, that
It was `noted 9 months later <https://discuss.python.org/t/5605/21>`__ that

Eliminate stray comma

there wasn't much discussion on the change to name normalization, but that it
landed anyways.
7. :pep:`621` was accepted, which provided a way to specify project metadata in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
7. :pep:`621` was accepted, which provided a way to specify project metadata in
7. :pep:`621` was accepted, which provided a way to specify project metadata in

``pyproject.toml``. This PEP was careful to make the distinction between
static metadata, where tools could trust the values in ``pyproject.toml`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static metadata, where tools could trust the values in ``pyproject.toml`` and
static metadata where tools could trust the values in ``pyproject.toml`` and

Commas should be consistent here between the two parallel/contrasting clauses

dynamic metadata where they could not.

However, this PEP doesn't clarify whether static values must be identical
values or equivalent values. To make matters worse, it includes the statement
Comment on lines +102 to +103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, while the PEP does not make this as explicit as it could be, the working definition that I and others seem to have settled on, based on the clear motivation and goals for requiring the explicit dynamic field is "neither".

It isn't really possible to interpret it as "identical", as there is not a 1:1 mapping between project-table keys and core metadata fields, the PEP itself specifies a number of deterministic transformations between the former and the latter (e.g. with author/email and maintainer/email, with license.file, with format-specific escaping, etc.). And there is no relevant general notion of "equivalence", at least for the variety of keys and fields involved.

Rather, it appears intended to guarantee that any standards-conforming tool will deterministically produce the same METADATA output from a given pyproject.toml input, by only following the relevant accepted standards, and thus any tool following those same standards can statically and unambiguously determine the output, without having to involve the project's chosen build backend or execute any external dynamic code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. I think it simply wasn't considered. At the time, people were definitely talking about reading metadata from pyproject.toml if it was specified as static. That's only reasonable if there's a presumption of the data being copied exactly. In reality, the notion of reading from pyproject.toml was considered "possible, but not recommended" - which led directly to me writing PEP 643 (Metadata for Source Distributions). But I don't recall anyone at the time discussing the issue in enough detail to pick up on the question of transformations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, I didn't mean to speak to what people thought at the time the PEP was written here (as I wasn't around for that, unlike yourself—thanks for the insight!). And I definitely agree that Name MUST NOT be normalized in the METADATA.

To clarify the above, what I meant was that ultimately, what seems to have been settled on is that the output METADATA, minus pyproject metadata keys marked dynamic, can be statically and deterministically extracted from a given pyproject.toml following only what is specified in the current "Declaring project metadata" spec. However, its hard to see how people could have previously interpreted it as an identical 1:1 copy, as the [project] table keys do not map 1:1 to METADATA fields, and the original PEP itself already required certain non-trivial transformations, e.g. for author/author-email and maintainer/maintainer-email, among others. And of course, Brett's own PEP 685 followup requires normalization for extras names in [project].

By contrast, as you mention, AFAIK PEP 643 does entail that non-dynamic sdist PKG-INFO fields can be copied 1:1, since there is an exact 1:1 mapping there any transformations from the user-entered data that will end up in the METADATA should have already been applied in the PKG-INFO.

that tools should normalize the name, using :pep:`503` rules, as soon as it
is read for internal consistency.
8. :pep:`625` was accepted, standardizing on a format for filenames for sdists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
8. :pep:`625` was accepted, standardizing on a format for filenames for sdists.
8. :pep:`625` was accepted, standardizing on a format for filenames for sdists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
8. :pep:`625` was accepted, standardizing on a format for filenames for sdists.
8. :pep:`625` was accepted, standardizing the format for filenames in sdists.

Fix, clarify and improve the grammar and phrasing here

This PEP requires that the project names are normalized "as described in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This PEP requires that the project names are normalized "as described in the
This PEP required that the project names are normalized "as described in the

Past tense is used for the other PEPs here when discussing what the PEPs require projects to do, so it is IMO worth being consistent here (particularly given the former are static, historical change proposals and the latter are the actual living, canonical specs).

wheel spec", which at the time meant full :pep:`503` normalization, and
versions normalized as per :pep:`440`.


Independently to all of the above, and prior to (4), PyPI had implemented a
check that ensured that the filename being uploaded matched the current project
name. This check did not correctly take into account normalization, but did take
into account filename escaping. It also implements renames by allowing projects
to rename themselves by changing their project name in their metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to rename themselves by changing their project name in their metadata.
to rename themselves by changing the project name in their metadata.

Avoid hard to read double "their"


The effect of all of the above, is that we're now in a situation where:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The effect of all of the above, is that we're now in a situation where:
The effect of all of the above is that we're now in a situation where:

Eliminate spurious comma


* Some tools will normalize the filename before writing them, either to the
filesystem or to PyPI.
* Some tools will normalize the project name before emitting them to either
``METADATA`` or to PyPI.
* Some tools (PyPI) require that the filename and the project name match, without
taking normalization into account.
* Some tools (Artifactory) require that the filenames are not normalized.
* The above sets of tools do not perfectly overlap in any direction.

We've essentially created a mess where nobody is emitting filenames in quite the
same way and the normalization rules, first defined in :pep:`503` are being used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
same way and the normalization rules, first defined in :pep:`503` are being used
same way and the normalization rules first defined in :pep:`503` are being used

Eliminate another spurious comma

in contexts where it is not appropriate to do so.


Rationale
=========

This PEP follows two guiding principals:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This PEP follows two guiding principals:
This PEP follows two guiding principles:


1. Names are provided by people and should be used as is where possible. The
name of the project, and how it appears, is a fundamental property of the
project.
2. When interpreting names, tooling should normalize values as much as
possible to reduce confusion.

This follows the original intent behind the normalization in :pep:`503`, which
was designed to be a normalization applied when two computers spoke to each
other, not as something that would "leak" out into the human-facing areas.


Specification
=============

The project name that is specified by an author ends up flowing through several
parts of the ecosystem, and each part needs to be considered on its own to determine what
kind of name (normalized or not) makes sense in that part.

In general, we follow the guiding principals, use the unnormalized name as
provided by the author wherever possible, and normalize strictly where not.
Comment on lines +157 to +158
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In general, we follow the guiding principals, use the unnormalized name as
provided by the author wherever possible, and normalize strictly where not.
In general, we follow the guiding principles of using the unnormalized name as
provided by the author wherever possible, and normalizing strictly where not.
  • Fix grammar errors
  • Wrong "principal" 😂


In some cases, we are simply repeating the status quo, this is done to provide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In some cases, we are simply repeating the status quo, this is done to provide
In some cases, we are simply repeating the status quo; this is done to provide

Fix grammar error (comma splice)

clarification and to be explicit which uses were considered as part of this
PEP.


Core Metadata
-------------

The ``Name`` field **MUST NOT** be normalized when emitting into ``METADATA``
or ``PKG-INFO``.

The ``Name`` field **MUST NOT** be normalized when uploading to a repository.

The ``Name`` field **SHOULD NOT** be normalized when being presented for display
to a user.

The ``Name`` field **MUST** be normalized during comparison.

Tools that read the ``Name`` field from a core metadata file **MUST** be prepared
to accept unnormalized names.


pyproject.toml
--------------

The ``project.name`` key **MUST** be preserved exactly as the author chose to
represent it, and **MUST** be emitted in this way into ``METADATA`` or
``PKG-INFO``.

The ``project.name`` field **MUST** be normalized during comparison.


.dist-info directories
----------------------

The directory name follows the pattern of ``{name}-{version}.dist-info``.

The ``name`` field **MUST** be normalized, with any resulting ``-`` escaped to ``_``.

Tools that read an arbitrary ``.dist-info`` directory **MUST** be prepared to
accept unnormalized values, however tools that work only on *new* ``.dist-info``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
accept unnormalized values, however tools that work only on *new* ``.dist-info``
accept unnormalized values; however, tools that work only on *new* ``.dist-info``

Fix grammar error (comma splice)

directories **SHOULD** validate that all values are normalized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not grammar, but rather a point of semantics (which may need to be raised on Discourse, but if I don't mention it now, I'll forget 🙂). I'd prefer this to be MAY. We should never insist (even mildly) on validation from consumers. Suggesting it is fine.



Source and Binary Distributions
-------------------------------

Both the sdist and bdist specifications incorporate the project name in their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Both the sdist and bdist specifications incorporate the project name in their
Both the :ref:`sdist <packaging:source-distribution-format>` and
:ref:`wheel <packaging:binary-distribution-format>` specifications
incorporate the project name in their

Link/cross-reference the appropriate specifications for readers, and use their common names

filenames (``{name}-{version}.tar.gz`` and
``{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it may not be obvious this is the "project name" mentioned above nor the name field mentioned several times below, seems like it would be worth being clear and consistent and using name in the placeholder—it doesn't have to be an exact quote of the placeholder names from the respective specs.

Suggested change
``{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl``
``{name}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl``

respectively).

The ``name`` field **MUST** be non-normalized, with the exception that any ``-``
**MUST** be escaped to be ``_``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to object to this on Discourse... At a minimum, there should be an explanation here of why you propose this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I have an update to this PEP that changes the proposal, this was my first cut, and I didn't get a chance to publish it before the direction of the conversation changed. So hold off till I get an update written for this :)


Tools that accept an arbitrary distribution **MUST** be prepared to accept both
non-normalized and normalized filenames. However, tools that only work on *new*
distributions **SHOULD** validate that the distribution filenames are not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, MAY rather than SHOULD IMO.

normalizing ``name``.


Simple Repository API
---------------------

The project name, when returned in the "index" URL (e.g. ``/simple/``)
Copy link
Member

@pfmoore pfmoore Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should clarify where exactly, as I had to check PEP 503 to understand it. Also "in the index URL" is confusing.

Suggested change
The project name, when returned in the "index" URL (e.g. ``/simple/``)
The project name, when used in the *text* of the anchor tag in the page served at the root URL (e.g. ``/simple/``)

**MUST** be non-normalized.

The project name when used in the URL (e.g. ``/simple/$project/``) **MUST** be
normalized.

The project name, when used on the Project detail page
Copy link
Member

@pfmoore pfmoore Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the HTML version of the index, the project name is never used (except in any additional content not covered by the spec). So this requirement can be omitted - unless you want to cover the JSON form as defined in PEP 691, in which case

Suggested change
The project name, when used on the Project detail page
The project name, when referenced in the ``"name"`` field of the JSON format project detail page

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Fixed mistaken single backticks in the above comment)

(e.g. ``/simple/$project/``), **MUST** be non-normalized.

Tools that read values for filenames and names from the Simple Repository API
**MUST** be prepared to handle both normalized and non-normalized names.


Backwards Compatibility
=======================

This PEP breaks compatibility in a few ways:

* Tools that are currently emitting filenames where ``name`` has been normalized
in accordance with the current spec are immediately no longer compliant and
must be updated to emit non-normalized names.

* This is mitigated by the fact that all tools are required to continue to
accept both normalized and non-normalized filenames unless they *know* that
they only work on *new* distributions (PyPI uploads, ``pyproject-build``, etc).

* Tools that emit normalized names into ``METADATA``, ``PKG-INFO``, or when
uploading to a repository are immediately no longer compliant and must be
updated to emit non-normalized names.

* It's unclear in the current spec whether names were intended to be normalized
in this case or not, but the practice of normalization here has caused a
number of people to be confused why their names are different from what
they've entered.

* Tools that are currently emitting the names in the simple API (outside of the URL
itself) as normalized, which is either allowed or required by the spec
Comment on lines +259 to +260
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Tools that are currently emitting the names in the simple API (outside of the URL
itself) as normalized, which is either allowed or required by the spec
* Tools that are currently emitting normalized project names in the Simple Repository API
(outside of the URL itself), which is either allowed or required by the spec
  • Clarify hard to parse and potentially-confusing awkward phrasing
  • Use the same full, title-cased form of the Simple Repo API name as used elsewhere, for clarity and consistency

currently are immediately not longer complaint and must be updated to emit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
currently are immediately not longer complaint and must be updated to emit
currently are immediately no longer complaint and must be updated to emit

Fix typo

non-normalized names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PEP should probably list which tools this is known to affect. After reading the PEP to here, I'm sufficiently overwhelmed as to be unsure whether this is referring to PyPI or Artifactory, or some other index software.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression here was that this specific bit referred to neither, and rather referred to installers using the normalized rather than the unnormalized name in user-facing UI text—but clearly, given there is evident confusion here it would be good to clarify appropriately.


* Like for filenames, this is mitigated by the fact that all tools are required
to continue to accept both normalized and non-normalized values.


Tools that validate *new* values should ideally start warning on now-invalid
options for some period of time, before starting to hard fail when encountering
them.


Rejected Ideas
==============

Require Normalization Everywhere
--------------------------------

One other possible idea is to simply require normalization everywhere, however
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One other possible idea is to simply require normalization everywhere, however
One other possible idea is to simply require normalization everywhere; however,

Fix grammar error (comma splice)

this PEP rejects that.

The primary reason we reject it is that the name of a project is not an internal
identifier, but is central to that project's identity. Projects often have
strong opinions on the way that their project's name should look, and
normalization removes that from them.

There are situations where we need a normalized value, so this PEP does use
them, but attempts to use them sparingly, only when they're actually required.
It treats normalization as something that is done when software is talking to
software about a project, and not when humans are talking about it.


Require Normalization in Filenames
----------------------------------

Filenames sit in a weird place, in most cases they are produced by software
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Filenames sit in a weird place, in most cases they are produced by software
Filenames sit in a weird place; in most cases they are produced by software

Fix another comma splice grammar error

and are consumed by software, so in theory it should be fine to normalize them
which has some nice properties.

However, this PEP rejects doing that.

Although they are often a software-to-software identifier, they are also used by
humans when sharing and manually downloading the software. They appear in places
like the PyPI UI, GitHub Releases, downstream Linux repositories, etc. In some
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downstream Linux repositories? Huh? Linux distros as a rule don't expose to users files straight off PyPI, they repack them into their own archive format following their own (nearly always normalized) naming convention. Furthermore, AFAIK most if not nearly all have naming and normalization policies similar or substantially stricter than PEP 503's for the canonical user-facing name of their packages, not just the file names. Therefore, I'm rather confused how this is relevant here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the wheel filename was explicitly designed to be for machine consumption, so saying it's not a software-to-software identifier is misrepresenting the intention of the design. (Thanks to compatibility tags, no-one would say wheel filenames are for human consumption 😉)

IMO, this whole argument is the weakest point of the whole PEP, and the overall proposal would be changed very little if it was changed to require normalisation of the project name in wheel and sdist filenames, but was left otherwise unchanged. If that isn't actually the case, then this is the section that should explain what the actual problem with normalising is. That would be a far better rejection reason than the current (frankly, largely subjective) argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, entirely agreed—I'd briefly mentioned it in my top-level review, but particularly as a PEP editor decided to save my own detailed commentary on this for the PEP discussion thread. However, as it ended up getting discussed at some length on Barry's thread that kicked off this PEP, it sounds like Donald mentioned there that he'd be revising this accordingly (hopefully to require normalization, which seemed to be the general consensus, rather than just allow it).

cases the only incanation of the project's name someone might see is the name
embedded into the filename.

Further, historically filenames were not normalized, and a change to the spec
that did not go through the PEP process is what required it. However, prior to
that change, people have created systems that rely on encoding information into
the project name, such as namespaces using the ``.`` character, which a
requirement to normalize would break.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.