Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document peps.json and move it to the root #2584

Open
AA-Turner opened this issue May 7, 2022 · 9 comments
Open

Document peps.json and move it to the root #2584

AA-Turner opened this issue May 7, 2022 · 9 comments
Assignees
Labels
enhancement infra Core infrastructure for building and rendering PEPs

Comments

@AA-Turner
Copy link
Member

AA-Turner commented May 7, 2022

We currently have https://peps.python.org/api/peps.json, an undocumented file.

It seems that people have usecases for the file, even without official support (see #2567, #2583). I suggest moving the file to the root1 for consistency with peps.rss and easier discoverability, and simultaneously documenting the file, and the minimum guarantees we provide about it.

In the issue that created it, Cam noted:

Maybe put this under an /api/peps endpoint (at least once we're ready to more publicly expose it)? Then if there was need/desire in the future, we could have an authors endpoint, sub-endpoints api/peps/N to get a single PEP's metadata, etc.

I think the api/ is uneeded here -- to those use cases, we could instead generate peps.python.org/pep-NNNN.json, and peps.python.org/authors.json.

I'd be interested in views.

A

Footnotes

  1. i.e. to https://peps.python.org/peps.json

@AA-Turner AA-Turner self-assigned this May 7, 2022
@AA-Turner
Copy link
Member Author

cc: @hugovk @CAM-Gerlach

@Rosuav
Copy link
Contributor

Rosuav commented May 7, 2022

👍 No need for the /api in the URL, the fact that it's a .json file implies that it's machine readable.

@hugovk
Copy link
Member

hugovk commented May 7, 2022

I don't have a strong opinion, but slightly prefer including api in the base URL.

https://pypi.org/project/pepotron/ is already using the API, so a move would cause 404s. Although I can fix and release it immediately, and to be honest doubt many people are using it, but it's still a break.

api.example.com and example.com/api are very common patterns for REST APIs. Looking up some best practices, I don't see anything saying if api should be there, but it's often used in examples.

A benefit is we can also have the docs at https://peps.python.org/api/ (for example like https://pypistats.org/api/).

@CAM-Gerlach CAM-Gerlach added the infra Core infrastructure for building and rendering PEPs label May 7, 2022
@AA-Turner
Copy link
Member Author

AA-Turner commented May 7, 2022

for REST APIs

I would argue that this is the wrong conceptual framing to use -- we are not in the business of providing a full programmatic APIs for PEPs, but simply a representation of the index as a JSON file for easier parsing and use.

In this spirit, the suggested authors.json and pep-NNNN.json files are technically unneeded, as one can parse all that data from the existing peps.json -- we would be providing it as a usability boon, rather than as building out an API.

I would very much encourage someone to run a "PEP API" service if wanted, but I really don't want to be in the business of anything beyond serving static files in this repo, and I think api has connotations more in line with a bigger, or more fully featured entity than what we're actually providing.

I'm aware some things would break if we moved the URL, but it was explicitly introduced as experimental and undocumented, so anybody making use of the file is doing so at their own risk -- I don't think breakage should be a big argument here. (It will be a different story when we document it, which is why I'd like to do so sooner rather than later).

A

@CAM-Gerlach
Copy link
Member

My opinion isn't that strong either, but I agree with @hugovk in also preferring api in the URL, In addition to the reasons Hugo mentioned, it draws a clear boundary between the regular user-facing content, which may include files in various formats including JSON, and the machine-readable API that is expected to be relatively stable (once we document it). This allows other users, like PEP-o-tron and @pfmoore 's tools, to know what they can rely on, while leaving us more free to change things elsewhere without worrying about breakage.

In any case, before/as part of formally documenting the API, we should provide the data in a more structured, easily-consumable form that is abstracted from that in the source; instead of requiring all tools (including our own) to independently parse the authors, post history, dates, etc. that are each in one of several different non-standard formats. This would allow us to make further changes to the user input format (simplifying it, being more flexible in what we allow, accepting URLs instead of emails for authors, etc) without having to worry about other tools being able to read it easily. It would also help address @AA-Turner 's concerns originally raised on #2358 regarding a lack of structure in the data, difficulty in tools reading it and the format/parsing being tied to reST/Sphinx.

In fact, as I've already been thinking about lately and discussed with @JelleZijlstra and @warsaw at PyCon, right now we parse the headers three different places with three different sets of logic, and instead should just use the structured format above (with the parsing presumably in the PEP class) for all of them, which would be a lot simpler and more DRY, reliable, maintainable and extensible overall. But as that's getting a bit ahead of ourselves, I've opened #2587 for that.

@CAM-Gerlach
Copy link
Member

we are not in the business of providing a full programmatic APIs for PEPs

Using the royal "we", are we? 😂

I really don't want to be in the business of anything beyond serving static files in this repo

I don't think anyone here is suggesting anything otherwise—a API is just a machine-readable interface to access some data or functionality, and does not require server-side interactivity, and that is exactly what is being proposed here. For example, the FSF API (full disclosure, I'm one of the maintainers) operates in essentially the same way as ours does.

What calling something an "API" fundamentally conveys is not a particular mechanism (REST, SOAP, etc) but rather that the data is machine-readable and reasonably stable enough to be used programmatically, which it seems there's already interest in despite not documenting or publicizing this at all. The value is enabling the wider ecosystem being able to easily and reliably consume and enrich PEP metadata for a variety of uses with minimal friction so long as they use what's under api/, while conversely giving us more freedom with the internals and user-facing GUI.

I'm aware some things would break if we moved the URL, but it was explicitly introduced as experimental and undocumented, so anybody making use of the file is doing so at their own risk -- I don't think breakage should be a big argument here.

Agreed there, but once we do, keeping anything we expect not to break under api (or some other subdir, if we want to bikeshed the name) makes it more clear where that can be expected to hold and where it doesn't.

@pfmoore
Copy link
Member

pfmoore commented May 8, 2022

I would argue that this is the wrong conceptual framing to use -- we are not in the business of providing a full programmatic APIs for PEPs, but simply a representation of the index as a JSON file for easier parsing and use.

As someone who's just found out about the JSON file and is now using it rather than scraping the HTML page, I can confirm that all I need is "a representation of the index as a JSON file for easier parsing and use". I really don't care whether you call it that, or an "API". I don't have any preconceptions about what an "API" might be beyond "a representation as a JSON file" - so call it what you like. But please don't remove it just because of terminology.

@onerandomusername
Copy link

I'm in the same boat as pfmoore here. I'm using undocumented files and have to keep updating my code to stay with the changes, and it would be great if there was a supported json representation of each pep.

Right now I'm using the sphinx generated objects.inv file at the website root to get a list of all peps in the repo and then when a user requests the pep number, fetching the html file of the pep and parsing the headers and information out of it.

In the end, I use the majority of the headers and the html generated content of the pep.

The end result for me ends up being something like this:
image

So while I can continue doing what I am currently doing, patching it when inevitable updates to the website html occur, it would be beneficial to have an API (even of static json files!) that provides access to all of the content of each pep.

@hugovk
Copy link
Member

hugovk commented Oct 12, 2023

Revisiting this.

  • Let's document the API at /api/index.html.

  • If we move the file to the root, we must also transparently redirect from the old to the new, perhaps indefinitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement infra Core infrastructure for building and rendering PEPs
Projects
None yet
Development

No branches or pull requests

7 participants
@Rosuav @pfmoore @hugovk @AA-Turner @CAM-Gerlach @onerandomusername and others