-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft Release support in PyPI #8941
base: main
Are you sure you want to change the base?
Conversation
Ah, the tests are failing because of the db migration, I'm not super familiar with alembic so I'd appreciate a pointer as to how to fix it :) |
Congratulations for a non-trivial challenge of a PR :)
Can we send it at upload time ? You upload a draft release, receive the draft hash in some way, and then you can automate things based on that ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review round one: done!
I didn't read all the tests yet, because I imagine they will be changing quite a bit before they're ready. Also, I saw surprisingly little tests to make a 100% coverage, I'm curious if you managed to hit the 100% or if it's still missing some tests to write (CI will tell).
Congratulations on the PR. There are quite a few comments but it's really nice already 👏 . The 2 big things I'm looking forward to discuss are the DB columns and the --draft--
thing (cf my comments).
It's my first time doing such a big review on Warehouse, so I hope it was good (I'm completely open to feedback on the review if you think there are things I should improve :) )
warehouse/migrations/versions/f65c7127ca5d_draft_releases.py.bak
Outdated
Show resolved
Hide resolved
@@ -173,16 +174,21 @@ class Project(SitemapMixin, db.Model): | |||
def __getitem__(self, version): | |||
session = orm.object_session(self) | |||
canonical_version = packaging.utils.canonicalize_version(version) | |||
try: | |||
canonical_version, something, draft_hash = canonical_version.split("--") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the dummy variable be named something
, but this might trigger flake8 saying it's unused. I believe flake8 will not complain if it's named _
, which also is a classical way to indicate a dummy variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we supply 1.2.3--draft--somehash--yay
, this code will ignore the hash, and treat the whole string as a version number, while if we limit split to 2 splits:
canonical_version, something, draft_hash = canonical_version.split("--") | |
canonical_version, something, draft_hash = canonical_version.split("--", 2) |
then it will give us canonical_version == 1.2.3
, draft_hash == somehash--yay
which I believe is slightly more correct.
Note that in both case, we're likely to not find the designed release, but I'd imagine the second case to be less surprising.
And what about 1.2.3--foobar--expected_hash
? Is it expected that it would work ?
I'm thinking splitting by --draft--
would be more explicit and resolve a number of those questions at once.
... Thinking about it again, if I understand correctly, the only reason why we need to have the whole --draft--
thing is for the release url in the pypi UI, right ? In this case, why not have the url be /projects/foo/1.2.3/draft/hash/
? I believe this could be implemented by implementing __getitem__
on release returning itself only if the hash is correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took your suggestion about splitting on --draft--
, that way we standarize more on the format of the URLS.
Regarding the change on the URL scheme, if I understood pyramid traversal correctly, there'd be no way to stop the resolver from returning the page for /projects/foo/1.2.3
even when it's a draft release. We'd need to return the object in both cases because we'd need the Release in order continue traversing through /draft/hash
.
The other way would be to use routes intead of traversal, but then we'd have separate logic for returning a release page (and managing pages) that we'd need to keep consistent.
Unless there's a strong benefit for changing it, I think we can keep it this way,
<p> | ||
{% trans project_name=project.name, version=release.version %} | ||
After publishing this release, users will be able to use pip to install <code>{{ project_name }}=={{ version }}</code>. | ||
{% endtrans %} | ||
</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth mentionning that this action cannot be undone ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be worth mentioning, but maybe also mention that once published, releases can be yanked? Which it's not really the same as unpublishing
but has the side effects that one might expect for such action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioning both the irreversable nature and the only recourse will be to yank sounds like a good idea to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewjoachim Thank you for taking the time to review this PR, I know it wasn't quick and easy!
I tried to answer each one of your questions as best as I could, I ceirtainly missed some things :)
There's still some decisions that are stil on the table, so I'd like for others to voice their opinions there.
EDIT: And you're right about the tests not being fully comprehensive, there's no 100% coverage yet :)
@@ -759,6 +759,34 @@ def _is_duplicate_file(db_session, filename, hashes): | |||
return None | |||
|
|||
|
|||
def _get_release_classifiers(db_session, classifiers_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that there's a lot of space for improvement 👍
I just moved the code out of the file_upload
method to make its own function in order to reduce the complexity of file_upload
, so I didn't want to change anything about the function in order to reduce the change surface of this PR.
for rc in release_classifiers: | ||
release._classifiers = release_classifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not really sure what I was trying to do here either.
I'll get rid of the loop.
) | ||
for rc in release_classifiers: | ||
release._classifiers = release_classifiers | ||
new_dependencies = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key detail here is that apparently there's a difference between the metadata that wheels and sdists provide.
Since they're sequential file uploads, and I don't think there's a guarantee on precedence, we need to consider that we only want to overwrite the metadata when there's information. In other words, we might get a wheel with ONLY dependency information, and then a sdist with ONLY the description. These updates need to be handled differently because they can be cummulative.
Right now what happens is whatever file gets updloaded first sets the metadata of the release, and all subsequent uploads do not modify any fields (so the metadata of the release might be imcomplete at the end).
cc/ @di in case I'm misremembering the conversation we had about this particular issue.
@@ -1274,8 +1316,15 @@ def file_upload(request): | |||
"The digest supplied does not match a digest calculated " | |||
"from the uploaded file.", | |||
) | |||
# Skip duplicate check for files when it's a draft release | |||
if not release.is_draft: | |||
# Skip duplicate check for files when it's a draft release, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you're right, we'd want to skip the re-upload when the filename and hashes are the same :)
@@ -371,6 +383,7 @@ def __table_args__(cls): # noqa | |||
created = Column( | |||
DateTime(timezone=False), nullable=False, server_default=sql.func.now() | |||
) | |||
published = Column(DateTime(timezone=False), nullable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Well, I was following the pattern of having
created
andupdated
timestamps in models, in this case for when it was published. Then it occurred to me that instead of setting a boolean value to know if a release was published or not, we could just look at thepublished
column and check for null. So we get 2 use cases for the price of 1. Otherwise, we'd need another column (or table!) to keep track of what releases are drafts... -
This was another option that was considered, having the
draft_hash
be an attribute itself, or even a whole other model. However I think managing an attribute or another model unnecesary adds complexity to the architecture, and since I don't see most projects using this feature, I think using acolumn_property
is better than having an almost-empty column. An important detail is that the idea is to only compute this value when we know that the release is a draft (by looking at thepublished
column). However, I'm open to hear more in favor of having of this argument :) -
Yes, it's supposed to be part of the migration process, and it's automatically populated for all releases that are not explicitly a draft. So no worries there when deploying the new code.
-
You're correct. However, from what I gathered from the documentation you can't use information from other tables when using
column_property
. If you know of a way to do this, I'd be happy to avoid the denormalization :)
<p> | ||
{% trans project_name=project.name, version=release.version %} | ||
After publishing this release, users will be able to use pip to install <code>{{ project_name }}=={{ version }}</code>. | ||
{% endtrans %} | ||
</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be worth mentioning, but maybe also mention that once published, releases can be yanked? Which it's not really the same as unpublishing
but has the side effects that one might expect for such action.
<a class="status-badge status-badge--warn" href="{{ request.route_path('packaging.project', name=release.project.name) }}"> | ||
<span>{% trans %}Draft version{% endtrans %}</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't release the version, it takes you to the default (latest) version of the project 😅 (similar to when you look at an old release)
But adding such a button would be nice 🤔 the only problem is that this page can be viewed by any user with the right link, and not only maintainers (that's why the publish button is under Manage project
)
@alanbato if you rebase this so it doesn't conflict with |
Will do! |
026736f
to
f61709a
Compare
Alan, the only merge conflict right now is translations/messages - could you fix? |
f827a40
to
380accc
Compare
Fixed the conflicts in translations (I think theres room for improvement in our process there) Also took @ewjoachim's advice and provided a default value of Thanks @brainwane for requesting further feedback, I'd love for more people to take a look at this and see what else can be improved :) |
Also, I think tests are failing because of the migration history is a bit wrong. I would appreciate assistance on that 😅 |
Sat down with @alanbato at the PyCon sprints and thought through a plan for what we need to do to get this mergeable
|
One other thing: currently the |
@@ -344,9 +372,11 @@ def __table_args__(cls): # noqa | |||
ForeignKey("projects.id", onupdate="CASCADE", ondelete="CASCADE"), | |||
nullable=False, | |||
) | |||
project_name = Column(Text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to think about ways to avoid having to store the project name on the Release
model, as this somewhat complicates our models.
This is currently necessary because the project name is otherwise unavailable to the column property to compute the draft_hash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like draft_hash
should not be a column property, and it should just be a string that is a derived value. You could have this triggered in the DB or in python.
If we change the mechanism by which the draft hash is computed, we probably don't want to change old hashes, so that their URLs remain stable, so that leads me to think that it should be derived at the point of release creation I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or stepping back, what is the value of hashing these values at all? If the inputs are well known it feels like the hash is just there to make the URL more obscure in general, but I don't think they provide much value on their own. Is there any reason the draft release url doesn't just have the project name and version in the url?
If I recall, the original proposal for draft releases used random URLs to prevent people from being able to "guess" the draft release url, and forcing it to be something that they were given by PyPI. This was an attempt to limit some abuses of draft releases as some sort of release channel or something. But if we're just using urls that anyone can compute knowing nothing but project name + version, I don't see a ton of value in jumping through hopes to embed that rather than just doing something like:
/draft/$project/$version/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meain reason that we're calculatin a hash over these values is to avoid accidental usage by regular users. By requiring draft releases be installed with the use of this obfuscated "tokens" which are available and computable by maintainers, it sends the message that you shouldn't try to create them as a user. Avoiding abuse of this feature (as a private release channel for example) could be further mitigated by other means, like deleating the draft reelease after a certain time (think, a day). I do concede that the current implemention has more hoops to jump that I would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't feel like there's going to be a huge risk of accidental usage if using it requires typing --index-url https://pypi.org/draft/foo/1.0/
and just happen to know that there exists a draft release of that url? Is there some other mechanism we expect users to enable using a draft release?
If we really do want to hash it like this, we could probably just do /draft/hash($project)/hash($version)/
yea? Could even do /draft/hash($project)hash($version)/
and just split based on length or use a - or something.
IOW, if we treat the project and version as distinct pieces of data to be hashed rather than trying to combine them and then hash them, we can sidestep all the need to denormalize since it wouldn't be any different than looking up by Project.name
+ Release.version
.
I'd also suggest not using md5, not because this is security sensitive, but because some Linux distros patch hashlib.md5
to require passing a usedforsecurity=False
attribute to make md5 work at all, and it's simpler to just side step that completely. It should be reasonable enough to use sha256, or if we're worried about the length of the URL and we can mandate 3.6+ (I don't recall what the tools involved support at the moment) you can use blake2 with a custom sized key.
Something like:
import hashlib
hashlib.blake2b(b"adf", digest_size=16).hexdigest()
Should produce digests of similar to MD5, you could probably go smaller too?
Could also use an alternative encoding if we're worried about the length of URLs:
import base64
import hashlib
hashlib.blake2b(b"adf", digest_size=16).hexdigest() # 32 characters
base64.urlsafe_b64encode(hashlib.blake2b(b"adf", digest_size=16).digest()).strip(b"=") .decode("ascii") # 22 characters
Anyways, just some food for thought
There is now PEP 694: Upload 2.0 API for Python Package Repositories, which has discussions on discuss.python.org which is relevant to this issue. |
Resolves all merge conflicts that have crept in from pypi#8941 since the last commit to that branch. This commit does *not* attempt to resolve any comments on the original PR, nor have I even run the tests on this branch yet. No promises that I haven't botched the merge conflict resolutions either.
Over in #16277 I've resolved all the merge conflicts that have crept in as main evolved. I have not attempted to resolve any comments on this original PR, nor have I even run the tests on this branch yet (let's see what the CI status is). No promises that I haven't botched the merge conflict resolutions either. I'm planning on seeing if I can help move this initiative forward, so I do plan on doing those things. I also based my branch on this one, so @alanbato gets all due credit for the original work. |
Oh, it's been quite a while! A thousand thank yous @warsaw, as I'm sure it was quite a lot of conlicts that came with time that you went through. This PR was abandoned by me due to what I felt was not reaching a concensus on some of the pending decisions to about how this funcionality was going to be exposed to users, pending work on documenting (and then proposing a better version) of the PyPI API, and the discussion thread for this PR eventually went quiet too... My free-time is not as abundant as it was 4 years ago, but I'd love to collaborate on this if we still think the community will benefit from such a feature :) Let me know how I can help! |
@alanbato I'm happy to help, and it's actually part of my job now to pitch in! I think now that we have PEP 694 to work from, maybe we can make good progress again. I do have quite a few questions about the current PEP draft, so it's probably worth resolving those in conjunction with working on a PR. I greatly appreciate the work you've done so far! |
This PR aims to implement the functionality required to support draft releases in PyPI.
Drafts releases are intended to enable package maintainers to better control their release processes and pipelines, and giving an extra step for testing things out before a new release is available to the public. This is separate from pre-releases in the sense that pre-releases are for users to use and test, whereas draft releases are invisible to them, and are only inteded to be used by maintainers.
Previous discussions as to why a feature like this could be useful, can be found in #726 and in this thread.
Implementation details and Rationale
This implementations key change is the addition of a
published
attribute in theRelease
model, all the new functionality operates on this value. Thepublished
timestamp is set when a release is published, and itsnull
value signals that the release iscreated
but notpublished
, so therefore it's a draft release. In other words:created
andpublished
set to the same timestamp. Everythig is as usual.Is-Draft
header, and it is set toTrue
.published
to the current timestamp.Aditionally, a Release now has a
draft_hash
calculated value, which will be explained later.The differentiation between draft releases and published releases is done in two separate places: pypi.org and the Simple Index.
Draft releases in PyPI.org
Regarding the PyPI website, the main focus was to 1) filter out draft releases from showing in a project's page for regular users and to 2) add additional UI elements for maintainers to manage draft releases.
Part 1) was done by modifying the traversal logic of a project's release when looking a specific version in the following way:
For example, if
my-package
's version0.23.0
is a draft,https://pypi.org/project/my-package/release/0.23.0
would not resolve (404 page)https://pypi.org/project/my-package/release/0.23.0--draft--<some-hash>/
would talke you to that release's draft preview, given thatsome-hash
is that release's computeddraft_hash
. If thedraft_hash
is incorrect, it would not resolve.As for Part 2), please see the Visual Demo with the annotated screenshots.
Draft releases and the Simple Index
In order to support the upload, download, and installation of the draft versions of packages for the project's release workflows, while still making those drafts invisible to regular users, we would've needed to make significant changes to the Simple Index API.
Since this was a non-goal for the project, an alternative solution was used, in the form of a newly created Draft Index.
The Draft Index implements the same API as the Simple Index, with the key difference that it's isolated for every single draft release. This means that when you create a draft release for a project, PyPI: creates a separate index listing only that project, with only that version available for it.
In order to be able to install the draft version of your package, you need to pass down this index URL as an extra index to
pip
, by means of the--extra-index-url
flag. This allows pip to fetch the wheels & sdist for your package's draft version from this special index, while fetching all its dependencies leveraging the Simple Index and its current infrastructure.Similar to what we did with PyPI, we use the
draft_hash
value of a release to construct its draft index URL, in the format of:https://pypi.org/simple/draft/<draft_hash>/
The full
pip
command using this URL is shown on the release's page on PyPI for easy accessIn order to make this
draft_hash
deterministic for the maintainer's automation needs, the current implementation computes an md5 hash on the concatenation of the project's name and version.On the other side of the equation, in order to create and upload a draft release, build clients need to pass down the
Is-Draft
special header, and could let the package maintainer do so by accepting a--draft
flag when uploading such a release.I made an example implementation of how this could work with Twine, and you can try it out yourself.
Other Details
Request for Feedback
All feedback is welcomed and appreciated, and trying out how everything works by pulling in these changes and spinning up a local server is encouraged to get a feel of how this all work. I'll try to answer as many things as I can.
Aditionally, I'm specially looking for feedback on these things:
Finally, here's the:
Visual Demo
This is a simple project I created to showcase how the new features and workflow look like in PyPI.
When looking at the project homepage, there's nothing different than what we have now.
The release history only shows one version.
If we go to the Simple Index for this project, we can see there is only a wheel and a sdist for that one version.
But if we are logged in as a project maintainer and go to the Manage Project section, we see there's a draft release listed!
The release table also has a new column, that list the "published at" date alongside the "created at" date, which may now be different.
Draft releases have a new option in the Option menu, Publish.
If we choose the Manage option on this draft release, we'll see that there's a "Publish release" section. This has the same functionality as the option in the previous menu.
If you click any of these two options, a popup appears to confirm the action. And if you do...
The release is published! And will now appear in the release history, the project's homepage, and the simple index/api.
If we wanted to create a new draft release, we can do so by passing the
--draft
option to twine. ( implementation)twine upload -r localpypi --draft ../simple-proyecto/dist/proyecto-0.0.3*
Navigating back to the release detail page, we can click the "view release" link to get to the preview page.
Looking at the preview, we can see some notable differences. The first one being the special
pip
command that will intall this draft version of the package by leveraging--extra-index-url
and a special hash that will prevent regular others from installing this version without access to this information. The second difference is the "draft" badge, which you can click to return to the project home page.The index url that we pass to pip is an actual simple-index compatible page that list only this project, and only these draft versions.
And now we can install said draft release! (this is an old screenshot where the draft release is version 0.0.2)
That's all folks! 🎶