Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP458: Update TUF repository metadata on project index change #15815

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

lukpueh
Copy link
Contributor

@lukpueh lukpueh commented Apr 18, 2024

EDIT 2024/06/21: update description; drop discussion points (see #15815 (comment), #15815 (comment))

Part 3 in a series of PRs to integrate Repository Service for TUF (RSTUF) with Warehouse for PEP 458 adoption. Previous PR was #15484

This PR adds an async task to the tuf subpackage, in order to capture project changes in TUF repository metadata via the RSTUF API. Relevant Warehouse views -- file upload and project management (remove, yank) -- are updated to trigger the task.

This PR also updates the so far unused render_simple_detail function for this purpose.

@lukpueh lukpueh force-pushed the rstuf-add-targets branch from ede9c1a to 33dabfe Compare April 22, 2024 14:05
dev/environment Outdated
@@ -64,6 +64,9 @@ TWOFACTORMANDATE_AVAILABLE=true
TWOFACTORMANDATE_ENABLED=true
OIDC_AUDIENCE=pypi

TUF_RSTUF_API_URL="http://rstuf-api"
TUF_ENABLED=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. in RubyGems.org we use RSTUF_API_URL and we enable it just by presence. Not sure if worth to keep it similar.

https://github.com/rubygems/rubygems.org/blob/ad1269b77e3ac83edac6a8f399125ba9fec0280d/config/initializers/rstuf.rb#L3C9-L7


Raises RuntimeError, if task fails.
"""
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in theory of state not being changed infinite loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is just a stopgap solution. We should definitely do better. See discussion item about async task handling in PR description.

@lukpueh
Copy link
Contributor Author

lukpueh commented May 2, 2024

Discussion

There are a few open question, for which I'd appreciate input from folks, who are more familiar with Warehouse.

...

Any chance to get some feedback from Warehouse maintainers? Also happy to arrange a call to discuss in person. cc @ewdurbin

On a side-note, we released v1.0.0 of the crypto interface used by RSTUF earlier today, and it includes a "VaultSigner", tailored for signing TUF metadata with a HashiCorp Vault key. :)

@ewdurbin
Copy link
Member

ewdurbin commented Jun 3, 2024

  • Completeness
    The PR should trigger TUF metadata updates in all relevant views, but someone needs to confirm. Basically, we need to update TUF metadata whenever a project change would result in a different project simple detail file, e.g. on distribution file upload, file removal, or yank. I wonder, if we could register a db hook for those changes, e.g. after_commit, which would also give us better certainty that the db change isn't rolled back.

It may be worth considering the machinery that handles our CDN cache purges (see warehouse/cache/origin) though it is admittedly opaque and a little convoluted to store "purge keys" on the request for handling after flush. This is woefully undocumented, but has been functionally reliable for a long time.

  • Async task handling
    Metadata update tasks in RSTUF are async. This PR busy polls the RSTUF task status API until the corresponding metadata update task has finished. I'm sure, Warehouse has better ways of dealing with async tasks.

Yes, look for usage of the @tasks.task decorator throughout the codebase.

lukpueh added a commit to lukpueh/warehouse that referenced this pull request Jun 13, 2024
Previously, `repository-service-tuf` (i.e. the RSTUF cli) was used to
bootstrap an RSTUF repo for development. This PR re-implements the
relevant parts of the cli locally in Warehouse and removes the
`repository-service-tuf` dependency, which conflicts with other
dependencies.

Change details
- Add lightweight RSTUF API client library (can be re-used for pypi#15815)
- Add local `warehouse tuf bootstrap` cli subcommand, to wraps lib calls
- Invoke local cli via `make inittuf`
- Remove dependency

supersedes pypi#15958 (cc @facutuesca @woodruffw)

Signed-off-by: Lukas Puehringer <[email protected]>
lukpueh added a commit to lukpueh/warehouse that referenced this pull request Jun 14, 2024
Previously, `repository-service-tuf` (i.e. the RSTUF cli) was used to
bootstrap an RSTUF repo for development. This PR re-implements the
relevant parts of the cli locally in Warehouse and removes the
`repository-service-tuf` dependency, which conflicts with other
dependencies.

Change details
- Add lightweight RSTUF API client library (can be re-used for pypi#15815)
- Add local `warehouse tuf bootstrap` cli subcommand, to wraps lib calls
- Invoke local cli via `make inittuf`
- Remove dependency

supersedes pypi#15958 (cc @facutuesca @woodruffw)

Signed-off-by: Lukas Puehringer <[email protected]>
miketheman pushed a commit that referenced this pull request Jun 14, 2024
* Replace conflicting repository-service-tuf dep

Previously, `repository-service-tuf` (i.e. the RSTUF cli) was used to
bootstrap an RSTUF repo for development. This PR re-implements the
relevant parts of the cli locally in Warehouse and removes the
`repository-service-tuf` dependency, which conflicts with other
dependencies.

Change details
- Add lightweight RSTUF API client library (can be re-used for #15815)
- Add local `warehouse tuf bootstrap` cli subcommand, to wraps lib calls
- Invoke local cli via `make inittuf`
- Remove dependency

supersedes #15958 (cc @facutuesca @woodruffw)

Signed-off-by: Lukas Puehringer <[email protected]>

* Make payload arg in tuf cli "lazy"

Other than the regular click File, the LazyFile also has the "name"
attribute, when passing stdin via "-".

We print the name on success.

Signed-off-by: Lukas Puehringer <[email protected]>

* Add minimal unittest for TUF bootstrap cli

Signed-off-by: Lukas Puehringer <[email protected]>

* Add unit tests for RSTUF API client lib

Signed-off-by: Lukas Puehringer <[email protected]>

---------

Signed-off-by: Lukas Puehringer <[email protected]>
@lukpueh lukpueh force-pushed the rstuf-add-targets branch from 33dabfe to 34a749f Compare June 20, 2024 17:15
@lukpueh lukpueh marked this pull request as ready for review June 20, 2024 17:15
@lukpueh lukpueh requested a review from a team as a code owner June 20, 2024 17:15
@lukpueh
Copy link
Contributor Author

lukpueh commented Jun 20, 2024

Just rebased on latest main, resolving conflicts and including some minor refactorings in the original commits for consistency with #16098. More significant changes/additions are added on top. The commit history is now a bit evolutionary. Happy to rewrite/squash, if that's a requirement.

cc @ofek, @trishankatdatadog, @dstufft

@lukpueh
Copy link
Contributor Author

lukpueh commented Jun 21, 2024

  • Completeness
    The PR should trigger TUF metadata updates in all relevant views, but someone needs to confirm. Basically, we need to update TUF metadata whenever a project change would result in a different project simple detail file, e.g. on distribution file upload, file removal, or yank. I wonder, if we could register a db hook for those changes, e.g. after_commit, which would also give us better certainty that the db change isn't rolled back.

It may be worth considering the machinery that handles our CDN cache purges (see warehouse/cache/origin) though it is admittedly opaque and a little convoluted to store "purge keys" on the request for handling after flush. This is woefully undocumented, but has been functionally reliable for a long time.

Thanks for the pointer, @ewdurbin! I decided to leave the explicit invocations for now. We can improve this in a follow-up PR.

  • Async task handling
    Metadata update tasks in RSTUF are async. This PR busy polls the RSTUF task status API until the corresponding metadata update task has finished. I'm sure, Warehouse has better ways of dealing with async tasks.

Yes, look for usage of the @tasks.task decorator throughout the codebase.

Thanks, this worked nicely.

@lukpueh
Copy link
Contributor Author

lukpueh commented Jun 21, 2024

  • Error handling
    This means setting up proper retry mechanism for temporary errors, and figuring out user feedback and recovery for permanent errors.

Now that the RSTUF API is called in an async task errors shouldn't affect the normal operation of Warehouse. I suggest to take the PR as is and improve error handling (ignore, retry or notify) in a next iteration.

@lukpueh lukpueh force-pushed the rstuf-add-targets branch from 6f89d17 to 106a536 Compare June 21, 2024 10:06
@lukpueh lukpueh closed this Jun 21, 2024
@lukpueh lukpueh reopened this Jun 21, 2024
@lukpueh
Copy link
Contributor Author

lukpueh commented Jun 21, 2024

(sorry for the noise, I accidentally hit the "close with comment" button)

@lukpueh lukpueh force-pushed the rstuf-add-targets branch from 106a536 to 6dfb07d Compare July 1, 2024 11:26
@ewdurbin ewdurbin self-requested a review July 1, 2024 16:42
@trishankatdatadog
Copy link
Contributor

Just rebased on latest main, resolving conflicts and including some minor refactorings in the original commits for consistency with #16098. More significant changes/additions are added on top. The commit history is now a bit evolutionary. Happy to rewrite/squash, if that's a requirement.

cc @ofek, @trishankatdatadog, @dstufft

Sorry, was on PTO. How can we best help here? For example, would you like us (esp. Donald) to review?

@lukpueh
Copy link
Contributor Author

lukpueh commented Jul 3, 2024

Thanks, @trishankatdatadog! A review by anyone would be amazing. Donald seems to be a great candidate because he knows Warehouse and TUF extremely well.

As said above, I think, this PR can be merged as is. However, I'd appreciate help with improvements in follow-up PRs (see my comments above about completeness and error handling).

Copy link
Contributor

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM FWIW!

My high-level Q is: does this work with the JSON Simple API (particularly for PEP 740)?

@@ -1861,6 +1862,8 @@ def delete_project(project, request):

remove_project(project, request)

request.task(tuf.update_metadata).delay(project.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see these calls a fair bit. Is there anything we could automatically do to reduce the burden of the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion about completeness above: #15815 (comment)

TLDR: This should be improved, but could happen in a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll echo this - we don't have many tasks directly called inside views today, rather we listen to the objects being changed via sqlalchemy and trigger tasks that way.

Here's a couple of examples:

@db.listens_for(db.Session, "after_flush")
def store_projects_for_project_reindex(config, session, flush_context):
# We'll (ab)use the session.info dictionary to store a list of pending
# Project updates to the session.
projects_to_update = session.info.setdefault(
"warehouse.search.project_updates", set()
)
projects_to_delete = session.info.setdefault(
"warehouse.search.project_deletes", set()
)
# Go through each new, changed, and deleted object and attempt to store
# a Project to reindex for when the session has been committed.
for obj in session.new | session.dirty:
if obj.__class__ == Project:
projects_to_update.add(obj)
if obj.__class__ == Release:
projects_to_update.add(obj.project)
for obj in session.deleted:
if obj.__class__ == Project:
projects_to_delete.add(obj)
if obj.__class__ == Release:
projects_to_update.add(obj.project)
@db.listens_for(db.Session, "after_commit")
def execute_project_reindex(config, session):
projects_to_update = session.info.pop("warehouse.search.project_updates", set())
projects_to_delete = session.info.pop("warehouse.search.project_deletes", set())
from warehouse.search.tasks import reindex_project, unindex_project
for project in projects_to_update:
config.task(reindex_project).delay(project.normalized_name)
for project in projects_to_delete:
config.task(unindex_project).delay(project.normalized_name)

@db.listens_for(db.Session, "after_flush")
def new_observation_created(_config, session: SA_Session, _flush_context):
# Go through each new, changed, and deleted object and attempt to store
# a cache key that we'll want to purge when the session has been committed.
for obj in session.new:
if isinstance(obj, Observation):
# Add to `session.info` so we can access it in the after_commit listener.
session.info.setdefault("warehouse.observations.new", set()).add(obj)
@db.listens_for(db.Session, "after_commit")
def execute_observation_report(config: Configurator, session: SA_Session):
# Fetch the observations from the session.
observations = session.info.pop("warehouse.observations.new", set())
for obj in observations:
# We pass the ID of the Observation, not the Observation itself,
# because the Observation object is not currently JSON-serializable.
config.task(report_observation_to_helpscout).delay(obj.id)
@tasks.task(bind=True, ignore_result=True, acks_late=True)
def report_observation_to_helpscout(task, request: Request, model_id: UUID) -> None:

@@ -90,7 +91,7 @@ def test_render_simple_detail_with_store(db_request, monkeypatch, jinja):

fake_named_temporary_file = pretend.stub(
name="/tmp/wutang",
write=pretend.call_recorder(lambda data: None),
write=pretend.call_recorder(lambda data: 42),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: could we specify and reuse a constant for the size (42)?

@lukpueh
Copy link
Contributor Author

lukpueh commented Jul 30, 2024

Thanks for the review, @trishankatdatadog!

My high-level Q is: does this work with the JSON Simple API (particularly for PEP 740)?

Great catch! This PR only covers the html simple index, but could be easily extended to also include the json simple index.

AFAICS we'd only have to make sure that

PEP 691 that TUF uses these target paths:

  • simple/PROJECT/vnd.pypi.simple.v1.html
  • simple/PROJECT/vnd.pypi.simple.v1.json

@trishankatdatadog
Copy link
Contributor

AFAICS we'd only have to make sure that

Got it, thanks. @woodruffw is this correct: did we miss anything?

miketheman
miketheman previously approved these changes Aug 23, 2024
Copy link
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey gang! I reviewed, and commented on how this is constructed, let me know if anything isn't clear!

dev/environment Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This init file is now starting to add more concepts other than the only pieces that are necessary to bootstrap.

I'd like to see a tuf/services.py and a tuf/interfaces.py, similar to other services, and split the components that way. We use pyramid-services style for dependency injection to make remote services either available or unavailable during dev/test - see examples in email or captcha modules.

Then, anything specific to the TUF CLI can get moved to the right module, and keep __init__.py for things that are only needed on import.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with commit chore: Implement TUF Interface and RSTUF Service

Comment on lines 106 to 107
@tasks.task(ignore_result=True, acks_late=True)
def update_metadata(request: Request, project_id: UUID):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond services and interfaces, split out another module for tasks.py to keep the modules scoped

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved with commit refactor: split tuf.__init__.tasks to tuf.tasks

warehouse/config.py Outdated Show resolved Hide resolved
@@ -1168,6 +1169,8 @@ def file_upload(request):

request.db.flush() # flush db now so server default values are populated for celery

request.task(tuf.update_metadata).delay(release.project.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be wrapped in a conditional that does not create the task if the URL setting is not created so file uploads are not creating tasks that get disposed immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved with commit chore: wrapped metadata update if rstuf is enabled

Comment on lines 86 to 89
# Test fail with incomplete response json (i.e. no bootstrap error)
del resp_json["data"]
with pytest.raises(tuf.RSTUFNoBootstrapError):
tuf.post_artifacts(self.server, payload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract failure test conditions to their own test cases

Copy link
Contributor

@kairoaraujo kairoaraujo Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @miketheman,

It has changed with my refactoring, but should I create two separate tests?

@pytest.mark.parametrize(
("states", "exception", "message"),
[
(
[
{"data": {"state": "PENDING"}},
{"data": {"state": "STARTED"}},
{"data": {"state": "RECEIVED"}},
{"data": {"state": "STARTED"}},
{"data": {"state": "SUCCESS"}},
],
None,
"",
),
(
[
{"data": {"state": "PENDING"}},
{"data": {"state": "STARTED"}},
{"data": {"state": "RECEIVED"}},
{"data": {"state": "STARTED"}},
{"data": {"state": "FAILURE"}},
],
services.RSTUFError,
"RSTUF job failed, please check payload and retry",
),
(
[
{"data": {"state": "PENDING"}},
{"data": {"state": "STARTED"}},
{"data": {"state": "RECEIVED"}},
{"data": {"state": "STARTED"}},
{"data": {"state": "ERRORED"}},
],
services.RSTUFError,

Comment on lines 132 to 136
# Test early return, if no RSTUF API URL configured
db_request.registry.settings = {"tuf.rstuf_api_url": None}
tuf.update_metadata(db_request, project_id)

assert not one.calls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another candidate for splitting to another test case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved with commit refactor: split tuf.__init__.tasks to tuf.tasks

Comment on lines 120 to 123
# NOTE: We ignore the returned simple detail path with the content hash as
# infix. In TUF metadata the project name and hash are listed separately, so
# that there is only one entry per target file, even if the content changes.
digest, _, size = render_simple_detail(project, request, store=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we explicitly ignore the simple_detail_path, why should the function continue to return it at all? Does any other caller use it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in the commit refactor: utils.render_simple_detail

Comment on lines 110 to 128
with tempfile.NamedTemporaryFile() as f:
simple_detail_size = f.write(content.encode("utf-8"))
f.flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only way for us to measure the size by writing out to disk? Could we keep the writing to disk still behind the if store conditional, and calculate the size in memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in refactor: utils.render_simple_detail

@@ -1861,6 +1862,8 @@ def delete_project(project, request):

remove_project(project, request)

request.task(tuf.update_metadata).delay(project.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll echo this - we don't have many tasks directly called inside views today, rather we listen to the objects being changed via sqlalchemy and trigger tasks that way.

Here's a couple of examples:

@db.listens_for(db.Session, "after_flush")
def store_projects_for_project_reindex(config, session, flush_context):
# We'll (ab)use the session.info dictionary to store a list of pending
# Project updates to the session.
projects_to_update = session.info.setdefault(
"warehouse.search.project_updates", set()
)
projects_to_delete = session.info.setdefault(
"warehouse.search.project_deletes", set()
)
# Go through each new, changed, and deleted object and attempt to store
# a Project to reindex for when the session has been committed.
for obj in session.new | session.dirty:
if obj.__class__ == Project:
projects_to_update.add(obj)
if obj.__class__ == Release:
projects_to_update.add(obj.project)
for obj in session.deleted:
if obj.__class__ == Project:
projects_to_delete.add(obj)
if obj.__class__ == Release:
projects_to_update.add(obj.project)
@db.listens_for(db.Session, "after_commit")
def execute_project_reindex(config, session):
projects_to_update = session.info.pop("warehouse.search.project_updates", set())
projects_to_delete = session.info.pop("warehouse.search.project_deletes", set())
from warehouse.search.tasks import reindex_project, unindex_project
for project in projects_to_update:
config.task(reindex_project).delay(project.normalized_name)
for project in projects_to_delete:
config.task(unindex_project).delay(project.normalized_name)

@db.listens_for(db.Session, "after_flush")
def new_observation_created(_config, session: SA_Session, _flush_context):
# Go through each new, changed, and deleted object and attempt to store
# a cache key that we'll want to purge when the session has been committed.
for obj in session.new:
if isinstance(obj, Observation):
# Add to `session.info` so we can access it in the after_commit listener.
session.info.setdefault("warehouse.observations.new", set()).add(obj)
@db.listens_for(db.Session, "after_commit")
def execute_observation_report(config: Configurator, session: SA_Session):
# Fetch the observations from the session.
observations = session.info.pop("warehouse.observations.new", set())
for obj in observations:
# We pass the ID of the Observation, not the Observation itself,
# because the Observation object is not currently JSON-serializable.
config.task(report_observation_to_helpscout).delay(obj.id)
@tasks.task(bind=True, ignore_result=True, acks_late=True)
def report_observation_to_helpscout(task, request: Request, model_id: UUID) -> None:

miketheman

This comment was marked as outdated.

@miketheman miketheman self-requested a review August 23, 2024 22:23
Copy link
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to use GitHub correctly, gonna stop for today 😆

@miketheman miketheman dismissed their stale review August 23, 2024 22:24

because I am making mistakes

@lukpueh
Copy link
Contributor Author

lukpueh commented Aug 28, 2024

Thanks for the feedback, @miketheman!

@kairoaraujo, @trishankatdatadog, and friends, please feel free to take over as discussed the other day!

@kairoaraujo kairoaraujo force-pushed the rstuf-add-targets branch 2 times, most recently from c39a363 to 28ab72d Compare October 4, 2024 09:58
@kairoaraujo
Copy link
Contributor

Note: I pushed some comment fixes, but I am still not finished (Work in Progress)

@kairoaraujo kairoaraujo force-pushed the rstuf-add-targets branch 2 times, most recently from bdc3309 to 0498c0a Compare October 7, 2024 10:18
lukpueh and others added 15 commits October 8, 2024 08:45
The simple detail file size is to be included in TUF metadata,
along with its path and hash.

Signed-off-by: Lukas Puehringer <[email protected]>
Add new TUF subpackage with functions to talk to the RSTUF API,
in order to update TUF repository metadata when a project changes.

The function requires RSTUF to be bootstrapped (see `make inittuf`), and
can additionally be toggled via setting:
- disabled by default
- enabled in dev environment

Signed-off-by: Lukas Puehringer <[email protected]>
Relevant views are "file upload", "remove" (file, release, project),
"yank".

Signed-off-by: Lukas Puehringer <[email protected]>
Make `tuf.update_metadata` an async task and call with `delay`.
The `project` argument becomes `project_id`, because it must be
serializable.

This change allows us to remove the `disable_tuf` test fixture, because
`delay` is mocked in the relevant tests anyway, so that
`update_metadata` isn't actually called in any existing tests. However,
call_recorder asserts need to be and are adapted.

Signed-off-by: Lukas Puehringer <[email protected]>
Instead, allow handling a custom RSTUFNoBootstrapError, which is
raised, if tuf.update_metadata is called before RSTUF is bootstrapped.

Signed-off-by: Lukas Puehringer <[email protected]>
Using TUF_RSTUF_API_URL as toggle instead should be good enough.
This is also what rubygems does. h/t @simi

Signed-off-by: Lukas Puehringer <[email protected]>
Signed-off-by: Lukas Puehringer <[email protected]>
Signed-off-by: Lukas Puehringer <[email protected]>
- `tuf.rstuf_*` to `rstuf.*`
- environment variable from `TUF_RSTUF*` to `RSTUF*`

Signed-off-by: Kairo de Araujo <[email protected]>
this refactor split the tasks from the tuf.__init__.py to tasks.py

Signed-off-by: Kairo de Araujo <[email protected]>
use rstuf upstream cli

update `make inittuf` to use upstream command

Signed-off-by: Kairo de Araujo <[email protected]>
- Implements the tuf/interfaces.py with ITUFService
- Implements the tuf/services.py with RSTUFService
- Refactor the tuf/tasks.py metadata_update to use the Service
- RSTUF tasks are triggered by changes in the Project db objects
- Add unit tests

Signed-off-by: Kairo de Araujo <[email protected]>
- remove the simple_detail_path as it is not used by any other caller
- calculate size without writting the file, only write when store is
  true

Signed-off-by: Kairo de Araujo <[email protected]>
- Update RSTUF services to 1.0.0b1
- Update payload to use `artifacts` instead `targets`
- Add RSTUF `PRE_RUN` state

Signed-off-by: Kairo de Araujo <[email protected]>
Signed-off-by: Kairo de Araujo <[email protected]>
@kairoaraujo
Copy link
Contributor

Hey folks,
Ready for another review! 🎉

Comment on lines +34 to +37
# NOTE: We ignore the returned simple detail path with the content hash as
# infix. In TUF metadata the project name and hash are listed separately, so
# that there is only one entry per target file, even if the content changes.
digest, size = render_simple_detail(project, request, store=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with/handle the other artifacts/pieces of metadata produced on file upload? In particular, I'm thinking of:

  1. The JSON (PEP 691) simple index
  2. The detached metadata (PEP 658)

Comment on lines +60 to +85
def wait_for_success(self, task_id):
"""Poll RSTUF task state API until success or error."""
for _ in range(self.retries):
state = self.get_task_state(task_id)

match state:
case "SUCCESS":
break

case "PENDING" | "PRE_RUN" | "RUNNING" | "RECEIVED" | "STARTED":
time.sleep(self.delay)
continue

case "FAILURE":
raise RSTUFError("RSTUF job failed, please check payload and retry")

case "ERRORED" | "REVOKED" | "REJECTED":
raise RSTUFError(
"RSTUF internal problem, please check RSTUF health"
)

case _:
raise RSTUFError(f"RSTUF job returned unexpected state: {state}")

else:
raise RSTUFError("RSTUF job failed, please check payload and retry")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own elucidation: is there a consistency risk with TUF if a particular upload fails and the next sequenced one succeeds? IIRC there isn't (since TUF isn't trying to impose a canonical ordering of uploads), but I wanted to make sure I'm not missing something there 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Separate from the more basic consistency risk of not retrying, that is.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants