Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up Archive Storage Tests #351

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

Swatinem
Copy link
Contributor

@Swatinem Swatinem commented Sep 9, 2024

  • Converts from class-based to function-based tests.
  • Cleans up usage of bucket names and filepaths.
  • Tests the minio and AWS backends against a live minio instance. (as a reminder: minio is supposed to be AWS API compatible)
  • Run GCS storage tests against upstream GCS.

The GCS tests were locally run by using the existing symbolicator GCS test credentials, though they are lacking some permissions.

Also embeds the other change to avoid constantly fetching GCS bucket metadata on each read/write.

Avoiding fetching of blob metadata was reverted however, as that does not play nice with correctly downloading gzip compressed archives depending on their content-type.


This is a properly tested alternative to #347, though it still keeps the "messing around with gzip content type" parts, so still does one additional metadata request which would ideally not be needed.

I advise to review with "ignore whitespace" settings because of reindentation.

- Converts from class-based to function-based tests.
- Cleans up usage of bucket names and filepaths.
- Tests the minio and AWS backends against a live minio instance.
  (as a reminder: minio is supposed to be AWS API compatible)
- Run GCS storage tests against upstream GCS.

The GCS tests were locally run by using the existing symbolicator GCS test credentials, though they are lacking some permissions.

Also embeds the other change to avoid constantly fetching GCS bucket metadata on each read/write.

Avoiding fetching of blob metadata was reverted however, as that does not play nice with correctly downloading gzip compressed archives depending on their content-type.
@Swatinem Swatinem requested a review from a team September 9, 2024 15:05
@Swatinem Swatinem self-assigned this Sep 9, 2024
Copy link
Contributor

@michelletran-codecov michelletran-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I'm not opposed to having some form of integration tests in CI. But I think the cost of that is that we'll make a live call to GCP, which, may incur extra $$ (i.e. cost of bucket access), more chance of flakes, and increased developer time rerunning CI, increased development complexity if the dev wants to test locally etc.. I think we probably want to consider the implications of this extra cost on the benefits of these additional tests (i.e. what is the user impact of broken code that happens here? can errors be detected quickly? and are mitigations, like rollbacks, easy to perform and quick to roll out?).

I think the tests here just ports the unit tests to do a live service call instead of talking to a mock. I'm not entirely sure about the value in this, as GCP API is unlikely to change (without lots of warning from Google). What we lose from this change are actual (non-GCP) calls for testing, which would be useful for costs (cheaper to run tests locally, than use CI run time to debug errors, not to mention bucket access costs), and ease of development (setting up envvars to run tests adds extra overhead and knowledge to the development setup). If we want to run these tests both with mocks AND on live environment (i.e. in CI), then maybe we should try to refactor the code so that we can inject the client that we want to use (i.e. use a mock client locally/most cases, and a live client in CI).

from .test_gcp import BUCKET_NAME, CREDENTIALS, IS_CI
from .test_gcp import make_storage as make_gcs_storage

pytestmark = pytest.mark.skipif(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to talk to a live version of GCS, can we move out of unit tests? Maybe a new directory for integration tests?

.parent # tests
/ "gcs-service-account.json"
).resolve() # fmt: skip
print(credentials_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print

IS_CI = os.environ.get("CI", "false") == "true"
CREDENTIALS = try_loading_credentials()

pytestmark = pytest.mark.skipif(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an integration test. Can we move the file into it's own integration test directory and out of unit tests?

pass


@pytest.mark.skip(reason="we currently have no way of cleaning up upstream buckets")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're skipping these tests because they're not possible/working, should we just get rid of them altogether? I'm not sure what the value of keeping a test that won't ever run or pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants