Clean up Archive Storage Tests #351

Swatinem · 2024-09-09T15:05:50Z

Converts from class-based to function-based tests.
Cleans up usage of bucket names and filepaths.
Tests the minio and AWS backends against a live minio instance. (as a reminder: minio is supposed to be AWS API compatible)
Run GCS storage tests against upstream GCS.

The GCS tests were locally run by using the existing symbolicator GCS test credentials, though they are lacking some permissions.

Also embeds the other change to avoid constantly fetching GCS bucket metadata on each read/write.

Avoiding fetching of blob metadata was reverted however, as that does not play nice with correctly downloading gzip compressed archives depending on their content-type.

This is a properly tested alternative to #347, though it still keeps the "messing around with gzip content type" parts, so still does one additional metadata request which would ideally not be needed.

I advise to review with "ignore whitespace" settings because of reindentation.

- Converts from class-based to function-based tests. - Cleans up usage of bucket names and filepaths. - Tests the minio and AWS backends against a live minio instance. (as a reminder: minio is supposed to be AWS API compatible) - Run GCS storage tests against upstream GCS. The GCS tests were locally run by using the existing symbolicator GCS test credentials, though they are lacking some permissions. Also embeds the other change to avoid constantly fetching GCS bucket metadata on each read/write. Avoiding fetching of blob metadata was reverted however, as that does not play nice with correctly downloading gzip compressed archives depending on their content-type.

michelletran-codecov

Generally, I'm not opposed to having some form of integration tests in CI. But I think the cost of that is that we'll make a live call to GCP, which, may incur extra $$ (i.e. cost of bucket access), more chance of flakes, and increased developer time rerunning CI, increased development complexity if the dev wants to test locally etc.. I think we probably want to consider the implications of this extra cost on the benefits of these additional tests (i.e. what is the user impact of broken code that happens here? can errors be detected quickly? and are mitigations, like rollbacks, easy to perform and quick to roll out?).

I think the tests here just ports the unit tests to do a live service call instead of talking to a mock. I'm not entirely sure about the value in this, as GCP API is unlikely to change (without lots of warning from Google). What we lose from this change are actual (non-GCP) calls for testing, which would be useful for costs (cheaper to run tests locally, than use CI run time to debug errors, not to mention bucket access costs), and ease of development (setting up envvars to run tests adds extra overhead and knowledge to the development setup). If we want to run these tests both with mocks AND on live environment (i.e. in CI), then maybe we should try to refactor the code so that we can inject the client that we want to use (i.e. use a mock client locally/most cases, and a live client in CI).

michelletran-codecov · 2024-09-12T14:12:12Z

tests/unit/storage/test_fallback.py

+from .test_gcp import BUCKET_NAME, CREDENTIALS, IS_CI
+from .test_gcp import make_storage as make_gcs_storage
+
+pytestmark = pytest.mark.skipif(


If we want to talk to a live version of GCS, can we move out of unit tests? Maybe a new directory for integration tests?

michelletran-codecov · 2024-09-12T14:15:11Z

tests/unit/storage/test_gcp.py

+                .parent # tests
+                / "gcs-service-account.json"
+            ).resolve()  # fmt: skip
+        print(credentials_file)


remove print

michelletran-codecov · 2024-09-12T14:16:47Z

tests/unit/storage/test_gcp.py

+IS_CI = os.environ.get("CI", "false") == "true"
+CREDENTIALS = try_loading_credentials()
+
+pytestmark = pytest.mark.skipif(


This looks like an integration test. Can we move the file into it's own integration test directory and out of unit tests?

michelletran-codecov · 2024-09-12T17:22:50Z

tests/unit/storage/test_gcp.py

+    pass
+
+
+@pytest.mark.skip(reason="we currently have no way of cleaning up upstream buckets")


If we're skipping these tests because they're not possible/working, should we just get rid of them altogether? I'm not sure what the value of keeping a test that won't ever run or pass.

Swatinem requested a review from a team September 9, 2024 15:05

Swatinem self-assigned this Sep 9, 2024

only patch blob metadata on checksum failure

196f95d

michelletran-codecov reviewed Sep 12, 2024

View reviewed changes

Swatinem mentioned this pull request Sep 13, 2024

Avoid metadata requests talking to GCS #347

Merged

Swatinem marked this pull request as draft September 13, 2024 14:07

Swatinem mentioned this pull request Nov 26, 2024

feat: support zstd compression in miniostorage #405

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up Archive Storage Tests #351

Clean up Archive Storage Tests #351

Swatinem commented Sep 9, 2024

michelletran-codecov left a comment

michelletran-codecov Sep 12, 2024

michelletran-codecov Sep 12, 2024

michelletran-codecov Sep 12, 2024

michelletran-codecov Sep 12, 2024

		pass


		@pytest.mark.skip(reason="we currently have no way of cleaning up upstream buckets")

Clean up Archive Storage Tests #351

Are you sure you want to change the base?

Clean up Archive Storage Tests #351

Conversation

Swatinem commented Sep 9, 2024

michelletran-codecov left a comment

Choose a reason for hiding this comment

michelletran-codecov Sep 12, 2024

Choose a reason for hiding this comment

michelletran-codecov Sep 12, 2024

Choose a reason for hiding this comment

michelletran-codecov Sep 12, 2024

Choose a reason for hiding this comment

michelletran-codecov Sep 12, 2024

Choose a reason for hiding this comment