Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Rewrite data cp implementation #506

Merged
merged 48 commits into from
May 27, 2022
Merged

Conversation

andriihomiak
Copy link
Contributor

@andriihomiak andriihomiak commented Apr 29, 2022

This PR addresses many issues with previous code for neuro-extras data cp command (#500,#301). Main idea of the proposed changes is to modularize key processes, related to data copy and add some useful abstractions to simplify further addition of supported sources/destinations (like neu.ro platform buckets or Google Drive #235).

Readiness checklist

  • Previous source/destination combinations are fully supported
    • Local-to-Local
      • Plain Copy
      • Extraction
      • Compression
      • Extraction + Compression
    • Local-to-Cloud
      • Plain Copy
      • Extraction
      • Compression
      • Extraction + Compression
    • Cloud-to-Local
      • Plain Copy
      • Extraction
      • Compression
      • Extraction + Compression
    • Platform-to-Cloud
      • Plain Copy
      • Extraction
      • Compression
      • Extraction + Compression
    • Cloud-to-Platform
      • Plain Copy
      • Extraction
      • Compression
      • Extraction + Compression
    • Platform-to-Platform (between storage: and disk: only)

      storage: to storage: and disk: to disk: are pretty much supported, but will be introduced in separate PR

      • Plain Copy
      • Extraction
      • Compression
      • Extraction + Compression
  • Tests
    • Conftests passing
    • E2E
      • Old passing
      • New (will be in a separate PR)
        • Exist
        • Passing
  • Docs
    • New code is documented
    • Changelog is added

@andriihomiak andriihomiak added data_ingestion Issue on `neuro-extras data cp` functionality python Pull requests that update Python code labels Apr 29, 2022
@andriihomiak andriihomiak self-assigned this Apr 29, 2022
@andriihomiak andriihomiak marked this pull request as ready for review May 10, 2022 11:03
Copy link
Collaborator

@YevheniiSemendiak YevheniiSemendiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, thank you a lot for refactoring the spaghetti code we have written :)
All looks good to me, only some suggestions came to my mind, apply them or not - up to you.

neuro_extras/data/operations.py Outdated Show resolved Hide resolved
neuro_extras/data/operations.py Outdated Show resolved Hide resolved
CHANGELOG.D/506.misc Show resolved Hide resolved
neuro_extras/data/gcs.py Outdated Show resolved Hide resolved
neuro_extras/data/operations.py Show resolved Hide resolved
neuro_extras/data/operations.py Outdated Show resolved Hide resolved
neuro_extras/data/common.py Outdated Show resolved Hide resolved
neuro_extras/data/local.py Outdated Show resolved Hide resolved
neuro_extras/data/remote.py Outdated Show resolved Hide resolved
neuro_extras/data/archive.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@YevheniiSemendiak YevheniiSemendiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM, great job, man!
Please, let's get the tests working before merging.

@andriihomiak
Copy link
Contributor Author

e2e tests are addressed in #515

@andriihomiak
Copy link
Contributor Author

Test finally passed, but from now on when running e2e tests we need to check test logs, as some tests might xfail - if there are only a few, that do not reproduce consistently over the same os, chances are it is due to platform instability.

Resolving #512 might help alleviate this problem a little.

@andriihomiak andriihomiak linked an issue May 27, 2022 that may be closed by this pull request
@andriihomiak andriihomiak merged commit 47d3544 into master May 27, 2022
@andriihomiak andriihomiak deleted the refactor/data_cp branch May 27, 2022 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data_ingestion Issue on `neuro-extras data cp` functionality python Pull requests that update Python code
Projects
None yet
2 participants