Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore shapefiles if they are under a hidden directory in the zip file #10627

Merged
merged 3 commits into from
Jul 16, 2024

Conversation

stevenwinship
Copy link
Contributor

@stevenwinship stevenwinship commented Jun 12, 2024

What this PR does / why we need it: Zip files containing shape files under hidden directories should labelled as a "ZIP Archive". Having it labelled as a "Shapefile as ZIP Archive" might be confusing to anyone looking to download the data.

Which issue(s) this PR closes: SPIKE: Improve how Dataverse labels shapefiles to prevent mislabelling of zip files that aren't shapefiles #8945

Closes #8945

Special notes for your reviewer:

Suggestions on how to test this:
zip test.zip src/test/resources/hiddenShapefiles.zip
upload this file which contains shapefile data under a hidden directory. Was showing as 'Shapefile as ZIP Archive'. Now shows 'ZIP Archive'
Upload double zip file with shapefiles in visible directory and see that it shows as 'Shapefile as ZIP Archive'.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Included

Additional documentation: None

@stevenwinship stevenwinship self-assigned this Jun 12, 2024
@coveralls
Copy link

Coverage Status

coverage: 20.594% (+0.02%) from 20.574%
when pulling 7b9319e on 8945-prevent-mislabelling-non-shapefiles-in-zip
into 5bf6b6d on develop.

This comment has been minimized.

@coveralls
Copy link

Coverage Status

coverage: 20.594% (+0.02%) from 20.574%
when pulling 2eea8e6 on 8945-prevent-mislabelling-non-shapefiles-in-zip
into 5bf6b6d on develop.

This comment has been minimized.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm including one suggestion.

@jggautier
Copy link
Contributor

Hi @stevenwinship. After this improvement makes its way to Harvard Dataverse, would I be able to change the label of a file that was labelled as "Shapefile as ZIP Archive", like the file in the dataset at https://doi.org/10.7910/DVN/HWVUER?

Maybe with the redetect file type API endpoint?

@coveralls
Copy link

Coverage Status

coverage: 20.594% (+0.02%) from 20.574%
when pulling 1b3e312 on 8945-prevent-mislabelling-non-shapefiles-in-zip
into 5bf6b6d on develop.

@coveralls
Copy link

Coverage Status

coverage: 20.594% (+0.02%) from 20.574%
when pulling 1b3e312 on 8945-prevent-mislabelling-non-shapefiles-in-zip
into 5bf6b6d on develop.

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:8945-prevent-mislabelling-non-shapefiles-in-zip
ghcr.io/gdcc/configbaker:8945-prevent-mislabelling-non-shapefiles-in-zip

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@stevenwinship
Copy link
Contributor Author

Hi @stevenwinship. After this improvement makes its way to Harvard Dataverse, would I be able to change the label of a file that was labelled as "Shapefile as ZIP Archive", like the file in the dataset at https://doi.org/10.7910/DVN/HWVUER?

Maybe with the redetect file type API endpoint?

Yes. I just tested the redetect endpoint and after exiting the ui and going back in the label changed to 'ZIP Archive'. Not sure why I had to exit and come back in but at least it looks correct.

@stevenwinship stevenwinship removed their assignment Jun 12, 2024
@stevenwinship stevenwinship added Type: Bug a defect Size: 10 A percentage of a sprint. 7 hours. labels Jun 17, 2024
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and should fix the specific issue - as shown by the test. FWIW: I suspect we could exclude more cases, i.e. if we don't detect shape file component files at the top or one dir down, the zip isn't a shapefile, but the problem is probably rare enough that changes can probably wait until there's a reported problem.

@sekmiller sekmiller self-assigned this Jul 16, 2024
@sekmiller sekmiller merged commit 93e7197 into develop Jul 16, 2024
19 checks passed
@sekmiller sekmiller deleted the 8945-prevent-mislabelling-non-shapefiles-in-zip branch July 16, 2024 18:22
@pdurbin pdurbin added this to the 6.4 milestone Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

SPIKE: Improve how Dataverse labels shapefiles to prevent mislabelling of zip files that aren't shapefiles
6 participants