Field Museum of Natural History (FMNH) scripts

This repository contains miscellaneous scripts for use by FMNH staff, interns, and volunteers. These scripts are generated within Google Colaboratory ("Colab"). The benefits of Colab include: in-browser Python code execution, instructions embedded within scripts, and usability by non-technologists. To execute a Colab script, you must have and be logged into a Google account.

To run any script: click on the name (above) and then click the blue "Open in Colab" button that appears near the top of the page. This will navigate you to the active script within Google Colab.

Download_Image_Files_from_Pteridoportal.ipynb

About: Download images from public-facing web databases for digitized herbarium collections, based on your search query.
Preconditions: A stable internet connection, and local storage space for the files.
Inputs: A data download from the Pteridoportal or other related portal site (e.g. Bryophyte Portal).
Outputs: A ZIP file containing all of the image files.

Microplant_Mystery_Zooniverse_Processing.ipynb

About: Process a Zooniverse classification results file by flattening JSON strings into columns, making it suitable for spreadsheet analysis.
Preconditions: A Zooniverse account, an already-existing Zooniverse project which has gathered results (i.e. public participants). Classification results file has been downloaded from Zooniverse. Any rows that you don't wish to be processed (e.g. testing phase results) are deleted.
Inputs: A CSV file, which has been uploaded to your Google Drive (in the root "My Drive" folder).
Outputs: A ZIP file containing one CSV file for each Workflow.
Known Limitations: Currently, only two types of Tasks are extracted:
- Question
- Drawing (rectangle tool)

Test_train_split_an_image_set_with_metadata.ipynb

About: Given a folder of images in Google Drive, this script creates a duplicate folder with train/test splits.
Preconditions: A folder of images in Google Drive -- either a single folder with images and metadata intermingled, or multiple folders with only metadata in the root. (For the latter, only can do stratified sampling). Authenticate your Google Drive in Colab (tutorial).
Inputs: A Google Drive folder containing one dataset.
Outputs: A Google Drive folder (with editor access).
Known Limitations:
- Sampling is not random -- since our specimen image files are often named specimen_id-image_count.tif and are retrieved alphabetically, we sample images evenly across the entire folder.
- In a flat folder

Acknowledgements

Thanks and credit to the Grainger Bioinformatics Center and FMNH botany collections.

Please consult the included license for information about use and redistribution.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Download_Image_Files_from_Pteridoportal.ipynb		Download_Image_Files_from_Pteridoportal.ipynb
LICENSE		LICENSE
Microplant_Mystery_Zooniverse_Processing.ipynb		Microplant_Mystery_Zooniverse_Processing.ipynb
README.md		README.md
Test_train_split_an_image_set_with_metadata.ipynb		Test_train_split_an_image_set_with_metadata.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Field Museum of Natural History (FMNH) scripts

Download_Image_Files_from_Pteridoportal.ipynb

Microplant_Mystery_Zooniverse_Processing.ipynb

Test_train_split_an_image_set_with_metadata.ipynb

Acknowledgements

About

Releases

Packages

Languages

License

emcdona1/fmnh_scripts

Folders and files

Latest commit

History

Repository files navigation

Field Museum of Natural History (FMNH) scripts

Download_Image_Files_from_Pteridoportal.ipynb

Microplant_Mystery_Zooniverse_Processing.ipynb

Test_train_split_an_image_set_with_metadata.ipynb

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages