This repository contains educational content and tools for performing machine learning (ML) modeling on soil organic carbon data. The goal is to provide an interactive learning experience for users to explore and apply various ML techniques in environmental data modeling, specifically focused on soil health metrics.
Visit our Cyber-Infraestructure here
Link to our public API of soil organic carbon model for CONUS here
The Soil Organic Carbon ML Lab is designed to help users learn how to:
- Analyze soil organic carbon data using various machine learning models.
- Experiment with data preprocessing, model training, and evaluation in an interactive environment.
The lab is equipped with various tools and educational resources that allow users to build predictive models and explore geospatial data related to soil properties.
- Python 3.8 or higher
pip
package manager- Recommended: A virtual environment tool such as
venv
orconda
.
To get started, clone this repository to your local machine:
git clone https://github.com/UW-Madison-DSI/soil_organic_carbon_ml_lab.git
cd soil_organic_carbon_ml_lab
Set up a virtual environment (recommended):
# For venv
python3 -m venv venv
source venv/bin/activate # On Windows, use 'venv\\Scripts\\activate'
# For conda
conda create --name soil_ml python=3.8
conda activate soil_ml
Install the required Python packages:
pip install -r requirements.txt
This repository includes several Jupyter notebooks that guide users through different stages of machine learning modeling for soil organic carbon data. To run the notebooks:
Then, open the notebook of your choice and follow the instructions within.
If the project includes a Streamlit-based web application, run the app with:
streamlit run app.py
This will launch a web interface for interacting with the soil organic carbon ML models.
- Data Preprocessing.ipynb: Steps to clean, prepare, and explore the soil organic carbon dataset.
- ML Modeling.ipynb: Build and evaluate machine learning models such as Random Forest and XGBoost to predict soil carbon levels.
- Visualization.ipynb: Visualize trends in soil organic carbon across different geographical regions.
The soil organic carbon data used in this project can be sourced from public datasets, as described here or can be loaded from your local directory.
- states_shape/States_shapefile.shp: Shapefile used for geospatial analysis of states.
- data/sample_soc_observations/: Contains sample data for soil organic carbon observations. This data was retrieved with extensive effort of data extraction from various public sources.
Ensure the data is structured correctly in the required format before running the models.
Contributions are welcome! Please follow these steps to contribute:
- Fork the repository.
- Create a new branch (`git checkout -b feature-branch`).
- Commit your changes (`git commit -m "Add some feature"`).
- Push to the branch (`git push origin feature-branch`).
- Open a pull request.
If you encounter any bugs or have questions, please open an issue in the repository.
This project is licensed under the MIT License.
For questions or inquiries, please contact:
- Jingyi Huang - Associate Professor, Department of Soil and Environmental Sciences, UW-Madison
- Google Scholar: [https://scholar.google.com/citations?user=ZEkcUDUAAAAJ&hl]
- Lab website: [https://soilsensingmonitoring.soils.wisc.edu/]
- Email: [email protected]
- Maria Oros - Data Scientist, UW-Madison DSI Email: [email protected]