This repository contains a pipeline for operational execution of the Amundsen Sea Ice Low calculations, provided in the asli
package. The functions in the asli
package are described in detail in the package repository amundsen-sea-low-index
(Hosking & Wilby 2024), and in Hosking et al. (2016).
This pipeline was built using the icenet-pipeline as a template (Byrne et al. 2024).
Clone this repository into a directory on your computer or HPC.
git clone [email protected]:antarctica/boost-eds-pipeline.git asli-pipeline
# if you are working on JASMIN you will need to load in jaspy
module load jaspy
python -m venv asli_env
source asli_env/bin/activate
To install all dependencies, inlcuding the asli
package, run:
pip install -r requirements.txt
If you are working on JASMIN, it is good to familiarise yourself with managing software environments on Jasmin:
The asli
package will not be able to download ERA5 data without access to the Copernicus Climate Data Store.
Follow these instructions to set up CDS API access: How to Use The CDS API.
nano $HOME/.cdsapirc
# Paste in your {uid} and {api-key}
This pipeline revolves around the ENVS
file to provide the necessary configuration items. This can easily be derived from the ENVS.example
file to a new file, then symbolically linked. Comments are available in ENVS.example
to assist you with the editing process.
cp ENVS.example ENVS.myconfig
ln -sf ENVS.myconfig ENVS
# Edit ENVS.myconfig to customise parameters for the pipeline
The pipeline allows data output to the JASMIN Object Store, a local file system, or both - depending on where you are running this pipeline and which output file formats you would like to use.
The pipeline uses s3cmd
to interact with S3 compatible Object Storage. If you configure your data to be written out to the JASMIN Object Store, you will need to configure s3cmd
to access your object storage tenancy and bucket.
You will need to generate an access key, and store it in a ~/.s3cfg
file. Full instructions on how to generate an access key on JASMIN and an s3cfg file to use s3cmd
are in the JASMIN documentation.
If you require data to be copied to a different location (e.g. the BAS SAN, for archival into the Polar Data Centre) you can configure this destination in ENVS
. This will then rsync
your output to that location.
Before running the pipeline, make sure you have followed the steps above:
- Cloned the pipeline.
- Set up your environment.
- Installed
asli
. - Set CDS API access with
.cdsapirc
. - Set configurations
ENVS.myconfig
and symbolically linked toENVS
. - Set configurations for the Object Store in
.s3cfg
.
You can now run the pipeline:
deactivate # Your environment is set in ENVS, so you do not need to call it
bash run_asli_pipeline.sh
A cron example has been provided in the cron.example
file.
crontab -e
# Then edit the file, for example to run once a month:
0 3 1 * * cd $HOME/boost-eds-pipeline && bash run_asli_pipeline.sh; deactivate
# OR on JASMIN we are using crontamer:
0 3 1 * * crontamer -t 2h -e [email protected] 'cd gws/nopw/j04/dit/users/thozwa/boost-eds-pipeline && bash run_asli_pipeline.sh; deactivate'
For more information on using cron on JASMIN, see Using Cron in the JASMIN documentation, and the crontamer package. The purpose of crontamer
is to stop multiple process instances starting. It also times out after x hours and emails on error.
If you need to submit this pipeline to SLURM (for example on JASMIN), you will need to provide sbatch
headers to the SLURM queue. We have not included sbatch headers in our script.
However, you can include sbatch
headers when you call the executable script:
# Submitting a job to the short-serial partition on JASMIN
sbatch -p short-serial -t 03:00 -o job01.out -e job01.err run_asli_pipeline.sh`
The following describes an example deployment setup for this pipeline. This was done under the BOOST-EDS project.
We are using a JASMIN group workspace (GWS) to run a data processing pipeline. Using the Copernicus Climate Data Store API, ERA5 data is read in. Calculations are then performed on LOTUS using asli
functions.Output data is stored on JASMIN Object Storage. This data is read in and displayed by this application. This application in turn is hosted on Datalabs.
This means compute, data storage and application hosting are all separated.
Each component listed above could also be deployed on different suitables infrastructures, for example BAS HPCs or commercial cloud providers.
The results of this pipeline are displayed in an application hosted on Datalabs.
Follow this tutorial to see how Datalabs and the JASMIN Object Store interact.
If you use this pipeline in your work, please cite this repository by using the 'Cite this repostory' button on the top right of this repository.
This work used JASMIN, the UK’s collaborative data analysis environment (https://www.jasmin.ac.uk).
Brown, M. J., & Chevuturi, A. object_store_tutorial [Computer software]. https://github.com/NERC-CEH/object_store_tutorial
Byrne, J., Ubald, B. N., & Chan, R. icenet-pipeline (Version v0.2.9) [Computer software]. https://github.com/icenet-ai/icenet-pipeline
Hosking, J. S., A. Orr, T. J. Bracegirdle, and J. Turner (2016), Future circulation changes off West Antarctica: Sensitivity of the Amundsen Sea Low to projected anthropogenic forcing, Geophys. Res. Lett., 43, 367–376, doi:10.1002/2015GL067143.
Hosking, J. S., & Wilby, D. asli [Computer software]. https://github.com/scotthosking/amundsen-sea-low-index
Lawrence, B. N. , Bennett, V. L., Churchill, J., Juckes, M., Kershaw, P., Pascoe, S., Pepler, S., Pritchard, M. and Stephens, A. (2013) Storing and manipulating environmental big data with JASMIN. In: IEEE Big Data, October 6-9, 2013, San Francisco.