Our final output is 06-ml-pipeline-in-htcondor-executor.ipynb, where other jupyter notebook are the intermediate notebook that we divide the job by the expected challenges, as follows:
- 01-intro-to-htcondor-python.ipynb: Trial to use Python bindings for HTCondor
- 02-running-sklearn-in-condor-executor.ipynb: Trial to execute Python script with Python libraries like
pandas
andsklearn
in HTCondor Executor - 03-running-sklearn-with-docker-in-condor-executor.ipynb: Initiative to explore Docker runtime for HTCondor jobs, we're not using this in the final as still stuck with the access to the NFS filesystem if using this method. See more info at ./setup_docs/Extra/Docker Runtime Setup Guide on.md
- 04-create-dag-file-via-python.ipynb: Trial to create DAG file via Python library:
htcondor.dags
API - 05-loading-csv-from-nfs-in-htcondor-executor.ipynb: Trial to read and write CSV in HTCondor Executor from shared NFS folder
If you're interested with our step-by-step setup guide, see here:
- Part 1 - HTCondor's Installation
- Part 2 - Install NFS File System
- Part 3 - Install Jupyter notebook on HTCondor Submit
- Part 4 - Managing HTCondor Jobs
And here are the challenges we've met during the setup. Please feel free to read here: Part 5 - Challenges we have met
So, back to our topic:
- Below is the machine learning workflow we're running via HTCondor ecosystem: