Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system: Introduce Kueue as an Optional Job Submission Approach #584

Open
xaviertintin opened this issue May 15, 2024 · 0 comments
Open

system: Introduce Kueue as an Optional Job Submission Approach #584

xaviertintin opened this issue May 15, 2024 · 0 comments

Comments

@xaviertintin
Copy link

xaviertintin commented May 15, 2024

This feature introduces Kueue as an optional alternative to the existing Kubernetes Job API for submitting user jobs in REANA. This enhances flexibility for REANA admins by allowing them to choose the most suitable job submission method for their specific requirements.

Goals:

  • Configurable Job Submission: Empower REANA administrators to configure Kueue as the job submission method during deployment by leveraging Helm values.
  • Automated Kueue Deployment: Streamline the process by automatically deploying the Kueue cluster upon admin selection of the Kueue option within their Helm values file.
  • Customizable Kueue: Provide basic configuration options for the Kueue deployment through Helm values and/or chart snippets, catering to specific admin preferences.
  • Resource Constraint Compatibility: Ensure all user-defined job resource constraints (e.g., memory limits) are fully supported when using Kueue. This includes thorough testing of job limits and requests to verify their proper transmission through Kueue.
  • Transparent Scheduling Feedback: Guarantee that scheduling errors, encompassing both successful scheduling and errors due o unavailable resources, are clearly communicated back to users in the workflow logs for both the standard and Kueue approaches.

Implementation Overview (Conceptual):

The GitHub project with the implementation steps and discussions that lead up to this integration can be found here
The repository with integration commands can be found here
Official documentation expected to be released June 2024 at latest

Component Updates:

1. reana:

  • Helm Value Configuration - (reana/helm/reana/values.yaml):
    • Introduce a new Helm value (e.g., kueueEnabled) to enable/disable Kueue usage.
  • Environment Variable Configuration - (reana/helm/reana/templates/reana-workflow-controller.yaml):
    • Define an environment variable (e.g., KUEUE_ENABLED) to reflect the Helm value setting.

2. reana-workflow-controller:

  • Retrieve the environment variable (KUEUE_ENABLED) in (reana-workflow-controller/reana_workflow_controller/config.py)
  • Select the appropriate job submission method based on the deployment type in (reana-workflow-controller/reana_workflow_controller/workflow_run_manager.py):
    • Standard: Utilize the existing Kubernetes Job API.
    • Kueue: Implement Kueue job submission logic.
  • Pass the chosen deployment type to downstream components (job_controller_env_vars).

3. reana-job-controller:

  • Access the environment variable (KUEUE_ENABLED) in (reana-job-controller/reana_job_controller/config.py).
  • Select the appropriate job submission method based on the deployment type in (reana-job-controller/reana_job_controller/kubernetes_job_manager.py):
    • Standard: Utilize the existing Kubernetes Job API.
    • Kueue: Implement Kueue job submission logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant