Skip to content

Workflow scheduling

Vladyslav Moisieienkov edited this page Aug 1, 2022 · 2 revisions

This page describes how workflow scheduling is working in REANA platform.

Contents

  1. Introduction
  2. workflow-submission queue
  3. Scheduling strategies
  4. Workflow scheduler

Introduction

In general, workflows are scheduled in the following way:

  1. User requests reana-server to start a workflow;
  2. reana-server calculates workflow priority and complexity based on configured scheduling strategy;
  3. reana-server published a message to workflow-submission queue with workflow details, priority, and complexity;
  4. workflow scheduler picks up the message from the queue and checks if the workflow can be scheduled;
  5. If the workflow can be scheduled, workflow scheduler sends a request to reana-workflow-controller to start the workflow;
  6. If the workflow cannot be scheduled, workflow scheduler can either:
    • publish a message to the workflow-submission queue to try again;
    • fail a workflow.

In the following sections, we will go deeper into the details of each step.

Tip: Architecture page provides a nice overview diagram of the REANA platform that can be helpful when reading this page.

workflow-submission queue

This is a queue that is used to submit workflows to the scheduler.

  • publishes to the queue: reana-server
  • consumes from the queue: workflow scheduler

Message schema:

{
  "$id": "reana/workflow-submission-message.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "workflow-submission message",
  "description": "Describes workflow submission message for scheduler",
  "type": "object",
  "properties": {
    "user": {
      "description": "The unique UUID identifier for a user",
      "type": "string"
    },
    "workflow_id_or_name": {
      "description": "The unique UUID identifier or name for a workflow",
      "type": "string"
    },
    "priority": {
      "description": "Priority number of the workflow",
      "type": "integer"
    },
    "min_job_memory": {
      "description": "Priority number of the workflow",
      "type": "integer"
    },
    "parameters": {
      "type": "object"
    },
    "retry_count": {
      "description": "Number of times the workflow submission was retried",
      "type": "integer"
    }
  },
  "required": ["user", "workflow_id_or_name", "priority", "min_job_memory"]
}

This is priority queue. Messages with higher integer in priority field should be consumed first.

Scheduling strategies

Currently, REANA supports two scheduling strategies:

  • fifo, first-in first-out strategy, starting workflows as they come;

  • balanced, a weighted strategy taking into account existing multi-user workloads and the complexity of incoming workflows.

Workflow complexity

Workflow complexity is an internal concept we use in REANA in order to help decide which workflow to schedule when balanced strategy is used. It expressed how many jobs the workflow would like to start, and how many memory each individual job would consume.

The workflow complexity value looks symbolically as follows [(4, 4G), (3, 2G)] meaning that when the given workflow starts, it would like to launch 4 jobs of 4 GB RAM each, and 3 jobs of 2GB RAM each.

The workflow complexity numbers for given workflow can be obtained by parsing the workflow DAG specification and studying how many jobs will be started in parallel upon launch and how many kubernetes_memory_limit each job asks for.

How to work with workflow complexity outside of REANA cluster?

Despite the fact that the workflow complexity logic belongs to reana-server, it can be tested without the cluster running by importing the appropriate functions from a python shell:

$ mkvirtualenv foo
$ pip install ../reana-client ../reana-server ipython
$ cd ../reana-demo-root6-roofit
$ ipython

and then in the Python REPL:

In [1]: from reana_client.utils import load_reana_spec

In [2]: from reana_server.complexity import estimate_complexity

In [3]: reana_yaml = load_reana_spec('./reana.yaml')
==> Verifying REANA specification file... ./reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying REANA specification parameters...
  -> SUCCESS: REANA specification parameters appear valid.
==> Verifying workflow parameters and commands...
  -> SUCCESS: Workflow parameters and commands appear valid.
==> Verifying dangerous workflow operations...
  -> SUCCESS: Workflow operations appear valid.

In [4]: from reana_server import complexity as reana_server_complexity

In [5]: reana_server_complexity.REANA_COMPLEXITY_JOBS_MEMORY_LIMIT = '4Gi'

In [6]: estimate_complexity('serial', reana_yaml)
Out[6]: [(1, 4294967296.0)]