Skip to content

Database Schema

Shashank Huddedar edited this page Aug 16, 2023 · 2 revisions

Goscheduler uses Cassandra as its database. The Goscheduler database schema is divided into two parts:

schedule_management

This keyspace contains all the data related to one-time schedules, recurring schedules, status, etc.

schedules

Purpose: Stores one-time schedules information. Application pollers query this table every minute and check if there are any schedules to be triggered.

Columns:

  • app_id: The application ID.
  • partition_id: The partition ID.
  • schedule_time_group: The scheduled time rounded to a minute level.
  • schedule_id: The unique identifier for the schedule.
  • callback_type: The type of callback.
  • callback_details: The details of the callback.
  • payload: The payload data.
  • schedule_time: The scheduled time.
  • parent_schedule_id: The parent schedule ID. Applicable for cron use-cases.

Primary Key: ((app_id, partition_id, schedule_time_group), schedule_id)

Clustering Order: schedule_id DESC

view_schedules (Materialized View)

Purpose: Provides an alternative view of the schedules. Mainly used to query schedules by schedule_id.

Primary Key: (schedule_id, app_id, partition_id, schedule_time_group)

Clustering Order: app_id ASC, partition_id ASC, schedule_time_group ASC

status

Purpose: Stores the status of the schedules.

Columns:

  • app_id: The application ID.
  • partition_id: The partition ID.
  • schedule_time_group: The scheduled time rounded to a minute level.
  • schedule_id: The unique identifier for the schedule.
  • schedule_status: The status of the schedule.
  • error_msg: The error message, if any.
  • reconciliation_history: The reconciliation history.

Primary Key: ((app_id, partition_id), schedule_id)

Clustering Order: schedule_id DESC

recurring_schedules_by_partition

Purpose: Stores information about recurring schedules by partition. A special app with a fixed number of pollers is configured to query this table every minute. If the time matches with the cron_expression a one-time schedule is created for that time.

Columns:

  • app_id: The application ID.
  • partition_id: The partition ID.
  • schedule_id: The unique identifier for the schedule.
  • callback_type: The type of callback.
  • callback_details: The details of the callback.
  • payload: The payload data.
  • cron_expression: The cron expression for recurring schedules.
  • status: The status of the recurring schedule.

Primary Key: (partition_id, schedule_id, app_id)

recurring_schedules_by_id

Purpose: Stores information about recurring schedules by ID. Mainly used to get recurring schedules by schedule_id.

Columns:

  • app_id: The application ID.
  • partition_id: The partition ID.
  • schedule_id: The unique identifier for the schedule.
  • callback_type: The type of callback.
  • callback_details: The details of the callback.
  • payload: The payload data.
  • cron_expression: The cron expression for recurring schedules.
  • status: The status of the recurring schedule.

Primary Key: (schedule_id)

recurring_schedule_runs

Purpose: Stores information about the runs of recurring schedules. This tables stores the parent-child mapping of each cron and its individual run.

Columns:

  • app_id: The application ID.
  • partition_id: The partition ID.
  • schedule_time_group: The group of scheduled times.
  • schedule_id: The unique identifier for the schedule.
  • callback_type: The type of callback.
  • callback_details: The details of the callback.
  • payload: The payload data.
  • schedule_time: The scheduled time.
  • parent_schedule_id: The parent schedule ID.

Primary Key: (parent_schedule_id, schedule_time_group)

Clustering Order: schedule_time_group DESC

cluster

This keyspace contains all the meta-data information i.e. apps onboarded, pollers to node mapping etc.

apps

Purpose: Stores information about applications within the cluster. This table stores all the onboarded client apps, resource quota (no. of pollers), and app status.

Columns:

  • id: The application ID.
  • partitions: The number of partitions.
  • active: Whether the application is active or not.

Primary Key: (id)

entity

Purpose: Stores information about cluster entities. This table is used during node bootstrap where each nodes reads the table and with the help of Ringpop library decides whether it should start the poller entity on it or not.

Columns:

  • id: The poller ID. It is formed with the concatenation of app_id and partition_id. For ex. app with 5 partitions will have test.0, test.1.. test.4 ids.
  • nodename: The ringpop node on which the poller entity is running.
  • status: The status of the entity.
  • history: The history of the entity.

Primary Key: (id)

nodes (Materialized View)

Purpose: Provides an alternative view of the cluster nodes.

Columns:

  • nodename: The ringpop node on which the poller entity is running.
  • id: The poller ID.
  • status: The status of the entity.

Primary Key: (nodename, id)