-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separating heavy / light jobs on cluster #204
Comments
An even better solution is to run the |
@pcarbo true. My proposed solution is essentially an extension to it by reserving multiple compute nodes and run these jobs, not just one node. The difference between submitting jobs is that a fixed number of multiple compute nodes are reserved up front for light jobs throughout the entire DSC; versus currently each module will have reserve nodes, run jobs, give up the reservation, and other modules come in to reserve new nodes -- this is higher overhead. |
In a benchmark there are heavy computations that for example takes a few minutes for each module instance; and light computations where each instance takes a fraction of seconds. Currently we have a mechanism to specify it such that heavy computations are submitted as jobs on the cluster and lighter applications will run directly on the node where jobs are submitted.
However here the limitation is that the smaller jobs still have to run on a single node eg the login node and there are limited control over the resource it uses, eg, number of CPU threads, memory (at least some control over memory) and walltime. It would is not very good to run computations on a login node anyways. A possible way out would be to parse the benchmark and use a dedicated compute node for these light jobs where resource usages are still under control; but without the per job queue and thus avoiding most of the interaction (overhead) with the queue system.
The text was updated successfully, but these errors were encountered: