Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating heavy / light jobs on cluster #204

Open
gaow opened this issue Nov 21, 2019 · 2 comments
Open

Separating heavy / light jobs on cluster #204

gaow opened this issue Nov 21, 2019 · 2 comments

Comments

@gaow
Copy link
Member

gaow commented Nov 21, 2019

In a benchmark there are heavy computations that for example takes a few minutes for each module instance; and light computations where each instance takes a fraction of seconds. Currently we have a mechanism to specify it such that heavy computations are submitted as jobs on the cluster and lighter applications will run directly on the node where jobs are submitted.

However here the limitation is that the smaller jobs still have to run on a single node eg the login node and there are limited control over the resource it uses, eg, number of CPU threads, memory (at least some control over memory) and walltime. It would is not very good to run computations on a login node anyways. A possible way out would be to parse the benchmark and use a dedicated compute node for these light jobs where resource usages are still under control; but without the per job queue and thus avoiding most of the interaction (overhead) with the queue system.

@pcarbo
Copy link
Member

pcarbo commented Nov 21, 2019

An even better solution is to run the dsc command on a compute node.

@gaow
Copy link
Member Author

gaow commented Nov 21, 2019

@pcarbo true. My proposed solution is essentially an extension to it by reserving multiple compute nodes and run these jobs, not just one node. The difference between submitting jobs is that a fixed number of multiple compute nodes are reserved up front for light jobs throughout the entire DSC; versus currently each module will have reserve nodes, run jobs, give up the reservation, and other modules come in to reserve new nodes -- this is higher overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants