Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New execution logic: module level sequential execution and group all #185

Open
gaow opened this issue Apr 28, 2019 · 3 comments
Open

New execution logic: module level sequential execution and group all #185

gaow opened this issue Apr 28, 2019 · 3 comments

Comments

@gaow
Copy link
Member

gaow commented Apr 28, 2019

@willwerscheid was wondering about a scenario where for a large data-set we can load the data once, and subsample from it many times to simulate smaller data. This boils down to some sequential execution logic, eg:

simulate:
   seed: R{1:10}
   data: '/path/to/data'
   ...

and we execute this module sequentially not in parallel (or parallel it in R session) so that we load data only once.

The biggest challenge is that we'd then have to move the for loop to module script level (language specific) rather than doing it at DSC level. It is some fundamental changes that existing code cannot be easily adapted into doing. But I can see the appeal of the request, so we need to think about how to best do it.

@pcarbo
Copy link
Member

pcarbo commented Apr 28, 2019

Why it is important to load the data only once?

@pcarbo
Copy link
Member

pcarbo commented Apr 28, 2019

@willwerscheid I think if loading a data set multiple times is a big issue, you should consider: (1) timing a way to make the data loading run faster (e.g., by saving in an efficient format), or (2) having a single module that creates all the data subsets in one go, and then can the subsets can be loaded in a separate module that is replicated many times.

This seems to me more a question about how to best design your DSC, and I think can be accomplished with the existing DSC features.

@gaow gaow changed the title Sequentially executed modules DSC execution logic May 2, 2019
@gaow
Copy link
Member Author

gaow commented May 2, 2019

@pcarbo I agree with your assessment, although it is not completely impossible to address this at higher DSC level. I'm thinking of addressing things like that in DSC 2.0, along with the map-reduce notion that in the end all results flows to one node. A third thing worth doing is to allow for multiple outcomes per module instance -- that is the best way to address to the issue of benchmarking with command line tools in say bash.

In any case, point 2 and 3 are not relevant to @willwerscheid 's initial question but these are related in a way because they are exceptions or extensions to the parallel execution paradigm. So I'd like to keep this ticket open as a reminder of myself when I re-evaluate and design some of the execution logic down the road.

@gaow gaow changed the title DSC execution logic New execution logic: module level sequential execution and group all Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants