forked from stephenslab/dsc
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSC current technical summary and some engineering aims #3
Comments
The code in question for us to reimplement are:
|
More technical details are now at #6 . This project overview ticket will no longer be updated. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The document below summarizes the main components of current DSC software design, and where I'd like to see improvement on first.
Introduction
DSC: Dynamic statistical comparisons
Goal
Typical workflow
Challenge
DSC implementation
Simple yaml-like syntax powered by a workflow system
dscquery()
an R functionIn this document I'll show a very simple toy focusing on discussions relevant to the use of SoS.
Example
A simple DSC script:
test.dsc
A simple DSC script:
test.dsc
(cont'd)Run the benchmark
Get benchmark results
Explore the results
SoS under the hood
Run DSC via SoS
dsc
program will generate two SoS workflowsprepare
to generate benchmark meta-data filesrun
to run the benchmarkfor_each
loops correspond to ordering of parametersDSC will run these workflows using
execute_workflow()
function.Obtain SoS scripts from DSC script
These scripts are uploaded to github repo
gaow/random-nbs
(click me). Let's focus ontest_run.sos
for now and discuss its design. What can be done to improve it? Or should we revamp?The use of dynamic targets and step dependencies are characteristic of the current implementation. They are related to DSC meta-data design (see below).
DSC meta-data design
DSC outputs: meta-data
There are 3 files:
map.mpk
: mapping between a substep's hash with a filenameconf.mpk
: input and output file names for workflow substepsdb
: meta-data fordscquery()
function (irrelevant to SoS for now)DSC outputs: benchmark data
These are output of each workflow substep. Notice here I use a smaller benchmark by not running
dsc
with--replicate 5
.Structure of
map.mpk
The basic idea is to first represent each substep as
substep:HASH:upstream:HASH:upstream:HASH
where
HASH
encodes everything for a "module" except for contents of module script, eg:Then map them to nicer looking filenames with numbers as indices
Structure of
map.mpk
(cont'd)Each HASH is combination of all parameters (parameter name and values) of a "module instance". eg
normal:6702ef96
may refer to thenormal
module withmu=0, n=100
.Structure of
map.mpk (cont'd)
File dependencies can be figured out from HASH in
map.mpk
, eg:tells us that file
sq_err/normal_1_mean_1_sq_err_1.rds
depends on:normal:3fce637f
which isnormal/normal_1.rds
mean:e3f9ad83:normal:3fce637f
which ismean/normal_1_mean_1.rds
Structure of
conf.mpk
conf.mpk
saves step dependency, input and output for every subworkflow.Pros of current design
dscquery()
functiondscquery()
function example see earlier in this document*.db
file which uses the*.mpk
files to build SQLite databaseCons of current design
Future work on meta-data
Stop using "HASH to numbered filename" mapping
Find some way to save output as
{step_name}_{substep_hash}
Figure out how to rebuild the SQLite database to work with
dscquery()
The only issue left is that substep output filenames are no longer meaningful
dscquery()
to be able to load a particular exampleDSC next steps
Some technical aims
Related to SoS
RDS
andpickle
filesNot related to SoS
A -> B -> A -> B ...
group_by
logic in DSC. We need this logic designed for the interface, and implemented for the SoS underneath it, as well as supporting it in queries.Appendix
This (possibly obsolete) slide
https://github.com/gaow/random-nbs/tree/master/slides/20191201_DSC_SoS
To generate the PDF file, first download this repo (click me) then run:
The text was updated successfully, but these errors were encountered: