-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] config as PR for discussion #10
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
#### General Settings #### | ||
# location of system level settings | ||
settings: | ||
title: My Very Cool Project | ||
Author: Bob | ||
|
||
# path like settings | ||
data: /data/bob/original_data | ||
|
||
# conda environment names | ||
python2: py2.7 | ||
|
||
# names to access specific envs by | ||
env: HOME | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For sure I would leave this out initially, I was trying to think of all possible settings. May come across some misc perl program or something that needs these envs, but would also be at the rule level. |
||
|
||
#### Experiment Level Settings #### | ||
# experiment level settings, settings that apply to all samples | ||
exp.settings: | ||
|
||
# Sample information relating sample specific settings to sample ids | ||
sampleinfo: sample_metadata.csv | ||
|
||
# it would be nice to be able define a setting here that applies to all | ||
# samples, or define for each sample in the sampleinfo table case they are | ||
# different. | ||
fastq_suffix: '.fastq.gz' | ||
# Need to some way to specify annotation to use, maybe here is not the best | ||
# place. | ||
annotation: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How to handle cases where we want to try multiple annotations? For example, to compare a truncated gene model annotation with the full. Maybe the answer there is just symlink over the upstream files to a new analysis dir rather than increase complexity here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the big problem, and I have not come up with any creative solutions. For example the iterative junctions for splicing. I guess the easier question is, how do we want to define an annotation that is used throughout all analysis. How to quickly switch between flybase releases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, the iterative junctions are going to get tricky. I have some ideas about that but it needs more testing. Currently swapping flybase versions should be as easy as swapping annotation filenames. I think the bigger question is whether or not we should support multiple "flavors" of annotations in a single workflow. For now I'm leaning towards just supporting a single one to keep things simple, and having downstream rules concatenate de novo junctions onto that single single annotation. |
||
genic: /data/... | ||
transcript: /data/.... | ||
intergenic: /data/... | ||
# add modeling information here | ||
models: | ||
formula: ~ sex + tissue + time | ||
# tell which columns in sample table should be treated like factors | ||
factors: | ||
- sex | ||
- tissue | ||
- time | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the annotations and models should go in workflow.rnaseq. Also I find myself using different models. This could be implemented by having multiple models and using jinja-templated R scripts There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really care what we do here, mostly I just use this step to build an Rdata file for use in my analysis notebooks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, how about just having an Rdata file as the designated endpoint, rather than trying to get fancy with script-building? Though I've found it useful to have a first pass of MA plots and DEG counts as another kind of QC to put in reports ("did the experiment work?") before doing custom work. So maybe just supporting a single model is fine. Anyway, I think this is an easy thing to work out later, no need to worry about it now. |
||
|
||
#### Workflow Settings #### | ||
# I think using a naming scheme that follow folder structure would be useful. | ||
# For example: if there is a workflows folder then we could define workflow | ||
# specific settings | ||
workflows.qc: | ||
# List pieces of the pipeline to run, (or not run may be better) | ||
steps_to_run: | ||
- fastqc | ||
- rseqc | ||
# or could have logical operators switches to change workflow behavior | ||
trim: True | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe a default config could be created that has all parts that can be run, and then they can be commented out on an experiment-by-experiment basis. That way it's easy for the user to pick and choose which parts to run and increase discoverability of the rules There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking. I think I would like a 'run' parameter for each rule, but may not matter. What about mutually exclusive rules? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point about mutually exclusive rules. Maybe we should see what the set of likely-to-by-optional rules looks like, and decide then what the config should be. I'd imagine some rules shouldn't be optional (aligning; counting reads in features; sorting BAMs) so whatever config method should probably not allow them to be disabled. I don't have this sort of thing in my existing pipelines so I don't have any use cases to think about -- can you give some examples of what kinds of rules would be disable-able? |
||
workflows.align: | ||
# define what software to use and optionally what version | ||
aligner: 'tophat2=2.1.0' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wrappers can have versions, but I think currently only at the git commit level rather than at the tool level. I like having the version specified here with the rest of the config so maybe we can find a way for this to work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be icing on top, but should be low priority. Maybe could create a pre-parser that pulls out the version and updates the conda envs. Or mayber there is someway to hijack the wrapper system with some sort of commit id look-up table. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, agreed on low priority, but really good ideas on how to make it work. That could be really powerful. |
||
aggregated_output_dir: /data/... | ||
report_output_dir: /data/... | ||
|
||
workflow.rnaseq: ... | ||
|
||
workflows.references: ... | ||
|
||
#### Rule Specific Settings #### | ||
# rule level settings again with naming based on folder structure if we need | ||
# folder structure | ||
rules.align.bowtie2: | ||
# It would be nice to be able to have cluster settings with rule setting, | ||
# can't think of a way to get this to work, probably just need a separate | ||
# cluster config. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we had a wrapper for the actual call to snakemake, we could extract this information and build a cluster config file on the fly so this info could remain here. That would be really convenient for configuration, but at the cost of extra complexity in the wrapper. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is worth it |
||
cluster: | ||
threads: 16 | ||
mem: 60g | ||
walltime: 8:00:00 | ||
# bowtie index prefix | ||
index: /data/... | ||
# Access to any parameters that need set | ||
params: | ||
# place to change the options | ||
options: -p 16 -k 8 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have to play around with how to add this to the rules. Pretty sure I've done this before, I'll have to dig up how. |
||
|
||
# place to change how files are named | ||
aln_suffix: '.bt2.bam' | ||
log_suffix: '.bt2.log' | ||
|
||
# vim: sw=2 ts=2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the path to fastqs? Or do indexes go here as well? I feel like this is too general of a setting, and that indexes, annotations, etc can just specify the full path. And maybe we can have a regexp for fastqs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the fastq regex can go into the controlling snake file