-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to define and use Global Constants #532
Comments
I am not very fond of global variables/contant. It leads to a lot of "Where the **** does that variable come from?!?!" shouting to computer screens and endless In your actual use case it seems much better do this at the execution engine level. Just create a program
And configure cromwell/miniwdl to use myshell.sh as default shell. |
Adjusting the launch shell is possible but not in the case of a managed workflow engine such as one behind a GA4GH endpoint or even in a world where your system administrator doesn't want you messing with the system that's used by many others. To help with the where does it come from question then I think using imports and namespaces should be required if the constant is declared in another file (also removes conflicts). |
I agree with @rhpvorderman. A primary goal of WDL is readability. Readability is sometimes sacrificed when it is outweighed by the potential benefit, but I don't see that being the case with globals. |
One thing I might consider is some form of inheritance. For example: version development
task Super {
input {
String cmd
}
command <<<
~{cmd}
>>>
}
task Sub extends Super {
input {
File foo
}
command <<<
~{cmd}
cp ~{foo} bar
>>>
} Inheritance would work via simple merge - the declaration with a given name at the lowest level replaces any with the same name at a higher level (and overriding a variable cannot change its type). With inheritance it is at least clear where a task is inheriting from - another task that is in the same file or is explicitly imported another file. I could see this replacing a lot of boilerplate in my own WDL tasks. But it also comes with some risk of unintended behavior. So I'm not sure whether it would be worth the tradeoff. |
Wouldn't you still need to call |
Yes, you'd still need to explicitly call each task with Another place where things could be simplified is in the workflow foo {
input {
String bar
}
# these are the same
call mytask { input: bar = bar }
call mytask { input: bar }
} We could further introduce syntax (e.g. task SuperTask {
input {
String cmd
}
command <<<
~{cmd}
>>>
}
task SubTask extends SuperTask {
input {
Int baz
}
}
workflow SuperWf {
input {
String cmd
}
}
workflow SubWf extends SuperWf {
# the '...' means that the `cmd` input to this workflow will be passed to the
# `cmd` input of mytask
call mytask {
input: baz = 1, ...
}
} Now you only have to pass I'm guessing this would get shot down for introducing more complexity than it's worth. |
Another approach that would work for my case (and maybe yours) would be to add a |
Where would those sections go? Would you put them in every task? Or would you put them in the workflow and expect every task to inherit them? |
I'll propose yet another option that may be more palatable to everyone. In workflow A {
call B
call C
hints {
global: {
cmd: "echo hi"
}
}
}
task B {
# echoes "hi"
command <<<
~{wdl.hints.global.cmd}
>>>
}
task C {
# echoes "bye"
command <<<
~{wdl.hints.global.cmd}
>>>
hints {
global: {
cmd: "echo bye"
}
}
} Keep in mind that in WDL 1.1 we introduced the ability to specify The only pre-requisite would be to make the |
Hints are optional and can be ignored if the engine chooses (I think). I don't know if a global constant should be optional. |
For hints, "optional" means the engine doesn't have to do anything with them. If we add runtime access to |
I am struggling to see the problem that is solved here. Is this a big deal that warrants extra language constructs if a simple configuration in the execution engine can solve it?
Yes, but would you debugging your workflow on a system that is used by many others? Wouldn't you do that on your own system? |
Lots of scenarios -
Would I debug (or profile or whatever) on my own system? Unlikely. Most workflows run in distributed cloud environments replicating this on my own system is only possible at very small scale and doesn't reveal what happens at larger scale when failures can happen due to the vagaries of large distributed systems under stress. Many workflow developers will not be in a position to deploy and configure their own test environments. Many workflow developers that I interact with are not given sufficient admin privileges to deploy these kinds of environments and really shouldn't need them. |
So you want to inject code Is that a correct problem definition? In that case I see as the least invasive option adding two runtime options:
Where these can be any random command that will be inserted before or after the task. By default these are empty strings. You can set runtime options globally by using defaults. It is not as flexible as you propose, but it is easy to implement, easy to understand and does not require tremendous changes to the language. For more flexibility you always still have the option to have a |
I think these would be valuable additions. Another valuable option would be We would need to have a few rules about what to do if any of these fail and what the eventual return code should be to the engine.
There may be (probably are) more rules that are needed that I haven't thought of yet? |
Oh dear, this run_before thing is getting too complex for my tastes already.
That seems to be the more preferable thing if the alternative is adding complex debugging code to the language that is going to be rarely used. Or worse: regularly abused. If these things are added to the language people will start relying on them and a whole lot of WDL's simplicity is going to be lost. The problem can already be solved right now using the existing WDL. I think the use case is rare: only when the bug can not be reproduced on one's own system without access to the production system. This is not a thing that should occur a lot when using containers: a lot of the environment is fixed in that case. I do not have numbers on this, only my own experience on the cluster. And even on the HPC-cluster, the cluster gives me back enough information that I do not need this sort of functionality to solve the problem. Adding new language functionality seems to me quite a heavy-handed solution to such a rare problem. Not that I want to diminish any problems you have. How many failing tasks that need to debugged are we talking about exactly? |
A mechanism to provide a global constant that can be used in a
workflow
block,task
block ortask command
block even if the constant isn’t made part of the task input.Example use case is:
A debug bash command snippet that I want to add into my tasks doing something at the top level like
String cmd = "foo baa -v"
and then in each task have something like:I can do it if I define
String cmd
in each task or haveString cmd
as an input of each task but that makes removing the debug command later more cumbersome.Pat Magee has suggested the following pattern for declaring them:
There are use cases for global variables (that are not constant) although that could be added later.
The text was updated successfully, but these errors were encountered: