Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module conflicts in workflows with a diamond-shape file structure #2894

Open
mramospe opened this issue May 23, 2024 · 0 comments
Open

Module conflicts in workflows with a diamond-shape file structure #2894

mramospe opened this issue May 23, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@mramospe
Copy link

Snakemake version

8.11.6
7.32.4
main (74627d3)

Describe the bug

When defining a workflow with files imported as modules in a diamond-shape, there seems to be an interference between the rules loaded in the intermediate files. Imagine a workflow defined from a file with general rules common.smk; two intermediate files corresponding to two different processes/studies first.smk and second.smk; and a final file collecting the main results of the two previous ones all.smk, whose rules are renamed to avoid clashes. If we run snakemake using all.smk as an input, there seems to be an interference between the rules imported from common.smk in first.smk and second.smk, even if the final file all.smk renames the rules.

If one uses the syntax use rule * from common in the intermediate files it looks like Snakemake is simply considering the rules in common.smk from the latest file that imports them, although they can be accessed in the file through the rules object. On the other hand, if we write use rule * from common as * the execution works fine. This looks more like a bug than a feature, or at least a design error.

Probably related to #1872, #2729, #2838

Minimal example

Define the general file containing a rule that creates a file, where part of the path depends on the provided configuration:

common.smk

import os

rule write:
    output: os.path.join('data', config['analysis'], 'input_value_{value}.txt')
    params: config_value=config['value']
    shell: 'echo {params.config_value} >> {output}'

Then make two separate files corresponding to two different studies, which simply make an alias to the file created with the rule in common.smk:

first.smk

import os

module common:
    snakefile: './common.smk'
    config: {"analysis": "first", "value": 1}

#use rule * from common as * # <--- works
use rule * from common # <--- fails if using the file all.smk

rule result:
    input: expand(rules.write.output, value=1)
    output: os.path.join('data', 'first', 'result.txt')
    shell: 'ln -srf {input} {output}'

second.smk

import os

module common:
    snakefile: './common.smk'
    config: {"analysis": "second", "value": 2}

#use rule * from common as * # <--- works
use rule * from common # <--- fails if using the file all.smk

rule result:
    input: expand(rules.write.output, value=2)
    output: os.path.join('data', 'second', 'result.txt')
    shell: 'ln -srf {input} {output}'

Finally, declare the file that collects all the results:

all.smk

module first:
    snakefile: './first.smk'
    config: config

module second:
    snakefile: './second.smk'
    config: config

use rule * from first as first_*
use rule * from second as second_*

assert(rules.first_write is not rules.second_write) # they always exist and they are always different, as expected

# if using "from common import *" in "first.smk" and "second.smk", then the way
# to obtain the files for the first result can not be resolved correctly
rule all:
    input: rules.first_result.output, rules.second_result.output

In this case we are propagating as well some configuration values to verify that it remains different when we run rules from all.smk. The expectation is that the commands

snakemake -s first.smk result -j1
snakemake -s second.smk result -j1

provide the same output as

snakemake -s all.smk all -j1

but this is only true if we load the rules as use rule * from common as * instead of use rule * from common inside first.smk and second.smk. Otherwise you get the following error

MissingInputException in rule first_result in file [MASKED]/first.smk, line 10:
Missing input files for rule first_result:
    output: data/first/result.txt
    affected files:
        data/first/input_value_1.txt

which suggests that somehow the rules that were imported from common.smk inside first.smk are not being considered.

@mramospe mramospe added the bug Something isn't working label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant