Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda package for 1.99 not working properly #72

Open
heylf opened this issue May 12, 2020 · 13 comments
Open

Conda package for 1.99 not working properly #72

heylf opened this issue May 12, 2020 · 13 comments
Assignees

Comments

@heylf
Copy link

heylf commented May 12, 2020

I tried to install shorah via bioconda (conda install -c biconda shorah) and executing shorah will give the following error:

`pkg_resources.DistributionNotFound: The 'shorah' distribution was not found and is required by the application

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File .../miniconda3/envs/shorah/bin/shorah", line 11, in
from shorah.cli import main
File ".../miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 53, in
with open(os.path.join(base_dir, '.version'), 'r') as version_file:
FileNotFoundError: [Errno 2] No such file or directory: '..../miniconda3/envs/shorah/.version'
`

@DrYak
Copy link
Member

DrYak commented May 12, 2020

Hello, and thank you for posting your concerns.

Well, that is really weird, as the bioconda package is what we are currently using in production in V-pipe with SARA-CoV-2 sequencing data.

Something must have changed in the lastest miniconda version as I can't reproduce your error message on my production installation, but I do get the exact same message when attempting to use it using a clean installation in a docker of Ubuntu:stable.

I'll try to investigate it more in detail tomorrow.

@DrYak DrYak self-assigned this May 12, 2020
@DrYak
Copy link
Member

DrYak commented May 19, 2020

Hello Heylf,

sorry for the slow answer, we're currently having some major computation trouble here, so I had less time to devote to your issue.

Meanwhile, upstream conda have again changed something because now the package works again and I am unable reproduce the problem in the Ubuntu:stable docker using the exact same sequence as last time. :-(

I'll try to investigate it as I get some free time aside from our other problems.

@DrYak
Copy link
Member

DrYak commented May 19, 2020

Can you give it a try on your side and tell me if you're still affected ?

And what platform are you using ?
Linux installation running on bare metal? VM? Docker? WSL1/2 in windows 10?
And which distribution ?

@heylf
Copy link
Author

heylf commented May 20, 2020

Hey DrYak,
I tried it again, but it still fails. I am using Ubuntu 18.04.3, no VM or docker, with a new miniconda3 environment for shorah.

I tried it now to install it directly in the base of miniconda3 and it works. Seems like it does not work if you create an own env for shorah.

@bgruening
Copy link

Same here:

Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate shorah
#
# To deactivate an active environment, use
#
#     $ conda deactivate

bag@bag:~$ . activate shorah
(shorah) bag@bag:~$ shorah --version
Traceback (most recent call last):
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 50, in <module>
    __version__ = get_distribution('shorah').version
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 482, in get_distribution
    dist = get_provider(dist)
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 358, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 901, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 787, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'shorah' distribution was not found and is required by the application

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bag/miniconda3/envs/shorah/bin/shorah", line 11, in <module>
    from shorah.cli import main
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 53, in <module>
    with open(os.path.join(base_dir, '.version'), 'r') as version_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/bag/miniconda3/envs/shorah/.version'

@bgruening
Copy link

It seems to work in the conda root dir, but this is not how it should work :)

@DrYak
Copy link
Member

DrYak commented May 27, 2020

Note: @pedrofale and @kpj are giving me a hand on this one.

@kpj
Copy link
Contributor

kpj commented May 29, 2020

This behavior is potentially fixed in #73.

@bgruening
Copy link

@bgruening cool, thanks. We can give it a new try as soon as there is a new release. Thanks.

@DrYak
Copy link
Member

DrYak commented Jul 3, 2020

Update: current test package is passing CircleCI tests on both Linux and Mac OS X.
Just need the colleagues to finish the code review and I can push the final package.

@DrYak
Copy link
Member

DrYak commented Jul 22, 2020

@bgruening has merged the 1.99.1 bioconda package.
It should be appearing on bioconda soon.

@heylf : could you give it a try again ?

@bgruening
Copy link

@DrYak you just need to bump https://github.com/galaxyproject/tools-iuc/blob/master/tools/shorah/shorah.xml#L5

Btw. do you have a list with new parameters or new inputs/outputs compared to the older version?

@DrYak
Copy link
Member

DrYak commented Jul 23, 2020

For Galaxyproject/tools-iuc : I will check the documented procedure for testing and submitting changes.

WRT to parameters: calling ShoRAH has indeed changed somewhat since the older 1.x.x serie.
Among others, there is now a single executable with multiple sub-commands instead of the older shotgun.py / amplian.py etc.

The most up-to-date list of parameters is ShoRAH's own help parameter:

# shorah -h
usage: shorah <subcommand> [options]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

sub-commands:
  {shotgun,amplicon,snv}
                        available sub-commands
    shotgun             run local analysis in shotgun mode
    amplicon            run local analysis in amplicon mode
    snv                 run single-nucleotide-variant calling

Run `shorah subcommand -h` for more help

shotgun is the subcommand that you're most likely to want implementing:
(SNV calls on the whole genome and local haplotype in every window)

# shorah shotgun -h
usage: shorah <subcommand> [options] shotgun [-h] [-v] -b BAM -f REF
                                             [-a FLOAT] [-r chrm:start-stop]
                                             [-R INT] [-x INT] [-S FLOAT] [-I]
                                             [-p FLOAT]
                                             [-of {csv,vcf} [{csv,vcf} ...]]
                                             [-c INT] [-w INT] [-s INT] [-k]
                                             [-t INT]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -a FLOAT, --alpha FLOAT
                        alpha in dpm sampling (controls the probability of
                        creating new classes)
  -r chrm:start-stop, --region chrm:start-stop
                        region in format 'chr:start-stop', e.g.
                        'chrm:1000-3000'
  -R INT, --seed INT    set seed for reproducible results
  -x INT, --maxcov INT  approximate max coverage allowed
  -S FLOAT, --sigma FLOAT
                        sigma value to use when calling SNVs
  -I, --ignore_indels   ignore SNVs adjacent to insertions/deletions (legacy
                        behaviour of 'fil', ignore this option if you don't
                        understand)
  -p FLOAT, --threshold FLOAT
                        pos threshold when calling variants from support files
  -of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
                        output format of called SNVs
  -c INT, --win_coverage INT
                        coverage threshold. Omit windows with low coverage
  -w INT, --windowsize INT
                        window size
  -s INT, --winshifts INT
                        number of window shifts
  -k, --keep_files      keep all intermediate files
  -t INT, --threads INT
                        limit maximum number of parallel sampler threads (0:
                        CPUs count-1, n: limit to n)

required arguments:
  -b BAM, --bam BAM     sorted bam format alignment file
  -f REF, --fasta REF   reference genome in fasta format

the amplicon mode:

# shorah amplicon -h
usage: shorah <subcommand> [options] amplicon [-h] [-v] -b BAM -f REF
                                              [-a FLOAT] [-r chrm:start-stop]
                                              [-R INT] [-x INT] [-S FLOAT]
                                              [-I] [-p FLOAT]
                                              [-of {csv,vcf} [{csv,vcf} ...]]
                                              [-c INT] [-d] [-m FLOAT]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -a FLOAT, --alpha FLOAT
                        alpha in dpm sampling (controls the probability of
                        creating new classes)
  -r chrm:start-stop, --region chrm:start-stop
                        region in format 'chr:start-stop', e.g.
                        'chrm:1000-3000'
  -R INT, --seed INT    set seed for reproducible results
  -x INT, --maxcov INT  approximate max coverage allowed
  -S FLOAT, --sigma FLOAT
                        sigma value to use when calling SNVs
  -I, --ignore_indels   ignore SNVs adjacent to insertions/deletions (legacy
                        behaviour of 'fil', ignore this option if you don't
                        understand)
  -p FLOAT, --threshold FLOAT
                        pos threshold when calling variants from support files
  -of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
                        output format of called SNVs
  -c INT, --win_coverage INT
                        coverage threshold. Omit windows with low coverage
  -d, --diversity       detect the highest entropy region and run there
  -m FLOAT, --min_overlap FLOAT
                        fraction of read overlap to be included

required arguments:
  -b BAM, --bam BAM     sorted bam format alignment file
  -f REF, --fasta REF   reference genome in fasta format

to re-call SNV from already computed local haplotypes:
(it is called internally at the end of either shotgun or amplicon. Though both of those are capable of skipping calls to dpm_sampler for windows for which they find already computed local haplotype).

# shorah snv -h
usage: shorah <subcommand> [options] snv [-h] [-v] -b BAM -f REF [-a FLOAT]
                                         [-r chrm:start-stop] [-R INT]
                                         [-x INT] [-S FLOAT] [-I] [-p FLOAT]
                                         [-of {csv,vcf} [{csv,vcf} ...]]
                                         [-i INT]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -a FLOAT, --alpha FLOAT
                        alpha in dpm sampling (controls the probability of
                        creating new classes)
  -r chrm:start-stop, --region chrm:start-stop
                        region in format 'chr:start-stop', e.g.
                        'chrm:1000-3000'
  -R INT, --seed INT    set seed for reproducible results
  -x INT, --maxcov INT  approximate max coverage allowed
  -S FLOAT, --sigma FLOAT
                        sigma value to use when calling SNVs
  -I, --ignore_indels   ignore SNVs adjacent to insertions/deletions (legacy
                        behaviour of 'fil', ignore this option if you don't
                        understand)
  -p FLOAT, --threshold FLOAT
                        pos threshold when calling variants from support files
  -of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
                        output format of called SNVs
  -i INT, --increment INT
                        value of increment to use when calling SNVs (1 used in
                        amplicon mode)

required arguments:
  -b BAM, --bam BAM     sorted bam format alignment file
  -f REF, --fasta REF   reference genome in fasta format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants