Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some reference data download link nolonger work, pending investigation. #758

Open
hsun3163 opened this issue Dec 1, 2023 · 3 comments
Open

Comments

@hsun3163
Copy link
Collaborator

hsun3163 commented Dec 1, 2023

(py3.11) [sunh14@lc03e22 ~]$ cd /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/working
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_hg_reference --cwd ../input/reference_data &
[1] 189132
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_gene_annotation --cwd ../input/reference_data &
[2] 189133
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_ercc_reference --cwd ../input/reference_data &
[3] 189134
(py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_dbsnp --cwd ../input/reference_data &INFO: Running download_hg_reference:
INFO: Running download_ercc_reference:
GRCh38_ful...lus_decoy_hla.fa: <urlopen error [Errno 101] Network is unreachable>:
INFO: Running download_gene_annotation:
ERROR: download_hg_reference (id=88880766584b8229) returns an error.
Homo_sapie...8.103.chr.gtf.gz: 0%| | 0/49087092 [00:00<?, ?it/s]
ERCC92.zip: 0%| | 0/28717 [00:00<?, ?it/s]
INFO: download_ercc_reference is completed.
INFO: download_ercc_reference output: /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/input/reference_data/ERCC92.gtf /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/input/reference_data/ERCC92.fa
Homo_sapie...8.103.chr.gtf.gz: 0%|▏ | 49152/49087092 [00:00<03:33, 229662.93it/s]INFO: Workflow download_ercc_reference (ID=w297010867a7f15c9) is executed successfully with 1 completed step.

[4] 189193
Homo_sapie...8.103.chr.gtf.gz: 0%|▍ | 172032/49087092 [00:00<01:56, 421643.95it/s]
[3]- Done sos run pipeline/reference_data.ipynb download_ercc_reference --cwd ../input/reference_data
Homo_sapie...8.103.chr.gtf.gz: 1%|█ | 360448/49087092 [00:00<01:39, 490429.95it/s]ERROR: [download_hg_reference]: [0]:

RuntimeError Traceback (most recent call last)
script_8878139621259498696 in
----> download('ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa\n\n', dest_dir = cwd)

RuntimeError: Failed to download {urls[0]}
Homo_sapie...8.103.chr.gtf.gz: 1%|█▏ | 425984/49087092 [00:00<01:35, 507455.58it/s]INFO: Running download_dbsnp:
00-All.vcf.gz: <urlopen error [Errno 101] Network is unreachable>:
00-All.vcf.gz.tbi: <urlopen error [Errno 101] Network is unreachable>:
ERROR: download_dbsnp (id=eb7f9a9839feca92) returns an error.
Homo_sapie...8.103.chr.gtf.gz: 2%|██▊ | 1007616/49087092 [00:02<01:30, 532953.87it/s]ERROR: [download_dbsnp]: [0]:

RuntimeError Traceback (most recent call last)
script_8177488568793545762 in
----> download('ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz\nftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz.tbi\n\n', dest_dir = cwd)

RuntimeError: Failed to download ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz (2 out of 2)
Homo_sapie...8.103.chr.gtf.gz: 31%|█████████████████████████████████████████▉ | 15007744/49087092 [00:28<01:05, 518354.50it/s]

@hsun3163
Copy link
Collaborator Author

hsun3163 commented Dec 1, 2023

These two command fails:
sos run pipeline/reference_data.ipynb download_hg_reference --cwd ../input/reference_data
sos run pipeline/reference_data.ipynb download_dbsnp --cwd ../input/reference_data

@hsun3163
Copy link
Collaborator Author

hsun3163 commented Dec 1, 2023

could be firewall blocking ftps.

@hsun3163
Copy link
Collaborator Author

hsun3163 commented Dec 5, 2023

The download_dbsnp should be due to different firewall setting in different nodes. The download_hg_reference is more strange as it can be wget but not download() via sos.

ERROR: download_hg_reference (id=88880766584b8229) returns an error.
00-All.vcf.gz.tbi: downloaded                                                   :
00-All.vcf.gzERROR: [download_hg_reference]: [0]:                               :
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
script_3183852603783812485 in <module>
----> download('ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa\n\n', dest_dir = cwd)


RuntimeError: Failed to download {urls[0]}
00-All.vcf.gz(py3.11) [sunh14@dataxfer-10 working]$ ftp ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
-bash: ftp: command not found
(py3.11) [sunh14@dataxfer-10 working]$ wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
--2023-12-05 12:54:58--  ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
           => ‘GRCh38_full_analysis_set_plus_decoy_hla.fa’
Resolving ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)... 193.62.193.167
Connecting to ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)|193.62.193.167|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /vol1/ftp/technical/reference/GRCh38_reference_genome ... done.
==> SIZE GRCh38_full_analysis_set_plus_decoy_hla.fa ... 3263683042
==> PASV ... done.    ==> RETR GRCh38_full_analysis_set_plus_decoy_hla.fa ... done.
Length: 3263683042 (3.0G) (unauthoritative)

14% [==============================>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant