You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running ipyrad [v.0.9.90] with maximum memory allocation (184G), 48 threads, and with the following (relevant) params:
~/all_trimmed_reads/*.fq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted/trimmed/unzipped fastq files
reference ## [5] [assembly_method]: Assembly method
~/reference1.1.fa ## [6] [reference_sequence]: Location of reference sequence file
pairddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
AATTC, GCATG ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2) [EcoRI, SphI]
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
5 ## [11] [mindepth_statistical]: Min depth for statistical base calling
5 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples [default = 10,000]
0.86 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.1 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.1 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
5 ## [21] [min_samples_locus]: GLOBAL Min # samples per locus
0.25 ## [22] [max_SNPs_locus]: Max % SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
* ## [27] [output_formats]: Output formats (see docs) [* = all of them]
Thus runs fine through part 2 of step 7:
Step 7: Filtering and formatting output files
[####################] 100% 0:05:58 | applying filters
[####################] 100% 1:02:59 | building arrays
Encountered an Error.
Message: KeyError: 68
Parallel connection closed.
Here is the traceback info:
KeyError Traceback (most recent call last)
File <string>:1, in <module>
File ~/.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/assemble/write_outputs.py:2158, in fill_snp_array(data, ntaxa, nsnps)
2156 # fill for each taxon
2157 for sidx in range(ntaxa):
-> 2158 resos = [DCONS[i] for i in snparr[sidx, :]]
2160 # pseudoref version
2161 io5['genos'][:, sidx, :] = get_genos(
2162 np.array([i[0] for i in resos]),
2163 np.array([i[1] for i in resos]),
2164 io5['pseudoref'][:]
2165 )
File ~/.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/assemble/write_outputs.py:2158, in <listcomp>(.0)
2156 # fill for each taxon
2157 for sidx in range(ntaxa):
-> 2158 resos = [DCONS[i] for i in snparr[sidx, :]]
2160 # pseudoref version
2161 io5['genos'][:, sidx, :] = get_genos(
2162 np.array([i[0] for i in resos]),
2163 np.array([i[1] for i in resos]),
2164 io5['pseudoref'][:]
2165 )
KeyError: 68
I followed the suggestion of a previous issue about using a reference genome with masked ambiguous bases (I just converted each to one of the possible resolution options) and tried running step 7 again with that, but it failed as above. Do I need to run the entire pipeline again from the beginning using the unambiguated reference, or is there something else that's causing this error in step 7? any insights would be much appreciated!
Thanks, Inbar
The text was updated successfully, but these errors were encountered:
Yes, ambig bases in the reference will cause problems, so it's good you found that and fixed it. By the time of step 7 all the formal assembly has been completed, so fixing the reference sequence will require to roll back and re-run from at least step 3 (including the -f flag) in order for the change in reference fix this error at step 7. Let me know how it goes....
cool, many thanks for the quick reply! I'll run it again from the start, I think that should fix it. Just wanted to make sure this was the issue before I submit this big job again.
Hi Isaac,
I'm running ipyrad [v.0.9.90] with maximum memory allocation (184G), 48 threads, and with the following (relevant) params:
Thus runs fine through part 2 of step 7:
Here is the traceback info:
I followed the suggestion of a previous issue about using a reference genome with masked ambiguous bases (I just converted each to one of the possible resolution options) and tried running step 7 again with that, but it failed as above. Do I need to run the entire pipeline again from the beginning using the unambiguated reference, or is there something else that's causing this error in step 7? any insights would be much appreciated!
Thanks, Inbar
The text was updated successfully, but these errors were encountered: