*.ustr file and other files not generated when using a genome as a reference in the assembly method #542

aroavaron · 2024-01-28T17:14:02Z

Hello,

I ran the program using denovo assembly strategy and then using a genome as a reference. For the first one, ipyrad generated 18 output files. However, when I used the genome, it only generated 14 output files. The missing files are: *.migrate, *.treemix, *.ugeno, and, most importantly for me, *.ustr. Has anyone encountered the same issue? Any suggestions or insights would be appreciated.

Cheers,
A

isaacovercast · 2024-01-30T20:12:02Z

Did you check the output_formats param in the params file? The default value will not generate all output files. Can you please set the output_formats equal to * (which indicates to create all output formats), and run step 7 again for the reference assembly (with the -f flag to generate new outputs). Please let me know if that works.

aroavaron · 2024-01-30T20:57:49Z

Hi Issac, I should have mentioned that I used the "*" in the params to create all the output formats, but the run only generated 8 instead of 12 files.
Thanks for your quick reply!

isaacovercast · 2024-01-30T22:40:11Z

What version of ipyrad are you running? (ipyrad -v will print the version) Using the most recent version (0.9.93) I ran a reference assembly from scratch on the simulated data and using * as the output_formats value gave me the full complement of output files.

If it is the most recent version of ipyrad please share your .json file with me that is in the project dir.

aroavaron · 2024-01-31T01:15:12Z

I ran ipyrad in the server and it's listed as ipyrad/0.9. I thought also that it could be an issue with the version, so I installed the latest version (ipyrad_0.9.93) in my miniconda environment.
However, I'm currently encountering a different issue and troubleshooting it. The program starts, but it gets stuck at the first step. No files are generated, so I can't share the .json file yet!

ipyrad [v.0.9.93]
Interactive assembly and analysis of RAD-seq data

Parallel connection | compute-65-17: 20 cores

Step 1: Loading sorted fastq data to Samples

aroavaron · 2024-02-02T22:27:23Z

Quick update. It was not running because of a lack of memory!

I'm running ipyrad for two species. For the first one ipyrad finished successfully and generated all the outputs. I used denovo assembly and the genome reference as filter (step #29). After comparing with my previous run using ipyrad 0.9.12 and the genome as a reference the amount of retained loci dropped from 22K to 6K and the amount of missing data for both the snps matrix and the sequence matrix decreased from 20.5% to 10.6%. Could it be possible that the newer version has different criteria?

Regarding the second species, ipyrad has not been able to pass step #6. Attached is the json file. I hope it helps to find out what could be the error. Fingers crossed for a quick and easy solution!

Thanks,
A

vsref_fil_200k_85p_denovo.json

isaacovercast · 2024-02-05T14:35:16Z

@aroavaron In general the newest version of ipyrad should be trusted more than any previous version, for the fact that we are always fixing bugs. The difference in results between 0.9.93 and 0.9.12 (very old) is not so surprising. I would trust the newest version.

As for the second species, can you tell me what is the error you are getting during step 6? If you can show me all the command line output and the full error message when it dies that would be very helpful.

aroavaron · 2024-06-11T14:01:27Z

I ran the latest version of ipyrad (0.9.93) and used two assembly approaches. The reference approach resulted in 15,304 loci retained (26.3 % SNPs matrix missing sites / 28.1% sequences matrix missing sites), while the denovo-reference reference using the reference (in this case a genome) as filter approach (parameter #29) recovered 6,378 loci (14.6% SNPs matrix missing sites and 14.8% sequences matrix missing sites). For downstream analyses, it would be better to use the data with fewer missing values in general. However, I am curious about the reason(s) for the difference in the number of retained loci.

isaacovercast · 2024-06-11T14:27:49Z

I'm not sure i understand well what the two different assemblies were. In one case you did the 'reference' assembly using an 'on target' genome. In the 'denovo-reference' approach did you use this same genome sequences as the 'reference_as_filter' parameter? In general different assembly methods are doing quite different things so they will normally produce different results.

aroavaron · 2024-06-11T14:39:15Z

Yes, exactly! I used the same genome (at chromosome level of the species that I'm working on) for both approaches. I fully agree with you that different approaches would generate different results, but I would like to understand a little better what is going on, as I was not expecting a 42% drop in the number of loci retained using the second approach. Thank you for the quick reply!

isaacovercast · 2024-06-11T14:43:04Z

Well, the reference_as_filter removes any reads that map to the reference sequence, so the 6,378 loci you retained in this assembly are all the loci that don't map well to the reference (for whatever reason). Either they are off target, or the reference is distant from the focal taxon, or the assembly quality is not perfect. Does that help?

aroavaron changed the title *.ustr file not generated when using genome as a reference *.ustr file and other files not generated when using a genome as a reference in the assembly method Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*.ustr file and other files not generated when using a genome as a reference in the assembly method #542

*.ustr file and other files not generated when using a genome as a reference in the assembly method #542

aroavaron commented Jan 28, 2024

isaacovercast commented Jan 30, 2024

aroavaron commented Jan 30, 2024

isaacovercast commented Jan 30, 2024

aroavaron commented Jan 31, 2024

aroavaron commented Feb 2, 2024

isaacovercast commented Feb 5, 2024

aroavaron commented Jun 11, 2024

isaacovercast commented Jun 11, 2024

aroavaron commented Jun 11, 2024

isaacovercast commented Jun 11, 2024

*.ustr file and other files not generated when using a genome as a reference in the assembly method #542

*.ustr file and other files not generated when using a genome as a reference in the assembly method #542

Comments

aroavaron commented Jan 28, 2024

isaacovercast commented Jan 30, 2024

aroavaron commented Jan 30, 2024

isaacovercast commented Jan 30, 2024

aroavaron commented Jan 31, 2024

ipyrad [v.0.9.93] Interactive assembly and analysis of RAD-seq data

aroavaron commented Feb 2, 2024

isaacovercast commented Feb 5, 2024

aroavaron commented Jun 11, 2024

isaacovercast commented Jun 11, 2024

aroavaron commented Jun 11, 2024

isaacovercast commented Jun 11, 2024

ipyrad [v.0.9.93]
Interactive assembly and analysis of RAD-seq data