Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*.ustr file and other files not generated when using a genome as a reference in the assembly method #542

Open
aroavaron opened this issue Jan 28, 2024 · 10 comments

Comments

@aroavaron
Copy link

Hello,

I ran the program using denovo assembly strategy and then using a genome as a reference. For the first one, ipyrad generated 18 output files. However, when I used the genome, it only generated 14 output files. The missing files are: *.migrate, *.treemix, *.ugeno, and, most importantly for me, *.ustr. Has anyone encountered the same issue? Any suggestions or insights would be appreciated.

Cheers,
A

@aroavaron aroavaron changed the title *.ustr file not generated when using genome as a reference *.ustr file and other files not generated when using a genome as a reference in the assembly method Jan 28, 2024
@isaacovercast
Copy link
Collaborator

Did you check the output_formats param in the params file? The default value will not generate all output files. Can you please set the output_formats equal to * (which indicates to create all output formats), and run step 7 again for the reference assembly (with the -f flag to generate new outputs). Please let me know if that works.

@aroavaron
Copy link
Author

Hi Issac, I should have mentioned that I used the "*" in the params to create all the output formats, but the run only generated 8 instead of 12 files.
Thanks for your quick reply!

@isaacovercast
Copy link
Collaborator

What version of ipyrad are you running? (ipyrad -v will print the version) Using the most recent version (0.9.93) I ran a reference assembly from scratch on the simulated data and using * as the output_formats value gave me the full complement of output files.

If it is the most recent version of ipyrad please share your .json file with me that is in the project dir.

@aroavaron
Copy link
Author

I ran ipyrad in the server and it's listed as ipyrad/0.9. I thought also that it could be an issue with the version, so I installed the latest version (ipyrad_0.9.93) in my miniconda environment.
However, I'm currently encountering a different issue and troubleshooting it. The program starts, but it gets stuck at the first step. No files are generated, so I can't share the .json file yet!


ipyrad [v.0.9.93]
Interactive assembly and analysis of RAD-seq data

Parallel connection | compute-65-17: 20 cores

Step 1: Loading sorted fastq data to Samples

@aroavaron
Copy link
Author

Quick update. It was not running because of a lack of memory!

I'm running ipyrad for two species. For the first one ipyrad finished successfully and generated all the outputs. I used denovo assembly and the genome reference as filter (step #29). After comparing with my previous run using ipyrad 0.9.12 and the genome as a reference the amount of retained loci dropped from 22K to 6K and the amount of missing data for both the snps matrix and the sequence matrix decreased from 20.5% to 10.6%. Could it be possible that the newer version has different criteria?

Regarding the second species, ipyrad has not been able to pass step #6. Attached is the json file. I hope it helps to find out what could be the error. Fingers crossed for a quick and easy solution!

Thanks,
A

vsref_fil_200k_85p_denovo.json

@isaacovercast
Copy link
Collaborator

@aroavaron In general the newest version of ipyrad should be trusted more than any previous version, for the fact that we are always fixing bugs. The difference in results between 0.9.93 and 0.9.12 (very old) is not so surprising. I would trust the newest version.

As for the second species, can you tell me what is the error you are getting during step 6? If you can show me all the command line output and the full error message when it dies that would be very helpful.

@aroavaron
Copy link
Author

I ran the latest version of ipyrad (0.9.93) and used two assembly approaches. The reference approach resulted in 15,304 loci retained (26.3 % SNPs matrix missing sites / 28.1% sequences matrix missing sites), while the denovo-reference reference using the reference (in this case a genome) as filter approach (parameter #29) recovered 6,378 loci (14.6% SNPs matrix missing sites and 14.8% sequences matrix missing sites). For downstream analyses, it would be better to use the data with fewer missing values in general. However, I am curious about the reason(s) for the difference in the number of retained loci.

@isaacovercast
Copy link
Collaborator

I'm not sure i understand well what the two different assemblies were. In one case you did the 'reference' assembly using an 'on target' genome. In the 'denovo-reference' approach did you use this same genome sequences as the 'reference_as_filter' parameter? In general different assembly methods are doing quite different things so they will normally produce different results.

@aroavaron
Copy link
Author

Yes, exactly! I used the same genome (at chromosome level of the species that I'm working on) for both approaches. I fully agree with you that different approaches would generate different results, but I would like to understand a little better what is going on, as I was not expecting a 42% drop in the number of loci retained using the second approach. Thank you for the quick reply!

@isaacovercast
Copy link
Collaborator

Well, the reference_as_filter removes any reads that map to the reference sequence, so the 6,378 loci you retained in this assembly are all the loci that don't map well to the reference (for whatever reason). Either they are off target, or the reference is distant from the focal taxon, or the assembly quality is not perfect. Does that help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants