Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty output fixed_wiped_paired_interleaving.fastq.gz #27

Open
moniquevdor opened this issue Sep 9, 2024 · 7 comments
Open

Empty output fixed_wiped_paired_interleaving.fastq.gz #27

moniquevdor opened this issue Sep 9, 2024 · 7 comments
Assignees

Comments

@moniquevdor
Copy link

Hi,
I am running fastqwiper using conda to fix one of my files (using the "slow" way). I have paired files (R1 and R2), and only one of them is corrupted (R2). Everything is installed and runs like I expect it to, but the final "fixed_wiped_paired_interleaving.fastq.gz" files are empty. Both summaries seem fine:

FASTQWIPER SUMMARY (of R1):

Clean lines: 110381572/116890740 (94.43%)
Not printable or uncompliant header lines: 6417448/116890740
Fixed header lines: 29797/116890740
BAD SEQ lines: 45661/116890740
BAD '+' lines: 130/116890740
Fixed + lines: 0/116890740
BAD QUAL lines: 2/116890740
QUAL out of range lines: 2/116890740
Len(SEQ) neq Len(QUAL): 0/116890740
Blank lines: 0/116890740

FASTQWIPER SUMMARY (of R2):

Clean lines: 283256/306651 (92.37%)
Not printable or uncompliant header lines: 23130/306651
Fixed header lines: 100/306651
BAD SEQ lines: 130/306651
BAD '+' lines: 2/306651
Fixed + lines: 0/306651
BAD QUAL lines: 0/306651
QUAL out of range lines: 0/306651
Len(SEQ) neq Len(QUAL): 0/306651
Blank lines: 0/306651

My commands are the following:

snakemake --config sample_name=GLFa03-LGE7166_L1 qin=33 alphabet=ACGTN log_freq=1000 -s pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores 4 -np
snakemake --config sample_name=GLFa03-LGE7166_L1 qin=33 alphabet=ACGTN log_freq=1000 -s pipeline/fix_wipe_pairs_reads_sequential.smk --dag | dot -Tpdf > dag_GLFa03-LGE7166_L1.pdf
snakemake --config sample_name=GLFa03-LGE7166_L1 alphabet=ACGTN qin=33 log_freq=1000 -s pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores 2

I also don't see anything out of the ordinary in my snakemake log.
Would you have any recommendations on how to fix this issue? Thank you!

@mazzalab
Copy link
Owner

mazzalab commented Sep 9, 2024

Hi. Since you are following the slow path, could you edit the snakemake pipeline to remove the temp keyword from the rules and verify which fastq file starts to be empty? This would be useful to understand which rule does not work.

Another way would be to share the fastq files with us and we will debug th code.

Let us know

@moniquevdor
Copy link
Author

moniquevdor commented Sep 10, 2024

Thank you! These are the file sizes for the non-corrupted file (R1):

GLFa03-LGE7166_L1_R1.fastq.gz ~4.8GB
GLFa03-LGE7166_L1_R1_fixed.fastq.gz ~9.3GB
GLFa03-LGE7166_L1_R1_fixed_wiped.fastq.gz ~2.3GB
GLFa03-LGE7166_L1_R1_fixed_wiped_paired.fastq.gz ~6.5MB
GLFa03-LGE7166_L1_R1_fixed_wiped_unpaired.fastq.gz 1KB
GLFa03-LGE7166_L1_R1_fixed_wiped_paired_interleaving.fastq.gz 1KB

As for the corrupted file (R2):

GLFa03-LGE7166_L1_R2.fastq.gz ~5.2GB
GLFa03-LGE7166_L1_R2_fixed.fastq.gz ~24MB
GLFa03-LGE7166_L1_R2_fixed_wiped.fastq.gz ~6.5MB
GLFa03-LGE7166_L1_R2_fixed_wiped_paired.fastq.gz ~6.7MB
GLFa03-LGE7166_L1_R2_fixed_wiped_unpaired.fastq.gz 1KB
GLFa03-LGE7166_L1_R2_fixed_wiped_paired_interleaving.fastq.gz 1KB

It seems like file sizes go down by a lot after wiping, in the corrupted, but also in the non-corrupted file. Is it possible that the program overfilters somehow? Or would this indicate that there is something wrong with my data?
This one corrupted file is the worst one out of three, the other ones give me the following:

GLFa04-LGE7167_L1_R1.fastq.gz ~4.7GB
GLFa04-LGE7167_L1_R1_fixed_wiped.fastq.gz ~138MB
GLFa05-LGE7168_L1_R1.fastq.gz ~5.1GB
GLFa05-LGE7168_L1_R1_fixed_wiped.fastq.gz ~2GB

If the program does over-filter, I might still be able to use the last one (as it's roughly the same size as a non-corrupted file after wiping).
When I run fastqwiper in single mode on just my corrupted files, and then run trimmomatic as I do with the rest of my data, the pairing does work, so I am considering just going ahead with that. If you have any recommendations on making the wiping step less conservative, please let me know.

Thank you for your help!

@mazzalab
Copy link
Owner

Wiping seems to be aggressive on our data. The process of wiping checks a few rules to guarantee the fastq file is generally well-formed.

I'd like to debug fastqwiper on your data. Would you mind sharing just the GLFa03-LGE7166_L1_R2_fixed.fastq.gz file, which should be small in size and should be your corrupted file just recovered by gzrt before wiping?

That would be very useful

@moniquevdor
Copy link
Author

Of course, thank you!
GLFa03-LGE7166_L1_R2_fixed.fastq.gz

@mazzalab
Copy link
Owner

great! I leave the issue open and give you a feedback in this thread as soon as debugged

@mazzalab
Copy link
Owner

just a note: the ourput of the first tool (fix_gzrt) should be: GLFa03-LGE7166_L1_R2_fixed.fastq (not gzipped).
Did you gzipped this file? The point is that "GLFa03-LGE7166_L1_R2_fixed.fastq" should be readable while "GLFa03-LGE7166_L1_R2_fixed.fastq.gz" that you sent here is not, that is strange.

image

@mazzalab mazzalab self-assigned this Sep 10, 2024
@moniquevdor
Copy link
Author

moniquevdor commented Sep 10, 2024

Whoops sorry about that, here you go.
GLFa03-LGE7166_L1_R2_fixed.zip
GitHub doesn't let me upload the fastq file on it's own, they don't support that file type.

EDIT: just to clarify, the first output is indeed GLFa03-LGE7166_L1_R2_fixed.fastq, but when I run the command

head GLFa03-LGE7166_L1_R2_fixed.fastq

it looks like a zipped file. I thought that fastqwiper just named the file incorrectly, so in my original upload I added .gz manually before sending it to you. For this second upload I actually zipped the file.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants