Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the SortmeRNA process #141

Open
phuongdoand opened this issue Aug 16, 2021 · 9 comments
Open

Issue with the SortmeRNA process #141

phuongdoand opened this issue Aug 16, 2021 · 9 comments
Labels
bug Something isn't working

Comments

@phuongdoand
Copy link

phuongdoand commented Aug 16, 2021

Hi, I just found out about your pipeline a few days ago and decided to give it a test run.
I tried it on two datasets that I am currently analyzing and in one of the datasets, there is an error that occurred in the preprocess:hisat2 process.

image

The error said that the length of the sequence and the length of the quality value are not the same, so I traced back to the fastq file and found the sequence that triggered the error. In the end, I found out that there are multiple sequences with the same behavior:

image

Just for reference, I also went back to check the original FastQ file provided by the sequencing machine and their sequence length is the same as the quality length:

image

Because the preprocess:hisat2 used the *.other.fastq.gz file (which is the output of sortmeRNA), I believe that the error actually occurred with the sortmeRNA process when it merges the R1 and R2 to map to the rRNA genome then unmerge to return other.R1 and other.R2 files.

Since it is an old version of sortmeRNA and now they have fixed it, can you try to update sortmeRNA to a newer version?

@hoelzer
Copy link
Contributor

hoelzer commented Aug 17, 2021

@phuongdoand thanks for your interest into our pipeline and for reporting that issue.

I think we (and others) experienced this now a couple of times.

May I ask what your running nextflow run ... command is? And which profile are you running? Lastly, are you running nextflow in a screen or something similar? We had some issues related to such things in the past.

We already thought about replacing the current SortMeRNA version with the newer one but had other issues then. Actually, we currently even think about replacing SortMeRNA w/ www.github.com/hoelzer/clean (but using the same database SortMeRNA provides).

@MarieLataretu in the meantime it might be worth giving the newest SortMeRNA version another try?

@phuongdoand sorry for the inconvenience, but in the case your RNA-Seq samples anyway have only a low rRNA amount (see the logs in the SortMeRNA working dirs) you might just want to skip the process via the respective flag (see --help).

(ps: Marie and I are on holiday so we might be a bit slower in responding/looking into this)

@hoelzer hoelzer added the bug Something isn't working label Aug 17, 2021
@hoelzer
Copy link
Contributor

hoelzer commented Aug 17, 2021

Here we had some long discussion about this (and other) issues: #116

The user was running the pipeline w/ tmux and apparently this was causing strange issues with SortMeRNA in Nextflow that we were not able to fully unravel. So I bet that you also started the pipeline in a screen/tmux/... ? :)

If so, can you please give it a try without? Maybe you are able to use Nextflows build-in -bg parameter to put the run into the background.

@phuongdoand
Copy link
Author

@hoelzer Thank you very much for the prompt response and so sorry to disrupt you on your holiday.

Forgot to attach my command in the issue but here it is:

nextflow run hoelzer-lab/rnaflow --autodownload hsa --reads cleanData/input.csv --deg comparison.csv -profile docker -resume --mode paired -with-trace

I was indeed running the command with a screen session using the screen command. So is it causing sortmeRNA to not work properly?
It is quite interesting to know; in the meantime, I will try out nextflow's -bg flag or Linux bg to see if it resolves the problem.

@hoelzer
Copy link
Contributor

hoelzer commented Aug 18, 2021

@phuongdoand no problem ;)

Thanks for the command. btw it's always good to use the -r flag to point to a specific release of the pipeline you want to run. That way, your results are fully reproducible. You can see available release versions via

nextflow pull hoelzer-lab/rnaflow
nextflow info hoelzer-lab/rnaflow

And yes, we dont have an explanation yet but we also experienced this behavior of SortMeRNA in a screen session. It could be also that this only happens in the specific combination of SortMeRNA+Nextflow+Screen/tmux. It would be cool if you can test the same command w/o using a screen or the -bg nextflow command.

Another hint, w/ -bg you will not see the nice nextflow progress output but you can redirect this via

nextflow run ... -bg > some.log

and then just

tail some.log

to check what's going on.

@phuongdoand
Copy link
Author

@hoelzer Thank you for the tip and advice.

Nevertheless, I just finished running the command again with nextflow -bg flag but the same behavior occurred again. So I guess that this is not because of using screen/tmux problem but rather sortmeRNA's trouble.

So in the end, I decided to use the --skip_sortmerna to avoid the error and also to accelerate the analysis pipeline.

@hoelzer
Copy link
Contributor

hoelzer commented Aug 19, 2021 via email

@phuongdoand
Copy link
Author

@hoelzer Sure, it will take quite some time for the sortmeRNA to run, so I will let you know the testing result when it is available.

@phuongdoand
Copy link
Author

phuongdoand commented Aug 20, 2021

@hoelzer Got the result today, the problem still occurs on the FASTQ file that I have been using.
I guess that this is a problem of the sortmeRNA merging and splitting processes. Since I have noticed that the quality value assigned for the same nucleotide differs between the other.fastq.gz and the original.fastq.gz, which is weird.

Btw, since you mentioned the README, it would be better if you can add the headline in the comparison.csv for the --deg flag. It actually took me quite some time to figured out that Condition1,Condition2 is required as the header of the file.

@hoelzer
Copy link
Contributor

hoelzer commented Aug 23, 2021

@phuongdoand okay thanks for reporting! And weird - I thought that running the pipeline and thus SMR outside of a screen/bg might solve the issue. But then we really need to replace SMR w/ the newer version or another tool. I will make a separate issue for that.

Regarding the README: thanks for the hint, this we can do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants