Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove gaps and extract 8bp #3

Open
JingGuo1997 opened this issue Sep 12, 2023 · 2 comments
Open

remove gaps and extract 8bp #3

JingGuo1997 opened this issue Sep 12, 2023 · 2 comments

Comments

@JingGuo1997
Copy link

JingGuo1997 commented Sep 12, 2023

hi,scan-seq2 developer:
The single-cell third-generation transcriptome sequencing that you have developed is extremely exciting. When I replicate your data I have some doubts.
1、the code,My understanding is to remove reads whose length is less than 108bp, but I don't know where remove gaps is embodied in coding and why remove gaps should be removed.

########read length < 100 and remove gaps

seqkit seq -m 108 -g ${cell}_full_length.fastq > ${cell}_full_length_filtered.fastq
rm -f ${cell}_full_length.fastq
2、the code,removeing extract 8bp will truncate ploy A by 8bp in reads where umi has been removed. Why do you want to do this?

####remove extra 8 bp

cutadapt -u -8 -o ${cell}_full_length_filtered.fastq ${cell}_full_length_filtered.extract.fastq
rm -f ${cell}_full_length_filtered.extract.fastq

Sincerely look forward to your reply!

@liuzhenyu-yyy
Copy link
Owner

Hi Jing,

Glad to see that you're interested in our work. The gaps in fastq sequences are removed with seqkit:

seqkit seq -m 108 -g ${cell}_full_length.fastq > ${cell}_full_length_filtered.fastq

where -m specifies minimum read length and -g means remove gaps letters. Check more on the Usage and Examples page of seqkit. Technically, remove gaps improves quality of ONT reads mapping. In our pipeline it's just a habitual behavior and we had not test how much this could improve the performance of the pipeline,

An extra 8-bp is removed to deal with unexpected insertions or base errors caused by Nanopore sequencing and guarantee the ploy-A trimming step to work properly, Since ploy-A length was not included in our analysis, we figure would be ok to remove a few based in the end of ploy-A. This is not mandatory and should be adjusted according to you data.

Please let me know if you have any further questions.

@JingGuo1997
Copy link
Author

Hi Zhenyu,
Thank you for your prompt response; your reply has been very helpful to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants