Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add functionality to filter sequence data by Kraken hits #222

Open
johnchase opened this issue Dec 11, 2024 · 3 comments
Open

ENH: Add functionality to filter sequence data by Kraken hits #222

johnchase opened this issue Dec 11, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@johnchase
Copy link

johnchase commented Dec 11, 2024

This is a request for an additional feature, and not a problem with the existing q2-moshpit

Describe the solution you'd like
We would like a function that takes as input sequence data, and the kraken 2 hits (SampleData[Kraken2Output]) and returns two sets of sequence data, one for classified, and one for unclassified

Describe alternatives you've considered
We are currently doing this in pure python

Additional context
We have already started developing this internally, and will use the functionality in our workflows, however, q2-moshpit seems to be a logical place for the functionality

@johnchase johnchase added the enhancement New feature or request label Dec 11, 2024
@nbokulich
Copy link
Contributor

Thanks for opening the issue @johnchase ! What type of sequence data do you want to filter?

@johnchase
Copy link
Author

For the time being it would be SampleData[SequencesWithQuality], SampleData[JoinedSequencesWithQuality], SampleData[PairedEndSequencesWithQuality], though we could likely add any type of sample data sequences

We're using this to filter contaminants from metagenomic data. In our case the input sequences to classify_kraken2 are the same sequences to be filtered

@johnchase
Copy link
Author

@nbokulich I noticed that Kraken2, when run from the command line, has the option to return the classified and unclassified sequences. Would it make sense to include that as default output for the plugin? I can't speak for other users, but the classified and unclassified sequences are the main reason we run Kraken2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants