-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assets for human germline pipeline #21
Comments
@vsbuffalo Pinging you in case you have some time to look at it. Thanks :) |
Hey Alexis, I am definitely open to hosting this. I would suggest a more specific name, however (as I am still thinking how best to handle #13 too. The tricky thing is the separation of a remote manifest and the information about what the user wants locally. Either the remote manifest could have an additional line like |
Hi Vince,
This asset has the human reference genome and all database for annotation so it's "pipeline-ready'.
What about human-genome-annotation ? The first name was taken from scidataflow README :)
Regarding #13, adding a local flag to the manifest seems reasonnable (but I am not familiar with scidaraflow codeset).
…On Tuesday, August 20th, 2024 at 5:59 PM, Vince Buffalo ***@***.***> wrote:
Hey Alexis,
I am definitely open to hosting this. I would suggest a more specific name, however (as `human_genome_assets` is fairly general).
I am still thinking how best to handle #13 too. The tricky thing is the separation of a remote manifest and the information about what the user wants locally. Either the remote manifest could have an additional line like `local` that is a boolean or there could be another file that stores this information (which is a bit messy).
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Hey Alexis, It still needs to be a more specific name — "human-genome-annotation" is too general, since your asset has specific annotation such as CADD. How about |
Hi, Do you think assets should be formatted according to a set of rules ? With that format, I propose the asset for this PR to be renamed What do you think ? |
Hi @vsbuffalo, to focus on this PR, are you okay with the new asset name ? |
Hi,
I've put into a separate assets data needed to run germline analysis for Homo sapiens here :
https://github.com/apraga/human_genome_assetshttps://github.com/apraga/germline-analysis-vepDo you think it could be added to scidataflow assets ?
At the moment, there's only the latest version of a reference genome (pipeline-ready GRCh38) and databases for annotation (dbSNP, CADD score and VEP cache).
Each dataset is in a separate directory, with subdirectories specifying the genome version and the database version. This info is still in the filename for reference. I've taken the liberty of renaming files to make it more user-friendly.
I plan to update this repository frequently but am open to discussion about its structure.
Note: if #13 can be solved, that would make it easier to work with several version.
Thanks !
The text was updated successfully, but these errors were encountered: