-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace input CSV with YAML and parse sample ID from BAM #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good stuff Faizal!
Made some comments/questions!
tumor_id_bam_bai = input_ch_samples_with_index | ||
.filter{ it.sample_type == 'tumor' } | ||
.map{ it -> [it.id, it.path, it.index] } | ||
normal_bam_bai = input_ch_samples_with_index | ||
.filter{ it.sample_type == 'normal' } | ||
.map{ it -> [it.path, it.index] } | ||
|
||
input_paired_bams_ch = tumor_id_bam_bai.combine(normal_bam_bai) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm this main input_paired_bams_ch has 4 items correct? I thought the first input used by the SV caller modules intakes a tuple with 5 arguments? From Delly:
input:
tuple(val(tumor_id), path(tumor_bam), path(tumor_bai), path(normal_bam), path(normal_bai))
Did you run into any issues there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues here. The input_paired_bams_ch
channel has 5 items, 3 from tumor_id_bam_bai
and 2 from normal_bam_bai
which are combined in L99.
Oh! Noticed this only parrtially addresses #103 . The output directory automatically created is named after the tumor "SM" tag I believe as opposed to the sample-id. In the case where these two are the same, this issue is hidden, but that's not the case for many datasets. Here for example, the sample ID is For context, the provided config file has the output directory path:
But the final output directory structure is detached from the provided sample-id in the yaml:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments:
This is intended as the SM tag removes any chance of user caused mislabelling of sample_id. |
Testing the updated PR on a real sample ILHNLNEV000009-T002-L01-F from one of the registered projects. /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-replace-input-csv-with-yaml/yaml_input_F32_ILHNLNEV000009-T002-L01-F.log |
While the test run for a real sample is still in progress, we can see the sample directory is set up as expected.
any other comments here? @tyamaguchi-ucla @yashpatel6 @Alfredo-Enrique |
Test runs completed successfully. /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-replace-input-csv-with-yaml/TWGSAMIN000001-T003-S03-F/call-sSV-5.0.0/S2_v1.1.5/ /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-replace-input-csv-with-yaml/ILHNLNEV000009-T002-L01-F/call-sSV-5.0.0/ILHNLNEV000009-T002-L01-F/ (Updated paths) @tyamaguchi-ucla @yashpatel6 |
@tyamaguchi-ucla @yashpatel6 I think this PR is ready to be merged and we can proceed with the next PR to release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
For the development inputs, /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/input/yaml/call-sSV-input.yaml
- I would suggest naming the file to indicate the sample rather than the generic name currently given; we'll likely have more YAMLs there in the future and it's clearer to name base on sample.
Anything else to add @tyamaguchi-ucla ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Description
Closes
The following related issues,
#103
#104
#105
Testing Results
This is an edge case where the sample ID in the BAM is different from the internal sample ID. However in production, the sample ID in the BAM would be the same as the internal sample ID.
Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the [github standards](https://confluence.mednet.ucla.edu/pages/viewpage.action?spaceKey=BOUTROSLAB&title=GitHub+Standards#GitHubStanda
rds-Branchprotectionrule) before opening this pull request.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline on at least one A-mini sample with
algorithm = ['delly', 'manta']
. The paths to the test config files and output directories are attached above.