no intro on how to use the group meta yaml file and the tool does not validate yaml properly #16

byb121 · 2019-01-18T11:47:22Z

When input sequencing files are in fastq format, how to use the yaml file is tricky. For example the paired fastq file names have to ended with _1.fq.gz and _2.fg.gz, but such converntions are not mentioned anywhere.

Our code does not validate the file properly either, a few cases below:

If SM tag is missed in the yaml file, cgpmap runs without complains, but will ignore rg id in the yaml file and generate a random one.
Identical fastq file names in the file will not trigger any error/warning.
When a input file is missed in the file, it will not complain.

I think it'll be better if we have a flag option specificly for single-ended fastqs. Cgpmap will assume inputs are paired-ended, and complains if inputs are neither interleaved nor paired, and if it's really a single ended input, user will need to label them specificly. Currently it just went on with its own assumptions silently.

The text was updated successfully, but these errors were encountered:

keiranmraine · 2019-01-30T09:10:50Z

The docs exist in the underlying tool documentation but need to be linked:

https://github.com/cancerit/PCAP-core/wiki/File-Formats-groupinfo.yaml

For example the paired fastq file names have to ended with _1.fq.gz and _2.fg.gz

I think this could be handled better by modifying the yaml format slightly, this was added very late previously you had no ability to include any header information. Pushing the files into the readgroup records would allow explicit pairing, but work on the underlying calls to the aligner will be needed.

SM: sample
# the actual readgroups
READGRPS:
  ID: 9
    files:
     - fq_1_00001.fq.gz
     - fq_2_00001.fq.gz
    CN: centre
    DS: Please don't use multiline
    LB: Library_id
    PI: 500
    PL: FORCED TO UPPER
    PM: HiSeq-XTen
    PU: 1234_1
  ID: 10
    files:
     - fq_1.fq.gz
     - fq_2.fq.gz

The 3 issues are definite bugs.

keiranmraine mentioned this issue Jan 30, 2019

Improve yaml to make paired read processing more robust cancerit/PCAP-core#29

Open

keiranmraine added enhancement documentation labels Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no intro on how to use the group meta yaml file and the tool does not validate yaml properly #16

no intro on how to use the group meta yaml file and the tool does not validate yaml properly #16

byb121 commented Jan 18, 2019

keiranmraine commented Jan 30, 2019 •

edited

Loading

no intro on how to use the group meta yaml file and the tool does not validate yaml properly #16

no intro on how to use the group meta yaml file and the tool does not validate yaml properly #16

Comments

byb121 commented Jan 18, 2019

keiranmraine commented Jan 30, 2019 • edited Loading

keiranmraine commented Jan 30, 2019 •

edited

Loading