A tandem repeat (TR) catalog generated from high-quality long-read human genome assemblies

This repository keeps the analysis scripts that were used to generated the TR catalog from public diploid long-read human genome assemblies from the following data soucres:

Human Pangenome Reference Consortium (HPRC)
Human Genome Structural Variation Consortium (HGSVC2)
1000G ONT Sequencing Consortium

Workflow

Mapping of TRs from assemblies to the reference genome

Catalog

v1

haplotype names separated by semi-colons are shown in first header line preceded by '#'
column descriptions:

Column	Description
chrom	chromosome
start	start coordinate
end	end coordinate
motif	consensus repeat motif
copy_numbers	copy numbers in haplotypes separated by semi-colons ('-' for missing genotypes)
sizes	sizes (bp) in haplotypes separated by semi-colons ('-' for missing genotypes)
motifs	motifs in haplotypes separated by semi-colons ('-' for missing genotypes)
max_change	maximum change (of all haplotypes) in size (bp) substracted from reference genome size
num_samples	number of samples with genotype
num_calls	number of haplotypes with genotype
motif_frequency	number of haplotypes associated with each motif observed e.g. CAG(10);CAA(2)
feature	gene element overlapped. Format: gene\|transcript\|, where = exon#\|intron#\|utr5\|utr3\|cds\|promoter\|exon_bound (exon boundary)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A tandem repeat (TR) catalog generated from high-quality long-read human genome assemblies

Workflow

Catalog

Files

README.md

Latest commit

History

README.md

File metadata and controls

A tandem repeat (TR) catalog generated from high-quality long-read human genome assemblies

Workflow

Catalog