Skip to content

Commit

Permalink
Merge branch 'feat/haplotypes' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
aryarm committed May 14, 2022
2 parents 68b3047 + 34a839d commit 9b4393a
Show file tree
Hide file tree
Showing 30 changed files with 2,080 additions and 269 deletions.
15 changes: 15 additions & 0 deletions docs/api/haptools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,18 @@ haptools.data.phenotypes module
:undoc-members:
:show-inheritance:

haptools.data.covariates module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: haptools.data.covariates
:members:
:undoc-members:
:show-inheritance:

haptools.data.haplotypes module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: haptools.data.haplotypes
:members:
:undoc-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion docs/commands/simgenotype.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Haptools simgenotype
# simgenotype

`haptools simgenotype` takes as input a reference set of haplotypes in VCF format and a user-specified admixture model. It outputs a VCF file with simulated genotype information for admixed genotypes, as well as a breakpoints file that can be used for visualization.

Expand Down
2 changes: 1 addition & 1 deletion docs/commands/simgenotype.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _subcommands-simgenotype:
.. _commands-simgenotype:

.. include:: simgenotype.md
:parser: myst_parser.sphinx_
6 changes: 3 additions & 3 deletions docs/commands/simphenotype.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Haptools simphenotype
# simphenotype

Haptools simphenotype simulates a complex trait, taking into account haplotype- or local-ancestry- specific effects as well as traditional variant-level effects. It takes causal effects and genotypes as input and outputs simulated phenotypes.

Expand All @@ -19,9 +19,9 @@ haptools simphenotype \

Required parameters:

* `--vcf <string>`: A bgzipped, tabix-indexed, phased VCF file. If you are simulating local-ancestry effects, the VCF file must contain the `FORMAT/LA` tag included in output of `haptools simgenotype`. See [haptools file formats](../../docs/project_info/haptools_file_formats.rst) for more details.
* `--vcf <string>`: A bgzipped, tabix-indexed, phased VCF file. If you are simulating local-ancestry effects, the VCF file must contain the `FORMAT/LA` tag included in output of `haptools simgenotype`. See [haptools file formats](../../docs/formats/inputs.rst) for more details.

* `--hap <string>`: A bgzipped, tabix-indexed HAP file, which specifies causal effects. This is a custom format described in more detail in [haptools file formats](../../docs/project_info/haptools_file_formats.rst). The HAP format enables flexible specification of a range of effect types including traditional variant-level effects, haplotype-level effects, associations with repeat lengths at short tandem repeats, and interaction of these effects with local ancestry labels. See [Examples](#examples) below for detailed examples of how to specify effects.
* `--hap <string>`: A bgzipped, tabix-indexed HAP file, which specifies causal effects. This is a custom format described in more detail in [haptools file formats](../../docs/formats/haplotypes.rst). The HAP format enables flexible specification of a range of effect types including traditional variant-level effects, haplotype-level effects, associations with repeat lengths at short tandem repeats, and interaction of these effects with local ancestry labels. See [Examples](#examples) below for detailed examples of how to specify effects.

* `--out <string>`: Prefix to name output files.

Expand Down
2 changes: 1 addition & 1 deletion docs/commands/simphenotype.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _subcommands-simphenotype:
.. _commands-simphenotype:

.. include:: simphenotype.md
:parser: myst_parser.sphinx_
35 changes: 35 additions & 0 deletions docs/commands/transform.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. _commands-transform:


transform
=========

Transform a set of genotypes via a list of haplotypes. Create a new VCF containing haplotypes instead of variants.

The ``transform`` command takes as input a set of genotypes in VCF and a list of haplotypes (specified as a :doc:`.hap file </formats/haplotypes>`) and outputs a set of haplotype "genotypes" in VCF.

Usage
~~~~~
.. code-block:: bash
haptools transform \
--region TEXT \
--sample SAMPLE \
--samples-file FILENAME \
--output PATH \
--verbosity [CRITICAL|ERROR|WARNING|INFO|DEBUG|NOTSET] \
GENOTYPES HAPLOTYPES
Examples
~~~~~~~~
.. code-block:: bash
haptools transform tests/data/example.vcf.gz tests/data/example.hap.gz | less
Detailed Usage
~~~~~~~~~~~~~~

.. click:: haptools.__main__:main
:prog: haptools
:show-nested:
:commands: transform
168 changes: 168 additions & 0 deletions docs/formats/haplotypes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
.. _formats-haplotypes:


.hap
====

This document describes our custom file format specification for haplotypes: the ``.hap`` file.

This is a tab-separated file composed of different types of lines. The first field of each line is a single, uppercase character denoting the type of line. The following line types are supported.

.. list-table::
:widths: 25 25
:header-rows: 1

* - Type
- Description
* - #
- Comment
* - H
- Haplotype
* - V
- Variant

Each line type (besides #) has a set of mandatory fields described below. Additional "extra" fields can be appended to these to customize the file.

``#`` Comment line
~~~~~~~~~~~~~~~~~~
Comment lines begin with ``#`` and are ignored. Consecutive comment lines that appear at the beginning of the file are treated as part of the header.

Extra fields must be declared in the header. The declaration must be a tab-separated line containing the following fields:

1. Line type (ex: ``H`` or ``V``)
2. Name
3. Python format string (ex: 'd' for int, 's' for string, or '.3f' for a float with 3 decimals)
4. Description

Note that the first field must follow the ``#`` symbol immediately (ex: ``#H`` or ``#V``).


``H`` Haplotype
~~~~~~~~~~~~~~~
Haplotypes contain the following attributes:

.. list-table::
:widths: 25 25 25 50
:header-rows: 1

* - Column
- Field
- Type
- Description
* - 1
- Chromosome
- string
- The contig that this haplotype belongs on
* - 2
- Start Position
- int
- The start position of this haplotype on this contig
* - 3
- End Position
- int
- The end position of this haplotype on this contig
* - 4
- Haplotype ID
- string
- Uniquely identifies a haplotype

``V`` Variant
~~~~~~~~~~~~~
Each variant line belongs to a particular haplotype. These lines contain the following attributes:

.. list-table::
:widths: 25 25 25 50
:header-rows: 1

* - Column
- Field
- Type
- Description
* - 1
- Haplotype ID
- string
- Identifies the haplotype to which this variant belongs
* - 2
- Start Position
- int
- The start position of this variant on its contig
* - 3
- End Position
- int
- The end position of this variant on its contig

Usually the same as the Start Position
* - 4
- Variant ID
- string
- The unique ID for this variant, as defined in the genotypes file
* - 5
- Allele
- string
- The allele of this variant within the haplotype

Examples
~~~~~~~~
You can find an example of a ``.hap`` file without any extra fields in `tests/data/basic.hap <https://github.com/gymrek-lab/haptools/blob/main/tests/data/basic.hap>`_:

.. include:: ../../tests/data/basic.hap
:literal:

You can find an example with extra fields added within `tests/data/simphenotype.hap <https://github.com/gymrek-lab/haptools/blob/main/tests/data/simphenotype.hap>`_:

.. include:: ../../tests/data/simphenotype.hap
:literal:


Compressing and indexing
~~~~~~~~~~~~~~~~~~~~~~~~
We encourage you to bgzip compress and/or index your ``.hap`` file whenever possible. This will reduce both disk usage and the time required to parse the file.

.. code-block:: bash
sort -k1,4 -o file.hap file.hap
bgzip file.hap
tabix -s 2 -b 3 -e 4 file.hap.gz
In order to properly index the file, the IDs in the haplotype lines must be different from their chromosomes. In addition, you must sort on the first field (ie the line type symbol) in addition to the latter three.

Extra fields
~~~~~~~~~~~~
Additional fields can be appended to the ends of the haplotype and variant lines as long as they are declared in the header.

haptools extras
---------------
The following extra fields should be declared for your ``.hap`` file to be compatible with ``simphenotype``.

.. code-block::
#H ancestry s Local ancestry
#H beta .2f Effect size in linear model
..
_TODO: figure out how to tab this code block so that the tabs get copied when someone copies from it

``H`` Haplotype
+++++++++++++++

.. list-table::
:widths: 25 25 25 50
:header-rows: 1

* - Column
- Field
- Type
- Description
* - 5
- Local Ancestry
- string
- A population code denoting this haplotype's ancestral origins
* - 6
- Effect Size
- float
- The effect size of this haplotype; for use in ``simphenotype``

``V`` Variant
+++++++++++++
No extra fields are required here.
2 changes: 1 addition & 1 deletion docs/executing/inputs.rst → docs/formats/inputs.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _executing-inputs:
.. _formats-inputs:


Inputs
Expand Down
9 changes: 6 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@
:parser: myst_parser.sphinx_

.. toctree::
:caption: Execution
:name: executing
:caption: File Formats
:name: formats
:hidden:
:maxdepth: 1

executing/inputs.rst
formats/inputs.rst
formats/haplotypes.rst

.. toctree::
:caption: Commands
Expand All @@ -18,6 +19,8 @@
:maxdepth: 1

commands/simgenotype.rst
commands/simphenotype.rst
commands/transform.rst

.. toctree::
:caption: API
Expand Down
3 changes: 0 additions & 3 deletions docs/project_info/haptools_file_formats.rst

This file was deleted.

Loading

0 comments on commit 9b4393a

Please sign in to comment.