Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
daler committed Apr 10, 2023
1 parent 90d219d commit a487dff
Show file tree
Hide file tree
Showing 7 changed files with 122 additions and 110 deletions.
5 changes: 4 additions & 1 deletion docs/conda.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,16 +128,19 @@ dependency tree and come up with a solution that works to satisfy the entire
set of specified requirements.

We chose to split the conda environments in two: the **main** environment and the **R**
environment (see :ref:`conda-design-decisons`). These environments are
environment (see :ref:`conda-design-decisions`). These environments are
described by both "strict" and "loose" files. By default we use the "strict"
version, which pins all versions of all packages exactly. This is preferred
wherever possible. However we also provide a "loose" version that is not
specific about versions. The following table describes these files:

+----------------+--------------------------------+----------------------------------+
| strict version | loose version | used for |
+================+================================+==================================+
| ``env.yml`` | ``include/requirements.txt`` | Main Snakefiles |
+----------------+--------------------------------+----------------------------------+
| ``env-r.yaml`` | ``include/requirements-r.txt`` | Downstream RNA-seq analysis in R |
+----------------+--------------------------------+----------------------------------+

When deploying new instances, use the ``--build-envs`` argument which will use
the strict version. Or use the following commands in a deployed directory:
Expand Down
4 changes: 2 additions & 2 deletions docs/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
.. _config:


Configuration details
=====================
Configuration
=============

General configuration
~~~~~~~~~~~~~~~~~~~~~
Expand Down
69 changes: 44 additions & 25 deletions docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,14 @@
Getting started
===============

The main prerequisite for `lcdb-wf` is `conda
<https://docs.conda.io/en/latest/>_`, with the `bioconda
<https://bioconda.github.io>`_. channel set up and the `mamba
<https://github.com/mamba-org/mamba>`_ drop-in replacement for conda.
The main prerequisite for `lcdb-wf` is `conda <https://docs.conda.io/en/latest/>_`, with the `bioconda <https://bioconda.github.io>`_. channel set up and the `mamba <https://github.com/mamba-org/mamba>`_ drop-in replacement for conda installed.

If this is new to you, please see :ref:`conda-envs`.

.. note::

`lcdb-wf` is tested and heavily used on Linux.

It is likely to work on macOS as long as all relevant conda packages are
available for macOS -- though this is not tested.

It will **not** work on Windows due to a general lack of support of Windows
in bioinformatics tools.
`lcdb-wf` is tested and heavily used on Linux. It is only supported on
Linux.

.. _setup-proj:

Expand All @@ -27,21 +19,24 @@ Setting up a project

The general steps to use lcdb-wf in a new project are:

1. **Deploy:** download and run ``deploy.py``
1. **Deploy:** download and run ``deploy.py`` to copy files into a project directory
2. **Configure:** set up samples table for experiments and edit configuration file
3. **Run:** activate environment and run the Snakemake file either locally or on a cluster

.. _deploy:

1. Deploying lcdb-wf
--------------------
Using `lcdb-wf` starts with copying files to a project directory, or
"deploying".

Unlike other tools you may have used, `lcdb-wf` is not actually installed per
se. Rather, it is "deployed" by copying over relevant files from the `lcdb-wf`
repository to your project directory. This includes Snakefiles, config files,
and other infrastructure required to run, and excludes files like these docs
and testing files that are not necessary for an actual project. The reason to
use this script is so you end up with a cleaner project directory.
and testing files that are not necessary for an actual project. The reason is
to use this script is so you end up with a cleaner project directory, compared
to cloning the repo directly.

This script also writes a file to the destination called
``.lcdb-wf-deployment.json``. It stores the timestamp and details about what
Expand All @@ -53,8 +48,8 @@ There are a few ways of doing this.
Option 1: Download and run the deployment script
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Note that you will not be able to run tests with this method, but it is likely
the most convenient method.
This is the most convenient method, although it does not allow running tests
locally.

.. code-block:: bash
Expand All @@ -63,7 +58,8 @@ the most convenient method.
Run ``python deploy.py -h`` to see help. Be sure to use the ``--staging`` and
``--branch=$BRANCH`` arguments when using this method, which will clone the
repository to a location of your choosing. Once you deploy you can remove it. For example:
repository to a location of your choosing. Once you deploy you can remove the
script. For example:

.. code-block:: bash
Expand All @@ -78,6 +74,12 @@ repository to a location of your choosing. Once you deploy you can remove it. Fo
# You can clean up the cloned copy if you want:
# rm -rf /tmp/lcdb-wf-tmp
This will clone the full git repo to ``/tmp/lcdb-wf-tmp``, check out the master
branch (or whatever branch ``$BRANCH`` is set to), copy the files required for
an RNA-seq project over to ``analysis/project``, build the main conda
environment and the R environment, save the ``.lcdb-wf-deployment.json`` file
there, and then delete the temporary repo.

Option 2: Clone repo manually
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Clone a repo using git and check out the branch. Use this method for running
Expand Down Expand Up @@ -140,9 +142,9 @@ and run the following:
If all goes well, this should print a list of jobs to be run.

You can run locally, but this is NOT recommended. To run locally, choose the
number of CPUs you want to use with the ``-j`` argument as is standard for
Snakemake.
You can run locally, but this is NOT recommended for a typicaly RNA-seq
project. To run locally, choose the number of CPUs you want to use with the
``-j`` argument as is standard for Snakemake.

.. warning::

Expand All @@ -157,18 +159,35 @@ Snakemake.
# run locally (not recommended)
snakemake --use-conda -j 8
The recommended way is to run on a cluster. On NIH's Biowulf cluster, the way
to do this is to submit the wrapper script as a batch job:
The recommended way is to run on a cluster.

To run on a cluster, you will need a `Snakemake profile
<https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles>`_ for
your cluster that translates generic resource requirements into arguments for
your cluster's batch system.

On NIH's Biowulf cluster, the profile can be found at
https://github.com/NIH-HPC/snakemake_profile. If you are not already using this for other Snakemake workflows, you can set it up the first time like this:

1. Clone the profile to a location of your choosing, maybe
``~/snakemake_profile``
2. Set the environment variable ``LCDBWF_SNAKEMAKE_PROFILE``, perhaps in your
``~/.bashrc`` file.

Then back in your deployed and configured project, submit the wrapper script as
a batch job:

.. code-block:: bash
sbatch ../../include/WRAPPER_SLURM
and then monitor the various jobs that will be submitted on your behalf. See
This will submit Snakemake as a batch job, use the profile to translate
resources to cluster arguments and set default command-line arguments, and
submit the various jobs created by Snakemake to the cluster on your behalf. See
:ref:`cluster` for more details on this.

Other clusters will need different configuration, but everything is standard
Snakemake. The Snakemake documentation on `cluster execution
Other clusters will need different configuration, but everything in `lcdb-wf`
is standard Snakemake. The Snakemake documentation on `cluster execution
<https://snakemake.readthedocs.io/en/stable/executing/cluster.html>`_ and
`cloud execution
<https://snakemake.readthedocs.io/en/stable/executing/cloud.html>`_ can be
Expand Down
41 changes: 28 additions & 13 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,18 @@ Non-model organism? Custom gene annotations? Complicated regression models?
Unconventional command-line arguments to tools? New tools to add to the
workflow? No problem.

Tested automatically
--------------------
Every change to the code on GitHub triggers an automated test, the results of
which you can find at https://circleci.com/gh/lcdb/lcdb-wf. Each test sets the
system up from scratch, including installing all software, downloading example
data, and running everything up through the final results. This guarantees that
you can set up and test the code yourself.
Extensive downstream RNA-seq
----------------------------
A comprehensive RMarkdown template, along with a custom R package, enables
sophisticated RNA-seq analysis that supports complex experimental designs and
many contrasts.

Extenstive exploration of ChIP-seq peaks
----------------------------------------
The ChIP-seq configuration supports multiple peak-callers as well as calling
peaks with many different parameter sets for each caller. Combined with
visualizaiton in track hubs (see below), this can identify the optimal
parameters for a given experiment.

Track hubs
----------
Expand All @@ -44,11 +49,10 @@ a site to get lots of genomes you can use for running `fastq_screen`, and
easily include arbitrary other genomes. They can then be automatically included
in RNA-seq and ChIP-seq workflows.

This system is designed to allow customization as the config file
can be used to include arbitrary genomes whether local or on the web.
The `references` workflow need only be run once for all these genomes
to be created, with the `references_dir` being used as a centralized
repository that can be then used with all other workflows.
Arbitrary genomes can be used, whether local (e.g., customized with additional
genetic constructs) or on the web. The `references` workflow need only be run
once for all these genomes to be created, with the `references_dir` being used
as a centralized repository that can be then used with all other workflows.

Integration with external data and figure-making
------------------------------------------------
Expand All @@ -59,6 +63,15 @@ If an upstream file changes (e.g., gene annotation), all dependent downstream
jobs -- including figures -- will be updated so you can ensure that even
complex analyses stay correct and up-to-date.

Tested automatically
--------------------
Every change to the code on GitHub triggers an automated test, the results of
which you can find at https://circleci.com/gh/lcdb/lcdb-wf. Each test sets the
system up from scratch, including installing all software, downloading example
data, and running everything up through the final results. This guarantees that
you can set up and test the code yourself.


All the advantages of Snakemake
-------------------------------

Expand All @@ -78,7 +91,9 @@ Only run the required jobs
~~~~~~~~~~~~~~~~~~~~~~~~~~
New gene annotation? Snakemake tracks dependencies, so it will detect that the
annotations changed. Only jobs that depend on that file at some point in their
dependency chain will be re-run and the independent files are untouched.
dependency chain will be re-run and the independent files are untouched. Adding
a new sample will leave untouched any output from samples that have already
run.

Parallelization
~~~~~~~~~~~~~~~
Expand Down
17 changes: 1 addition & 16 deletions docs/tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,7 @@ This assumes you have set up the `bioconda channel
We **highly recommend** using conda for isolating projects and for analysis
reproducibility. If you are unfamiliar with conda, we provide a more detailed look
at:

.. toctree::
:maxdepth: 2

conda
at :ref:`conda-envs`.


Activate the main env
Expand Down Expand Up @@ -186,13 +181,3 @@ Exhaustive tests
The file ``.circleci/config.yml`` configures all of the tests that are run on
CircleCI. There's a lot of configuration happening there, but look for the
entries that have ``./run_test.sh`` in them to see the commands that are run.

Next steps
----------

Now that you have tested your installation of ``lcdb-wf`` you can learn about the
different workflows implemented here at the :ref:`workflows` page and see details
on configuration at :ref:`config`, before getting started on your analysis.

In addition, :ref:`setup-proj` explains the process of deploying ``lcdb-wf``
to a project directory.
5 changes: 3 additions & 2 deletions docs/toc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,20 @@ Table of Contents
=================

.. toctree::
:maxdepth: 2
:maxdepth: 3

index
getting-started
guide
tests
workflows
config
references
rnaseq
downstream-rnaseq
chipseq
integrative
conda
tests
faqs
changelog
developers
Expand Down
Loading

0 comments on commit a487dff

Please sign in to comment.