Skip to content

Building

javild edited this page Sep 18, 2014 · 18 revisions

Testing the software

First test that CellBAse Java software is correctly built, you must first follow this tutorial:

https://github.com/opencb/cellbase/wiki/installation

Go to CellBase folder and:

cd cellbase/cellbase-build/installation-dir
cellbase-build/installation-dir

You must get something like:

usage: cellbase-build.jar --build [--cosmic-file <arg>]
   [--description-file <arg>] [--fasta-file <arg>] [--gtf-file <arg>]
   [-i <arg>] [--log-level <arg>] [--mirna-file <arg>] -o <arg>
   [--psimi-tab-file <arg>] [-s <arg>] [--tfbs-file <arg>]
   [--xref-file <arg>]
Some options are mandatory for all possible 'builds', while others are
only mandatory for some specific 'builds':
--build                    Build values: core, genome_sequence,
                           variation, protein
--cosmic-file <arg>        Output directory to save the JSON result
--description-file <arg>   Output directory to save the JSON result
--fasta-file <arg>         Output directory to save the JSON result
--gtf-file <arg>           Output directory to save the JSON result
-i,--indir <arg>              Input directory with data files
--log-level <arg>          DEBUG -1, INFO -2, WARNING - 3, ERROR - 4,
                           FATAL - 5
--mirna-file <arg>         Output directory to save the JSON result
-o,--output <arg>             Output file or directory (depending on the
                           'build') to save the result
--psimi-tab-file <arg>     Output directory to save the JSON result
-s,--species <arg>            Sapecies...
--tfbs-file <arg>          Output directory to save the JSON result
--xref-file <arg>          Output directory to save the JSON result
For more information or reporting bugs contact me: [email protected]

Downloading the genome sequence and gene annotation raw data

Go to CellBase folder and execute:

cd cellbase/cellbase-build/installation-dir/bin/genome-fetcher

You must change the Ensembl API folder in config file, for doing this change the value of $ENSEMBL_LIBS in file:

cellbase/cellbase-build/installation-dir/bin/genome-fetcher/DB_config.pm

./genome-fetcher.py -s "Homo sapiens" --sequence 1 --gene 1 -o /tmp

This will download the FASTA and GTF data files into /tmp folder

Generating the JSON data

Go to: cd cellbase/cellbase-build/installation-dir

and execute CellBase CLI, for building genome sequence collection:

java -jar libs/cellbase-build-3.0.0.jar --build genome-sequence 
 --fasta-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh37.p12.fa.gz 
 -o /tmp/

For building gene collection:

java -jar libs/cellbase-build-3.0.0.jar --build gene 
 --gtf-file /tmp/homo_sapiens/gene/homo_sapiens.gtf.gz
 --xref-file /tmp/homo_sapiens/gene/xrefs.txt
 --description-file /tmp/homo_sapiens/gene/description.txt
 --fasta-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh37.p12.fa.gz
 -o /tmp/