Before this migration (i.e. for versions < 1.99.0), Rsamtools contained a copy of an old (9-10 year old?) version of the samtools and tabix C code.
The latest version of samtools (as of Feb 6, 2019) is 1.9: http://www.htslib.org/
In the recent years the samtools code has been split into samtools + htslib. Most of the code that used to be in samtools is now in htslib. The latest version of htslib is also 1.9.
Bioconductor package Rhtslib contains htslib 1.7, which is a decently recent version of htslib.
After this migration (i.e. for versions >= 1.99.0), Rsamtools no longer contains a copy of the samtools and tabix C code but compiles and links against Rhtslib instead (i.e. against htslib 1.7).
In Rsamtools 1.99.0 all the C/C++ code from the old Rsamtools was migrated to
Rhtslib, except for the code behind asBcf()
and razip()
:
-
asBcf()
(implemented insrc/bcffile.c
) is currently disabled but will need to be migrated if it turns out that some other package needs it (it doesn't seem to be the case though). Note thatasBcf()
isn't used anywhere in Rsamtools either (no example in the man page, no unit test, not used in the vignette). -
razip()
(implemented inzip_compression.c
in the old Rsamtools) was removed. This is because recent samtools and htslib no longer support RAZF. From the samtools NEWS file:RAZF and razip are superseded by BGZF/bgzip and have been removed from samtools.
This happened in samtools 1.0 (15 August, 2014).
-
Even though the C code behind
applyPileups()
was migrated (pileupbam.c
), the examples in its man page and most of the tests ininst/unitTests/test_applyPileups.R
are currently failing and have been disabled for now. -
Test
test_BcfFile_scanBcfHeader_remote
(located ininst/unitTests/test_BcfFile.R
) passes when run interactively (or viaBiocGenerics:::testPackage('Rsamtools')
) but, for some obscure reason, NOT in the context ofR CMD check
. It has been disabled for now.
As of Feb 6, 2019, 159 Bioconductor packages (154 software, 3 data-experiment, and 2 workflow packages) depend directly on Rsamtools (i.e. via their Depends, Imports, or LinkingTo field). So we've only checked manually a few of them. This was on a 64-bit Ubuntu 16.04.5 LTS laptop running R 3.6 (2018-12-04 r75758) and Bioconductor 3.9 (current devel).
-
VariantAnnotation: migrated (changes committed), passes
R CMD check
-
ArrayExpressHTS: migrated (changes not committed yet, Angela Goncalves contacted but didn't answer yet), passes
R CMD check
-
BitSeq: migrated (changes committed), passes
R CMD check
-
DiffBind: migrated (changes committed by Gord Brown), passes
R CMD check
-
h5vc: migrated (changes committed, including fixing pre-existing build ERROR currently visible on the build report), passes
R CMD check
-
podkat: migrated (changes committed, including fixing
inst/examples/example1.vcf.gz
to make it compatible with bcftools 1.7), passesR CMD check
-
qrqc: migrated (PR created at vsbuffalo/qrqc#6 per Vince Buffalo's request), passes
R CMD check
-
QuasR: migrated (PR created at fmicompbio/QuasR#9 per Michael Stadler's request), passes
R CMD check
-
seqbias: migrated (patch sent to, and applied by, Daniel Jones), passes
R CMD check
-
TransView: migrated (changes committed), passes
R CMD check
-
AllelicImbalance: migrated (contained old BCF file (
ERP000101.bcf
, inextdata/ERP000101_subset/
) that needed to be fixed and reindexed because it no longer worked with recent bcftools/Rhtslib) passesR CMD check
-
biovizBase: passes
R CMD check
as-is -
compEpiTools: passes
R CMD check
as-is -
SICtools:
test_snpDiff()
(fromtest_snpDiff.R
) FAILS! -
VariantTools: passes
R CMD check
as-is
-
AllelicImbalance: migrated (changes not committed yet) passes
R CMD check
-
BaalChIP: passes
R CMD check
as-is -
chimera: passes
R CMD check
as-is -
CoverageView: passes
R CMD check
as-is -
exomeCopy: passes
R CMD check
as-is -
exomePeak: passes
R CMD check
as-is -
GenomicAlignments: passes
R CMD check
as-is -
gmapR: does NOT compile (needs to be migrated)
-
MEDIPS: passes
R CMD check
as-is -
methylPipe: passes
R CMD check
as-is -
rnaSeqMap: passes
R CMD check
as-is -
ssviz: passes
R CMD check
as-is -
systemPipeR: passes
R CMD check
as-is
-
rnaseqGene: vignette builds as-is
-
sequencing: vignette does NOT build (because of AnnotationHub razip'ed FASTA file
AH18522
)
As of Feb 6, 2019, 8 CRAN packages depend directly on Rsamtools (i.e. via their Depends, Imports, or LinkingTo field).
-
BinQuasi: passes
R CMD check
as-is -
Brundle: passes
R CMD check
as-is -
ExomeDepth: passes
R CMD check
as-is -
hoardeR: fails to install because of unrelated pre-existing problem:
Error : object ‘importGFF3’ is not exported by 'namespace:GenomicTools'
(See https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-gcc/hoardeR-00install.html)
-
NIPTeR: passes
R CMD check
as-is -
PlasmaMutationDetector: passes
R CMD check
as-is -
RAPIDR: passes
R CMD check
as-is -
spp: passes
R CMD check
as-is
RAZF is no longer supported in recent samtools, and old razip-compressed
FASTA files (.rz
) and their index files (rz.fai
) are now causing problems.
Here is an example from the sequencing workflow:
library(AnnotationHub)
ah <- AnnotationHub()
fa <- ah[["AH18522"]]
library(Rsamtools)
idx <- scanFaIndex(fa) # still works
long <- idx[width(idx) > 82000]
getSeq(fa, param=long) # ERROR! ('open' index failed)
The new way to go is to compress with bgzip and to recreate the index.
/path/to/samtools-1.7/htslib/bgzip myfile.fa # creates myfile.fa,gz
/path/to/samtools-1.7/samtools faidx myfile.fa,gz # generates index files
# myfile.fa.gz.fai
# and myfile.fa.gz.gzi
Note that the compression step is actually optional i.e. one can use
samtools faidx
directly on the uncompressed FASTA file:
gunzip myfile.fa,gz # uncompress to get myfile.fa back
/path/to/samtools-1.7/samtools faidx myfile.fa # generates index file
# myfile.fa.fai only
Note that in this case, only the .fai
index file is generated
(no .gzi
index file).
library(Rsamtools)
bgzip("myfile.fa")
indexFa("myfile.fa.bgz")
Then create a FaFile object that can be used with getSeq()
as usual:
fa <- FaFile("myfile.fa.bgz")
Here is an itemized summary of what still needs to happen (some details are provided above in this document):
-
Chase down old razip'ed FASTA files and update them. Places to look at:
- AnnotationHub
- ExperimentHub
- data annotation packages
- data experiment packages
- software packages
-
Troubleshoot
applyPileups()
unit test failures. -
Troubleshoot
test_BcfFile_scanBcfHeader_remote()
failure (test is located ininst/unitTests/test_BcfFile.R
). -
Migrate
asBcf()
. -
Update
NEWS
file.