Python bindings for tatami

Overview

The mattress package implements Python bindings to the tatami C++ library for matrix representations. Downstream packages can use mattress to develop C++ extensions that are interoperable with many different matrix classes, e.g., dense, sparse, delayed or file-backed. mattress is inspired by the beachmat Bioconductor package, which does the same thing for R packages.

Instructions

mattress is published to PyPI, so installation is simple:

pip install mattress

mattress is intended for Python package developers writing C++ extensions that operate on matrices. The aim is to allow package C++ code to accept all types of matrix representations without requiring re-compilation of the associated code. To achive this:

Add mattress.includes() and assorthead.includes() to the compiler's include path. This can be done through include_dirs= of the Extension() definition in setup.py or by adding a target_include_directories() in CMake, depending on the build system.
Call mattress.initialize() on a Python matrix object to wrap it in a tatami-compatible C++ representation. This returns an InitializedMatrix with a ptr property that contains a pointer to the C++ matrix.
Pass ptr to C++ code as a uintptr_t referencing a tatami::Matrix, which can be interrogated as described in the tatami documentation.

So, for example, the C++ code in our downstream package might look like the code below:

#include "mattress.h"

int do_something(uintptr_t ptr) {
    const auto& mat_ptr = mattress::cast(ptr)->ptr;
    // Do something with the tatami interface.
    return 1;
}

// Assuming we're using pybind11, but any framework that can accept a uintptr_t is fine.
PYBIND11_MODULE(lib_downstream, m) {
    m.def("do_something", &do_something);
}

Which can then be called from Python:

from . import lib_downstream as lib
from mattress import initialize

def do_something(x):
    tmat = initialize(x)
    return lib.do_something(tmat.ptr)

Check out the included header for more definitions.

Supported matrices

Dense numpy matrices of varying numeric type:

import numpy as np
from mattress import initialize
x = np.random.rand(1000, 100)
init = initialize(x)

ix = (x * 100).astype(np.uint16)
init2 = initialize(ix)

Compressed sparse matrices from scipy with varying index/data types:

from scipy import sparse as sp
from mattress import initialize

xc = sp.random(100, 20, format="csc")
init = initialize(xc)

xr = sp.random(100, 20, format="csc", dtype=np.uint8)
init2 = initialize(xr)

Delayed arrays from the delayedarray package:

from delayedarray import DelayedArray
from scipy import sparse as sp
from mattress import initialize
import numpy

xd = DelayedArray(sp.random(100, 20, format="csc"))
xd = numpy.log1p(xd * 5)

init = initialize(xd)

Sparse arrays from delayedarray are also supported:

import delayedarray
from numpy import float64, int32
from mattress import initialize
sa = delayedarray.SparseNdarray((50, 20), None, dtype=float64, index_dtype=int32)
init = initialize(sa)

See below to extend initialize() to custom matrix representations.

Utility methods

The InitializedMatrix instance returned by initialize() provides a few Python-visible methods for querying the C++ matrix.

init.nrow() // number of rows
init.column(1) // contents of column 1
init.sparse() // whether the matrix is sparse.

It also has a few methods for computing common statistics:

init.row_sums()
init.column_variances(num_threads = 2)

grouping = [i%3 for i in range(init.ncol())]
init.row_medians_by_group(grouping)

init.row_nan_counts()
init.column_ranges()

These are mostly intended for non-intensive work or testing/debugging. It is expected that any serious computation should be performed by iterating over the matrix in C++.

Operating on an existing pointer

If we already have a InitializedMatrix, we can easily apply additional operations by wrapping it in the relevant delayedarray layers and calling initialize() afterwards. For example, if we want to add a scalar, we might do:

from delayedarray import DelayedArray
from mattress import initialize
import numpy

x = numpy.random.rand(1000, 10)
init = initialize(x)

wrapped = DelayedArray(init) + 1
init2 = initialize(wrapped)

This is more efficient as it re-uses the InitializedMatrix already generated from x. It is also more convenient as we don't have to carry around x to generate init2.

Extending to custom matrices

Developers can extend mattress to custom matrix classes by registering new methods with the initialize() generic. This should return a InitializedMatrix object containing a uintptr_t cast from a pointer to a tatami::Matrix (see the included header). Once this is done, all calls to initialize() will be able to handle matrices of the newly registered types.

from . import lib_downstream as lib
import mattress

@mattress.initialize.register
def _initialize_my_custom_matrix(x: MyCustomMatrix):
    data = x.some_internal_data
    return mattress.InitializedMatrix(lib.initialize_custom(data))

If the initialized tatami::Matrix contains references to Python-managed data, e.g., in NumPy arrays, we must ensure that the data is not garbage-collected during the lifetime of the tatami::Matrix. This is achieved by storing a reference to the data in the original member of the mattress::BoundMatrix.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
docs		docs
lib		lib
src/mattress		src/mattress
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python bindings for tatami

Overview

Instructions

Supported matrices

Utility methods

Operating on an existing pointer

Extending to custom matrices

About

Releases

Contributors 3

Languages

License

tatami-inc/mattress

Folders and files

Latest commit

History

Repository files navigation

Python bindings for tatami

Overview

Instructions

Supported matrices

Utility methods

Operating on an existing pointer

Extending to custom matrices

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 3

Languages