Skip to content

Commit

Permalink
GH-43352: [Docs][Python] Add all tensor classes documentation
Browse files Browse the repository at this point in the history
### Rationale for this change

We have added the missing documentation for various tensor classes in PyArrow, specifically `SparseCOOTensor`, `SparseCSRMatrix`, `SparseCSCMatrix`, and `SparseCSFTensor`. This documentation is necessary to provide users with reference information and examples on how to use these classes, as currently, there is no reference documentation or information in the PyArrow user guide regarding these tensor classes.

### What changes are included in this PR?

1. Added docstrings to the `SparseCSRMatrix`, `SparseCSCMatrix`, and `SparseCSFTensor` classes and their methods in the `tensor.pxi` file.
2. Updated the `tables.rst` file to include documentation for `SparseCSFTensor` along with examples.

### Are these changes tested?

No new tests are included in this PR as the changes are purely documentation updates. The existing tests for the tensor classes should cover the functionality.

### Are there any user-facing changes?

Yes, the user-facing change is the addition of documentation for the tensor classes, which will help users understand and utilize these classes more effectively. There are no breaking changes to public APIs.
  • Loading branch information
ShaiviAgarwal2 committed Jan 3, 2025
1 parent 7e703aa commit 8eb2af8
Show file tree
Hide file tree
Showing 2 changed files with 104 additions and 1 deletion.
85 changes: 85 additions & 0 deletions docs/source/python/api/tables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,92 @@ Dataframe Interchange Protocol
Tensors
-------

PyArrow supports both dense and sparse tensors. Dense tensors store all data values explicitly, while sparse tensors represent only the non-zero elements and their locations, making them efficient for storage and computation.

Dense Tensors
-------------

.. autosummary::
:toctree: ../generated/

Tensor

Sparse Tensors
--------------

PyArrow supports the following sparse tensor formats:

.. autosummary::
:toctree: ../generated/

SparseCOOTensor
SparseCSRMatrix
SparseCSCMatrix
SparseCSFTensor

### SparseCOOTensor

The `SparseCOOTensor` represents a sparse tensor in Coordinate (COO) format, where non-zero elements are stored as tuples of row and column indices.

Example:
.. code-block:: python
import pyarrow as pa
indices = pa.array([[0, 0], [1, 2]])
data = pa.array([1, 2])
shape = (2, 3)
tensor = pa.SparseCOOTensor(indices, data, shape)
print(tensor.to_dense())
### SparseCSRMatrix

The `SparseCSRMatrix` represents a sparse matrix in Compressed Sparse Row (CSR) format. This format is useful for matrix-vector multiplication.

Example:
.. code-block:: python
import pyarrow as pa
data = pa.array([1, 2, 3])
indptr = pa.array([0, 2, 3])
indices = pa.array([0, 2, 1])
shape = (2, 3)
sparse_matrix = pa.SparseCSRMatrix.from_numpy(data, indptr, indices, shape)
print(sparse_matrix)
### SparseCSCMatrix

The `SparseCSCMatrix` represents a sparse matrix in Compressed Sparse Column (CSC) format, where data is stored by columns.

Example:
.. code-block:: python
import pyarrow as pa
data = pa.array([1, 2, 3])
indptr = pa.array([0, 1, 3])
indices = pa.array([0, 1, 2])
shape = (3, 2)
sparse_matrix = pa.SparseCSCMatrix.from_numpy(data, indptr, indices, shape)
print(sparse_matrix)
### SparseCSFTensor

The `SparseCSFTensor` represents a sparse tensor in Compressed Sparse Fiber (CSF) format, which is a generalization of the CSR format for higher dimensions.

Example:
.. code-block:: python
import pyarrow as pa
data = pa.array([1, 2, 3])
indptr = [pa.array([0, 1, 3]), pa.array([0, 2, 3])]
indices = [pa.array([0, 1]), pa.array([0, 1, 2])]
shape = (2, 3, 2)
sparse_tensor = pa.SparseCSFTensor.from_numpy(data, indptr, indices, shape)
print(sparse_tensor)
20 changes: 19 additions & 1 deletion python/pyarrow/tensor.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,25 @@ shape: {0.shape}""".format(self)

cdef class SparseCSRMatrix(_Weakrefable):
"""
A sparse CSR matrix.
SparseCSRMatrix represents a sparse matrix in Compressed Sparse Row (CSR) format.
Attributes:
indptr : array
Index pointer array.
indices : array
Column indices of the corresponding non-zero values.
shape : tuple
Shape of the matrix.
dim_names : list, optional
Names of the dimensions.
Example:
>>> import pyarrow as pa
>>> indptr = pa.array([0, 2, 3])
>>> indices = pa.array([0, 2, 1])
>>> shape = (2, 3)
>>> tensor = pa.SparseCSRMatrix(indptr, indices, shape)
>>> print(tensor)
"""

def __init__(self):
Expand Down

0 comments on commit 8eb2af8

Please sign in to comment.