Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be able to construct quadratic models from np.memmap array efficiently #1312

Open
juansebastianl opened this issue Feb 24, 2023 · 2 comments
Labels
feature-request/enhancement New feature or request

Comments

@juansebastianl
Copy link

Application
If I have a square matrix that is very large and being stored as an np.memmap array and I try to construct a BQM with it I often run out of memory even if the actual final BQM isn't too large for my system.

Proposed Solution
If I simply iterate through my memory mapped matrix I don't run out of memory for matricies which are large enough that I normally would (tested up to 25,000x25,0000 on 15GB of RAM):

bqm = dimod.BQM(vartype="BINARY")
bqm.add_variables_from({i: large_matrix[i,i] for i in range(large_matrix.shape[0])}) 
chunksize = 100
for row_index in range(0, large_matrix.shape[0], chunksize):
      for col_index in range(0, large_matrix.shape[1], chunksize):
              row_index_max = min(row_index + chunksize, large_matrix.shape[0])
              col_index_max = min(col_index + chunksize, large_matrix.shape[1])
              chunk = large_matrix[row_index:row_index_max, col_index:col_index_max]
              quad_biases = [(i[0], i[1], j) for i, j in np.ndenumerate(chunk) if i[0] > i[1]]
              bqm.add_interactions_from(quad_biases)

This sort of points towards a solution, but I'm not sure given how the BQM constructor is written that there is an easy way of doing this - at the moment it is cythonized and I think for that to continue to work there is likely an implicit loading of the whole memory mapped matrix into RAM which is an issue for this approach.

Alternatives Considered

There is the possibility of having models that are memory serialized themselves, so that you don't have to ever iterate through the matrix if you pass the underlying file, but that seems even more complex.

@arcondello
Copy link
Member

Interesting. What you're currently doing is pretty reasonable I think. Though you might have a bit more luck with using np.nditer to control things like buffer size etc.

You're correct though that we could get some performance benefit from handling this natively. When we add quadratic models from dense numpy arrays in memory, in some cases we can take advantage of the order of indices to more efficiently construct the model. See:

cpdef Py_ssize_t add_quadratic_from_dense(self, ConstNumeric[:, ::1] quadratic) except -1:
if quadratic.shape[0] != quadratic.shape[1]:
raise ValueError("quadratic must be a square matrix")
cdef Py_ssize_t num_variables = quadratic.shape[0]
cdef Py_ssize_t ui
for ui in range(num_variables):
if quadratic[ui, ui]:
raise ValueError(f"{ui!r} cannot have an interaction with itself")
if self.variables._is_range():
if num_variables > self.num_variables():
self.resize(num_variables)
self.cppbqm.add_quadratic_from_dense(&quadratic[0, 0], num_variables)
else:
raise NotImplementedError

for (index_type u = 0; u < num_variables; ++u) {
// diagonal
add_quadratic_back(u, u, dense[u * (num_variables + 1)]);
// off-diagonal
for (index_type v = u + 1; v < num_variables; ++v) {
bias_type qbias = dense[u * num_variables + v] + dense[v * num_variables + u];
if (qbias) {
add_quadratic_back(u, v, qbias);
}
}
}

However, actually getting memmap arrays down into the C++ would be quite a pain. Right now, you're correct that we are using some implicit conversions in NumPy and Cython that are pulling it into memory.

@arcondello arcondello added the feature-request/enhancement New feature or request label Feb 27, 2023
@tim-shea
Copy link

@juansebastianl Thanks for posting this, I was facing the same issue and struggling to think of a workaround. Your code seems to have a bug though where the constructed BQM does not match the original QUBO, and actually chunking the matrix is not really any faster than just iterating through the entire thing (at least in my tests). Here's a slightly simpler alternative:

def bqm_from_huge_qubo(q):
    """ Create a BinaryQuadraticModel from a QUBO matrix iteratively.
    This method is intended to be used for very large QUBO matrices which
    will cause a kernel crash with `dimod.BinaryQuadraticModel.from_qubo(q)`

    It is very slow (e.g. ~30 mins for 45k vars).
    """
    bqm = dimod.BinaryQuadraticModel(vartype='BINARY')
    bqm.add_variables_from({i: q[i,i] for i in range(q.shape[0])})
    for x in range(0, q.shape[0]):
        if x % 1000 == 0:
            print('*', end='')
        for y in range(0, q.shape[0]):
            if x != y and q[x, y] != 0:
                bqm.add_interactions_from([(x, y, q[x, y])])
    return bqm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants