Be able to construct quadratic models from `np.memmap` array efficiently #1312

juansebastianl · 2023-02-24T03:11:58Z

Application
If I have a square matrix that is very large and being stored as an np.memmap array and I try to construct a BQM with it I often run out of memory even if the actual final BQM isn't too large for my system.

Proposed Solution
If I simply iterate through my memory mapped matrix I don't run out of memory for matricies which are large enough that I normally would (tested up to 25,000x25,0000 on 15GB of RAM):

bqm = dimod.BQM(vartype="BINARY")
bqm.add_variables_from({i: large_matrix[i,i] for i in range(large_matrix.shape[0])}) 
chunksize = 100
for row_index in range(0, large_matrix.shape[0], chunksize):
      for col_index in range(0, large_matrix.shape[1], chunksize):
              row_index_max = min(row_index + chunksize, large_matrix.shape[0])
              col_index_max = min(col_index + chunksize, large_matrix.shape[1])
              chunk = large_matrix[row_index:row_index_max, col_index:col_index_max]
              quad_biases = [(i[0], i[1], j) for i, j in np.ndenumerate(chunk) if i[0] > i[1]]
              bqm.add_interactions_from(quad_biases)

This sort of points towards a solution, but I'm not sure given how the BQM constructor is written that there is an easy way of doing this - at the moment it is cythonized and I think for that to continue to work there is likely an implicit loading of the whole memory mapped matrix into RAM which is an issue for this approach.

Alternatives Considered

There is the possibility of having models that are memory serialized themselves, so that you don't have to ever iterate through the matrix if you pass the underlying file, but that seems even more complex.

The text was updated successfully, but these errors were encountered:

arcondello · 2023-02-27T16:10:52Z

Interesting. What you're currently doing is pretty reasonable I think. Though you might have a bit more luck with using np.nditer to control things like buffer size etc.

You're correct though that we could get some performance benefit from handling this natively. When we add quadratic models from dense numpy arrays in memory, in some cases we can take advantage of the order of indices to more efficiently construct the model. See:

dimod/dimod/binary/cybqm/cybqm_template.pyx.pxi

Lines 226 to 242 in aadde04

 cpdef Py_ssize_t add_quadratic_from_dense(self, ConstNumeric[:, ::1] quadratic) except -1: 

 if quadratic.shape[0] != quadratic.shape[1]: 

 raise ValueError("quadratic must be a square matrix") 

 cdef Py_ssize_t num_variables = quadratic.shape[0] 

 cdef Py_ssize_t ui 

 for ui in range(num_variables): 

 if quadratic[ui, ui]: 

 raise ValueError(f"{ui!r} cannot have an interaction with itself") 

 if self.variables._is_range(): 

 if num_variables > self.num_variables(): 

 self.resize(num_variables) 

 self.cppbqm.add_quadratic_from_dense(&quadratic[0, 0], num_variables) 

 else: 

 raise NotImplementedError

dimod/dimod/include/dimod/abc.h

Lines 574 to 586 in aadde04

 for (index_type u = 0; u < num_variables; ++u) { 

 // diagonal 

 add_quadratic_back(u, u, dense[u * (num_variables + 1)]); 

 // off-diagonal 

 for (index_type v = u + 1; v < num_variables; ++v) { 

 bias_type qbias = dense[u * num_variables + v] + dense[v * num_variables + u]; 

 if (qbias) { 

 add_quadratic_back(u, v, qbias); 

 } 

 } 

 }

However, actually getting memmap arrays down into the C++ would be quite a pain. Right now, you're correct that we are using some implicit conversions in NumPy and Cython that are pulling it into memory.

tim-shea · 2023-10-12T19:30:02Z

@juansebastianl Thanks for posting this, I was facing the same issue and struggling to think of a workaround. Your code seems to have a bug though where the constructed BQM does not match the original QUBO, and actually chunking the matrix is not really any faster than just iterating through the entire thing (at least in my tests). Here's a slightly simpler alternative:

def bqm_from_huge_qubo(q):
    """ Create a BinaryQuadraticModel from a QUBO matrix iteratively.
    This method is intended to be used for very large QUBO matrices which
    will cause a kernel crash with `dimod.BinaryQuadraticModel.from_qubo(q)`

    It is very slow (e.g. ~30 mins for 45k vars).
    """
    bqm = dimod.BinaryQuadraticModel(vartype='BINARY')
    bqm.add_variables_from({i: q[i,i] for i in range(q.shape[0])})
    for x in range(0, q.shape[0]):
        if x % 1000 == 0:
            print('*', end='')
        for y in range(0, q.shape[0]):
            if x != y and q[x, y] != 0:
                bqm.add_interactions_from([(x, y, q[x, y])])
    return bqm

arcondello added the feature-request/enhancement New feature or request label Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be able to construct quadratic models from `np.memmap` array efficiently #1312

Be able to construct quadratic models from `np.memmap` array efficiently #1312

juansebastianl commented Feb 24, 2023

arcondello commented Feb 27, 2023

tim-shea commented Oct 12, 2023

Be able to construct quadratic models from np.memmap array efficiently #1312

Be able to construct quadratic models from np.memmap array efficiently #1312

Comments

juansebastianl commented Feb 24, 2023

arcondello commented Feb 27, 2023

tim-shea commented Oct 12, 2023

Be able to construct quadratic models from `np.memmap` array efficiently #1312

Be able to construct quadratic models from `np.memmap` array efficiently #1312