-
I'm not confident in my own Python skills to identify this as a bug or whether it is my own inexperience with the language so I decided to open this up as a discussion. I'm was trying to split a List of AtomicGroups into N equal parts using by first converting the list into a numpy array and then calling numpy.split(the_list, N_equal_parts). Unfortunately, when I try to convert the list to a numpy array of AtomicGroups I see the error: Traceback (most recent call last): My python code is along with comments is below and the PDB file to recreate can be found here. I've included some basic List and type python code in the example below just as a sanity check:
Many thanks for your continued assistance. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
For what it's worth, numpy arrays are designed to hold continuous memory objects where each object is the same size. For example, an array of double precision floats (which is what most python implementations use for the type 'float'). Atomic groups are a more complicated, variable size object. In python, each AG is itself a list of Atoms, which may have potentially different lengths. So I don't think it would be possible to put AGs into a numpy array, and to be honest I'd wonder whether that's really what you'd want, since I have a feeling it'd imply allocating blocks of memory that were sized to the largest group and having lots of wasted space. I'm a late-commer to the other conversation you had about MARTINI, so I apologize if this was covered, but I believe that you probably can accomplish whatever you want to do with the AG list using conventional python list programming. Often you'd want to do something with each AG in the AG list like call a method on it; this type of thing is nicely expressed as a normal python for-loop, or if you want to record the results in a vector, list, or numpy array as a list comprehension, or appending to a predefined empty list. For example, if I wanted to compute the distance between the centroids of each pair of AGs in the list I could write:
If you wanted those distances as a 2D numpy array you could instead write Also obviously you might need to be operating on the list of AGs, rather than on a list of centroid AGs. I just gave that as an example because it came into my head as I was typing this, and because I thought I saw you and Alan talking about centroid distances earlier. Moreover, if you wanted to threshold these values you could do so after you have the numpy array, using numpy's facility for boolean arrays and thresholding (for example, if you used the line where you'd called np.array on both the rows and then the list of rows, If you really want to work with numpy arrays from the get-go, you could use the atomic-group method *The equivalent double for-loop would look like:
I suppose you can see why I like the list comprehension better. |
Beta Was this translation helpful? Give feedback.
-
I pretty much agree with what Louis (@lgsmith) wrote. I can provide a bit more background, though. Basically, if I recall correctly we had to put special case stuff into the swig file to allow conversion of what things like splitByMolecule() return (a c++ std::vector) into a python list. My guess is that we didn’t supply that intermediate class (AtomicGroupVector) with a next method, which is what the conversion to a numpy array needs. I’ll take a look when I get a chance because this is something we should support, but my guess is that you’ll see little to no performance improvement from using numpy arrays of AGs (which is also why we never noticed the problem). Usually, if there’s a performance issue, our next step is to move the loop into a C++ method to AtomicGroup, which as a rule should be even faster and also makes it reusable. Is there a different reason you want numpy arrays? cheers, Alan |
Beta Was this translation helpful? Give feedback.
For what it's worth, numpy arrays are designed to hold continuous memory objects where each object is the same size. For example, an array of double precision floats (which is what most python implementations use for the type 'float'). Atomic groups are a more complicated, variable size object. In python, each AG is itself a list of Atoms, which may have potentially different lengths. So I don't think it would be possible to put AGs into a numpy array, and to be honest I'd wonder whether that's really what you'd want, since I have a feeling it'd imply allocating blocks of memory that were sized to the largest group and having lots of wasted space.
I'm a late-commer to the other conversati…