Skip to content

Basic Terms and Key Classes

Grossfield Lab edited this page May 7, 2021 · 10 revisions

Purpose

The purpose of this tutorial is to give you an idea how we intend LOOS to be used as a development tool, and the key classes and functions even the most basic tools will need to use.

How to think about analyzing molecular dynamics data

Most analyses of molecular dynamics data follow one of a couple of patterns. You read in a file that defines the system contents, identify the set of atoms you're interested in, and loop over one or more trajectory file, calculating some geometric property using that set of atoms for each frame. You either output a time series frame by frame or accumulate an average or histogram. Occasionally an additional transformation (e.g. a correlation function) is computed.

The goal of LOOS is to make each of these steps as easy to implement as possible. We will walk through each of these steps, and describe the key classes and functions you'll use to perform them. Most of this discussion will apply to both C++ and the python wrapper; when the two diverge, we'll explain both options. That said, we highly recommend that developers use the python bindings for code development. Development is generally quicker and easier in python (particularly for the non-LOOS parts of the code, and the performance difference is usually small enough to be irrelevant.

If you want to see toy code examples, we supply skeleton LOOS programs for different types of analysis patterns in the distribution. In Packages/User, you will see several simple programs that lay out how to perform a task. For example, simple_model_calc.cpp gives a skeleton for reading in a structure and doing some sort of calculation on the atoms, although it doesn't actually do anything. Some of the C++ examples use our Options framework to manage the command line and take care of some stuff behind the scenes; we will cover the options framework in a separate document. Similarly, Packages/PyLOOS contains equivalent examples of using the python bindings, in addition to a number of useful programs.

4 Key classes for developing with LOOS

Although LOOS has more than 100 classes, most of them work behind the scenes. As a developer, there are 4 classes you're likely to actually interact with on a regular basis.

Manipulating coordinates with Coord (GCoord)

The Coord class is our data structure for interacting with coordinates and vectors. The arithmetic operators are overloaded, so that one can add and subtract Coords to get vector addition or multiply to get dot products. Coord is implemented as a C++ template, meaning that the coordinate field could be one of several data types. Using the most common case, when you want the coordinates be doubles, is simplified by a typedef we provide, GCoord (this means you can simply declare a GCoord as if it were a fundamental data type, and not worry about the template part of it).

The Doxygen-generated documentation for Coord is here; it lists the various methods and members of the class. You have a choice when working with the components of a GCoord: you can treat them as components or as array indices.

    GCoord foo;
    foo.x() = 7.0;
    cout << foo[0] << endl;  // will print 7.0
    foo.y(18.0);  // another way of assigning to y
    foo[2] = 4.0;  // assign to z using the array notation
    cout << foo << endl; // will print 7.0, 18.0, 4.0 in an xml-ish format`

Note: PyLOOS does not support the foo.x() = 7.0 form of assignment, but the other two methods work fine.

Representing atoms with Atom (pAtom)

The next class you need is Atom , which as the name implies is used to represent an atom. The full class documentation is found here, but basically an Atom contains a GCoord plus some associated metadata (atom name, atom number, residue name, residue number, segment name, partial charge, mass, etc). Not all file formats contain all of this information, so when it is absent a sane guess is supplied (e.g. charge = 0 if none is supplied). Atom can also contain connectivity information in the form of a list of indices for atoms bound to it; this is mostly used for splitting a system into a set of molecules.

When writing C++ code, you never work with actual Atom instances. To make copying fast and lightweight, we instead use reference-counted shared pointers to Atom objects, implemented using the BOOST shared pointer class. We hide this complexity behind the pAtom typedef -- you declare and use a pAtom as if it were a normal pointer to an Atom, but you don't have to worry about declaring or freeing memory for it (that gets taken care of behind the scenes).

Reading and manipulating structures with AtomicGroup

AtomicGroup is the workhorse class within LOOS. It contains a list of pAtoms, plus a bunch of additional information (e.g. box dimensions if the system is periodic). AtomicGroup also contains a large number of methods for manipulating and computing properties of sets of atoms, listed in the Doxygen documentation. If you're wondering if we provide a way to do something, that page is the place to go. If you're thinking of implementing something new, that's probably the place to put it.

AtomicGroup supports set operations (e.g. merge or intersect to combine 2 AtomicGroup objects), geometric transformations (e.g. translate to move the atoms, alignOnto to perform superposition), and can compute a lot of properties (e.g. principalAxes, centroid, centerOfMass).

You can access the individual atoms within an AtomicGroup by treating it as a 0-based array, so group[4] would access the 5th atom of AtomicGroup group.

AtomicGroup can also check for which properties are present in the system. For example, group.hasBonds() returns True if the system contains connectivity information.

We support reading different file formats by subclassing AtomicGroup -- for example, the PSF class is a subclass of AtomicGroup. The most common way to create an AtomicGroup is via the factory function createSystem (see this). This function will analyze the supplied filename, instantiate the correct type, then convert it to an AtomicGroup. The result is that in actual LOOS you almost never use any of the subclasses, just the main AtomicGroup class. This makes it easy to write code that is agnostic to the package used to run the simulation.

Reading trajectories with Trajectory

Trajectory, as the name implies, is the class that is used to extract coordinates from a molecular dynamics trajectories. As with AtomicGroup, Trajectory supports all of the different file formats via subclassing. However, the main trajectory object needs to retain subclass-specific functionality, so in an application you generally work with a pointer to a trajectory; we simplify the declaration by providing the typedef pTraj.
The method you'll use most often is readFrame, which reads the next snapshot from the trajectory, returning False when it hits the end of the trajectory.

    while (traj->readFrame()) {
        // traj is a pTraj
        // system is the AtomicGroup associated with it
        traj->updateGroupCoords(system);

        // Now do something useful
        calculate_something(system);
    }

readFrame can also take an integer argument to jump to a specific frame.

The rest of the documentation for Trajectory can be found here.

When using PyLOOS, you have a choice about how to handle trajectories. You can use the Trajectory class directly, with code that looks just like the C++, or you can use the wrapper in loos.pyloos, which is considerably more pythonish.

For example, this is part of the example program found in Packages/PyLOOS/simple_traj_calc.py:

#!/usr/bin/env python3
import loos
import loos.pyloos
import sys

header = " ".join(sys.argv)
print("#", header)

# parse the command line arguments -- in a more complex example,
# you'd use the argparse module
model_filename = sys.argv[1]
trajectory_filename = sys.argv[2]
selection_string = sys.argv[3]

# Create the system 
model = loos.createSystem(model_filename)

# Select a subset of the system
subset = loos.selectAtoms(model, selection_string)

# Create the trajectory
# You can automatically set skip and stride options here,
# so that when you loop over traj using a standard iterator
# those options are automatically applied
traj = loos.pyloos.Trajectory(trajectory_filename, model)


# Iterate over 
for frame in traj:
    # Example: compute the centroid of the selection
    centroid = subset.centroid()

A loos.pyloos.Trajectory object can be used as if it were an iterator over a list of AtomicGroup objects, as in the example above. Moreover, they allow you to set skip and stride variables when the object is created, so that you can for example look at every 10th frame without the code to do that cluttering up your program. There are also variants of this class that will pre-align structures for you, or let you treat multiple trajectories as if they were one. See the documentation for details.