alchemical analysis features

alchemical-analysis announced in late 2017 that its features will be moved to alchemlyb. This is going to be a long and tedious process but it will have the following advantages

tested (all code coming into alchemlyb is tested to > 90% coverage, with 95% as a goal)
modular (functionality as library functions)
Python 3 (and Python 2)

User input

The alchemlyb team welcomes user input: please raise an issue in the Issue Tracker for any alchemical-analysis features that you would really like to have in alchemlyb.

Please also feel free to edit this wiki page and contribute to the discussion.

Desired alchemical-analysis features

Please describe the feature and make a case for why you want it included. Add your name/GitHub handle; feel free to add yourself to any existing entries, too. Popular features are more likely to be migrated. (See issue #54 to discuss the process.)

More estimators

@orbeckst

-m METHODS, --methods=METHODS
missing estimators
- TI_CUBIC #21
- DEXP #22
- IEXP #23
- GINS #24
- GDEL #25
- UBAR #26
- RBAR #27
- BAR #28
Are some of these estimators more important than others?

@mrshirts

BAR is high priority, because sometimes MBAR can't be done if we don't have energies at all i+1's. The BAR solution can be made very fast (significantly faster than MBAR) (as it is is in the pyMBAR package).
DEXP and IEXP are single state perturbation. Worth including for comparison. There are two because there are essentially two ways to calculate if you have a series of lambda points.
TI-CUBIC is essentially a higher order integration of the <dH/dl> using cubic splines. Experience (non-exhaustive) has shown that it's not really much better than TI and has a larger chance of failing because of locally high curvature. I think this is lower priority, especially since it's a pain to handle the uncertainties correctly in the code. There could easily be better integration formulas. IF equally spaced, one could to simpsoms, or romberg, but there doesn't appear to be a general integration algorithm that works well for predefined spacing (as opposed to adaptive spacing). So could be cut.
GINS and GDEL are the Gaussian approximations to insertion and deletion FEP. We included them because people kept saying that the Gaussian versions worked, and they really only work for linear problems (charging, etc), and we had to have a testbed to show them. Low overhead to put in.
UBAR is BAR without optimizing the constant. The only reason one would ever do this is because you don't want to maintain a history to adaptively update everything each iteration, which would only happen if you were running this adaptively, i.e. maintaining the accumulated averages (O(1) operation) each step, so you have a cheap estimate each step without running a nonlinear optimization. BUT not very accurate in most cases.
RBAR is interesting, since you calculate the UBAR for a series of 'trial' free energies, and choose the one that best satisfies the equations. One can get a very accurate answer with no iteration each step if you know the range to start out with. PROBABLY not worth supporting, since one is not going to be using alchemlyb adaptively, in the sense that you would need to keep K sets of averages around in between alchemlyb runs. If one were implemented a code where it was tightly integrated, it could be very useful, but likely not in postanalysis code.

Overlap matrix

@orbeckst

-w, --overlap Print out and plot the overlap matrix.
unique functionality, quite useful in visual analysis of the data quality

@mrshirts: yes, very useful. Very easy to implement once MBAR has been called, requires MBAR to be called first. How would that dependency be enforced? Try a call to see if the object exists, generate if it doesn't?

Existing features

The following features already exist

MBAR and TI estimators
subsampling (with preprocessing.subsampling.statistical_inefficiency() (Does this correspond to the -n UNCORR, --uncorr=UNCORR feature??)
discarding of initial time (-s EQUILTIME, --skiptime=EQUILTIME) and more flexible slicing with preprocessing.subsampling.slicing()
Extract the energy data from the backward direction (-e, --backward) can be done with preprocessing.subsampling.slicing() (... I think ... check!)

Features in alchemlyb but not in alchemical-analysis

The following features only exist in alchemlyb

equilibrium detection with preprocessing.subsampling.equilibrium_detection()

Features in considered for alchemical-analysis that should go in alchemlyb

@mrshirts:

Estimation of uncertainties and covariances by bootstrapping. Very useful to diagnose if things go wrong in the error estimates, generally more reliable error estimates in regime of low sampling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly