python script to calculate the free energy #100

qasimpars · 2020-03-07T17:36:37Z

Hi,

Is there an example python script to calculate the free energy with MBAR, BAR and TI from the dhdl.xvg files of GROMACS via alchemlyb?
Please note that all the dhdl.xvg files obtained from GROMACS are in a directory named dHdl_files.

Hope that you will help me.

Best,

dotsdl · 2020-03-11T17:31:58Z

Hi @qasimpars, and welcome to alchemlyb!

You can write a script like the following to get out what you're looking for:

#!/usr/bin/env python

import pandas as pd

from alchemlyb.parsing.gmx import extract_dHdl, extract_u_nk
from alchemlyb.preprocessing import statistical_inefficiency
from alchemlyb.estimators import BAR, MBAR, TI


if __name__ == '__main__':

    import glob
    import os
    import argparse

    parser = argparse.ArgumentParser(description="Perform a simple alchemical analysis")

    parser.add_argument('datadir', type=str,
                        help="directory of XVG files")
    parser.add_argument('-T', type=float,
                        help="Temperature (K) simulations performed at")

    args = parser.parse_args()

    # get a list of files from the given directory
    files = glob.glob(os.path.join(args.datadir, '*.xvg*'))

    # extract and subsample dHdl using statistical inefficiency
    print("Gathering dHdl from {} files".format(len(files)))
    dHdl = [extract_dHdl(i, T=args.T) for i in files]
    print("Subsampling dHdl on statistical inefficiency")
    dHdl = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in dHdl])

    # extract and subsample u_nk using statistical inefficiency
    print("Gathering u_nk from {} files".format(len(files)))
    u_nk = [extract_u_nk(i, T=args.T) for i in files]
    print("Subsampling u_nk on statistical inefficiency")
    u_nk = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in u_nk])

    # fit dHdl estimators 
    print("Fitting TI on dHdl")
    ti = TI().fit(dHdl)

    # fit u_nk estimators
    print("Fitting BAR on u_nk")
    bar = BAR().fit(u_nk)
    print("Fitting MBAR on u_nk")
    mbar = MBAR().fit(u_nk)

    print("====== Results ======")
    
    # print free energy estimates between lowest lambda and highest lambda
    # these will all be in units of kT!
    # NOTE: for BAR we would need to perform bootstrap sampling to get
    # uncertainties of the endpoints
    print("TI: {} +/- {} kT".format(ti.delta_f_.iloc[0, -1], ti.d_delta_f_.iloc[0, -1]))
    print("BAR: {} +/- {} kT".format(bar.delta_f_.iloc[0, -1], "unknown"))
    print("MBAR: {} +/- {} kT".format(mbar.delta_f_.iloc[0, -1], mbar.d_delta_f_.iloc[0, -1]))

You can then call this script on your directory with, e.g.:

python alchemical-calc.py -T <simulation_temperature> <directory_path>

Please note that currently we don't give reliable estimates of the uncertainty for BAR, so I have left those out of the output. We are working to address this by adding an implementation of bootstrap sampling (#80).

You'll want to make sure your files have the necessary data to support both dHdl and u_nk parsing. Check out the Gromacs page of the alchemlyb docs for some recommended MDP options that should ensure this.

qasimpars · 2020-03-11T19:50:46Z

Hi @dotsdl,

Thanks for your reply and script.
First I installed the alchemlyb and pandas with the following commands:

pip install alchemlyb
pip install pandas

They have been installed properly. Then I ran the script:

qasim@qasim:~/Desktop/dHdl_files$ python alchemical-calc.py -T 298.15 /home/qasim/Desktop/dHdl_files

But the script gives the below error:

Traceback (most recent call last):
File "alchemical-calc.py", line 3, in
import pandas as pd
ImportError: No module named panda

It gives the following error about alchemlyb when I delete import pandas as pd line from the script:

Traceback (most recent call last):
File "alchemical-calc.py", line 4, in
from alchemlyb.parsing.gmx import extract_dHdl, extract_u_nk
File "/home/qasim/Desktop/dHdl_files/alchemical-calc.py", line 4, in
from alchemlyb.parsing.gmx import extract_dHdl, extract_u_nk
ImportError: No module named parsing.gmx

Do you know how I can fix those errors? My OS is ubuntu 18.04. I reinstalled the alchemlyb and pandas but it didn't help.

Edit:
I opened ~/.bashrc file and added new alias to change the default python-2.x.
alias python='/usr/bin/python3.6'

Now the script still gives the below error:

Traceback (most recent call last):
File "alchemical-cal.py", line 5, in
from alchemlyb.parsing.gmx import extract_dHdl, extract_u_nk
ModuleNotFoundError: No module named 'alchemlyb'

dotsdl · 2020-03-11T22:42:41Z

Hi @qasimpars,

It may be that your pip installation and python installation are part of two different environments. Can you please run:

python --version
which python
pip --version
which pip

and paste the output of each command here?

dotsdl · 2020-03-11T22:48:14Z

If I recall on Ubuntu 18.04 pip by default corresponds to python2. You will likely need to install python-pip3 and use pip3 to install everything.

qasimpars · 2020-03-12T04:22:10Z

Hi @dotsdl,

I used the pip3 to install every thing but the script gives another error. Do you think that there is still something missing with my installations?

(base) qasim@qasim:~/Desktop/dHdl_files$ python alchemical-cal.py -T 298.15 /home/qasim/Desktop/dHdl_files

Gathering dHdl from 31 files
Subsampling dHdl on statistical inefficiency
Gathering u_nk from 31 files
Subsampling u_nk on statistical inefficiency
Traceback (most recent call last):
File "alchemical-cal.py", line 38, in
u_nk = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in u_nk])
File "alchemical-cal.py", line 38, in
u_nk = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in u_nk])
File "/home/qasim/.local/lib/python3.6/site-packages/alchemlyb/preprocessing/subsampling.py", line 128, in statistical_inefficiency
series = slicing(series, lower=lower, upper=upper, step=step)
File "/home/qasim/.local/lib/python3.6/site-packages/alchemlyb/preprocessing/subsampling.py", line 45, in slicing
if not force and _check_multiple_times(df):
File "/home/qasim/.local/lib/python3.6/site-packages/alchemlyb/preprocessing/subsampling.py", line 11, in _check_multiple_times
return df.sort_index(0).reset_index(0).duplicated('time').any()
File "/home/qasim/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 4885, in duplicated
raise KeyError(diff)
KeyError: Index(['time'], dtype='object')

dotsdl · 2020-03-12T05:34:33Z

Ah, I believe you are running into #95. We need to get #98 finished and merged soon to address this.

If you are feeling adventurous, you can clone the repository and switch to the refactor-subsampling branch with a git checkout refactor-subsampling. You can then do a pip install . inside of the repository to install the current state of that branch. This should address the issue you are seeing.

qasimpars · 2020-03-12T09:43:49Z

Hi,

I did what you said. It seems that the script works but it gives the below error. Do you know how I can fix the error about MBAR?

Gathering dHdl from 31 files
Subsampling dHdl on statistical inefficiency
Gathering u_nk from 31 files
Subsampling u_nk on statistical inefficiency
Fitting TI on dHdl
Fitting BAR on u_nk
Fitting MBAR on u_nk
Traceback (most recent call last):
File "alchemical-cal.py", line 48, in
mbar = MBAR().fit(u_nk)
File "/home/qasim/.local/lib/python3.6/site-packages/alchemlyb/estimators/mbar_.py", line 90, in fit
out = self._mbar.getFreeEnergyDifferences(return_theta=True)
File "/home/qasim/.local/lib/python3.6/site-packages/pymbar/mbar.py", line 542, in getFreeEnergyDifferences
np.exp(self.Log_W_nk), self.N_k, method=uncertainty_method)
File "/home/qasim/.local/lib/python3.6/site-packages/pymbar/mbar.py", line 1679, in _computeAsymptoticCovarianceMatrix
check_w_normalized(W, N_k)
File "/home/qasim/.local/lib/python3.6/site-packages/pymbar/utils.py", line 360, in check_w_normalized
(firstbad, column_sums[firstbad], np.sum(badcolumns)))
pymbar.utils.ParameterError: Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 3.643136. 31 other columns have similar problems

Edit:
The script (alchemical-cal.py) calculates the free energy for BAR and TI properly when MBAR is deselected in the script. Whereas, for the same dhdl.xvg files alchemical-analysis.py tool calculates the free energy with MBAR. So, I think alchemical-cal.py also should calculate the free energy with MBAR?

qasimpars · 2020-03-12T10:33:56Z

It would be really good if the following flags are implemented in the script also:

-f: the directory of input files
-o: output directory 
-u: unit to report energies (default: kcal/mol)
-m: method(s) to estimate the free energy (default: BAR, MBAR and TI)
-s: Subset of simulation time to analyze, e.g. [1000:5000] (default unit ps)
-b: number of iterations (bootstrap samples) for the bootstrap estimate of the standard error

dotsdl · 2020-03-12T21:15:43Z

Hmmm, I think it may be worth trying the contents of the quick script above out in a Jupyter notebook. This way, you can run individual components and have a look at the intermediates, such as what u_nk looks like. pymbar.MBAR doesn't appear to like it, despite the fact that pymbar.BAR appears just fine with it. You will have to do some experimenting.

As for adding more flags, our eventual goal is to create a replacement for alchemical-analysis.py with flamel, which internally uses alchemlyb library components. This should end up covering many user cases, but we generally emphasize using alchemlyb interactively via Jupyter or writing your own use-case-specific scripts with its components.

Bernadette-Mohr · 2020-07-20T11:38:48Z

Hello @dotsdl ,
I would like to ask how far along you are with implementing the required changes and solving the above issues?
I run into the exact same problems as @qasimpars :
KeyError: Index(['time'], dtype='object') with the official alchemlyb branch and
pymbar.utils.ParameterError: Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 3.643136. 31 other columns have similar problems with the refactor-subsampling branch.
The u_nk dataframe looks like it is supposed to according to your own documentation. So I guess there are more issues with the alchemlyb/pymbar code right now.
By the way, I can perform the MBAR calculation using the old alchemical-analysis.py no problem. So I don't think it is a problem with my data.

Edit:
I discovered the verbose mode for MBAR(), this seems to point at another change-in-pandas-indexing problem:

K (total states) = 79, total samples = 18531
sys:1: PerformanceWarning: indexing past lexsort depth may impact performance.
Traceback (most recent call last):
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 385, in pandas._libs.hashtable.Float64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 392, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/bernadette/PycharmProjects/AutoFE/analysis.py", line 74, in
analyze_FE(args.environment, args.skiptime)
File "/home/bernadette/PycharmProjects/AutoFE/analysis.py", line 61, in analyze_FE
print(perform_analysis(data=data))
File "/home/bernadette/PycharmProjects/AutoFE/analysis.py", line 51, in perform_analysis
print(mbar.fit(data))
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/alchemlyb/estimators/mbar_.py", line 85, in fit
verbose=self.verbose)
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pymbar/mbar.py", line 237, in init
uzero = u_kn[k, :] - u_kn[l, :]
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/core/frame.py", line 2799, in getitem
return self._getitem_multilevel(key)
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/core/frame.py", line 2849, in _getitem_multilevel
loc = self.columns.get_loc(key)
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2700, in get_loc
self.levels[i], k
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2598, in _get_loc_single_level_index
return level_index.get_loc(key)
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/core/indexes/numeric.py", line 508, in get_loc
return super().get_loc(key, method=method, tolerance=tolerance)
File "/home/bernadette/software/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 385, in pandas._libs.hashtable.Float64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 392, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0

xiki-tempula · 2021-03-27T17:41:10Z

I hope to close this issue with #114

orbeckst · 2021-10-29T17:33:46Z

I am closing this issue (in the light of discussion #159 )

Implement the workflow pipeline #114 will provide a library-based workflow
flamel should become a CLI frontend to alchemlyb

orbeckst closed this as completed Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python script to calculate the free energy #100

python script to calculate the free energy #100

qasimpars commented Mar 7, 2020

dotsdl commented Mar 11, 2020 •

edited

Loading

qasimpars commented Mar 11, 2020 •

edited

Loading

dotsdl commented Mar 11, 2020

dotsdl commented Mar 11, 2020

qasimpars commented Mar 12, 2020

dotsdl commented Mar 12, 2020

qasimpars commented Mar 12, 2020 •

edited

Loading

qasimpars commented Mar 12, 2020 •

edited

Loading

dotsdl commented Mar 12, 2020

Bernadette-Mohr commented Jul 20, 2020 •

edited

Loading

xiki-tempula commented Mar 27, 2021

orbeckst commented Oct 29, 2021

python script to calculate the free energy #100

python script to calculate the free energy #100

Comments

qasimpars commented Mar 7, 2020

dotsdl commented Mar 11, 2020 • edited Loading

qasimpars commented Mar 11, 2020 • edited Loading

dotsdl commented Mar 11, 2020

dotsdl commented Mar 11, 2020

qasimpars commented Mar 12, 2020

dotsdl commented Mar 12, 2020

qasimpars commented Mar 12, 2020 • edited Loading

qasimpars commented Mar 12, 2020 • edited Loading

dotsdl commented Mar 12, 2020

Bernadette-Mohr commented Jul 20, 2020 • edited Loading

xiki-tempula commented Mar 27, 2021

orbeckst commented Oct 29, 2021

dotsdl commented Mar 11, 2020 •

edited

Loading

qasimpars commented Mar 11, 2020 •

edited

Loading

qasimpars commented Mar 12, 2020 •

edited

Loading

qasimpars commented Mar 12, 2020 •

edited

Loading

Bernadette-Mohr commented Jul 20, 2020 •

edited

Loading