Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alchemical-analysis.py not computing BAR/MBAR results #97

Open
mqbpksi2 opened this issue Jan 26, 2017 · 40 comments
Open

alchemical-analysis.py not computing BAR/MBAR results #97

mqbpksi2 opened this issue Jan 26, 2017 · 40 comments

Comments

@mqbpksi2
Copy link

mqbpksi2 commented Jan 26, 2017

Dear Mobley lab,

I am trying to run the alchemical-analysis tool on a set of output files from Amber TI simulations. The tool successfully computes dG, but fails to compute BAR/MBAR results and output any plots.

My command line input:

alchemical_analysis -a AMBER -p l-prod-0.[1-9]-0 -q out -o . -r 5 -u kcal -s 0 -g -w

The output from alchemical_analysis:

Started on Thu Jan 26 11:12:41 2017
Command line was: /usr/local/bin/alchemical_analysis -a AMBER -p l-prod-0.[1-9]-0 -q out -o . -r 5 -u kcal -s 0 -g -w

Loading in data from ./l-prod-0.4-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.6-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.7-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.5-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.8-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.9-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.1-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.3-0.out... 1000 data points, 1 DV/DL averages
Loading in data from ./l-prod-0.2-0.out... 1000 data points, 1 DV/DL averages

Sorting input data by starting time

Note: BAR/MBAR results are not computed.

Skipping first 0 steps (= 0.000000 ps)

Note: extrapolating missing gradient for lambda = 0.0: 58.481188
Note: extrapolating missing gradient for lambda = 1.0: -127.351642

The gradients (DV/DL) from the correlated samples (kcal/mol):

Lambda gradient

0.00000 58.481
0.10000 15.875
0.20000 -5.820
0.30000 -10.537
0.40000 -19.724
0.50000 -29.159
0.60000 -36.760
0.70000 -54.858
0.80000 -57.177
0.90000 -71.284
1.00000 -127.352

dG = -30.388

The correlated gradient (DV/DL) components from every_single step (kcal/mol):

Lambda BOND ANGLE DIHED 1-4 NB 1-4 EEL VDWAALS EELEC RESTRAINT

0.10000 2.512 0.854 -0.093 0.426 -16.137 3.537 24.232 0.000
0.20000 2.186 0.694 -0.095 0.420 -16.186 3.245 3.514 0.000
0.30000 1.932 0.580 -0.151 0.401 -16.272 5.021 -2.080 0.000
0.40000 1.835 0.351 -0.116 0.388 -16.119 3.152 -9.142 0.000
0.50000 1.758 0.577 -0.152 0.433 -16.059 -0.498 -15.740 0.000
0.60000 1.534 0.080 -0.400 0.313 -16.274 0.016 -21.656 0.000
0.70000 1.290 0.134 -0.126 0.372 -16.155 -9.769 -30.628 0.000
0.80000 1.085 -0.168 -0.227 0.340 -16.201 -0.396 -42.075 0.000
0.90000 0.898 -0.210 -0.134 0.353 -16.175 5.599 -61.267 0.000

dG = 1.606 0.283 -0.125 0.360 -16.266 -1.093 -14.374 0.000

WARNING: Found all 0 in input data.
Generating results.txt with all 0
Exiting...

The .out files can be found here: https://www.dropbox.com/sh/i4c44d0yrid1a9u/AABjyJjOMBykC9FWefZYCMNGa?dl=0

I have pymbar installed. Could it be that it's because these were generated by sander or because I'm using a lambda interval from 0.1 to 0.9? Any help and advice would be greatly appreciated.

Sincerely,

Stefan

@davidlmobley

@halx
Copy link
Collaborator

halx commented Jan 26, 2017 via email

@davidlmobley
Copy link
Member

Yes, I'm with @halx - a key question in this case is what the expected behavior would be. If you're running a free energy calculation and you're lacking data at the endpoints, what OUGHT BAR and MBAR to give you?

In the other thread, we discussed extrapolation, but one view was that this is a bad idea since it will be difficult to know when it will break.

Historically, this has been why AMBER analysis of these types of simulations has been only with TI.

@mqbpksi2
Copy link
Author

Many thanks for the feedback.

Best wishes,

Stef

@danmermelstein
Copy link

Hi,

I got the same error, but I was using the pmemd version of TI, which does compute endpoints. The analysis worked when I reran my simulations with MBAR on. @halx do you know if this code is only designed to work when MBAR is on? I'm just wondering because right now the GPU version of the code I have only supports TI, and for the time being it would be nice to be able to use this code for analysis/testing

Thanks!

Dan

P.S. Command line was: /net/home/dmermels/SOFTWARE/alchemical-analysis/alchemical_analysis/alchemical_analysis.py -a AMBER -d . -p [01].*/charge.gas_0-5ns -q out -m -mbar-bar -r 5 -o . -u kcal -s 0 -g

@davidlmobley
Copy link
Member

Could you provide inputs, @danmermelstein ?

@danmermelstein
Copy link

Sure! Let me know if anything else is missing

Thanks!

mermelstein_alchemical_analysis_inputs.tar.gz

@halx
Copy link
Collaborator

halx commented Apr 18, 2017

Hi Dan,

I'm not sure what exactly the problem is. When I run your command with your data I get the output just fine. Only "issue" is that the tool also computes zero results for both BAR and MBAR. If you use "-m ti+ti-cubic" you can avoid that.

You have explicitly switch off MBAR reporting with ifmbar=0. The message you see is just a warning and can safely be ignored.

Cheers,
Hannes.

@danmermelstein
Copy link

Hi @halx,

Thanks for looking into this! What version of python and pymbar are you using?

Thanks!

Dan

python 2.7.13

@halx
Copy link
Collaborator

halx commented Apr 21, 2017 via email

@davidlmobley
Copy link
Member

@danmermelstein - LMK if you think it will be helpful for me to try too.

@danmermelstein
Copy link

Hi Hannes and David,

I tried pymbar 3.0.0 and no luck, so I just disabled the system exit following the warning like Hannes said and I get the correct results. I'll post here if I figure out why that was happening, but may have just been an unnecessary exit.

Either way, problem solved, thanks for all of your help!

Dan

@mqbpksi2
Copy link
Author

mqbpksi2 commented Apr 23, 2017 via email

@jchodera
Copy link

MBAR should absolutely be able to extrapolate to the endpoints even if they aren't sampled. Let me know if you folks need any help on the MBAR interface end of things.

Yes, this is because sander cannot sample the end points.

I believe this has was fixed with the introduction of softcore Lennard-Jones, which should be absolutely required for modern alchemical free energy calculations. Neglecting endpoint sampling in non-softcore free energy calculations has been shown to be highly problematic and has not been part of recommended best practices for many years now (nearly a decade).

@halx
Copy link
Collaborator

halx commented Apr 23, 2017

This is really an annoying implementation issue with sander. It does not allow you to set lambda < 0.005 or > 0.995. @danmermelstein can probably add more information. It is not an issue with pmemd though.

In this context, being able to estimate the non-sampled end-points would obviously be very interesting. Except, that I often get wrong results with MBAR (tested with pmemd result) that is the free energy can be very different from TI or BAR.

@jchodera
Copy link

Except, that I often get wrong results with MBAR (tested with pmemd result) that is the free energy can be very different from TI or BAR.

Can you elaborate on this? We find this universally indicates an issue with either (1) a problem in the thermodynamic self-consistency of your data, or (2) actual bias in the calculation that is impossible to detect via TI or BAR. TI allows you to miss near-singularities that hide the primary contributions to your integrals at the endpoints beyond or between the lambda values you evaluate, while one-sided BAR reduces to EXP, which has an enormous problem with bias being as large as the statistical error.

@jchodera
Copy link

In other words, and not meaning to mince words, if MBAR is giving very different results from TI or BAR, something is totally fucked up with your calculation.

@halx
Copy link
Collaborator

halx commented Apr 23, 2017

Thanks for the comment. I can provide more info and also data this week.

@halx
Copy link
Collaborator

halx commented Apr 24, 2017

I have uploaded the data for a simulation mutating ethane to methanol in water (see link below). The transformation has been split into charge only contributions and vdW+bonded contributions. The charge transformations seems to agree in the total dG in all methods (TI, TI-cubic, BAR, MBAR) but MBAR differs in almost all windows. The vdW+bonded transformation is just a mess... We have other simulations with small molecules but results are very similar with MBAR disagreeing. MBAR only agrees when the dH/dl gradients are a constant function over lambda.

https://www.dropbox.com/s/vbuonmoo42e69ye/ethane~methanol_water.tar.xz?dl=0

@jchodera
Copy link

Thanks! I've just downloaded the files.

Just to be clear, when you say you've split the transformation into contributions, what do you mean precisely? Do you break this into stages where you first change only the charges from ethane to methanol in the charge branch, and then change the vdW+bonded from ethane to methanol parameters in the vdw branch? Setting ifsc=1 I believe means that both Lennard-Jones and electrostatics are softened at intermediate lambda values even for the charge branch if just the charges are changing, and presumably the dummy atoms must also have nonzero Lennard-Jones repulsion to ensure the solvent does not overlap with it.

@halx
Copy link
Collaborator

halx commented Apr 24, 2017

Yes, in stages as you say. So, charges only.

Yes, ifsc=1 is required to make use of softcore potentials but they also require scmask for thedefinition of the dummy atoms. In my particular case ifsc=1 is really an accident but this has no further consequences because there are no dummy atoms.

Many thanks, for looking into this.

@jchodera
Copy link

How is it that there are no dummy atoms? An ethane-to-methanol transformation (CH3-CH3 to CH3-OH) would necessarily turn two of the hydrogens into dummy atoms, would it not?

@halx
Copy link
Collaborator

halx commented Apr 24, 2017

No dummy atoms for the charge only transformation. So here we really transform between the same molecule (same vdW and same bonded parameters) but with the charges of the respective end states (zero charges for "will-be" dummy atoms).

For the vdw+bonded transformation dummy atoms are defined for the two "vanishing" hydrogen atoms, see scmask1 = ':1@H7,H8' and ifsc=1. As charges are zero on these hydrogens already, only the LJ softcore will be active.

@jchodera
Copy link

No dummy atoms for the charge only transformation. So here we really transform between the same molecule (same vdW and same bonded parameters) but with the charges of the respective end states (zero charges for "will-be" dummy atoms).

Ah, excellent, thanks!

For the vdw+bonded transformation dummy atoms are defined for the two "vanishing" hydrogen atoms, see scmask1 = ':1@H7,H8' and ifsc=1. As charges are zero on these hydrogens already, only the LJ softcore will be active.

Thanks for the clarification!

@davidlmobley
Copy link
Member

@jchodera :

On this:

MBAR should absolutely be able to extrapolate to the endpoints even if they aren't sampled. Let me know if you folks need any help on the MBAR interface end of things.

Yes, this is because sander cannot sample the end points.

I believe this has was fixed with the introduction of softcore Lennard-Jones, which should be absolutely required for modern alchemical free energy calculations. Neglecting endpoint sampling in non-softcore free energy calculations has been shown to be highly problematic and has not been part of recommended best practices for many years now (nearly a decade).

Part of the issue here is that people might be changing a variety of other things aside from Lennard-Jones and Coulomb (adding or removing restraints, adding or removing bonds, etc.), and doing it in a way which (is perhaps) very non-ideal. Since we aren't responsible for what data they are feeding in, and in fact have no idea whether they generated it by doing something sensible, our thought was that it did not make sense to do any type of extrapolation even though it certainly could be done.

We discussed this in a separate thread (#66 ) and Michael Shirts used this analogy:

I think that any sort of extrapolation is a bad idea. It's new functionality that allows users to cut off their own feet. The error analysis gets to be really horrific. There's really no good reason to extrapolate in a generally distributed code.

I really would rather not let people cut off their feet if it can be avoided.

Now, if the AMBER folks have a way of parsing their free energy files (which I haven't worked with) to allow us to identify cases where extrapolation should be relatively "well behaved" and turn it on only in those cases, then I'm totally fine with doing that (and we should get alchemlyb headed in that direction). But otherwise I just don't think we should be doing it because people will use it to cut off their feet.

@danmermelstein
Copy link

@davidlmobley yeah I don't know of any tools right now that can check for "well behaved" systems in AMBER.

The easier thing would probably be just to update sander to use the same bonded potential as is in pmemd?

@mrshirts
Copy link
Contributor

mrshirts commented Apr 24, 2017 via email

@danmermelstein
Copy link

is the pmemd bonded functional form a problem?

@davidlmobley
Copy link
Member

I think @mrshirts just means that we don't particularly care what functional form AMBER is using, at some level (it doesn't make life easier for us one way or the other), though it WOULD certainly simplify the life of people who have to write the parsers for AMBER data and worry about whether the data is coming from pmemd or sander. So it seems like a good thing to me. Otherwise you have to worry that people might be able to get one set of answers in sander and another in pmemd, etc.

@danmermelstein
Copy link

ah gotcha, that makes sense thanks!

@danmermelstein
Copy link

yeah that is probably something that should happen. I don't know much about sander so I can't say there isn't some other issue preventing us from changing the bonded functional form, but I doubt it. I also don't want to commit myself to making the changes just because I have no idea if/when I'll have some extra time, but certainly I'll look into getting it done, if not by me then someone else.

@mqbpksi2
Copy link
Author

Sorry if this is a little off topic, but did someone say that TI can now run on GPUs? If so, which version of Amber is this? 16?

@davidlmobley
Copy link
Member

You want @danmermelstein I think.

@danmermelstein
Copy link

Sorry for the lag, @mqbpksi2 I have a patch over AMBER 16 I can give you if you're interested?

@mqbpksi2
Copy link
Author

mqbpksi2 commented Jun 7, 2017

Hi @danmermelstein,

Sorry for the late reply. Thanks for the offer; I don't need Amber16 at the moment; I was just wondering if it's possible, in principle, to run TI on GPUs now. This issue has been pending for a while, I believe. Does anyone know if there are plans to implement per-residue decomposition in TI in pmemd as well?

@mqbpksi2
Copy link
Author

mqbpksi2 commented Jun 9, 2017

The -s flag skips entries before a predefined time limit. Analysis is then carried out past that time point till the end of the input file(s). My question is "Is there a corresponding flag that limits analysis to a predetermined time point before the end of the simulation(s)?" Of course, I had a look at the available options with -h and saw that there is none, but I guess there's no harm in asking to double check. I imagine deleting the file entries past a certain time is one way to do it.

@davidlmobley
Copy link
Member

@mqbpksi2 - no, there is not. What's the use case for this, out of curiosity?

Thanks.

@mqbpksi2
Copy link
Author

mqbpksi2 commented Jun 9, 2017

I wanted to see how much the ΔG values would change if I break up the trajectories, for example if I break up the trajectories into four equally long intervals and compare ΔG values among those; it's one more way of looking at convergence. It would probably be useful to add a flag like this in future releases, in my humble opinion, at least.

@mqbpksi2
Copy link
Author

mqbpksi2 commented Jun 9, 2017

If I may ask one more followup question, how are lambda values extrapolated to the end states, if we don't have sampling at the end states?

For example, I have 11 lambda values: 0.005, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.995. The output reads:

Note: extrapolating missing gradient for lambda = 0.0: -0.511236
Note: extrapolating missing gradient for lambda = 1.0: -396.616445

The gradients (DV/DL) from the correlated samples (kcal/mol):

Lambda gradient

0.00000 -0.511
0.00500 0.259
0.10000 -13.376
0.20000 55.118
0.30000 7.271
0.40000 -67.904
0.50000 -94.552
0.60000 -111.262
0.70000 -147.835
0.80000 -185.730
0.90000 -244.077
0.99500 -392.396
1.00000 -396.616

dG = -100.191

Does the tool fit to a 10th degree polynomial and then extrapolate to 0 and 1 or what exactly?

@davidlmobley, can you tell me how the extrapolation is performed?

@ash286
Copy link

ash286 commented Aug 15, 2017

I have uploaded the data for a simulation mutating ethane to methanol in water (see link below).

Hi @halx ,

Can you please help me creating parmtop & rst7 files with dummy atoms that you have created in ethane~methanol system for charge transformation. May be this is question is not exactly related to alchemical-analysis script, but I would like to dig into MD/TI.

Thank you.

@halx
Copy link
Collaborator

halx commented Aug 15, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants