Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 6 new benchmark systems #3

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

brycestx
Copy link
Collaborator

@brycestx brycestx commented Aug 8, 2018

This PR adding protein and ligand systems for 6 systems that come from JMedChem papers published recently. Readme and documentation is provided. Most of the work came from @jeremy2401 from an internship this summer.

@davidlmobley
Copy link
Contributor

@brycestx @jeremy2401 - I see some issues that it would be nice to have sorted out here. Starting with the big picture:

  • There is nothing in the documentation here which actually says what these systems are and why they are good benchmarks. It would be ideal to instead (a) briefly mention them in the MAIN README.md, and (b) describe in detail in the README.md associated with the clean_dataset directory. Right now the only things listed there are things like "EZH2", "IMPDH", etc. I'd have to look those up to know what they are.
  • Why are these systems provided? There should be something that discusses that -- what's known about them, what are the challenges, the range of affinities, or why were they selected out of all posssibilities?
  • Right now VERY little is said about how the systems were prepared
  • How many compounds are provided? For what targets?
  • Have these been used in prior computational studies/are they being used?

Minor quibbles:

  • Current documentation is very odd, e.g. " SMILES of cmp_14 given by the csv was wrong and have been corrected". What CSV? And is, say, cmp_14 from a paper? Or from an internal datafile? There are many other comments that are hard to understand, like " carefull to inhibition mechanism (uncomp)" (???)
  • "cmp with stereo have been removed" Why? I think I can guess, but should be stated
  • Some text seems missing, e.g. this phrase appears: "Duplicates, wrong SMILES strings and compounds with no specified stereochemistry have been removed or corrected from The SMILES codes given by the supplementary data and conformers have been generated." Is there a clause missing after "corrected from"?
  • The markdown format headers aren't spaced properly so they appear odd, see https://github.com/brycestx/soft-benchmarks/tree/master/clean_dataset
  • It would be helpful to say HOW things were done otherwise it's impossible to reproduce, e.g. "The SMILES codes given by the supplementary data and conformers have been generated." With what tools?

Big picture: Make it so people can tell what you're adding, why you're adding it, and why it might be interesting to them without having to read all of the papers you're depositing.

(Also take note you probably don't have the rights to deposit the papers, only link to them...)

Copy link
Contributor

@davidlmobley davidlmobley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested, see comments.

@davidlmobley
Copy link
Contributor

Other things that would be worth knowing:

  • Are the PDBs simulation-ready (e.g. do they have protons? Missing residues? Gaps? Caps?)
  • How were they prepared/pre-processed? And with what tools?
  • Do the ligands have 3D geometries and are they posed? With all protons? How were protonation states/tautomers set/selected?

@davidlmobley
Copy link
Contributor

@brycestx any chance of follow-up on this?

@brycestx
Copy link
Collaborator Author

@davidlmobley I never received any response from @jeremy2401 in regards to your comments. Let me assess bandwidth internally and figure out if we can reproduce and redocument from scratch Jeremy's work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants