AI-Drive drug Discovery & design

Artificial Intelligence-assisted Drug Discovery and Design

Let me briefly introduce the field I am responsible for - artificial intelligence-assisted drug discovery and design. In order to better help everyone understand this complex cutting-edge technology, I will explain it in Chinese, especially to support those who may have language barriers. Although this content is usually more suitable for presentation in English, I have chosen to provide a more intuitive learning path in Chinese to ensure everyone can better grasp the core concepts and quickly get started.

Artificial intelligence-assisted drug discovery and design is a method that utilizes artificial intelligence (AI) technology to accelerate and optimize the drug development process. Traditional drug research and development cycles are long, costly, and have low success rates, while AI can significantly enhance the efficiency and accuracy of drug design through big data processing, model prediction, and automated analysis.

Advantages of AI in Drug Design

Accelerate the R&D cycle: AI can quickly screen a large number of compounds, reducing the workload of experimental screening.
Enhance accuracy: AI can improve the predictive accuracy in drug design through large-scale data training of models.
Cost reduction: AI can effectively reduce the cost of drug development due to the reduction in the repetitiveness and failure rate of experiments.
Personalized Drug Design: AI can assist in designing personalized drugs, targeting specific patient or disease characteristics to optimize therapeutic effects.

The application of artificial intelligence in drug discovery and design makes the drug development process more intelligent and automated. Through key steps such as compound screening, target prediction, drug generation, and ADMET analysis, AI can effectively accelerate the process of drug research and development while improving the accuracy and efficiency of design.

Acknowledgments

First, I would like to express my gratitude to Professor Lei at Sun Yat-sen University Medical School for introducing me to this field. I entered the campus with no knowledge, so I didn't know my direction, and was quite confused, so I am especially grateful to my mentor. Moreover, I would like to thank Professor Yu at Huaiyin Institute of Technology for his help in writing this. I would also like to thank Zhang Shengyu for his strong support in completing my studies.
Secondly, first of all, thank you to the big shot in computational chemistry at Nanjing University of Technology, "Guo Weitao" student, who has been helping me and explaining the process of pharmaceutical manufacturing.
Third, I still have to thank my good friend——the student Zhou Yiping from Nanjing Medical University, for she is a clever, beautiful, and kind girl who has greatly helped me.
Secondly, thank you to the big shots in my group for all their guidance.
Last but not least, I would like to express my gratitude to the members of the public and my peers for helping me criticize and correct, making this document and this code more complete and perfect, and making our motherland stronger and better. Thank you, motherland, thank you, Party.
Thank you to those who helped me complete my studies!

Welcome to the Artificial Intelligence Drug Discovery and Design Platform

Abstract

Abstract: Drug discovery and development affects various aspects of human health and dramatically impacts the pharmaceutical market. However, investments in a new drug often go unrewarded due to the long and complex process of drug research and development (R&D). With the advancement of experimental technology and computer hardware, artificial intelligence (AI) has recently emerged as a leading tool in analyzing abundant and high-dimensional data. Explosive growth in the size of biomedical data provides advantages in applying AI in all stages of drug R&D. Driven by big data in biomedicine, AI has led to a revolution in drug R&D, due to its ability to discover new drugs more efficiently and at lower cost. This review begins with a brief overview of common AI models in the field of drug discovery; then, it summarizes and discusses in depth their specific applications in various stages of drug R&D, such as target discovery, drug discovery and design, preclinical research, automated drug synthesis, and influences in the pharmaceutical market. Finally, the major limitations of AI in drug R&D are fully discussed and possible solutions are proposed.

Keywords: Artificial intelligence; Machine learning; Deep learning; Target identification; Target discovery; Drug design; Drug discovery

👉🏻👉🏻👉🏻👉🏻👉🏻 Currently, the table of contents is divided into, the first, second, and third sections respectively: providing for students without a foundation, and students with a foundation

Section 1: Basic Part

Application of Artificial Intelligence in Medicine

Python Basics
Numpy, Pandas
Matplotlib
Machine Learning and Scikit-Learn
Deep Learning
CADD
Graph Neural Network

Python and Basics of Machine Learning

Molecular text representation: SMILES
Molecular Vector Representation: Descriptors and Fingerprints
RDKit Introduction
Classical machine learning models: linear regression, random forest, support vector machine

Exploration of Publicly Available Small Molecule Datasets

Biologically active molecules ChEMBL database
ZINC Database
PubChem Database
Exploratory Data Analysis (EDA)
Quantitative Structure-Activity Relationship (QSAR) and Virtual Screening (VS)

Graph Neural Network

Neural Network Architecture and Training
Molecular diagram, atomic characterization
Message Passing Neural Network
Tiled Convolutional Neural Network
Explainability: Grad-CAM

Molecular Docking

Molecular data formats: SMI, SDF, MOL2, PDB
Field
Protein folding
Using AutoDock Vina, Smina, QuickVina for molecular docking
Interchangeable fingerprint
Pharmacodynamic group

Deep generative model

Automatic Encoder
Recurrent Neural Network
SMILES Generator: ReLeaSE and REINVENT
Graph-based Generative Model: JT-VAE
Molecular Characteristic Optimization: Reinforcement Learning and Bayesian Optimization

Protein Deep Learning

Simplified protein representation
Voxel grid representation
For encoding the grid representation of protein surfaces
3D Convolutional Neural Network

Uncertainty Prediction

Any and cognitive uncertainty
Conformal Prediction

Section 2：Chemical Informatics

Basic Knowledge

Chemical Informatics RDKit Introduction
Application of Pandas in Chemical Informatics
SMILES Tutorial
SMARTS Tutorial
Basic Reactions Enumeration
List of stereoisomers and diastereomers

Utilizing Datamol and Molfeat to streamline chemical informatics workflows

Data processing, descriptors, and clustering

Cluster

K-Means Clustering
Taylor-Butina clustering

Complex chemical informatics analysis

Chembl System Analysis
Based on Chembl database drug data analysis
Analysis based on patent data from BindingDB

SAR Analysis

Scaffolding Identification
R-group analysis
Position simulation scan analysis
Free-Wilson analysis
Matching molecular pairs
Matching molecular set

Machine Learning Model

Construct and test a QSAR model
Model Classification Construction and Comparison
Building and Comparing Regression Models

Active Learning

Active Classification
Active Regression
Active Shape Search

Neural Network Potential

Predicting isomers' energy using Auto3D

Section 3: Practical Part

1. Compound data collection and preliminary processing

01_Compound data collection from ChEMBL
02_ From PubChem Data Acquisition
03_ Data from KLIFS
- Complete Project: "Predicting Bioactivity Based on Machine Learning"
04_Query Online API Network Service

2. Molecular Filtration and Pretreatment

05_Molecular Filtration: ADMET and Lead Compound Similarity Criteria
- Complete Project: "Molecular ADMET Prediction Based on Machine Learning and Deep Learning"
- Complete Project: "Molecular Toxicity Prediction Based on GNN"
06_Molecular Filtration: Unwanted Substructures
- 06_1 Complete Project: "Ligand Screening Based on ADMET and RO5 for Molecular Screening and Compound Similarity"

3. Molecular Representation and Similarity Analysis

07_Molecular Representation
08_Ligand-based screening: Compound similarity
09_Composite Clustering
10_Largest Common Substructure

4. Drug Efficacy Cluster and Off-Target Prediction

11_Ligand-Based Pharmacophore
12_Combining site similarity and off-target prediction

5. Protein data acquisition and binding site detection

13_Protein Data Acquisition: Protein Data Bank (PDB)
14_Combination Ligation Site Detection

6. Protein-ligand docking

15_Protein-ligand docking
- Predicting the retrosynthetic accessibility of bioactive molecules
16_Protein-ligand interactions
17_NGLview Advanced Usage

7. Molecular Dynamics Simulation

18_Molecular Dynamics Simulation
19_Analysis of Molecular Dynamics Simulation

8. Lead Compound Optimization and Automation Process

20_Lead Compound Optimization Automation Process

9. Machine Learning and Molecular Property Prediction

21_Ligand-based Screening: Machine Learning
22_Ligand-based screening: Neural network
23_Molecular Property Prediction Based on RNN
24_Molecular Property Prediction Based on GNN
25_Molecular Property Predictor Converter
26_ Uncertainty Estimation

10. RNA Aptamer Design and Analysis

1. Data Collection and Preparation

27_1 RNA Aptamer Data Source
- Data source: RNAapt3D (https://rnaapt3d.medals.jp/)
27_2 Data Cleaning and Preprocessing

2. RNA Structure Prediction

28_1 Primary Structure Prediction
28_2 Structural Visualization and Analysis
29_1 Secondary Structure Prediction
29_2 Energy minimization and folding stability analysis
- ΔG (Free Energy) and Folding Stability Diagram

30_RNA Aptamer Binding Site Analysis

30_1 RNA Aptamer Binding Site Prediction to Target
30_2 Calculation and Optimization of Binding Energy

3. Drug Design and Optimization

30_3 Drug Design Based on RNA Aptamers
30_4 Drug Chemistry and Virtual Screening

5. Molecular Dynamics Simulation and Experimental Verification

30_5 Molecular Dynamics Simulation
30_6 Experimental Verification

11. Kinase Similarity and Analysis

31_Kinase similarity: Sequence
32_Kinase Similarity: Kinase Pocket (KiSSim Fingerprint)
33_Kinase Similarity: Interaction Fingerprint
34_Kinase Similarity: Ligand Profile
35_Kinase Similarity: Comparison of Different Views
36_ Designed kinase inhibitors based on kinase fragment libraries

12. Protein-ligand interactions and prediction

37_Protein-ligand interaction prediction
- Complete Project: "Project Practice: Prediction of Chemical Reaction Yields Using Transformer-Based Deep Learning"
- Complete Project: "Project Practice: Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks"
- Complete Project: "Project Practice: A Graph to Graphs Framework for Retrosynthesis Prediction of Small Molecule Compounds"
- Complete Project: "Project Practice: Generative Language Modeling for Antibody Design"

13. Advanced Modeling and Virtual Screening

38_Running 3D Dynamics Based on KLIFS Data
Consensus-based Integrated Structure Virtual Screening (Protein Preparation, Docking, Binding Site Selection, Rescore and Sort)

14. Visualization and Coding Tools

40_One-Hot Encoding
41_Using Code to Draw Molecular Graphs

Section 4：Researching Genes to Immune Infiltration

1. Introduction

Research objective: Explore the relationship between genes and immune infiltration
Background Introduction: The Importance of Immune Infiltration and Its Impact on Gene Expression

2. Software Environment Configuration

Software tools required for installation and configuration (such as R, Python, Bioconductor)

3. Data Download and Preprocessing

Download gene expression data from GEO, TCGA, and other databases
Data standardization and cleaning

4. Data Annotation (ANN)

Annotate gene expression data using annotation files (such as GTF or GFF files)
Tools: biomaRt org.Hs.eg.db

5. SVA (Surrogate Variable Analysis)

Apply SVA to correct batch effects
Code Implementation: Method ComBat or svaseq in the sva package

6. Differential Gene Expression (Diff)

Using tools such as limma and DESeq2 for differential gene analysis
List of differentially expressed genes (DEGs)

7. Volcano Plot Drawing

Plot volcano plots using the results of the differential analysis
Code Implementation: ggplot2 , EnhancedVolcano Package

8. Enrichment analysis

08. Metascape
- Conduct pathway and functional enrichment analysis on the Metascape website
09. Gene Ontology (GO)
- Using GO for gene function annotation and enrichment analysis
- Tools: clusterProfiler topGO
10. KEGG Enrichment Analysis
- Using KEGG database for signaling pathway enrichment analysis
- Tools: clusterProfiler KEGGREST

11. Protein Interaction Network

11. Protein-Protein Interaction (PPI)
- Constructing protein-protein interaction networks
- Tools: STRING Database, Cytoscape

12. Random Forest Model Random Forest

Constructing a random forest model, filtering important genes

13. Heatmap Drawing

Draw the expression heat map of different genes

14. Gene Score Calculation

According to differential gene calculation of gene score
Code Implementation: Comprehensive Scoring Method Based on Gene Expression Values

15. Neural Network Model

Predicting gene expression using neural network models
Tool: neuralnet Package

16. ROC Curve AnalysisReceiver Operating Characteristic (ROC)

Plot ROC curve, evaluate model accuracy
Tool: pROC Package

17. Gene Score Test

Evaluate the accuracy and feasibility of the gene scoring model

18. Neural Network Prediction Test

Evaluate the predictive ability of neural network models

19. ROC test Test ROC

Validate ROC Curve Using Test Dataset

20. CIBERSORT analysis

Using CIBERSORT analysis to estimate the proportion of immune cells from gene expression data

21. Barplot drawing

Create a histogram of immune cell proportions
Tool: ggplot2

22. Violin Plot

Create a diagram showing the relationship between gene expression and immune cell infiltration

Project Structure

|-- Al-drug-design-reference.Data   <- 参考文献文件夹
|-- README.md                       <- 详细简介
|-- img                             <- md的图片
|-- docs                            <- 文档
|-- Al-drug-design-reference.enl    <- 参考文献文件
|-- list                            <- 项目结构目录
|-- Al-drug-design.yml                             <- 环境配置
|   |-- 00_ai in_medicine                          <- Python基础知识（❤️如果你有Python基础，或者你有Python与药物设计基础，你可以跳过这一章节，直接从01开始看）
|   |-- 03_03_Al Drug Discovery & Design              <- 化合物采集

Configure code environment

⚠️ If you are a big shot, just skip to the next step below------------------------

Install Anaconda and Git. Please refer to the Anaconda website and the Git website for download.

first

❤️ u must need read paper ⚠️:

Where is AIDrugDesign.yml ?

u first git clone my link!!!, it is have AIDrugDesign.yml.

conda env create -f AIDrugDesign.yml

If you are abroad or have🪜, please open AIDrugDesign.yml

Delete:

  - pytorch
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free

second

conda env list

Check to see if the AI-drug-design

Then

conda activate AIDrugDesign

⚠️ If you are a big shot, just skip to the next step below end------------------------

If you are a beginner, please humbly look at the following steps:

One, install and configure Git on Windows and macOS systems, and pull my code to the local machine.

If you want to learn more about Git, please go to [if_operation_git_process.md]

你的翻译请求没有提供具体的源文本，因此我无法进行翻译。请提供需要翻译的文本，我将按照您的要求进行翻译。

Windows system

1. Download and install Git

Go to the official Git download page.
Select the Windows version for download and follow the prompts to complete the installation.
- At installation, the default options can be used. If customization is desired, different configurations can be selected as needed, such as the editor, environment variables, etc.

2. Configure Git

After installation, open the terminal (or Git Bash) and configure your Git user information:

git config --global user.name "Your Name"
git config --global user.email "[email protected]"

Thus, Git will use this information to identify your identity each time you commit code.

3. Generate SSH Key

On Windows, open Git Bash, enter the following command to generate an SSH key:

ssh-keygen -t rsa -b 4096 -C "[email protected]"

After pressing Enter, you will see a prompt to select the location for the storage key, default Enter is fine.
Then you need to set a password, which can be empty, but it is recommended to do so.

4. Add SSH Key to GitHub

Use the following command to display the public key you generated:

cat ~/.ssh/id_rsa.pub
Copy the public key and log in to GitHub.
Go to the SSH and GPG key page on GitHub, click New SSH key.
Paste the public key you just copied into the text box, save after adding.

5. Clone GitHub repository

After completing the SSH key configuration, you can use the following command to clone the code repository:

git clone [email protected]:itWangCode/AI-drug-design.git

6. Successfully pulled code

After executing the command, Git will pull the code into the local folder. If successful, you will see information similar to the following:

Cloning into 'AI-drug-design'...

At this point, the code has been successfully pulled.

macOS system

1. Install Git

macOS usually comes with Git pre-installed; if not, Git can be installed via Homebrew:

Open the terminal and enter the following command to install Homebrew (if not already installed):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
After installation, enter the following command to install Git:

brew install git

2. Configure Git

Open the terminal and enter the following command to configure Git:

git config --global user.name "Your Name" git config --global user.email "[email protected]"

3. Generate SSH Key

Enter the following command in the terminal to generate an SSH key:

ssh-keygen -t rsa -b 4096 -C "[email protected]"
As with Windows, press Enter to store the key at the default path, set a password (optional).

4. Add SSH Key to GitHub

At the terminal, use the following command to view the public key:

cat ~/.ssh/id_rsa.pub
Copy the output public key, log in to GitHub, and add the public key to the GitHub SSH Keys page.

5. Clone GitHub repository

Use the following command to clone your repository to your local machine:

git clone [email protected]:itWangCode/AI-drug-design.git

6. Successfully pulled code

Git will download the code to the current directory, indicating a successful code pull.

Thus, you have successfully configured Git and cloned the code repository.

How to download and install Anaconda-Navigator on Windows and macOS systems and configure your environment

1. Download Anaconda-Navigator

Windows:

Go to the Anaconda official website.
Click the download button, select the version suitable for the Windows system (usually 64-bit).
Download completed, run the installation program and follow the prompts to complete the installation.

macOS:

Similarly, go to the Anaconda official website.
Select the version of macOS system to download and install.

During the installation process, it is recommended to check the option "Add Anaconda to my PATH environment variable" so that the conda command can be used directly in the terminal.

2. Open Anaconda-Navigator

After installation:

On Windows, Anaconda-Navigator can be found in the Start menu and run.
On macOS, you can find Anaconda-Navigator in the "Applications" folder, and click to launch it.

3. Clone your repository

Before configuring in the Anaconda environment, you first need to clone your GitHub repository to your local machine.

Open terminal (macOS) or Git Bash (Windows).
Run the following command to clone your repository:

git clone [email protected]:itWangCode/AI-drug-design.git

This will download the repository containing the AIDrugDesign.yml file to the local machine.

4. Create Conda Environment

Navigate to the clone project folder and create a new Conda environment through the .yml file:

In the terminal or Git Bash, switch to the directory of the cloned repository:

cd AI-drug-design
Run the following command to create a Conda environment based on the AIDrugDesign.yml file:

conda env create -f AIDrugDesign.yml

❤️ Please wait for more than 15 minutes, please connect to Wi-Fi !!!!!!!

Conda will automatically install the required dependency packages based on the .yml file and create the environment.

5. List Conda environments

After installation, use the following command to view all environments and check if the AI-drug-design environment has been created:

conda env list

You should see an output similar to the following, which includes AI-drug-design :

# conda environments:
#
base                  *  /path/to/anaconda3
AI-drug-design           /path/to/anaconda3/envs/AI-drug-design

6. Activation Environment

Finally, use the following command to activate the AI-drug-design environment:

conda activate AI-drug-design

You have now successfully configured and activated the AI-drug-design environment, and you can start developing using this environment.

Install all configurations

Operation

Three, software

https://www.jetbrains.com/pycharm/

I sometimes get particularly scared of Mac OS system classmates

More operations: Please refer to: [https://www.exception.site/article/1762\]

QS Question Area

1. For example, will you encounter such a situation?

Then how do we solve it, so go to the win advanced environment variables to configure?

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
AI-drug-design-reference.Data		AI-drug-design-reference.Data
docs		docs
img		img
list		list
.gitignore		.gitignore
AIDrugDesign.yml		AIDrugDesign.yml
README.md		README.md

itWangCode/AI-Drug-Discovery-Design

Folders and files

Latest commit

History

Repository files navigation

AI-Drive drug Discovery & design

Artificial Intelligence-assisted Drug Discovery and Design

Advantages of AI in Drug Design

Acknowledgments

Welcome to the Artificial Intelligence Drug Discovery and Design Platform

Abstract

Table of Contents

Section 1: Basic Part

Section 2：Chemical Informatics

Section 3: Practical Part

1. Compound data collection and preliminary processing

2. Molecular Filtration and Pretreatment

3. Molecular Representation and Similarity Analysis

4. Drug Efficacy Cluster and Off-Target Prediction

5. Protein data acquisition and binding site detection

6. Protein-ligand docking

7. Molecular Dynamics Simulation

8. Lead Compound Optimization and Automation Process

9. Machine Learning and Molecular Property Prediction

10. RNA Aptamer Design and Analysis

1. Data Collection and Preparation

2. RNA Structure Prediction

30_RNA Aptamer Binding Site Analysis

3. Drug Design and Optimization

5. Molecular Dynamics Simulation and Experimental Verification

11. Kinase Similarity and Analysis

12. Protein-ligand interactions and prediction

13. Advanced Modeling and Virtual Screening

14. Visualization and Coding Tools

Section 4：Researching Genes to Immune Infiltration

1. Introduction

2. Software Environment Configuration

3. Data Download and Preprocessing

4. Data Annotation (ANN)

5. SVA (Surrogate Variable Analysis)

6. Differential Gene Expression (Diff)

7. Volcano Plot Drawing

8. Enrichment analysis

11. Protein Interaction Network

12. Random Forest Model Random Forest

13. Heatmap Drawing

14. Gene Score Calculation

15. Neural Network Model

16. ROC Curve AnalysisReceiver Operating Characteristic (ROC)

17. Gene Score Test

18. Neural Network Prediction Test

19. ROC test Test ROC

20. CIBERSORT analysis

21. Barplot drawing

22. Violin Plot

Project Structure

Configure code environment

first

second

Then

If you are a beginner, please humbly look at the following steps:

One, install and configure Git on Windows and macOS systems, and pull my code to the local machine.

Windows system

1. Download and install Git

2. Configure Git

3. Generate SSH Key

4. Add SSH Key to GitHub

5. Clone GitHub repository

6. Successfully pulled code

macOS system

1. Install Git

2. Configure Git

3. Generate SSH Key

4. Add SSH Key to GitHub

5. Clone GitHub repository

6. Successfully pulled code

How to download and install Anaconda-Navigator on Windows and macOS systems and configure your environment

1. Download Anaconda-Navigator

Windows:

macOS:

Packages