Machine Learning-Based Hypothesis Generation to Select Optimum Additive for Perovskite Single-Crystal Synthesis (PvkAdditives)
This documentation is prepared as the workflow to accompany the following study:
"Principled Exploration of Bipyridine and Terpyridine Additives to Promote Methylammonium Lead Iodide Perovskite Crystallization"
Cryst. Growth Des. 2022, 22, 9, 5424–5431 https://doi.org/10.1021/acs.cgd.2c00522
Noor Titan Putri Hartono (1, 5), Mansoor Ani Najeeb Nellikkal (2), Zhi Li (3), Philip W. Nega (3), Clare A. Fleming (2), Xiaohe Sun (2), Emory M. Chan (3), Antonio Abate (5), Alexander J. Norquist (2), Joshua Schrier (4), Tonio Buonassisi (1)
Affiliations:
- Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Department of Chemistry, Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, USA
- Molecular Foundry, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, USA
- Department of Chemistry, Fordham University, 441 E. Fordham Road, The Bronx, New York 10458, USA
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Kekuléstraße 5, Berlin 12489, Germany
To install, just clone this repository and chemprop repository:
$ git clone https://github.com/PV-Lab/PvkAdditives.git
$ cd PvkAdditives
To install the required packages, create a virtual environment using Anaconda/ Miniconda (https://docs.conda.io/en/latest/miniconda.html). The optional but recommended setup on Anaconda/ Miniconda:
$ conda env create -f environment.yml
$ conda activate pvkadditives
$ jupyter notebook
Also download the SMILES database from eMolecules (https://downloads.emolecules.com/free/) for mapping out the chemical space, especially in the planning step.
After measuring the distributions of the sizes of black crystals (MAPbI3) using a separate software (recommendation: ImageJ https://imagej.nih.gov/ij/download.html) and measuring the metric of interest (in our case, we measure the top 10th percentile of each vial of perovskite single-crystals), we can start analyze the data.
There are X separate Jupyter Notebook for each step of the workflow.
additives_diversity.ipynb
consists of the Tanimoto similarity-based analysis to determine the diversity of the pre-screened additive molecules, and prioritize the synthesis and measurement based on that. The list of additive experimental candidates is indataset/20211013_smiles_experimental_candidates_sorted.csv
.regression_mordred.ipynb
consists the combination of: (a) Mordred featurization (b) Recursive feature elimination (RFE) (c) Random forest regression (d) Shapley feature importance rank, to drive the hypothesis generation. Both round 1 and round 2 of data are analyzed using this notebook. The round 1 data is compiled indataset/20210923_round1_compiledData.csv
, and the round 2 data is compiled indataset/20211006_round2_compiledData.csv
.simulation_boxfunction.ipynb
consists the following: (a) It randomly draws SMILES from the eMolecules database, so we can investigate where our SMILES of interest are in the chemical space using t-SNE.Based on the n-randomly drawn SMILES, we can construct the dataset with x as our feature of interest (in our case, it's ATSC5Z). Therefore, f(x) = 1 when the ATSC5Z value of the molecule falls within the range of our 'box', and f(x) = 0 otherwise. (b) After constructing both x and f(x) in our dataset, we can test the sequence of algorithms: Mordred featurization, RFE, random foreset regression, and Shapley feature importance rank to see if ATSC5Z appears as the most important feature.
Author(s) | Noor Titan Putri Hartono |
Version | 1.0/ October 2021 |
E-mail(s) | noortitan at alum dot mit dot edu |
This work is under an Apache 2.0 License. Please, acknowledge use of this work with the appropriate citation to the repository and research article.
@Misc{pvkadditives2021,
author = {The Perovskite Additives authors},
title = {Machine Learning-Based Hypothesis Generation to Select Optimum Additive for Perovskite Single-Crystal Synthesis},
howpublished = {\url{https://github.com/PV-Lab/PvkAdditives}},
year = {2021}
}