Code for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow
This repository contains the framework and code for constructing a variational autoencoder (VAE) for use with molecular SMILES, as described in doi:10.1021/acscentsci.7b00572, with preprint at https://arxiv.org/pdf/1610.02415.pdf.
In short, molecular SMILES are encoded into a code vector representation, and can be decoded from the code representation back to molecular SMILES. The autoencoder may also be jointly trained with property prediction to help shape the latent space. The new latent space can then be optimized upon to find the molecules with the most optimized properties of interest.
In our example, we perform encoding/decoding with the ZINC dataset, and shape the latent space on prediction on logP, QED, and SAS properties.
Make a github issue :smile:. Please be as clear and descriptive as possible.
An Anaconda python environment is recommend. Check the environment.yml file, but primarily: - Python >= 3.5 - Keras >= 2.0.0 && <= 2.0.7 - Tensorflow == 1.1 - RDKit - Numpy
Jupyter notebook is required to run the ipynb examples. Make sure that the Keras backend is set to use Tensorflow
Create a conda enviroment:
conda env create -f environment.yml source activate chemvae python setup.py install
Assuming you have all the requirements:
pip install git+https://github.com/aspuru-guzik-group/chemical_vae.git
This repository contains an example of how to run the autoencoder on the zinc dataset.
First, take a look at the zinc directory. Parameters are set in the following jsons - exp.json - Sets parameters for location of data, global experimental parameters number of epochs to run, properties to predict etc.
For a full description of all the parameters, see hyperparameters.py ; parameters set in exp.json will overwrite parameters in hyperparameters.py, and parameters set in params.json will overwrite parameters in both exp.json and hyperparameters.py
Once you have set the parameters, run the autoencoder using the command from directory with exp.json:
python -m chemvae.train_vae
(Make sure you copy examples directories to not overwrite the trained weights (*.h5))
train_vae.py : main script for training variational autoencoder Accepts arguments -d ... Example of how to run (with example directory here)
This software is written by Jennifer Wei, Benjamin Sanchez-Lengeling, Dennis Sheberla, Rafael Gomez-Bomberelli, and Alan Aspuru-Guzik ([email protected]). It is based on the work published in https://arxiv.org/pdf/1610.02415.pdf by
Feel free to reach out to us with any questions!
"This work was supported by the Computational Chemical Sciences Program funded by the U.S.Department of Energy, Office of Science, Basic Energy Sciences, under Award #DE- FG02-17ER16362"