Need help with gnina?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

gnina
182 Stars 71 Forks Other 5.9K Commits 1 Opened issues

Description

A deep learning framework for molecular docking

Services available

!
?

Need anything else?

Contributors list

gnina (pronounced NEE-na) is a molecular docking program with integrated support for scoring and optimizing ligands using convolutional neural networks. It is a fork of smina, which is a fork of AutoDock Vina.

Help

Please subscribe to our slack team.

Citation

If you find gnina useful, please cite our paper(s):

Protein–Ligand Scoring with Convolutional Neural Networks (Primary citation) M Ragoza, J Hochuli, E Idrobo, J Sunseri, DR Koes. J. Chem. Inf. Model, 2017
link arXiv

Ligand pose optimization with atomic grid-based convolutional neural networks M Ragoza, L Turner, DR Koes. Machine Learning for Molecules and Materials NIPS 2017 Workshop, 2017 arXiv

Visualizing convolutional neural network protein-ligand scoring J Hochuli, A Helbling, T Skaist, M Ragoza, DR Koes. Journal of Molecular Graphics and Modelling, 2018 link arXiv

Convolutional neural network scoring and minimization in the D3R 2017 community challenge J Sunseri, JE King, PG Francoeur, DR Koes. Journal of computer-aided molecular design, 2018 link PubMed

Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design PG Francoeur, T Masuda, J Sunseri, A Jia, RB Iovanisci, I Snyder, DR Koes. J. Chem. Inf. Model, 2020 link PubMed Chemrxiv

Docker

A pre-built docker image is available here and Dockerfiles are here.

Installation

We strongly recommend that you build gnina from source to ensure you are using libraries that are optimized for your system. However, a compatibility focused binary is available as part of the release for evaluation purposes.

Ubuntu 20.04

apt-get install build-essential cmake git wget libboost-all-dev libeigen3-dev libgoogle-glog-dev libprotobuf-dev protobuf-compiler libhdf5-dev libatlas-base-dev python3-dev librdkit-dev python3-numpy python3-pip python3-pytest

Follow NVIDIA's instructions to install the latest version of CUDA (>= 10.0 is required). Make sure

nvcc
is in your PATH.

Optionally install cuDNN version 7.85 (>= 8.0 is not yet supported).

Install OpenBabel3

git clone https://github.com/openbabel/openbabel.git
git checkout openbabel-3-1-1 
mkdir build
cd build
cmake -DWITH_MAEPARSER=OFF -DWITH_COORDGEN=OFF ..
make
make install

Install gnina

git clone https://github.com/gnina/gnina.git
cd gnina
mkdir build
cd build
cmake ..
make
make install

If you are building for systems with different GPUs (e.g. in a cluster environment), configure with

-DCUDA_ARCH_NAME=All
.
Note that the cmake build will automatically fetch and install libmolgrid if it is not already installed.

The scripts provided in

gnina/scripts
have additional python dependencies that must be installed.

Usage

To dock ligand

lig.sdf
to a binding site on
rec.pdb
defined by another ligand
orig.sdf
:
gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf -o docked.sdf.gz

To perform docking with flexible sidechain residues within 3.5 Angstroms of

orig.sdf
(generally not recommend unless prior knowledge indicates pocket is highly flexible):
gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --flexdist_ligand orig.sdf --flexdist 3.5 -o flex_docked.sdf.gz

To perform whole protein docking:

gnina -r rec.pdb -l lig.sdf --autobox_ligand rec.pdb -o whole_docked.sdf.gz --exhaustiveness 64

To utilize the default ensemble CNN in the energy minimization during the refinement step of docking (10 times slower than the default rescore option):

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --cnn_scoring refinement -o cnn_refined.sdf.gz

To utilize the default ensemble CNN for every step of docking (1000 times slower than the default rescore option):

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --cnn_scoring all -o cnn_all.sdf.gz

To utilize all empirical scoring using the Vinardo scoring function:

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --scoring vinardo --cnn_scoring none -o vinardo_docked.sdf.gz

To utilize a different CNN during docking (see help for possible options): ```

gnina -r rec.pdb -l lig.sdf --autoboxligand orig.sdf --cnn dense -o densedocked.sdf.gz ```

To minimize and score ligands

ligs.sdf
already positioned in a binding site:
gnina -r rec.pdb -l ligs.sdf --minimize -o minimized.sdf.gz

All options: ``` Input: -r [ --receptor ] arg rigid part of the receptor --flex arg flexible side chains, if any (PDBQT) -l [ --ligand ] arg ligand(s) --flexres arg flexible side chains specified by comma separated list of chain:resid --flexdistligand arg Ligand to use for flexdist --flexdist arg set all side chains within specified distance to flexdistligand to flexible --flexlimit arg Hard limit for the number of flexible residues --flexmax arg Retain at at most the closest flex_max flexible residues

Search space (required): --centerx arg X coordinate of the center --centery arg Y coordinate of the center --centerz arg Z coordinate of the center --sizex arg size in the X dimension (Angstroms) --sizey arg size in the Y dimension (Angstroms) --sizez arg size in the Z dimension (Angstroms) --autoboxligand arg Ligand to use for autobox --autoboxadd arg Amount of buffer space to add to auto-generated box (default +4 on all six sides) --autoboxextend arg (=1) Expand the autobox if needed to ensure the input conformation of the ligand being docked can freely rotate within the box. --nolig no ligand; for sampling/minimizing flexible residues

Scoring and minimization options: --scoring arg specify alternative built-in scoring function --customscoring arg custom scoring function file --customatoms arg custom atom type parameters file --scoreonly score provided ligand pose --localonly local search only using autobox (you probably want to use --minimize) --minimize energy minimization --randomizeonly generate random poses, attempting to avoid clashes --nummcsteps arg number of monte carlo steps to take in each chain --nummcsaved arg number of top poses saved in each monte carlo chain --minimizeiters arg (=0) number iterations of steepest descent; default scales with rotors and usually isn't sufficient for convergence --accurateline use accurate line search --simpleascent use simple gradient ascent --minimizeearlyterm Stop minimization before convergence conditions are fully met. --minimizesinglefull During docking perform a single full minimization instead of a truncated pre-evaluate followed by a full. --approximation arg approximation (linear, spline, or exact) to use --factor arg approximation factor: higher results in a finer-grained approximation --forcecap arg max allowed force; lower values more gently minimize clashing structures --usergrid arg Autodock map file for user grid data based calculations --usergridlambda arg (=-1) Scales usergrid and functional scoring --printterms Print all available terms with default parameterizations --printatomtypes Print all available atom types

Convolutional neural net (CNN) scoring: --cnnscoring arg (=1) Amount of CNN scoring: none, rescore (default), refinement, all --cnn arg built-in model to use, specify PREFIXensemble to evaluate an ensemble of models starting with PREFIX: crossdockdefault2018 crossdockdefault2018_ 1 crossdockdefault20182 crossdockdefault20183 crossdockdefault20184 default2017 dense dense1 dense2 dense3 dense4 generaldefault2018 generaldefault20181 generaldefault20182 generaldefault20183 generaldefault20184 redockdefault2018 redockdefault20181 redockdefault20182 redockdefault20183 redockdefault20184 --cnnmodel arg caffe cnn model file; if not specified a default model will be used --cnnweights arg caffe cnn weights file (*.caffemodel); if not specified default weights (trained on the default model) will be used --cnnresolution arg (=0.5) resolution of grids, don't change unless you really know what you are doing --cnnrotation arg (=0) evaluate multiple rotations of pose (max 24) --cnnupdateminframe During minimization, recenter coordinate frame as ligand moves --cnnfreezereceptor Don't move the receptor with respect to a fixed coordinate system --cnnmixempforce Merge CNN and empirical minus forces --cnnmixempenergy Merge CNN and empirical energy --cnnempiricalweight arg (=1) Weight for scaling and merging empirical force and energy --cnnoutputdx Dump .dx files of atom grid gradient. --cnnoutputxyz Dump .xyz files of atom gradient. --cnnxyzprefix arg (=gradient) Prefix for atom gradient .xyz files --cnncenterx arg X coordinate of the CNN center --cnncentery arg Y coordinate of the CNN center --cnncenterz arg Z coordinate of the CNN center --cnn_verbose Enable verbose output for CNN debugging

Output: -o [ --out ] arg output file name, format taken from file extension --outflex arg output file for flexible receptor residues --log arg optionally, write log file --atomterms arg optionally write per-atom interaction term values --atomtermdata embedded per-atom interaction terms in output sd data --posesortorder arg (=0) How to sort docking results: CNNscore (default), CNNaffinity, Energy

Misc (optional): --cpu arg the number of CPUs to use (the default is to try to detect the number of CPUs or, failing that, use 1) --seed arg explicit random seed --exhaustiveness arg (=8) exhaustiveness of the global search (roughly proportional to time) --nummodes arg (=9) maximum number of binding modes to generate --minrmsdfilter arg (=1) rmsd value used to filter final poses to remove redundancy -q [ --quiet ] Suppress output messages --addH arg automatically add hydrogens in ligands (on by default) --stripH arg remove hydrogens from molecule _after performing atom typing for efficiency (on by default) --device arg (=0) GPU device to use --no_gpu Disable GPU acceleration, even if available.

Configuration file (optional): --config arg the above options can be put here

Information (optional): --help display usage summary --help_hidden display usage summary with hidden options --version display program version ```

CNN Scoring

--cnn_scoring
determines at what points of the docking procedure that the CNN scoring function is used. *
none
- No CNNs used for docking. Uses the specified empirical scoring function throughout. *
rescore
(default) - CNN used for reranking of final poses. Least computationally expensive CNN option. *
refinement
- CNN used to refine poses after Monte Carlo chains and for final ranking of output poses. 10x slower than
rescore
when using a GPU. *
all
- CNN used as the scoring function throughout the whole procedure. Extremely computationally intensive and not recommended.

The default CNN scoring function is an ensemble of 5 models selected to balance pose prediction performance and runtime: dense, generaldefault20183, dense3, crossdockdefault2018, and redock_default2018. More information on these various models can be found in the papers listed above.

Training

Scripts to aid in training new CNN models can be found at https://github.com/gnina/scripts and sample models at https://github.com/gnina/models.

The DUD-E docked poses used in the original paper can be found here and the CrossDocked2020 set is here.

License

gnina is dual licensed under GPL and Apache. The GPL license is necessitated by the use of OpenBabel (which is GPL licensed). In order to use gnina under the Apache license only, all references to OpenBabel must be removed from the source code.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.