dgl

by dmlc

dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.

5.9K Stars 1.1K Forks Last release: about 2 months ago (0.5.2) Apache License 2.0 1.4K Commits 16 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Deep Graph Library (DGL)

PyPi Latest Release Conda Latest Release Build Status Benchmark by ASV License

Documentation (Latest | Stable) | DGL at a glance | Model Tutorials | Discussion Forum | Slack Channel

DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow.

DGL v0.4 architecture
Figure: DGL Overall Architecture

DGL News

09/05/2020: We invite you to participate in the survey here to make DGL better fit for your needs. Thanks!

08/21/2020: The new v0.5.0 release includes distributed GNN training, overhauled documentation and user guide, and several more features. We have also submitted some models to the OGB leaderboard. See our release note for more details.

06/11/2020: Amazon Shanghai AI Lab and AWS Deep Engine Science team working along with academic collaborators from the University of Minnesota, The Ohio State University, and Hunan University have created the Drug Repurposing Knowledge Graph (DRKG) and a set of machine learning tools, DGL-KE, that can be used to prioritize drugs for repurposing studies. DRKG is a comprehensive biological knowledge graph that relates human genes, compounds, biological processes, drug side effects, diseases and symptoms. DRKG includes, curates, and normalizes information from six publicly available databases and data that were collected from recent publications related to Covid-19. It has 97,238 entities belonging to 13 types of entities, and 5,874,261 triplets belonging to 107 types of relations. More about the dataset is in this blogpost.

Using DGL

A data scientist may want to apply a pre-trained model to your data right away. For this you can use DGL's Application packages, formally Model Zoo. Application packages are developed for domain applications, as is the case for DGL-LifeScience. We will soon add model zoo for knowledge graph embedding learning and recommender systems. Here's how you will use a pretrained model: ```python from dgllife.data import Tox21 from dgllife.model import loadpretrained from dgllife.utils import smilesto_bigraph, CanonicalAtomFeaturizer

dataset = Tox21(smilestobigraph, CanonicalAtomFeaturizer()) model = loadpretrained('GCNTox21') # Pretrained model loaded model.eval()

smiles, g, label, mask = dataset[0] feats = g.ndata.pop('h') labelpred = model(g, feats) print(smiles) # CCOc1ccc2nc(S(N)(=O)=O)sc2c1 print(labelpred[:, mask != 0]) # Mask non-existing labels

tensor([[ 1.4190, -0.1820, 1.2974, 1.4416, 0.6914,

2.0957, 0.5919, 0.7715, 1.7273, 0.2070]])

**Further reading**: DGL is released as a managed service on AWS SageMaker, see the medium posts for an easy trip to DGL on SageMaker([part1](https://medium.com/@julsimon/a-primer-on-graph-neural-networks-with-amazon-neptune-and-the-deep-graph-library-5ce64984a276) and [part2](https://medium.com/@julsimon/deep-graph-library-part-2-training-on-amazon-sagemaker-54d318dfc814)).

Researchers can start from the growing list of models implemented in DGL. Developing new models does not mean that you have to start from scratch. Instead, you can reuse many pre-built modules. Here is how to get a standard two-layer graph convolutional model with a pre-built GraphConv module:

```python from dgl.nn.pytorch import GraphConv import torch.nn.functional as F

build a two-layer GCN with ReLU as the activation in between

class GCN(nn.Module): def init(self, in_feats, h_feats, num_classes): super(GCN, self).init() self.gcn_layer1 = GraphConv(in_feats, h_feats) self.gcn_layer2 = GraphConv(h_feats, num_classes)

def forward(self, graph, inputs):
    h = self.gcn_layer1(graph, inputs)
    h = F.relu(h)
    h = self.gcn_layer2(graph, h)
    return h

Next level down, you may want to innovate your own module. DGL offers a succinct message-passing interface (see tutorial here). Here is how Graph Attention Network (GAT) is implemented (complete codes). Of course, you can also find GAT as a module GATConv: ```python import torch.nn as nn import torch.nn.functional as F

Define a GAT layer

class GATLayer(nn.Module): def init(self, infeats, outfeats): super(GATLayer, self).init() self.linearfunc = nn.Linear(infeats, outfeats, bias=False) self.attentionfunc = nn.Linear(2 * out_feats, 1, bias=False)

def edge_attention(self, edges):
    concat_z = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
    src_e = self.attention_func(concat_z)
    src_e = F.leaky_relu(src_e)
    return {'e': src_e}

def message_func(self, edges): return {'z': edges.src['z'], 'e':edges.data['e']}

def reduce_func(self, nodes): a = F.softmax(nodes.mailbox['e'], dim=1) h = torch.sum(a * nodes.mailbox['z'], dim=1) return {'h': h}

def forward(self, graph, h): z = self.linear_func(h) graph.ndata['z'] = z graph.apply_edges(self.edge_attention) graph.update_all(self.message_func, self.reduce_func) return graph.ndata.pop('h')

## Performance and Scalability

Microbenchmark on speed and memory usage: While leaving tensor and autograd functions to backend frameworks (e.g. PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and computation with its own kernels. Here's a comparison to another popular package -- PyTorch Geometric (PyG). The short story is that raw speed is similar, but DGL has much better memory management.

Dataset Model Accuracy Time
PyG    DGL
Memory
PyG    DGL
Cora GCN
GAT
81.31 ± 0.88
83.98 ± 0.52
0.478    0.666
1.608    1.399
1.1    1.1
1.2    1.1
CiteSeer GCN
GAT
70.98 ± 0.68
69.96 ± 0.53
0.490    0.674
1.606    1.399
1.1    1.1
1.3    1.1
PubMed GCN
GAT
79.00 ± 0.41
77.65 ± 0.32
0.491    0.690
1.946    1.393
1.1    1.1
1.6    1.1
Reddit GCN 93.46 ± 0.06 OOM   28.6 OOM    11.7
Reddit-S GCN N/A 29.12    9.44 15.7    3.6

Table: Training time(in seconds) for 200 epochs and memory consumption(GB)

Here is another comparison of DGL on TensorFlow backend with other TF-based GNN tools (training time in seconds for one epoch):

Dateset Model DGL GraphNet tf_geometric
Core GCN 0.0148 0.0152 0.0192
Reddit GCN 0.1095 OOM OOM
PubMed GCN 0.0156 0.0553 0.0185
PPI GCN 0.09 0.16 0.21
Cora GAT 0.0442 n/a 0.058
PPI GAT 0.398 n/a 0.752

High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images. | | | | -------- | -------- |

Scalability: DGL has fully leveraged multiple GPUs in both one machine and clusters for increasing training speed, and has better performance than alternatives, as seen in below images.

Further reading: Detailed comparison of DGL and other Graph alternatives can be found here.

DGL Models and Applications

DGL for research

Overall there are 30+ models implemented by using DGL:

DGL for domain applications

DGL for NLP/CV problems

We are currently in Beta stage. More features and improvements are coming.

Installation

DGL should work on

  • all Linux distributions no earlier than Ubuntu 16.04
  • macOS X
  • Windows 10

DGL requires Python 3.6 or later.

Right now, DGL works on PyTorch 1.5.0+, MXNet 1.6+, and TensorFlow 2.3+.

Using anaconda

conda install -c dglteam dgl # cpu version conda install -c dglteam dgl-cuda9.0 # CUDA 9.0 conda install -c dglteam dgl-cuda9.2 # CUDA 9.2 conda install -c dglteam dgl-cuda10.0 # CUDA 10.0 conda install -c dglteam dgl-cuda10.1 # CUDA 10.1 conda install -c dglteam dgl-cuda10.2 # CUDA 10.2 ```

Using pip

| | Latest Nightly Build Version | Stable Version | |-----------|-------------------------------|-------------------------| | CPU |

pip install --pre dgl
|
pip install dgl
| | CUDA 9.0 |
pip install --pre dgl-cu90
|
pip install dgl-cu90
| | CUDA 9.2 |
pip install --pre dgl-cu92
|
pip install dgl-cu92
| | CUDA 10.0 |
pip install --pre dgl-cu100
|
pip install dgl-cu100
| | CUDA 10.1 |
pip install --pre dgl-cu101
|
pip install dgl-cu101
| | CUDA 10.2 |
pip install --pre dgl-cu102
|
pip install dgl-cu102
|

Built from source code

Refer to the guide here.

DGL Major Releases

| Releases | Date | Features | |-----------|--------|-------------------------| | v0.4.3 | 03/31/2020 | - TensorFlow support
- DGL-KE
- DGL-LifeSci
- Heterograph sampling APIs (experimental) | | v0.4.2 | 01/24/2020 | - Heterograph support
- TensorFlow support (experimental)
- MXNet GNN modules
| | v0.3.1 | 08/23/2019 | - APIs for GNN modules
- Model zoo (DGL-Chem)
- New installation | | v0.2 | 03/09/2019 | - Graph sampling APIs
- Speed improvement | | v0.1 | 12/07/2018 | - Basic DGL APIs
- PyTorch and MXNet support
- GNN model examples and tutorials |

New to Deep Learning and Graph Deep Learning?

Check out the open source book Dive into Deep Learning.

For those who are new to graph neural network, please see the basic of DGL.

For audience who are looking for more advanced, realistic, and end-to-end examples, please see model tutorials.

Contributing

Please let us know if you encounter a bug or have any suggestions by filing an issue.

We welcome all contributions from bug fixes to new features and extensions.

We expect all contributions discussed in the issue tracker and going through PRs. Please refer to our contribution guide.

Cite

If you use DGL in a scientific publication, we would appreciate citations to the following paper:

@article{wang2019dgl,
    title={Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks},
    author={Minjie Wang and Da Zheng and Zihao Ye and Quan Gan and Mufei Li and Xiang Song and Jinjing Zhou and Chao Ma and Lingfan Yu and Yu Gai and Tianjun Xiao and Tong He and George Karypis and Jinyang Li and Zheng Zhang},
    year={2019},
    journal={arXiv preprint arXiv:1909.01315}
}

The Team

DGL is developed and maintained by NYU, NYU Shanghai, AWS Shanghai AI Lab, and AWS MXNet Science Team.

License

DGL uses Apache License 2.0.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.