Need help with alphafold2?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

201 Stars 31 Forks MIT License 199 Commits 2 Opened issues


To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released

Services available


Need anything else?

Contributors list

# 15,928
102 commits
# 272,130
36 commits
# 356,263
1 commit

Alphafold2 - Pytorch (wip)

To eventually become an unofficial working Pytorch implementation of Alphafold2, the breathtaking attention network that solved CASP14. Will be gradually implemented as more details of the architecture is released.

Once this is replicated, I intend to fold all available amino acid sequences out there in-silico and release it as an academic torrent, to further science. If you are interested in replication efforts, please drop by #alphafold at this Discord channel


$ pip install alphafold2-pytorch


import torch
from alphafold2_pytorch import Alphafold2
from alphafold2_pytorch.utils import MDScaling, center_distogram_torch

model = Alphafold2( dim = 256, depth = 2, heads = 8, dim_head = 64, reversible = False # set this to True for fully reversible self / cross attention for the trunk ).cuda()

seq = torch.randint(0, 21, (1, 128)).cuda() msa = torch.randint(0, 21, (1, 5, 64)).cuda() mask = torch.ones_like(seq).bool().cuda() msa_mask = torch.ones_like(msa).bool().cuda()

distogram = model( seq, msa, mask = mask, msa_mask = msa_mask ) # (1, 128, 128, 37)

distances, weights = center_distogram_torch(distogram)

coords_3d, _ = MDScaling(distances, weights = weights, iters = 200, fix_mirror = 0 )

Sparse Attention

You can train with Microsoft Deepspeed's Sparse Attention, but you will have to endure the installation process. It is two-steps.

First, you need to install Deepspeed with Sparse Attention

$ sh

Next, you need to install the pip package

$ pip install triton

If both of the above succeeded, now you can train with Sparse Attention!

Sadly, the sparse attention is only supported for self attention, and not cross attention. I will bring in a different solution for making cross attention performant.

model = Alphafold2(
    dim = 256,
    depth = 12,
    heads = 8,
    dim_head = 64,
    max_seq_len = 2048,                   # the maximum sequence length, this is required for sparse attention. the input cannot exceed what is set here
    sparse_self_attn = (True, False) * 6  # interleave sparse and full attention for all 12 layers

Memory Compressed Attention

To save on memory for cross attention, you can set a compression ratio for the key / values, following the scheme laid out in this paper. A compression ratio of 2-4 is usually acceptable.

model = Alphafold2(
    dim = 256,
    depth = 12,
    heads = 8,
    dim_head = 64,
    cross_attn_compress_ratio = 3

Equivariant Attention

There are two equivariant self attention libraries that I have prepared for the purposes of replication. One is the implementation by Fabian Fuchs as detailed in a speculatory blogpost. The other is from a recent paper from Deepmind, claiming their approach is better than using irreducible representations.

Miscellaneous Settings

Below are some miscellaneous settings for cutting down on attention

model = Alphafold2(
    dim = 256,
    depth = 12,
    heads = 8,
    dim_head = 64,
    inter_msa_self_attn = False   # turns off self-attention across MSA. each MSA will only attend internally


$ python test


This library will use the awesome work by Jonathan King at this repository.

To install

$ pip install git+


$ git clone
$ cd sidechainnet && pip install -e .


Developments from competing labs

External packages

  • Final step - Fast Relax - Installation Instructions:
    • Download the pyrosetta wheel from: (select appropiate version) - beware the file is heavy (approx 1.2 Gb)
      • Ask for username and password to
        in the Discord
    • Bash >
      cd downloads_folder
      pip install pyrosetta_wheel_filename.whl

OpenMM Amber


    title   = {Alphafold2},
    author  = {John Jumper},
    year    = {2020},
    archivePrefix = {arXiv},
    primaryClass = {q-bio.BM}
    title   = {SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning}, 
    author  = {Jonathan E. King and David Ryan Koes},
    year    = {2020},
    eprint  = {2010.08162},
    archivePrefix = {arXiv},
    primaryClass = {q-bio.BM}
    title   = {ProteinNet: a standardized data set for machine learning of protein structure}, 
    author  = {Mohammed AlQuraishi},
    year    = {2019},
    eprint  = {1902.00249},
    archivePrefix = {arXiv},
    primaryClass = {q-bio.BM}
    title     = {The Reversible Residual Network: Backpropagation Without Storing Activations}, 
    author    = {Aidan N. Gomez and Mengye Ren and Raquel Urtasun and Roger B. Grosse},
    year      = {2017},
    eprint    = {1707.04585},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.