Need help with BERT-GPU?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

guotong1988
143 Stars 39 Forks Apache License 2.0 71 Commits 3 Opened issues

Description

multi gpu training in one machine for BERT from scratch without horovod

Services available

!
?

Need anything else?

Contributors list

# 55,003
Tensorf...
word-em...
Jupyter...
xlnet
69 commits

BERT MULTI GPU ON ONE MACHINE WITHOUT HOROVOD

LICENSE

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

PRINCIPLE

More gpu means more data in a batch. And the gradients of a batch data is averaged for back-propagation. So more gpu finally means a lower learning rate. Lower learning rate result in better pre-training performance.

REQUIREMENT

python 3

tensorflow 1.14

TRAINING

0, edit the input and output file name in

create_pretraining_data.py
and
run_pretraining_gpu.py

1, run

create_pretraining_data.py

2, run

run_pretraining_gpu.py

PARAMETERS

Edit

n_gpus
in
run_pretraining_gpu.py

batchsize is the batchsize per GPU, not the globalbatchsize

DATA

In

sample_text.txt
, sentence is end by \n, paragraph is splitted by empty line.

EXPERIMENT RESULT

Quora question pairs English dataset,

Official BERT: ACC 91.2, AUC 96.9

This BERT with pretrain loss 2.05: ACC 90.1, AUC 96.3

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.