Need help with distributed-tensorflow-example?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

ischlag
124 Stars 80 Forks 18 Commits 7 Opened issues

Description

An example of data parallelism and async updates of parameter in tensorflow.

Services available

!
?

Need anything else?

Contributors list

# 321,776
Python
14 commits
# 581,795
Python
1 commit

Distributed Tensorflow 1.2 Example (DEPRECATED)

Using data parallelism with shared model parameters while updating parameters asynchronous. See comment for some changes to make the parameter updates synchronous (not sure if the synchronous part is implemented correctly though).

Trains a simple sigmoid Neural Network on MNIST for 20 epochs on three machines using one parameter server. The goal was not to achieve high accuracy but to get to know tensorflow.

Run it like this:

First, change the hardcoded host names with your own and run the following commands on the respective machines.

pc-01$ python example.py --job_name="ps" --task_index=0 
pc-02$ python example.py --job_name="worker" --task_index=0 
pc-03$ python example.py --job_name="worker" --task_index=1 
pc-04$ python example.py --job_name="worker" --task_index=2 

Thanks to snowsquizy for updating the script to TensorFlow 1.2.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.