A tutorial on how to train a hand detector with TensorFlow Object Detection API
This is a tutorial on how to train a 'hand detector' with TensorFlow object detection API. This README outlines how to set up everything and train the object detection model locally. You could refer to the following blog post for more detailed description about the steps within.
Just for reference, the code in this repository has been tested on a desktop PC with:
This tutorial uses python3 for training and testing the TensorFlow object detection models. Follow the steps below to set up the environment for training the models. Make sure
tensorflow(python3 packages) has been installed on the system already.
$ cd ~/project $ git clone https://github.com/jkjung-avt/hand-detection-tutorial.git $ cd hand-detection-tutorial
$ sudo pip3 install -r requirements.txt
In case you are having trouble with
sudo, you can do
pip3 install --user -r requirements.txtinstead.
Running model_builder_test.py, finishes without error, before continuing on.
$ python3 prepare_egohands.py
prepare_egohands.pyscript downloads the 'egohands' dataset and convert its annotations to KITTI format. When finished, the following files should be present in the folder. Note there are totally 4,800 jpg images in the 'egohands' dataset.
./egohands_data.zip ./egohands ├── (egohands dataset unzipped) └── ...... ./egohands_kitti_formatted ├── images │ ├── CARDS_COURTYARD_B_T_frame_0011.jpg │ ├── ...... │ └── PUZZLE_OFFICE_T_S_frame_2697.jpg └── labels ├── CARDS_COURTYARD_B_T_frame_0011.txt ├── ...... └── PUZZLE_OFFICE_T_S_frame_2697.txt
create_tfrecords.pyscript would split the jpg images into 'train' (4,300) and 'val' (500) sets, and then generate
data/egohands_val.tfrecord. This process might take a few minutes. The resulting TFRecord files are roughly 1.1GB and 132MB in size.
(Optional) Review and modify the model config file if necessary. For example, open the file
configs/ssd_mobilenet_v1_egohands.configwith an editor and do some editing.
Start training the model by invoking
./train.sh. For example, to train the detector based on ssdmobilenetv1. Do this:
$ ./train.sh ssd_mobilenet_v1_egohands
The training is set to run for 20,000 iterations. It takes roughly 2 hours to finish on the desktop PC listed above.
faster_rcnn_inception_v2_egohandsmodel on the 2nd GPU (GPU #1).
$ CUDA_VISIBLE_DEVICES=1 ./train.sh faster_rcnn_inception_v2_egohands
tensorboardin another terminal.
$ cd ~/project/hand-detection-tutorial $ tensorboard --logdir=ssd_mobilenet_v1_egohands
http://localhost:6006with a browser locally. (You could also replace
localhostwith IP address of the training PC, and do the monitoring remotely.)
./eval.shscript. For example,
# similar to train.sh, use 'CUDA_VISIBLE_DEVICES' to specify GPU $ ./eval.sh ssd_mobilenet_v1_egohands
Here's an example output of the evaluation output. Among all the numbers, the author would pay most attention to the 'AP @ IoU=0.50' value (0.967).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.681 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.967 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.809 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.079 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.313 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.717 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.258 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.736 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.742 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.118 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.466 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.774
In addition, you could run
tensorboardto inspect details of the evaluation. Note
logdirpoints to the 'eval' subdirectory below.
$ cd ~/project/hand-detection-tutorial $ tensorboard --logdir=ssd_mobilenet_v1_egohands_eval
http://:6006with a browser. Click on the 'IMAGES' tab. You can then browse through all images in the validation set and check how well your trained model performs on those images.
ssdlite_mobilenet_v2_egohandsmodel into a frozen graph (saved under
model_exported/), and then use the graph to detect hands in
data/jk-son-hands.jpg. The output image, with bounding boxes overlaid, would be saved as
$ CUDA_VISIBLE_DEVICES=0 ./export.sh ssdlite_mobilenet_v2_egohands $ CUDA_VISIBLE_DEVICES=0 ./detect_image.sh data/jk-son-hands.jpg
You can then check out the output image by, say,
$ display detection_output.jpg
Please refer to the following GitHub repos and blog posts.