Need help with train-CRF-RNN?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

197 Stars 91 Forks Other 54 Commits 22 Opened issues


Train CRF-RNN for Semantic Image Segmentation

Services available


Need anything else?

Contributors list

# 125,750
53 commits

Train CRF-RNN for Semantic Image Segmentation

Martin Kersner, [email protected]

This repository contains Python scripts necessary for training CRF-RNN for Semantic Image Segmentation with 3 classes.

git clone --recursive


In order to be able to train CRF-RNN you will need to install caffe from CRF-RNN.

Prepare dataset for training

First, you will need images with corresponding semantic labels. The easiest way is to employ PASCAL VOC dataset (!2GB) which provides those image/label pairs. Dataset consist of 21 different classes1, but in this example we will use only three of them in order to demonstrate training with different number classes than it was used in original CRF-RNN.

Download PASCAL VOC dataset

tar -xvf VOCtrainval_11-May-2012.tar

After executing commands above you can find in

2913 labels and in
their corresponding original images2. In order to have a better access to those directories we will create symlinks to them. Therefore, from your cloned repository you should run following commands (replace $DATASETS with your actual path where you downloaded PASCAL VOC dataset).
ln -s $DATASETS/VOCdevkit/VOC2012/SegmentationClass labels
ln -s $DATASETS/VOCdevkit/VOC2012/JPEGImages images

Split classes

In the next step we have to select only images that contain classes (in our case 3) for which we want to train our semantic segmentation algorithm. At first we create a list of all images that can be exploited for segmentation.

find labels/ -printf '%f\n' | sed 's/\.png//'  | tail -n +2 > train.txt

Ground truth segmentations in PASCAL VOC 2012 dataset are defined as RGB images. However, if you decide to use different dataset or already preprocessed segmentations, you could be working with gray-level ones whose values exactly correspondent to label indexes in documentation. Because the workflow of creating dataset for training is separated to several parts, we access some images twice. In a case that we are working with unpreprocessed ground truth segmentations, we would have to perform conversion twice. Unfortunately, this conversion is rather time consuming (~2s), therefore we suggest to run following command first. It is not mandatory though.

python labels/ train.txt converted_labels/ # OPTIONAL

Then we decide which classes we are interested in and specify them in (on line 15 there is set bird, bottle and chair class). This script will create several text files (which list images containing our desired classes) named correspondingly to selected classes. Each file has the same structure as train.txt. In a case of experimenting with different classes it would be wise to generate those image list for all classes from dataset.

You should be aware that if an image label is composed from more than one class in which we are interested in, that image will be always assigned to a class with lower id. This behavior could potentionally cause a problem if dataset consists of many images with the same label couples. However, this doesn't count for background class.

python labels/ train.txt # in a case you DID NOT RUN script
#python converted_labels/ train.txt # you RUN script

Create LMDB database

Original CRF-RNN used for training images with size 500x500 px and we will do so as well. But if, for whatever reason, one would decide for different dimensions3 it can be changed on line 20 of Currently, we expect that the larger side in no more than 500 px. Because images/labels don't always correspond to required dimensions, we padd them with zeros in order to obtain right image/label size.

On line 21 we can set labels which we want to include into dataset.

Within training we will regularly test our network's performance. Thus, besides the training data we will need a testing data. On line 22 we can set a ratio (currently 0.1 == 10 percent of data) which denotes how much percent of data from whole dataset will be included in the test data.

Following command will create four directories with training/testing data for images/labels.

python # in a case you DID NOT RUN script
#python converted_labels/ # you RUN script


In order to be able to start a training we will need to download precomputed weights for CRF-RNN first.

wget -O TVG_CRFRNN_COCO_VOC.caffemodel
python 2>&1 | tee train.log


During training we can visualize a loss using Script accepts even more than one log file. That can be useful when we had to stop training and restarted it from the last state. Therefore, we end up with two or more log files.

python train.log


I don't want to train with 3 classes. What should I do?

You have to generate lists of images for more or less classes. This is described in a paragraph above called Split classes. Afterward, you will also have to change prototxt description of network TVGCRFRNNCOCOVOCTRAIN3CLASSES.prototxt. Each line in this file which contains text CHANGED should be modified. At each of those lines is num_ouput: 4, denoting 3 classes and background.

If you want to use for example 6 different classes, you should change parameter num_ouput at those lines to number 7.

(1) aeroplane, bicycle, bird, boat, bottle, bus, car , cat, chair, cow, diningtable, dog, horse, motorbike, person, potted plant, sheep, sofa, train, tv/monitor

(2) Maybe one noticed that in

directory there are more than 2913 images. This is because dataset is not used only for segmentation but also for detection.

(3) The larger dimensions of input images are, the more memory for training is required.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.