An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments
Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link
This is a re-implementation of the SegLink text detection algorithm described in the paper Detecting Oriented Text in Natural Images by Linking Segments, Baoguang Shi, Xiang Bai, Serge Belongie
cv2. I'm using 18.104.22.168, but some other versions less than 3 should be OK too. If not, try to switch to the version as mine.
download the project pylib and add the
srcfolder to your
If any other requirements unmet, just install them following the error msg.
Convert them into tfrecords format using the scripts in
datasetsif you wanna train your own model.
The convergence speed of my seglink is quite slow compared with that described in the paper. For example, the authors of SegLink paper said that a good result can be obtained by training on Synthtext for less than 10W iterations and on IC15-train for less than 1W iterations. However, using my implementation, I have to train on SynthText for about 20W iterations and another more than 10W iterations on IC15-train, to get a competitive result.
Several reasons may contribute to the slow convergency of my model:
Two models trained on SynthText and IC15 train can be downloaded.
hust_orientedTextis the result of paper.
They have been trained:
on Synthtext for about 20W iterations, and on IC15-train for 10w~20W iterations.
learning_rate = 10e-4
384: GTX 1080, batchsize = 24; 512: Titan, batchsize = 20
Both models perform best at
link_conf_threshold=0.5, well, another difference from paper, which takes 0.9 and 0.7 respectively.
Use the script
test_seglink.py, and a shortcut has been created in
Go to the seglink root directory and execute the command:
./scripts/test.sh 0 GPU_ID CKPT_PATH DATASET_DIR
./scripts/test.sh 0 ~/models/seglink/model.ckpt-217867 ~/dataset/ICDAR2015/Challenge4/ch4_training_images
I have only tested my models on IC15-test, but any other images can be used for test: just put your images into a directory, and config the path in the command as
A bunch of txt files and a zip file is created after test. If you are using IC15-test for testing, you can upload this zip file to the icdar evaluation server directly.
The text files and placed in a subdir of the checkpoint directory, and contain the bounding boxes as the detection results, and can visualized using the script
The command looks like:
python visualize_detection_result.py \
--image=where your images are put --det=the directory of the text files output by test_seglink.py --output=the output directory of detection results drawn on images.
python visualize_detection_result.py \
--image=~/dataset/ICDAR2015/Challenge4/ch4_training_images/ \ --det=~/models/seglink/seglink_icdar2015_without_ignored/eval/icdar2015_train/model.ckpt-72885/seg_link_conf_th_0.900000_0.700000/txt \ --output=~/temp/no-use/seglink_result_512_train
The training processing requires data processing, i.e. converting data into tfrecords. The converting scripts are put in the
datasetsdirectory. The scrips:
eval_seglink.pyare the training and evaluation scripts respectively. Especially, I have implemented an offline evaluation function, which calculates the Recall/Precision/Hmean as the ICDAR test server, and can be used for cross validation and grid search. However, the resulting scores may have slight differences from those of test sever, but it does not matter that much. Sorry for the imcomplete documentation here. Read and modify them if you want to train your own model.
Thanks should be given to the authors of the Seglink paper, i.e., Baoguang Shi1 Xiang Bai1, Serge Belongie.
EAST is another paper on text detection accepted by CVPR 2017, and its reported result is better than that of SegLink. But if they both use same VGG16, their performances are quite similar.
Contact me if you have any problems, through github issues.
How the groundtruth is calculated, in Chinese: http://fromwiz.com/share/s/34GeEW1RFx7x2iIM0z1ZXVvc2yLl5t2fTkEg2ZVhJR2n50xg