Recognize tables from images and restore them into word.
Recognize tables from images and restore them into word/crnn single character coordinates
Final .docx file in word directory
~~1. python server.py~~
~~Load the unet model to extract table lines from the input image~~
~~2. python test.py~~
~~Feed the input image~~
(Table line detection model is not very robust, but I will reserve the related files maybe I will update it later. 开源的表格线检测模型泛化能力不够强，暂时搁置。保留之前的代码和模型，仅作参考)
1. Edit PSE/config/config.py, crnn/config/config.py to set use GPU/CPU and model path.
2. python image2word.py
Restore table use opencv & python-docx, text detection: psenet, text recognition: crnn & edit distance
采用opencv和python-docx还原表格， 文本检测部分采用psenet， 文本识别部分采用resnet18-bilstm-ctc加编辑距离校正
~~Step 1 & 2 are not necessary if you have quite neat PDF images, meanwhile this project can't deal with some complex samples like tortuous and colorful receipts, I am still working on it.~~
I am handling complex table recognition, struggling with dataset. ~~Optimistically, there could be a radical change in weeks. If you are researching page layout and table recognition, please contact me.~~[email protected]