Need help with Table-OCR?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

215 Stars 62 Forks GNU General Public License v3.0 11 Commits 17 Opened issues


Recognize tables from images and restore them into word.

Services available


Need anything else?

Contributors list

No Data


Recognize tables from images and restore them into word/crnn single character coordinates


origin_image single_char_position
Final .docx file in word directory

Model weights

Maybe one of the best open weights of document text detection and recognition.
Google drive link
PSE weight
CRNN weight
Unet weight
pkl for edit distance

How to run

~~1. python
~~Load the unet model to extract table lines from the input image~~
~~2. python
~~Feed the input image~~
(Table line detection model is not very robust, but I will reserve the related files maybe I will update it later. 开源的表格线检测模型泛化能力不够强,暂时搁置。保留之前的代码和模型,仅作参考)
1. Edit PSE/config/, crnn/config/ to set use GPU/CPU and model path.
修改PSE/config/, crnn/config/config.py这两个文件配置文件,配置运行设备和模型地址
2. python
Restore table use opencv & python-docx, text detection: psenet, text recognition: crnn & edit distance
采用opencv和python-docx还原表格, 文本检测部分采用psenet, 文本识别部分采用resnet18-bilstm-ctc加编辑距离校正


~~Step 1 & 2 are not necessary if you have quite neat PDF images, meanwhile this project can't deal with some complex samples like tortuous and colorful receipts, I am still working on it.~~

To do

I am handling complex table recognition, struggling with dataset. ~~Optimistically, there could be a radical change in weeks. If you are researching page layout and table recognition, please contact me.~~[email protected]

Reference and some useful projects

  3. 腾讯表格识别方案简述
  4. OpenCV-检测并提取表格
  5. 文本检测:PSENet.pytorch
  6. 文本识别:CRNN

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.