mogoz

OCR

tags
Machine Learning , Image Compression , Computer Vision , Deploying ML applications (applied ML)

Comparison

Type Name Description
Service Claude/OpenAI/AWS They have APIs
LSTM-CNN Tesseract
PP-OCR(DB+CRNN) PaddleOCR Works with rotated stuff
EasyOCR
Toolbox, Modular models doctrexternal link Some people mention it works better than paddle and tesseract.
Pytorch+mmlabs MMOCR Might be nice if using mmdetection stuff
suryaexternal link Only for documents, doesn’t work in handwritten. faster than tesseract, Language support. Tries to guess proper reading order.
VLM MGP-STR new kid (2024)
VLM GOT new kid (2024)
VLM olmOCR olmOCR โ€“ Open-Source OCR for Accurate Document Conversionexternal link (has comparision to GOT)
VLM ROlmOCR better and faster olmOCR
VLM TrOCR
VLM DONUT
VLM InternVL
VLM Idefics2
  • olmOCR introduces a technique they call “Document Anchoring”, where the quality of the extracted text is enhanced with any text and metadata present in the PDF file.

Resources