Introducing Donut: The OCR-Free Document Understanding Transformer Revolutionising Visual Document Understanding

Research Paper Summary

Prakhar Mishra
5 min readJan 10, 2023
Introducing Donut: The OCR-Free Document Understanding Transformer Revolutionising Visual Document Understanding
Image from Source

In this blog, we will be doing a deep dive of the paper OCR-free Document Understanding Transformer.


Introduction: OCR Free Document Understanding Transformer (Donut)

Synthetic Document Generator: Generating Data for Pre-training

Pre-training of Donut Model

Results and Performance


Introduction: OCR Free Document Understanding Transformer (Donut)

OCR Free Document Understanding
Image from Source

The task of understanding document images such as invoices has been a core but challenging problem. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf OCR engines and focus on understanding the task with the OCR output. This can lead to high computational cost, lack of…

