TermaOCR BETA
Upload an image or PDF of Tibetan text, draw boxes around the areas you want recognized, and get editable Unicode text back instantly.
Drag & drop an image here, or click to browse
PNG, JPG, TIFF, BMP, WEBP, PDF
1. Choose a Template
No templates saved yet. Upload an image in New mode and save your regions as a template first.
Recognised Text
How It Works
Drop an image or PDF of Tibetan text. PDFs are OCR'd directly — no setup needed. For images, you'll draw boxes around the text you want recognized.
Draw tight boxes around the Tibetan text — the closer you crop to the text, the better the results. Avoid illustrations and margins.
Hit Process Regions and get editable Unicode text back. Copy, download, or save your region layout as a template for batch processing similar pages.
TermaOCR is actively developing. Tibetan is a low-resource language with limited training data available — a challenge that makes each improvement hard-won and meaningful.
Our system currently handles Uchen (དབུ་ཅན་) printed text, Drutsa (བྲུ་ཚ་), Betsug (དཔེ་ཚུགས་), and Umey (དབུ་མེད་) handwritten scripts with growing accuracy, and we are working to expand support for:
- Khyug Yig (འཁྱུག་ཡིག་) — fast cursive handwriting
- Tsugthung (ཚུགས་ཐུང་) — shorthand script
- Complex vertical stacking — multi-layered consonant clusters
- Historical calligraphic styles — woodblock and manuscript traditions
- Advanced page segmentation — mixed layouts, illustrations, and marginalia
With continued support, TermaOCR can become the definitive open tool for digitizing Tibet's vast literary heritage. Your contribution directly advances the preservation of an endangered written tradition.