Tibetan Script Recognition

TermaOCR BETA

Upload an image or PDF of Tibetan text, draw boxes around the areas you want recognized, and get editable Unicode text back instantly.

📜

Drag & drop an image here, or click to browse

PNG, JPG, TIFF, BMP, WEBP, PDF

Save Template

Name this region layout for reuse on similar pages

1. Choose a Template

No templates saved yet. Upload an image in New mode and save your regions as a template first.

2. Upload Pages

📚

Drop multiple images here, or click to browse

All pages will use the same template regions

Processing...

Results

Processing...

Recognised Text

How It Works

Upload

Drop an image or PDF of Tibetan text. PDFs are OCR'd directly — no setup needed. For images, you'll draw boxes around the text you want recognized.

Select regions

Draw tight boxes around the Tibetan text — the closer you crop to the text, the better the results. Avoid illustrations and margins.

Process

Hit Process Regions and get editable Unicode text back. Copy, download, or save your region layout as a template for batch processing similar pages.

TermaOCR is actively developing. Tibetan is a low-resource language with limited training data available — a challenge that makes each improvement hard-won and meaningful.

Our system currently handles Uchen (དབུ་ཅན་) printed text, Drutsa (བྲུ་ཚ་), Betsug (དཔེ་ཚུགས་), and Umey (དབུ་མེད་) handwritten scripts with growing accuracy, and we are working to expand support for:

Khyug Yig (འཁྱུག་ཡིག་) — fast cursive handwriting
Tsugthung (ཚུགས་ཐུང་) — shorthand script
Complex vertical stacking — multi-layered consonant clusters
Historical calligraphic styles — woodblock and manuscript traditions
Advanced page segmentation — mixed layouts, illustrations, and marginalia

With continued support, TermaOCR can become the definitive open tool for digitizing Tibet's vast literary heritage. Your contribution directly advances the preservation of an endangered written tradition.