Zפ(Journal Papers)

Analysis of Error Count Distributions for Improving

the Postprocessing Performance of OCCR

Yue-Shi Lee and Hsin-Hsi Chen

Department of Computer Science and Information Engineering

National Taiwan University

Taipei, Taiwan, R.O.C.


Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is not so good as expect when the test data contain more unseen context, e.g., proper nouns such as personal names and organizational names. This paper addresses the importance of analyzing the error count distributions before applying the language models. According to the analysis, more than 50% of errors can be reduced and more than 90% of time can be saved on the average based on the Markov character bigram model.