·|ij½×¤å(Conference Paper)
Development of Partially Bracketed Corpus with Part-of-Speech Information Only
Hsin-Hsi Chen and Yue-Shi Lee
Department of Computer Science and Information Engineering
National Taiwan University
Taipei, Taiwan, R.O.C.
Abstract
Research based on a treebank is active for many natural language applications. However, the work to build a large scale
treebank is laborious and tedious. This paper proposes a probabilistic chunker to help the development of a partially
bracketed corpus. The chunker partitions the part-of-speech sequence into segments called chunks. Rather than using a
treebank as our training corpus, a corpus which is tagged with part-of-speech information only is used. The experimental
results show the probabilistic chunker has more than 92% correct rate in outside test. The well-formed partially bracketed
corpus is a milestone in the development of a treebank. Besides, the simple but effective chunker can also be applied to
many natural language applications.