Berkeley at NTCIR-2: Chinese, Japanese, and English IR Experiments
Aitao Chen, Fredric Gey, and Hailing Jiang

This paper reports on the work of Berkeley group at the second NTCIR workshop on Japanese & English IR and Chinese IR. A number of runs were submitted on all subtasks in the two main tasks. Our main focus on the Japanese monolingual subtask was on comparing the retrieval effectiveness of different seg-mentation methods. The experimental results show the bigram indexing outperformed the word-based in-dexing in Japanese monolingual retrieval. The bi-gram indexing was also highly effective in Chinese monolingual retrieval. This paper presents an alter-native segmentation method that breaks text into one-character terms and two-character terms that do not overlap with each other, which overcomes the disad-vantage of producing large index files by bigram in-dexing. This paper describes a technique for building bilingual word lexicons from parallel text by sentence alignment and word association. A purely rank-based document polling strategy is presented for combining monolingual retrieval results in multilingual retrieval.

