The PKU 863
Chinese-English Parallel Corpus
The PKU Chinese-English Parallel Corpus is developed on the 863 Project
by the Institute of Computational Linguistics of Peking University. The corpus
consists of over 200,000 aligned sentence pairs taken from quality bilingual
texts (3,066,435 English words and 2,874,462 Chinese words), covering a range
of genres and domains including, for example, government white papers, official
documents, news texts, essays, speech scripts, literary texts, academic prose,
as well as literature of politics, law, tourism, food industry, economics and
business. The majority of the texts are taken from established bilingual
websites while some are digitalised using OCR scanning.
The PKU 863 corpus is now transferred into Unicode and tagged with
part-of-speech information on the project Contrasting
English and Chinese (ESRC Award Reference RES-000-23-0553), using CLAWS for English (C7 tagset) and ICTCLAS
for Chinese. It can be accessed via the online parallel concordancer
[Sorry! Service is no longer available]. A trial version of the software
package is also available here [Sorry! Service is no longer available], which
will show you the initial 100 concordances, though the total number of hits is
given in the report at the bottom of the result page.