Japanese-English News Article Alignment Data (JENAAD)
Sample
License
To download JENAAD, you need to fill a copy of the license (one of Japanese Text / English Text) and send it to the following address:
Masao Utiyama
National Institute of Information and Communications Technology
3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan, 619-0289
and receive a user account and password in return in e-mail
that enables you to access JENAAD. Note that the account will
be expired in a period.
Download
- p11-1989-2001.txt.gz (150000 one-to-one sentence alignments, 23118450 bytes, euc-jp-unix)
- p11-1989-2001.zip (150000 one-to-one sentence alignments, 23335741 bytes, shift_jis-dos)
- pnm-1989-2001.txt.gz (30000 one-to-many sentence alignments, 7417253 bytes, euc-jp-unix)
- pnm-1989-2001.zip (30000 one-to-many sentence alignments, 7468055 bytes, shift_jis-dos)
78 sentence pairs (list)
in pnm-1989-2001.* lack Japanese sentences, English
sentences or both sentences due to a bug in a sentence
selection program.
- split.zip (22919410 bytes) has the following files.
- p11-1989-2001-j.txt (150000 Japanese lines, 19751012 bytes, shift_jis-dos)
- p11-1989-2001-e.txt (150000 English lines, 22522594 bytes, ascii-dos)
- pnm-1989-2001-j.txt (29922 Japanese lines, 6458527 bytes, shift_jis-dos)
- pnm-1989-2001-e.txt (29922 English lines, 8137390 bytes, ascii-dos)
The sentences in corresponding lines between p*-1989-2001-j.txt and p*-1989-2001-e.txt are units in p*-1989-2001.txt. The Japanese sentences were segmented by ChaSen version 2.2.9.
How to cite this data
The following article should be cited:
instead of the web page you are now reading.
Last updated: 2005/05/17