This project is not complete. We are currently looking for people interested in creating the first Chinese dictionary for NLP++.
Overview of the First Project
Lucas Wang chose this project because of his curiosity about how the Chinese language can be represented and understood by a computer given that it is a language that uses characters. Chinese also has two character phrases and idioms that may be represented by 4 or more characters which have different meanings to the individual characters. As such, it is a unique challenge from the perspective of computer processing.
Lucas needs to filter the relevant knowledge base file for Chinese (downloaded from Wiktionary), formalizing on a single standard to read one Chinese dialect. Using the chosen standard, he needs to create analyzers that understand each Chinese character and then parse them into the knowledge base. He also needs to write a ‘parts of speech’ tagger and create a knowledge base for these as well.
Chinese Wiktionary Pages
It turns out that Wiktionary pages in Chinese have little or no linguistic information. With this in mind, Lucas found a digital dictionary online that had some parts of speech that was used in a video game.
Find out more about this project by reading Lucas’s blog journal.