Portuguese Dictionary

The first steps in creating a portuguese dictionary has been started and can be found in the GitHub repository: http://github.com/VisualText/dict-pt-br. This was started by NLP++ co-author David de Hilster given he is fluent in Portuguese and that no digital dictionary for portuguese is available.

Video Sessions

This is a video meeting from January 11, 2024 on the version of the Wiktionary page analyzed for Portuguese.

Here are is our first live meeting in Portuguese. The first is the last 30 minutes of the talk talking about the Wiktionary page analyzer and the second is the entire session with the first part talking about the NLP++ analyzers in the dictkb directory that created the current Portuguese dictionaries and kbs.

Using NLP++ to Create Dictionaries

It is very common to use NLP++ to parse data in order to create dictionaries and knowledge to be used in other NLP++. In the creation of the Portuguese dictionary, Wiktionary was not the other source for creation.

David de Hilster also found a dictionary of the most common portuguese verbs: the Reverso verbs website. You can find the NLP++ analyzers David started in this repository. The idea for this was to quickly create a dictionary of the most common verbs in portuguese with their syntactic information as well as the root form of the verb. Here is the repository of the NLP++ parsers David wrote in this effort. NOTE: it was not completed:https://github.com/VisualText/dict-pt-br/tree/main/reverso-verbs


It was determined that parsing the portuguese wiktionary pages would yield the first version of the portuguese dictionary. Using NLP++’s knowledge base (KB) would allow for the redundancies found in all dictionaries where you will find most every conjugation of a verb as a separate page. Keeping the knowledge of the dictionary in the KB allows for not creating redundant information in dictionaries.

Part of this project is to include syntactic information with portuguese words. This would include singular and plural versions of nouns, and more complex, the conjugation of verbs.

Productive Functions

In the NLP++ wiktionary page parser for Portuguese, David de Hilster created functions that generate