
Category: Projects
These are projects that are proposed, on-going, or completed by NLUGI.


Python Project
With the official launch of the NLPPlus Python package, we are now focusing on the best way to introduce it to the Python community. The major message? NLPPlus is NLP that is 100% customizable. Where other NLP packages and toolkits are turnkey and supposedly do not require any customization, they […]
Read more
Building an NLP++ Brazilian Address Cleaner for HPCC Systems
Live Talk Open to the Public Watch a video of the live presentation of this NLP++ analyzer for Brazilian addresses during the 2024 HPCC Systems Community Summit. It took place on Wednesday, October 8, at 9:00 am EST USA online. It was free to register and attend. Using the NLP++ […]

Online Dictionary Creation Tool
Coming 2024 This is an exciting project that is currently being implemented. Expected first version release data in 2024. Description The dictionary tool is a web-based tool that will allow for quick creation of dictionaries in any language. Given there are few linguistic dictionaries available online for even major languages […]

Python Package for NLP++
The first version of our NLPPlus python package is ready to use. We are still waiting on approval of the package on the python package website, but it is available as a download from our GitHub. https://github.com/VisualText/py-package-nlpengine The NLPPlus python package for NLP++ allows Python programmers to call NLP++ analyzers […]

Portuguese Dictionary
The first steps in creating a portuguese dictionary has been started and can be found in the GitHub repository: http://github.com/VisualText/dict-pt-br. This was started by NLP++ co-author David de Hilster given he is fluent in Portuguese and that no digital dictionary for portuguese is available. Video Sessions This is a video […]

English Dictionary
This project involves parsing the Wiktionary pages for English into the most comprehensive digital dictionary ever created. The first two stages of this project have been done via grants to RV College of Engineering from LexisNexis Risk’s HPCC Systems group. In Progress This project is still in progress. Project Description […]

NLP Course Using NLP++
One of the more important projects we are currently working on is the creation of high school and college-level courses on NLP using NLP++. NLP courses at universities almost exclusively concentrate on statistical methods like Machine Learning, Neural Networks, and Large Language Models. NLP courses that do not use statistical […]

Medical Text Processing
The medical text processing project is sponsored by LexisNexis Risk solutions HPCC systems in conjunction with Clemson University. This project is the master’s thesis for Clemson graduate student Ashton Williamson and involves using NLP++ to assign ICD codes to the Mimic Dataset. The project is mentored by Dr. Amy Apon […]

Resume Analyzer in English
This project was an internship project by Kruthika Pinnada from RV College of Engineering who developed an analyzer capable of parsing the different sections of a resume, such as education, professional experience, and demographic, just like a human reader does. The parsed data supports the creation of a knowledge base […]

Sentiment Analysis of Soccer Games in Portuguese
This project was from grant work provided by LexisNexis HPCC Systems to the University of Sao Paulo that produced by Pedro Lima Rodrigues under the guidance of Professor Renato de Oliveira Moraes and NLP++ co-author David de Hilster. This analyzer was specifically created to analyzer tweets about the Palmeiras Soccer […]

Global Dictionary Initiative
Part of the Natural Language Understanding Global Initiative is the Global Dictionary Initiative. The idea is to product NLP++ dictionary files for all the major languages of the world. VisualText and NLP++ are being used to parse Wiktionary pages as well as other digital resources in order to create NLP++ […]

Tamil Dictionary
In Progress This is a project that is still in progress. Anyone wanting to contribute to this please email us at contact@nluglob.org. Project Description The more languages that can be used in Natural Language Processing, the more effective it can be as a whole. Therefore, my goal was to expand […]

Nepali Dictionary
Ongoing Project This project is ongoing. The original idea was to create a Nepali dictionary by parsing Nepali Wiktionary pages. It turns out that the Nepali Wiktionary is not complete enough to be useful for NLP and NLU. We are looking to for help to create this dictionary. Please see […]

Chinese Dictionary
Unfinished Project This project is not complete. We are currently looking for people interested in creating the first Chinese dictionary for NLP++. Overview of the First Project Lucas Wang chose this project because of his curiosity about how the Chinese language can be represented and understood by a computer given […]