Category: Projects
These are projects that are proposed, on-going, or completed by NLUGI.
Building an NLP++ Brazilian Address Cleaner for HPCC Systems
Live Talk Open to the Public Watch a video of the live presentation of this NLP++ analyzer for Brazilian addresses during the 2024 HPCC Systems Community Summit. It took place on Wednesday, October 8, at 9:00 am EST USA online. It was free to register and attend. Using the NLP++ […]
Read more
Online Dictionary Creation Tool
Coming 2024 This is an exciting project that is currently being implemented. Expected first version release data in 2024. Description The dictionary tool is a web-based tool that will allow for quick creation of dictionaries in any language. Given there are few linguistic dictionaries available online for even major languages […]
Read more
Python Package for NLP++
The first version of our NLPPlus python package is ready to use. We are still waiting on approval of the package on the python package website, but it is available as a download from our GitHub. https://github.com/VisualText/py-package-nlpengine The NLPPlus python package for NLP++ allows Python programmers to call NLP++ analyzers […]
Read more
Portuguese Dictionary
The first steps in creating a portuguese dictionary has been started and can be found in the GitHub repository: http://github.com/VisualText/dict-pt-br. This was started by NLP++ co-author David de Hilster given he is fluent in Portuguese and that no digital dictionary for portuguese is available. Video Sessions This is a video […]
Read more
English Dictionary
This project involves parsing the Wiktionary pages for English into the most comprehensive digital dictionary ever created. The first two stages of this project have been done via grants to RV College of Engineering from LexisNexis Risk’s HPCC Systems group. In Progress This project is still in progress. Project Description […]
Read more
NLP Course Using NLP++
One of the more important projects we are currently working on is the creation of high school and college-level courses on NLP using NLP++. NLP courses at universities almost exclusively concentrate on statistical methods like Machine Learning, Neural Networks, and Large Language Models. NLP courses that do not use statistical […]
Read more
Medical Text Processing
The medical text processing project is sponsored by LexisNexis Risk solutions HPCC systems in conjunction with Clemson University. This project is the master’s thesis for Clemson graduate student Ashton Williamson and involves using NLP++ to assign ICD codes to the Mimic Dataset. The project is mentored by Dr. Amy Apon […]
Read more
Resume Analyzer in English
This project was an internship project by Kruthika Pinnada from RV College of Engineering who developed an analyzer capable of parsing the different sections of a resume, such as education, professional experience, and demographic, just like a human reader does. The parsed data supports the creation of a knowledge base […]
Read more
Sentiment Analysis of Soccer Games in Portuguese
This project was from grant work provided by LexisNexis HPCC Systems to the University of Sao Paulo that produced by Pedro Lima Rodrigues under the guidance of Professor Renato de Oliveira Moraes and NLP++ co-author David de Hilster. This analyzer was specifically created to analyzer tweets about the Palmeiras Soccer […]
Read more
Global Dictionary Initiative
Part of the Natural Language Understanding Global Initiative is the Global Dictionary Initiative. The idea is to product NLP++ dictionary files for all the major languages of the world. VisualText and NLP++ are being used to parse Wiktionary pages as well as other digital resources in order to create NLP++ […]
Read more
Tamil Dictionary
In Progress This is a project that is still in progress. Anyone wanting to contribute to this please email us at contact@nluglob.org. Project Description The more languages that can be used in Natural Language Processing, the more effective it can be as a whole. Therefore, my goal was to expand […]
Read more
Nepali Dictionary
Ongoing Project This project is ongoing. The original idea was to create a Nepali dictionary by parsing Nepali Wiktionary pages. It turns out that the Nepali Wiktionary is not complete enough to be useful for NLP and NLU. We are looking to for help to create this dictionary. Please see […]
Read more
Chinese Dictionary
Unfinished Project This project is not complete. We are currently looking for people interested in creating the first Chinese dictionary for NLP++. Overview of the First Project Lucas Wang chose this project because of his curiosity about how the Chinese language can be represented and understood by a computer given […]
Read more