
Author: David de Hilster


ACL 2024 in Bangkok Thailand: Revelations of Old and New
I have been in computational linguistics for more than 40 years, and this is the first time I have been to the most important conference in our field: the annual Association of Computational Linguistics (ACL) Conference. As part of the registration process, I became a member for the first time […]

NLP++ and LLM
Trustworthy NLP systems must be rule and knowledge based given all statistical systems like large language models, machine learning, and neural networks are not. With the advent of large language models that can be queried about common knowledge, it is natural to use them to generate linguistic and world knowledge […]

Guilherme Santos da Silva
Guilherme Santos da Silva has a degree in Computer Engineering from the Federal Technological University of Paraná, Brazil and is currently an employee of LexisNexis Risk Brazil. He discovered HPCC Systems in 2021 when he joined LexisNexis as an intern and participated in the 2021 HPCC Systems Poster Contest with […]

Scalable Analysis of English Dictionary Files on HPCC Systems Big Data Platform
Congratulations to Jayanth C on presenting his paper on Scalable Analysis of English Dictionary Files on HPCC Systems Big Data Platform at a conference in Japan. Read more about it on LinkedIn. Here are links to the paper online:

Online Dictionary Creation Tool
Coming 2024 This is an exciting project that is currently being implemented. Expected first version release data in 2024. Description The dictionary tool is a web-based tool that will allow for quick creation of dictionaries in any language. Given there are few linguistic dictionaries available online for even major languages […]

Python Package for NLP++
The first version of our NLPPlus python package is ready to use. We are still waiting on approval of the package on the python package website, but it is available as a download from our GitHub. https://github.com/VisualText/py-package-nlpengine The NLPPlus python package for NLP++ allows Python programmers to call NLP++ analyzers […]

Portuguese Dictionary
The first steps in creating a portuguese dictionary has been started and can be found in the GitHub repository: http://github.com/VisualText/dict-pt-br. This was started by NLP++ co-author David de Hilster given he is fluent in Portuguese and that no digital dictionary for portuguese is available. Video Sessions This is a video […]

English Dictionary
This project involves parsing the Wiktionary pages for English into the most comprehensive digital dictionary ever created. The first two stages of this project have been done via grants to RV College of Engineering from LexisNexis Risk’s HPCC Systems group. In Progress This project is still in progress. Project Description […]

Dr. Jyoti Shetty
Dr. Jyoti Shetty is an Assistant Professor in the Computer Science and Engineering Department at the RV College of Engineering. In collaboration with students, she has executed several projects on HPCC Systems, including implementing a distributed DBSCAN, providing evaluation metrics for a clustering algorithm, and IoT plugin for HPCC Systems, an OpenCV […]

Nathalia Ribas
Nathalia Ribas is a Computer Engineering student at the Federal University of Santa Catarina (UFSC), Brazil and currently an employee at LexisNexis Risk Brazil. She was introduced to HPCC Systems in 2021 when she had the opportunity to study the HPCC platform in the cloud along with Elastic Stack for her […]

NLP Course Using NLP++
Textbook Coming August 2025 Read all about it here: First NLP++ Textbook on Its Way – Natural Language Understanding Global Initiative Teaching NLP++ One of the more important projects we are currently working on is the creation of high school and college-level courses on NLP using NLP++. NLP courses at […]

Medical Text Processing
The medical text processing project is sponsored by LexisNexis Risk solutions HPCC systems in conjunction with Clemson University. This project is the master’s thesis for Clemson graduate student Ashton Williamson and involves using NLP++ to assign ICD codes to the Mimic Dataset. The project is mentored by Dr. Amy Apon […]

Resume Analyzer in English
This project was an internship project by Kruthika Pinnada from RV College of Engineering who developed an analyzer capable of parsing the different sections of a resume, such as education, professional experience, and demographic, just like a human reader does. The parsed data supports the creation of a knowledge base […]

Sentiment Analysis of Soccer Games in Portuguese
This project was from grant work provided by LexisNexis HPCC Systems to the University of Sao Paulo that produced by Pedro Lima Rodrigues under the guidance of Professor Renato de Oliveira Moraes and NLP++ co-author David de Hilster. This analyzer was specifically created to analyzer tweets about the Palmeiras Soccer […]