Author: David de Hilster
Building an NLP++ Brazilian Address Cleaner for HPCC Systems
Live Talk Open to the Public Watch a video of the live presentation of this NLP++ analyzer for Brazilian addresses during the 2024 HPCC Systems Community Summit. It took place on Wednesday, October 8, at 9:00 am EST USA online. It was free to register and attend. Using the NLP++ […]
Read more
Regex for NLP
Regex is ubiquitous in the programming world because of its usefulness as a rule-based text parsing language. Programmers find comfort in the idea of writing explicit, modifiable rules in order to parse text. This is in contrast with black-box statistical models, which cannot be modified when things go wrong – […]
Read more
ACL 2024 in Bangkok Thailand: Revelations of Old and New
I have been in computational linguistics for more than 40 years, and this is the first time I have been to the most important conference in our field: the annual Association of Computational Linguistics (ACL) Conference. As part of the registration process, I became a member for the first time […]
Read more
NLP++ and LLM
Trustworthy NLP systems must be rule and knowledge based given all statistical systems like large language models, machine learning, and neural networks are not. With the advent of large language models that can be queried about common knowledge, it is natural to use them to generate linguistic and world knowledge […]
Read more
Guilherme Santos da Silva
Guilherme Santos da Silva has a degree in Computer Engineering from the Federal Technological University of Paraná, Brazil and is currently an employee of LexisNexis Risk Brazil. He discovered HPCC Systems in 2021 when he joined LexisNexis as an intern and participated in the 2021 HPCC Systems Poster Contest with […]
Read more
Scalable Analysis of English Dictionary Files on HPCC Systems Big Data Platform
Congratulations to Jayanth C on presenting his paper on Scalable Analysis of English Dictionary Files on HPCC Systems Big Data Platform at a conference in Japan. Read more about it on LinkedIn. Here are links to the paper online:
Read more
Online Dictionary Creation Tool
Coming 2024 This is an exciting project that is currently being implemented. Expected first version release data in 2024. Description The dictionary tool is a web-based tool that will allow for quick creation of dictionaries in any language. Given there are few linguistic dictionaries available online for even major languages […]
Read more
Python Package for NLP++
The first version of our NLPPlus python package is ready to use. We are still waiting on approval of the package on the python package website, but it is available as a download from our GitHub. https://github.com/VisualText/py-package-nlpengine The NLPPlus python package for NLP++ allows Python programmers to call NLP++ analyzers […]
Read more
Portuguese Dictionary
The first steps in creating a portuguese dictionary has been started and can be found in the GitHub repository: http://github.com/VisualText/dict-pt-br. This was started by NLP++ co-author David de Hilster given he is fluent in Portuguese and that no digital dictionary for portuguese is available. Video Sessions This is a video […]
Read more
English Dictionary
This project involves parsing the Wiktionary pages for English into the most comprehensive digital dictionary ever created. The first two stages of this project have been done via grants to RV College of Engineering from LexisNexis Risk’s HPCC Systems group. In Progress This project is still in progress. Project Description […]
Read more
Dr. Jyoti Shetty
Dr. Jyoti Shetty is an Assistant Professor in the Computer Science and Engineering Department at the RV College of Engineering. In collaboration with students, she has executed several projects on HPCC Systems, including implementing a distributed DBSCAN, providing evaluation metrics for a clustering algorithm, and IoT plugin for HPCC Systems, an OpenCV […]
Read more
Nathalia Ribas
Nathalia Ribas is a Computer Engineering student at the Federal University of Santa Catarina (UFSC), Brazil and currently an employee at LexisNexis Risk Brazil. She was introduced to HPCC Systems in 2021 when she had the opportunity to study the HPCC platform in the cloud along with Elastic Stack for her […]
Read more