Here are some testimonial stories about those who have used NLP++ over the years.
NLP “Must Be” Rule Based
When I was working at LexisNexis in their supercomputing group, I had many stimulating and fruitful discussions with Roger Dev, an expert in Machine Learning and Neural Networks for many decades. He and I agree on many aspects of AI given we are of the “older generation” who has seen the promise of singularity appear numerous times over the decades only to be brought back down to reality.
One of our favorite topics was “AI Hype” and the often “stunning” misconceptions about the current state of the art in AI and statistical methods like machine learning, neural networks, and more recently, large language models. The idea for instance that Text Vectors found “meaning” in a language we both agreed what simply untrue.
In one of my last conversations with Roger, we were talking about the role of statistical methods in NLP. During the course of our conversation, Roger simply blurted out this phrase:
“NLP has to be rule-based”
I was taken aback when he said that. I asked him to repeat the phrase, and he said that he was convinced after decades of work in statistical methods that NLP had to be rule-based. He explained that statistical systems will always be wrong because they rely on probability. I agreed and mentioned the fact that probability rarely plays a role in when humans are reading and understanding text.
Couldn’t break NLP++ Conceptually or Algorithmically
NLP++ was developed by Text Analysis International is a startup company funded by friends and family of Amnon Meyers in the late 1990s.
In the early 2000s, the company was contacted by Keith Woods-Holder from an AI company located in the UK. He wanted to fly out to California to meet Amnon and myself to find out in his words (paraphrasing) “what in the heck did you guys invent?” They mentioned that they had looked at 99 NLP technologies and happened across NLP++. They decided to test our technology, and they said that they couldn’t break it conceptually or algorithmically and they had to fly out and talk to the authors of this technology.
They eventually did use NLP++ for sentiment analysis for NASDAQ.
Interestingly enough, they are looking to the technology again in 2024 for several of their clients in the UK.
Could Not Do with Without NLP++
I had the privilege of mentoring more than a dozen college and high school students in various projects in AI during the last few years. Many of them involved using NLP++ directly for projects such as sentiment analysis, resume processing, or processing Wiktionary pages for building a bigger and better English Dictionary.
Mentoring up to four students at a time, I had many interactions with students all of whom had never used NLP++ before. I always kept them aware of my first two rules of using NLP++. I stole this idea from the movie “Fight Club”. Here they are:
- Think like a human
- Think like a human
When preparing for my talk on NLP++ at Clemson University, I was in a conundrum: how to describe the difference between traditional programming, and intelligent programming. I looked up the definition of computer programming found nothing out there that I found satisfying. After weeks of busting my brain, I finally came up with the answer.
- Traditional programming: “Think like a computer”
- Programming with NLP++: “Think like a human”
In fact, it was the biggest obstacle with all my computer programming students. They were all extremely bright and some of the top students in their respective schools but when they often got bogged down in talking about while loops in NLP++ I would stop them and ask: “is this what humans do when they are reading and understanding text”. Some students picked it up quickly. Others took a few weeks.
During one of my video conference sessions with some students from India, one of the students volunteered to pass me a very interesting conclusion that his fellow students and professors came to when talking about NLP++. They said that during the last meeting, they stopped and tried to think of doing the same task they were attempting without using NLP++. I believe it was the group that was parsing English Wiktionary pages to create a more extensive digital dictionary of English.
The student, addressing me as “Mr. de Hilster” recalled that they stopped and asked themselves this simple question: can they do the current programming task without using NLP++. He asked me, “do you want to know what our conclusion was?” I said “of course!” He replied (and I am paraphrasing), “we concluded that such a task is impossible without NLP++”.
As Amnon has said many times: to program what NLP++ can do using another programming language, you would have to recreate the NLP Engine in that language.
Address Parsing
When people think of NLP, they think of conversing with computers. But NLP includes much more than conversation. It also includes reading formatted documents or things like addresses, emails, urls, and the like.
Story One
One of the things Amnon and myself know is that NLP++ is a regex killer. Regex is a rule based "sub-language" that is used for matching patterns in text. It is a compliable language meaning that the patterns written in Regex be perfect. This is unlike natural languages which are open languages. It is a sub-language given that the compilers for Regex patterns can be found as part of packages for many languages including C++, Python and yes, even NLP++.
Regex is used heavily in industry to parse simple phrases such as dates, telephone number, and emails. They are very efficient and do the job for simple phrases. But as one build patterns using Regex and string them together, their readability becomes a problem. In fact, almost every programmer knows that when they go back to code they have written in Regex, they often have no clue as to how a certain pattern works since the human readability of Regex is notoriously bad.
One person in my last company upon learning about NLP++, went through NLP++ video tutorials and is now translating her Regex address parsers into NLP++. NLP++ is a Regex killer in that it is human readable and easy to trace and upkeep. It is also 1000 times more powerful.
Story Two
While at LexisNexis, I gave a talk in Portuguese about NLP++ to a group of colleagues working in Brazil. From the talk, I got several contacts that were interested in the technology. One was interested in replacing a licensed software package that parsed Brazilian addresses. The current system was generating errors that were difficult or impossible to correct given the fact that the address parser currently used is a black box.
Guilherme Santos learned NLP++ and worked with me on creating a parser for addresses in Portuguese. This lead me to create some dictionaries (using NLP++ to parse webpages) for Portuguese which are now available for everyone in the Portuguese library that comes with the NLP++ extension for VSCode. The advantage is that instead of a black box, NLP++ is a glass box that can be fixed when the address is erroneously broken down.
Guilherme will be presenting his work live online during the HPCC 2024 Community Day Conference in October which is public where anyone can attend.
Uses of NLP++ Through the Years
These companies licensed NLP++ analyzers before the technology became open source in December of 2018.
Licensed Commercial Use
- NASDAQ OMX: The customer used NLP++ in its deployed sentiment analysis solution.
- XIEO: TAI partners with Maya Information Technologies to deploy Official Records processing solutions, including OCR cleanup, information extraction, and property lookup from deeds, mortgages, foreclosure documents, and other court and local government documents.
- Michael Page International: Builds and deploys resume analyzers for a host of European languages.
- Patrice Mellot Consultant: PMC has deployed an information extraction system for resumes in French, serving Michael Page International, one of the world's foremost professional recruitment firms. PMC's description of the application:
- Educational Testing Service: A multi-year research project, building multiple Natural Language Generation (NLG) applications.
- IBM Global Services UK: A multi-year deployed text analytic application for the UK government.
- Polwire: Delivers up-to-the-minute personalized news for the political arena, including foreign policy, defense, budget & economy, health care, education, energy & oil, and much more.
- Milkhouse Software: Text analytic capabilities for the VizForce product. Applications focus on patents and medical research papers.
Open-Source Usage
- NLP++ Plugin: LexisNexis Risk developed an NLP++ plugin for their HPCC Systems Supercomputing Platform after NLP++ became open source.
- Portuguese Address Parser: prototype development of a Portuguese address parser to be presented in October 2024 during the virtual HPCC Community Day (open to the public)
- Low-Resource ICD Coding of Hospital Discharge Summaries: master's thesis by Ashton Williamson at Clemson University (read about it here)