NLP++ in One Page

Putting data into tables with columns and values, and creating relations between them created SQL. Tokenizing text into trees, matching patterns, and building knowledge, created NLP++. SQL evolved from the necessity to write programs that manipulate data in database tables. NLP++ evolved from the necessity to write programs that break down text and find meaning.

Both languages were logical progressions of the data structures they were required to manipulate. Where SQL is ubiquitous and runs almost 90% of all databases in the world, NLP++  is hardly known. This will change when two things happen:

  1. When the world becomes aware that NLP has a universal programming language as databases have SQL.
  2. When rule-based NLP is recognized as the only way to write trustworthy, human-like NLP programs.

Although NLP++ has been used successfully in industry throughout the years, there are two problems currently facing NLP++ before it becomes widely adopted:

  1. Human language is exponentially more complex than data in database tables and there is no standard for constructing NLP++ analyzers.
  2. Data, rules, knowledge and algorithms must be created manually by humans.

Building rule-based NLP systems is labor intensive, doesn’t yet have well-defined best practices, and has no inherent incentives to build such systems (e.g. Wikipedia).

Unlike SQL which was designed for a well defined data space, NLP++ was designed for text which does not have a well-defined and acceptable data space.

The solution to these problems is the NLP Blockchain.

The NLP Blockchain will organize, incentivize, and decentralize NLP sparking the “great digital migration” where people around the world will build trustworthy NLP for all human languages. The progression from dictionaries, to simple phrases, to entity extraction, to story understanding will happen with tens of thousands of people building towards better and better rule-based NLP.

The incentive will be NLP coins that will be distributed during the “great digital migration” with dictionaries, knowledge bases, simpler patterns being supervised while analyzer code being allowed to happen organically. Certain techniques in parsing will bubble to the surface, eventually becoming “standard practice”.

This is a decades-long project where NLP++ code fully captures linguistic and world knowledge and creates the algorithms necessary for distributed, maintainable, controlled, trustworthy, and powerful NLP.

The “great digital migration” requires programmers to shift from thinking like computers as in traditional programming, to thinking like humans.

That is why NLP++ was created: to allow for the encoding of how humans read and understand text.

NLP++ is the SQL for text.

Loading