Inside Version 3 of the VisualText NLP++ Engine: Compiled, Cloud-Built, and Now on npm – Natural Language Understanding Global Initiative

For most of its life, the NLP++ engine has been a powerful tool for writing rule-based analyzers — explainable, glass-box, and deterministic natural-language processing you can actually read, debug, and reproduce exactly. It has run on Windows, Linux, and macOS for years. What Version 3 changes is how you build, deploy, and install it: compiling your analyzers to native code is now a one-click job rather than a C++ expedition, those binaries can be built in the cloud, and — for the first time — you can pull the engine into a Node.js project with a single npm install.

Here’s what shipped, and why it matters.

Compiling analyzers — now a one-click job, not a C++ expedition

Here’s a subtlety worth being precise about: compiling NLP++ analyzers and knowledge bases to native C++ was never impossible. The capability has existed for a long time. But it was hidden, fragile, and realistically reachable only by someone with serious C++ build expertise — the right toolchain, the right flags, the right link order. For everyone else, it might as well not have existed.

Version 3 takes that buried, expert-only capability and turns it into a one-click process any user can run — no C++ knowledge required. You write rules; the engine code-generates C++ from your NLP++ rule files and knowledge bases and builds them into native shared libraries. The payoff is faster execution and a deployable artifact, without ever opening a compiler.

Making that one click trustworthy took real engineering under the hood: dozens of type-correct overloads for NLP++ builtins in the generated code, provenance comments (/* nlp-source */) so a line of generated C++ can be traced back to the NLP++ rule that produced it, and a series of correctness fixes — flushing knowledge-base writes that were silently coming up empty, guarding against a null-pointer crash, and fixing a subtle C++ evaluation-order bug by hoisting function arguments into temporaries.

Compile in the cloud — no toolchain required

Not everyone wants a C++ build environment on their machine. So Version 3 introduced a cloud-compile service (nlp-compile-service): a lightweight dispatcher that hands your analyzer to GitHub-hosted runners, compiles it for Windows, Linux, and macOS, and hands back native libraries.

The genuinely hard problem here was cross-platform native linking — getting ICU and the engine’s own static libraries to link cleanly on Linux and macOS (the kind of --whole-archive / --start-group / -force_load incantations that decide whether you get a working binary or an undefined-symbol error). With that solved, “compile my analyzer” became a button in the VisualText VS Code extension (now v3.1.24), complete with progress, an elapsed-time counter, and automatic staging of the result into your project.

New in v3: the engine on npm

The headline packaging change in Version 3 is a brand-new Node.js package. NLP++ has had a native Python package for a while (pip install NLPPlus), but the JavaScript world had nothing first-class. Now it does:

Node.js (new): npm install nlpplus (npm-package-nlpengine, v1.0.5) — a self-contained native addon that bundles its runtime dependencies. Built from scratch this cycle.
Python (enhanced): pip install NLPPlus (py-package-nlpengine, v2.0.9) — the existing package, now with the new compile() and cloud_compile() APIs, context-manager support, and Python 3.13 wheels.

Both embed the engine directly — no subprocess shell-out, no manual binary wrangling — just import and analyze.

Smarter analyzers and lighter dictionaries

The language runtime got upgrades too. Large lexicons now lazy-load one word at a time instead of reading an entire dictionary into memory up front — a big win for startup time and footprint. New loadkbb and loaddict functions, an _xVAR("attribute") match-list special, and support for digits in underscore-prefixed token names round out the NLP++ improvements.

On the content side, the shared package-analyzers collection (email, telephone, links, address) was re-architected from line-based to zone-based processing, so it works on real-world HTML and Markdown pages rather than tidy single lines — and the telephone analyzer gained international support.

Built with an AI collaborator

One more thread runs through all of this: much of Version 3 was developed in collaboration with Claude (Anthropic’s Claude Code). You can see it in the commit history — the systematic, traceable changes that hardened compiled mode, ironed out the platform-specific build problems compiled mode exposed, stood up the cloud-compile backend, scaffolded the new Node package, and wired up automatic cross-repo release propagation.

There’s a deeper point here than convenience. NLP++ is glass-box AI — rule-based, inspectable, explainable, and deterministic: identical input yields identical output, every time, with rules a human can audit. That’s exactly what statistical and probabilistic models can’t guarantee, and it’s why deterministic NLP is the right fit for critical-path systems — the places where “usually correct” isn’t good enough and you need to be able to prove why a decision was made.

So the symmetry is striking: a large language model (Claude) is helping build a deterministic NLP system — one that can go where statistical systems can’t. The LLM is a development partner, not the runtime. It helps create the tools; the tools themselves stay transparent and reproducible.

Try it

Python: pip install NLPPlus
Node.js: npm install nlpplus
VS Code: install the VisualText extension
Ready-to-run engine: grab a build from nlp-engine-linux, -mac, or -windows
Source: github.com/VisualText/nlp-engine

Version 3 is the release where NLP++ became a compilable, cloud-buildable, pip-and-npm-installable engine — without giving up the thing that makes it special: you can always read the rules.