Inside Version 3 of the VisualText NLP Engine: Compiled, Cloud-Built, on npm — and AI-Assisted

For most of its life, the NLP++ engine has been a powerful tool for writing rule-based analyzers — explainable, glass-box, and deterministic natural-language processing you can actually read, debug, and reproduce exactly. It has run on Windows, Linux, and macOS for years. What Version 3 changes is how you build, deploy, and install it — and, now, how you start. Compiling your analyzers to native code is a one-click job rather than a C++ expedition; those binaries can be built in the cloud; and — for the first time — you can pull the engine into a Node.js project with a single npm install. The VS Code extension goes one step further: a built-in help system and ready-to-paste prompts let Claude scaffold a working analyzer from scratch — and even gather real text off the web to test and harden it — so you no longer have to be fluent in NLP++ to begin. And what comes back is still plain, glass-box rules you own.

Here’s what shipped, and why it matters.

Compiling analyzers — now a one-click job, not a C++ expedition

Here’s a subtlety worth being precise about: compiling NLP++ analyzers and knowledge bases to native C++ was never impossible. The capability has existed for a long time. But it was hidden, fragile, and realistically reachable only by someone with serious C++ build expertise — the right toolchain, the right flags, the right link order. For everyone else, it might as well not have existed.

Version 3 takes that buried, expert-only capability and turns it into a one-click process any user can run — no C++ knowledge required. You write rules, functions, dictionaries, and knowledge bases; the engine code-generates C++ from your NLP++ rule files and knowledge bases and builds them into native C++ shared libraries on whatever platform you are on whethe it be Linux, the Mac, or Windows. The payoff is faster execution and a deployable artifact, without ever opening a compiler.

Making that one click trustworthy took real engineering under the hood: dozens of type-correct overloads for NLP++ builtins in the generated code, provenance comments (/* nlp-source */) so a line of generated C++ can be traced back to the NLP++ rule that produced it, and a series of correctness fixes — flushing knowledge-base writes that were silently coming up empty, guarding against a null-pointer crash, and fixing a subtle C++ evaluation-order bug by hoisting function arguments into temporaries.

Compile in the cloud — no toolchain required

Not everyone wants a C++ build environment on their machine. So Version 3 introduced a cloud-compile service (nlp-compile-service): a lightweight dispatcher that hands your analyzer to GitHub-hosted runners, compiles it for Windows, Linux, and macOS, and hands back native libraries.

The genuinely hard problem here was cross-platform native linking — getting ICU and the engine’s own static libraries to link cleanly on Linux and macOS (the kind of --whole-archive / --start-group / -force_load incantations that decide whether you get a working binary or an undefined-symbol error). With that solved, “compile my analyzer” became a button in the VisualText VS Code extension (now v3.1.24), complete with progress, an elapsed-time counter, and automatic staging of the result into your project.

New in v3: the engine on npm

The headline packaging change in Version 3 is a brand-new Node.js package. NLP++ has had a native Python package for a while (pip install NLPPlus), but the JavaScript world had nothing first-class. Now it does:

  • Node.js (new): npm install nlpplus (npm-package-nlpengine, v1.0.5) — a self-contained native addon that bundles its runtime dependencies. Built from scratch this cycle.
  • Python (enhanced): pip install NLPPlus (py-package-nlpengine, v2.0.9) — the existing package, now with the new compile() and cloud_compile() APIs, context-manager support, and Python 3.13 wheels.

Both embed the engine directly — no subprocess shell-out, no manual binary wrangling — just import and analyze.

Smarter analyzers and lighter dictionaries

The language runtime got upgrades too. Large lexicons now lazy-load one word at a time instead of reading an entire dictionary into memory up front — a big win for startup time and footprint. New loadkbb and loaddict functions, an _xVAR("attribute") match-list special, and support for digits in underscore-prefixed token names round out the NLP++ improvements.

On the content side, the shared package-analyzers collection (email, telephone, links, address) was re-architected from line-based to zone-based processing, so it works on real-world HTML and Markdown pages rather than tidy single lines — and the telephone analyzer gained international support.

Built with an AI collaborator

One more thread runs through all of this: much of Version 3 was developed in collaboration with Claude (Anthropic’s Claude Code). You can see it in the commit history — the systematic, traceable changes that hardened compiled mode, ironed out the platform-specific build problems compiled mode exposed, stood up the cloud-compile backend, scaffolded the new Node package, and wired up automatic cross-repo release propagation.

There’s a deeper point here other than convenience. NLP++ is glass-box AI — rule-based, inspectable, explainable, and deterministic: identical input yields identical output, every time, with rules a human can audit. That’s exactly what statistical and probabilistic models can’t guarantee, and it’s why deterministic NLP is the right fit for critical-path systems — the places where “usually correct” isn’t good enough and you need to be able to prove why a decision was made.

So the symmetry is striking: a large language model (Claude) is helping build a deterministic NLP system — one that can go where statistical systems like LLMs can’t. The LLM is a development partner, not the runtime. It helps create the tools that itself cannot perform reliably; the tools themselves stay transparent and reproducible.

Help is now built in — and Claude can write your first analyzer

The thing that kept most people out of NLP++ was never the engine. It was the blank editor. If you weren’t already fluent in the language, “write a rule-based analyzer” felt like a wall — and so people reached for an LLM and stopped there, settling for output they couldn’t inspect or reproduce.

Version 3’s tooling knocks that wall down. The VS Code extension now ships a built-in Help view — quick-start, compiling, regression testing, and a full NLP++ reference, a click away in the sidebar instead of buried on a website. But the part that changes who can use NLP++ is the new LLM Prompts.

These are ready-to-paste prompts for Claude, generated on the spot with your machine’s actual paths filled in — where the engine lives, where the example and template analyzers are, where the language and utility libraries sit. That context is the whole trick: Claude doesn’t have to guess how NLP++ works, because the prompt points it straight at working analyzers to learn from.

Two of them matter most:

  • Build an analyzer from scratch. One click drops a prompt that tells Claude to study the bundled example analyzers, then build a working prototype for whatever you describe — and to first gather real text files from the internet to run it against. You go from an idea to a runnable, inspectable analyzer without having written a line of NLP++ yourself.
  • Harden an existing analyzer. Another prompt has Claude pull in more real-world text, run it through your analyzer, and use the results to tighten the rules — feeding straight into the new golden-file regression tester so you can lock the behavior down. Real inputs, found automatically, turned into a test suite.

Notice what didn’t change: the artifact Claude produces is still plain, deterministic, glass-box NLP++ — rules you can read, version, compile, and run the same way every time. The LLM gets you off the blank page and helps you stress-test against the messy real world, but it’s a development partner, not the runtime. You end up fluent in NLP++ faster, and with code you actually own.

Try it

Version 3 is the release where NLP++ became a compilable, cloud-buildable, pip-and-npm-installable engine — without giving up the thing that makes it special: you can always read the rules.