There’s a strange thing happening in software right now. Everyone will tell you the large language model is the future of language processing, and in the same breath admit they can’t tell you why their model did what it did, whether it will do the same thing tomorrow, or what it will do on the input they haven’t tried yet. That’s not a future. That’s a slot machine with good PR.
NLP++ is the opposite. It’s a rule-based, glass-box programming language for natural language. You write the rules; the engine runs them; the same input gives the same output every time; and when something goes wrong, you can open the code and see exactly which rule fired and why. No training data. No black box. No “we think it’s about 94% accurate.” It’s just code you can read.
So here’s a question I get a lot: if NLP++ is the honest, auditable alternative to the LLM, why would I ever use an LLM to write it?
Because a well-used tool is a well-used tool. I’m contrarian about the hype, not about usefulness. Claude — Anthropic’s coding assistant — turns out to be genuinely good at writing NLP++, and that opens the door to a lot of people who looked at the paradigm and found it too daunting to start. The important part is what you’re left holding at the end: not a dependency on a model in someone’s data center, but trustworthy, self-contained NLP++ code that runs on its own, forever, without Claude anywhere in the picture.
This is a how-to. I’ll take you from a blank machine to a working analyzer, using Claude to do the heavy lifting, and then I’ll show you the part that matters most — reading, running, and tweaking the code so it’s yours.
You don’t need to be a programmer. If you are one, and NLP is new to you, this is a fast on-ramp too.
Step 1 — Install VS Code
Everything happens inside Visual Studio Code, Microsoft’s free code editor. Go to code.visualstudio.com, download the build for your operating system — Windows, macOS, or Linux — and install it the way you’d install any app.
That’s the whole step. Open it once so it’s ready.
Step 2 — Install the NLP++ (VisualText) extension
Inside VS Code, open the Extensions view — the icon in the left sidebar that looks like four little squares, or press Ctrl+Shift+X (Cmd+Shift+X on a Mac).
Search for nlp. The NLP++ extension shows up at the top of the results. Click Install.
The first time it runs, the extension offers to download the NLP engine and its VisualText support files. Say yes. That download is the whole point — it’s the engine that actually runs your analyzers, bundled for Linux, macOS, and Windows so you don’t have to build anything yourself.
Once it’s in, you’ll see the NLP++ views appear: Analyzers, Text, Output Files, and a Help panel. To get a feel for it, click the ⚙️ cog icon in the Analyzers view to Load Example Analyzers — VS Code reloads, and you get a set of tutorial analyzers you can click, run against sample text, and poke at. (Copy any example out to another folder before you edit it — the originals get overwritten on the next update.)
When you’re ready to build your own, the New Analyzer icon creates a fresh analyzer from a template, laid out with the standard spec/ input/ kb/ folders. Hold that thought — Claude is going to do this part for you in a minute.
Step 3 — Get Claude Code running in VS Code
Back in the Extensions view, search for and install Claude Code, Anthropic’s assistant. Sign in to connect it to your Claude subscription — the extension walks you through it.
That’s it. Claude now lives in a panel inside the same window as your analyzer. It can read the files on your machine, run the engine, and hand you back working code — all without you leaving VS Code.
Step 4 — Use the built-in prompts to generate an analyzer
Here’s where the VisualText extension does something I’m genuinely proud of. Writing a good prompt is half the battle, and getting Claude to write correct NLP++ means telling it where the engine is, where the example analyzers live, which conventions to follow, and how to run the thing. That’s a lot of fiddly, machine-specific detail.
So the extension writes the prompts for you.
Open the NLP++ Help view and you’ll find a short list of Claude prompts — one per task. Click the one you want and it opens in a new editor as ready-to-paste text, with the real paths on your machine already filled in: the engine executable, the example and template analyzers, and the language libraries. You copy that into Claude, and Claude starts from solid ground instead of guessing.
Each prompt is tuned for a different job:
- Build an analyzer — the generic starting point for something brand new. It hands Claude the engine and template paths and the guardrails that keep it honest, then leaves two blanks: describe the kind of text you want to process, and describe what you want extracted. Fill those in and go.
- From scratch: chemical formulas — a complete worked example, start to finish. It has Claude gather a chemistry corpus from Wikipedia and build an analyzer that finds chemical formulas in ordinary prose (
H2O,CO2,C6H12O6), breaks each into its elements and atom counts, and emits clean JSON. Run this one if you want to watch an analyzer get built end to end before you try your own. It’s the best way to see the shape of the whole thing. - Harden analyzer — for an analyzer you already have. It asks Claude to generate more varied test inputs, including the nasty edge cases that break rules, run them through the engine, and report back where the extraction looks wrong. This is how you find the holes.
- Create Dictionaries & KBs — builds the word lists (
.dict) and knowledge bases (.kbb) an analyzer leans on, learning the exact format from the shared libraries and dropping the files where they belong. - Add missing words to the English dictionary — reads the list of words your analyzer didn’t recognize and adds proper entries — part of speech, root, verb and noun features — to the full English dictionary, keeping everything alphabetized and in sync.
The pattern is always the same: start from the closest-fitting prompt and edit, rather than writing from a blank page. Claude reads a couple of the example analyzers first to learn the conventions, copies the right template, writes the passes, runs the engine over your text, and shows you the output so you can check it together.
Step 5 — Read, run, and tweak the code
This is the step people are tempted to skip, and it’s the one that matters.
When Claude finishes, you don’t have a mysterious model. You have an analyzer folder — plain files you can open:
spec/holds the.nlppasses (the actual rules) andanalyzer.seq, the ordered list of which pass runs when.input/holds the text files being analyzed.kb/holds the knowledge base — the dictionaries and concepts.
Open a pass in the Analyzer (sequence) view and read it. NLP++ is designed to be read: rules, wildcards, and knowledge-base operations laid out in a structure you can follow. You’ll see, in order, how the engine walks the text and builds up what it found. Nothing is hidden.
Run it yourself: pick a text file in the Text view and hit the ▶ run button. The output — the parse tree, any output.json — lands under that text file’s _log/ folder. Change a word in the input, run it again, watch what changes. That loop, input to output with nothing in between you can’t inspect, is the entire value proposition.
And when you want to change behavior, you have two ways to do it, and you’ll use both:
- Ask Claude — “make it also catch formulas written with subscripts,” “reject these false positives” — and let it edit the passes.
- Edit the rule yourself — because now you can. Once you’ve read a pass or two, tweaking a rule is not mysterious. Claude got you to the point where the code is legible; from there you’re in control.
Every time you touch it, run it again and look at the output. When it’s right, lock it in with a regression test so you’ll know the moment anything changes. That’s the discipline an LLM can’t give you: a fixed, checkable definition of correct.
The part that lasts
Here’s the durability angle, and it’s the whole reason I built this.
You used an LLM to get started. Fine. But walk through what you’re actually holding now: a folder of NLP++ code, on your disk, that you can read line by line. It runs on the NLP engine — no API key, no network call, no subscription, no model that gets deprecated or “updated” out from under you next quarter. Feed it the same input in ten years and it gives you the same answer. If a regulator, a customer, or your own future self asks why did it decide that?, you open the rule and point at it.
Claude was scaffolding. Useful scaffolding — it lowered the wall that kept a lot of smart people out of NLP++, and I’m not too proud to say so. But scaffolding comes down, and what’s left standing is yours: deterministic, auditable, self-contained code that owes nothing to the tool that helped write it.
That’s the trade I’ll defend all day. Use the clever tool to build the honest thing. Then let the honest thing run on its own.
— David De Hilster, co-creator of NLP++
![]()
