Why we built our own PII engine instead of shipping Presidio

Microsoft Presidio is one of the best open-source PII toolkits in existence. We read its source, we learned from it, and we still chose not to ship it. This is the honest case for why.

The short version: Presidio finds personal data inside a block of text. promptShield does something narrower and more physical — it takes a PDF on someone's laptop, with no internet connection, and lets the user click a name on the page and watch it get blacked out. Both tasks involve "finding personal data," so they sound like the same job. They aren't. And the differences between them are exactly the parts where we would have spent all of our time fighting Presidio to do something it was never built to do.

So this isn't a case of reinventing the wheel for the sake of it. We genuinely needed a different wheel. Here's why.

The thing we ship is not a text stream

Presidio thinks in plain text: you give it a string, and it hands back a list of positions — "characters 40 to 51 are a name." That's perfect if you're scanning chat messages or a database column, where text is just text. It's the wrong starting point for us, because our user isn't looking at a string. They're looking at a page.

A PDF page isn't a line of text — it's a layout. Words sit in boxes at specific coordinates. A single name can be split across two lines, spread across two columns, or even rotated. And the user doesn't redact "characters 40 to 51" — they black out a rectangle on the page they can see. So for us, finding the name and knowing where it sits on the page aren't two separate steps we could glue together. They're the same step.

Here's what that looks like in practice. When our pipeline finds "Pierre Dubois" wrapped across the end of one line and the start of the next, it has to turn that one name into two rectangles and black out both. When it finds a name on page 1, it remembers it and blacks it out automatically on page 4, so the reviewer doesn't have to flag the same person eleven times by hand. Presidio has no concept of any of this — it only knows about positions in a string. To make it work, we'd have had to wrap it in so much page-and-layout code that Presidio would have handled the small, easy part and our own code would have handled everything that's actually hard.

We don't run one model. We run five layers that vote.

In Presidio, each detector works on its own and you simply pool everything they find. That's flexible, but it has a catch: if any one detector is trigger-happy, all of its false alarms end up in your results. Your accuracy is dragged down to that of your noisiest detector.

We took a different approach: our detectors have to agree with each other. Five of them run over the same page:

Regex — high-precision structured patterns (SSN, IBAN, email, phone, business IDs), with a context window so a bare 9-digit number isn't an SSN unless the surrounding words say it is.
spaCy NER — per-language statistical models, one each for 7 languages.
GLiNER — zero-shot entity recognition for types the trained models miss.
BERT NER — transformer-backed recognition where the smaller models under-recall.
LLM — optional, local GGUF or a remote API, for the long-tail context calls.

Instead of just pooling what they find, we have them vote. If two detectors flag the same thing, we trust it more; if three agree, more still. Something only one twitchy detector spotted has to clear a much higher bar before we show it to the user. This voting logic — how we weigh agreement, merge overlapping hits, and carry results across pages — is the product. It's where our accuracy actually comes from. Presidio gives you the detectors, but it deliberately leaves the judgment call about how to reconcile them up to you. And writing that judgment call is writing most of an engine — so we wrote the whole thing.

It has to run with the network cable unplugged

promptShield is an offline desktop app. The whole pitch is that your documents never leave your machine — no cloud round-trip, no API key, no "we promise we don't log it." That's a hard constraint, and it reshapes the dependency question.

Presidio can run locally too — but that's not how most people actually use it. To get good accuracy out of it, the common path is to plug it into Microsoft's or Amazon's cloud language services. Those are exactly the cloud calls we've promised never to make. That leaves Presidio's fully-local option, which relies on smaller, lighter models — and that happens to be the very setup we measured in our benchmark, the one that floods financial documents with false positives.

And it goes deeper than accuracy. We have to pack every AI model inside the app's installer and run it on the user's machine. Each model has to be bundled, locked to a version, checked for tampering, loaded into memory and — the part nobody warns you about — cleanly unloaded again. On Windows, if a user removes a language pack while any part of that model is still held in memory, Windows simply refuses to delete the files. So we keep a carefully maintained list of every place a model might still be loaded, across all five detectors, and release them all before deleting anything. That kind of housekeeping only works if we control exactly how and when each model is loaded. A third-party engine that loads its own models, on its own schedule, would constantly trip over this.

The noise filtering is the real moat — and it's ours

In our head-to-head benchmark, default Presidio emitted 666 spans on a 14-document corpus; we emitted 252 on the same documents. The gap is almost entirely noise we filter that a default install doesn't: jurisdictional country mentions ("governed by the laws of France"), URLs in footers, role titles mistaken for names, generic department nouns tagged as organizations, the same person fragmented into three overlapping spans.

Every one of those filters works across all seven languages, is tuned against real contracts and financial statements, and runs inside the voting step — where it can weigh both how confident the detectors are and where the text actually sits on the page. You could rebuild all of this as add-ons bolted onto Presidio. But by then Presidio is just running the detectors, while everything that makes our output clean — the filtering, the voting, the page-awareness — is still code we had to write ourselves. We'd be carrying a big dependency mostly to switch large parts of it off.

What we gave up

This was not free, and it would be dishonest to pretend otherwise.

We don't get Presidio's recognizer breadth for free. Their community ships recognizers for entity types and locales we haven't built. Every one we want, we write and test ourselves.
We carry our own maintenance. When a transformer library ships a breaking change, that's our problem to absorb — there's no upstream to file an issue with and wait. We've eaten a few of those.
We can't lean on Presidio's name. "Built on Microsoft Presidio" is a credibility shortcut in a procurement conversation. We traded it for "here's our reproducible benchmark," which is more work and less brand-borrowing.

When you should reach for Presidio instead

If your problem looks like any of these, Presidio is very likely the right call and building your own would be a waste:

You're scanning text streams — chat logs, database columns, API payloads — not bounded, laid-out documents.
You want library-level control inside a larger Python DLP system, and you're happy to own the precision tuning.
You can use a cloud NER backend (Azure, AWS) and want one PII abstraction over several providers.
You need breadth of entity types and locales more than you need a clean reviewer experience on a fixed document class.

Presidio is the better tool for a large, varied slice of the PII-detection world. We're not in that slice.

The actual decision rule

Here's the test we applied, and the one we'd recommend to anyone weighing "adopt the library" against "build the engine":

If the dependency would do the easy part and you'd write the hard part on top of it, you're not adopting a library — you're adopting a constraint.

For us, the hard parts — knowing where text sits on the page, having five detectors vote, managing AI models offline, filtering noise across seven languages — make up most of the product, and they're all tightly tangled together. Presidio would have sat at the bottom running the detectors while every decision that actually mattered happened in our own code. So we wrote that bottom layer too — and ended up with an engine where we understand every piece, ship it entirely offline, and tune it against the exact kinds of documents our users actually redact.

We were inspired by Presidio. We just had a different document to redact.