Measured 2026-05-27 · promptShield 1.0.14 · Presidio 2.2.362

promptShield vs Microsoft Presidio

Benchmarked head-to-head on 14 PDF documents across 7 languages. Same text extraction, same documents, no asymmetric tuning. Published script and raw CSVs — verify it yourself.

Start Free Trial7-Day Free Trial — No Card

Reproduce the benchmark Read the methodology

TL;DR

Presidio (default install)

666

spans emitted · high noise level

promptShield 1.0.14

252

spans emitted · same corpus · noise filtered

On contract documents (where the data is real PII), promptShield emits the same 209 high-signal spans as Presidio. The 414-span gap is on financial statements, where raw entity hits are 30–70% noise (audit firm names, registered offices, jurisdictional country mentions).

Feature comparison

Feature	promptShield	Presidio
Multi-language regex recognizer (60+ types) Presidio: EN-primary by default		◦
Multilingual entity model (Davlan) Presidio: requires custom HuggingFace entity recognizer		◦
Optional LLM layer for contextual PII
US NPI / ABA / DEA checksum validation Presidio: NPI and MBI included, ABA/DEA partial		◦
AU TFN / ABN / ACN / Medicare checksum validation
BR CPF / CNPJ checksum validation
CH AHV checksum validation
MX CLABE checksum validation
NL BSN / RSIN checksum validation
French NIR (social security) validation
Jurisdiction-boilerplate filter "governed by the laws of X" → X not flagged
Intra-page PERSON span coalescence "Pierre Dubois" kept, bare "Pierre" dropped
Generic-department ORG filter
Role-title PERSON filter
Finished desktop application with review GUI
Reversible tokenization (encode / decode) Presidio: separate anonymizer layer		◦
100% offline (no cloud dependency)
Custom recognizers via API
Python library for custom integration
Azure AI Language integration

✓ = included by default · ◦ = available via custom configuration · ✗ = not present

Noise categories Presidio (default install) emits and promptShield suppresses

URLs in contract footers

Privacy policy link, support page URL, generic legal contact. ~8–10 spans per contract, virtually none of which is PII the reviewer wants to redact.

UrlRecognizer

Standalone country / city mentions in jurisdictional prose

"governed by the laws of France", "registered office in Paris", "company incorporated in Germany". In a French financial statement, Presidio emits 73 LOCATION spans; almost all are contract chrome.

filter_jurisdiction_boilerplate + filter_standalone_country

PERSON fragments

The same name appears as three separate spans: "Pierre", "Dubois", "Pierre Dubois". On an Italian contract, Presidio emits 26 PERSON spans for 10 distinct named parties.

filter_span_coalescence

Generic department names tagged as ORG

"Marketing", "Vorstand", "Direction Générale", "Board of Directors", "Comitato di Direzione". These aren't organisations the reviewer wants to redact.

filter_generic_org (90-entry stoplist × 7 languages)

Role titles tagged as PERSON

"CEO", "Directeur", "Geschäftsführer" capitalised at line start, sometimes mis-tagged as PERSON by the entity model.

filter_role_titles

When to choose Presidio instead

Presidio is the right choice when:

You're building a custom Python-based DLP pipeline and want library-level control.
You need to integrate cloud services (Azure AI Language, AWS Comprehend) under one PII abstraction.
You want to fine-tune your own recognizers for proprietary entity types.
You're processing text streams (not bounded documents) and need a service-shaped library.

Both Presidio (by Microsoft) and promptShield are MIT-spirit projects. Use the right tool for the job.

When to choose promptShield

You're anonymizing bounded PDF/DOCX/XLSX documents on the desktop.
You need country-specific checksums out of the box (TFN, CPF, NPI, IBAN, etc.) without writing them yourself.
You want a finished GUI workflow (review, redact, tokenize, export) rather than a library to integrate.
Your customers can't send documents to a cloud service for compliance reasons.

Reproduce it yourself

Every number on this page comes from the script published in the public repo. No number is sourced from an internal, non-verifiable run.

git clone https://github.com/promptshield-Inc/pii-detection-benchmarks
cd pii-detection-benchmarks
pip install -r requirements.txt
python benchmark.py

Outputs: results/presidio_counts_<date>.csv + results/presidio_entities_<date>.csv. The Presidio script is fully standalone — you don't need promptShield installed to verify the Presidio side.

Open the repo Download promptShield

Honest caveats

Default install only. A tuned Presidio install (custom recognizers + transformer entity backend + tuned confidence thresholds) would close most of the precision gap. We measure what most Presidio users deploy in the first month.
Synthetic corpus. Real customer documents have richer noise (OCR errors, scanned originals, multi-column layouts) we don't measure here.
No ground-truth labels. The "both / ours-only / presidio-only" numbers are a precision proxy, not a strict F1 measurement. Hand-labelling 14 PDFs across 7 languages is ~40 hours of work; we haven't done that yet.
Two document classes. Contracts + financial statements. Medical records, HR forms, immigration paperwork would produce different gaps.

Feature comparison

Feature	promptShield	Presidio
Multi-language regex recognizer (60+ types) Presidio: EN-primary by default		◦
Multilingual entity model (Davlan) Presidio: requires custom HuggingFace entity recognizer		◦
Optional LLM layer for contextual PII
US NPI / ABA / DEA checksum validation Presidio: NPI and MBI included, ABA/DEA partial		◦
AU TFN / ABN / ACN / Medicare checksum validation
BR CPF / CNPJ checksum validation
CH AHV checksum validation
MX CLABE checksum validation
NL BSN / RSIN checksum validation
French NIR (social security) validation
Jurisdiction-boilerplate filter "governed by the laws of X" → X not flagged
Intra-page PERSON span coalescence "Pierre Dubois" kept, bare "Pierre" dropped
Generic-department ORG filter
Role-title PERSON filter
Finished desktop application with review GUI
Reversible tokenization (encode / decode) Presidio: separate anonymizer layer		◦
100% offline (no cloud dependency)
Custom recognizers via API
Python library for custom integration
Azure AI Language integration

✓ = included by default · ◦ = available via custom configuration · ✗ = not present

Noise categories Presidio (default install) emits and promptShield suppresses

URLs in contract footers

Privacy policy link, support page URL, generic legal contact. ~8–10 spans per contract, virtually none of which is PII the reviewer wants to redact.

UrlRecognizer

Standalone country / city mentions in jurisdictional prose

"governed by the laws of France", "registered office in Paris", "company incorporated in Germany". In a French financial statement, Presidio emits 73 LOCATION spans; almost all are contract chrome.

filter_jurisdiction_boilerplate + filter_standalone_country

PERSON fragments

The same name appears as three separate spans: "Pierre", "Dubois", "Pierre Dubois". On an Italian contract, Presidio emits 26 PERSON spans for 10 distinct named parties.

filter_span_coalescence

Generic department names tagged as ORG

"Marketing", "Vorstand", "Direction Générale", "Board of Directors", "Comitato di Direzione". These aren't organisations the reviewer wants to redact.

filter_generic_org (90-entry stoplist × 7 languages)

Role titles tagged as PERSON

"CEO", "Directeur", "Geschäftsführer" capitalised at line start, sometimes mis-tagged as PERSON by the entity model.

filter_role_titles

When to choose Presidio instead

Presidio is the right choice when:

You're building a custom Python-based DLP pipeline and want library-level control.

You need to integrate cloud services (Azure AI Language, AWS Comprehend) under one PII abstraction.

You want to fine-tune your own recognizers for proprietary entity types.

You're processing text streams (not bounded documents) and need a service-shaped library.

Both Presidio (by Microsoft) and promptShield are MIT-spirit projects. Use the right tool for the job.

When to choose promptShield

You're anonymizing bounded PDF/DOCX/XLSX documents on the desktop.

You need country-specific checksums out of the box (TFN, CPF, NPI, IBAN, etc.) without writing them yourself.

You want a finished GUI workflow (review, redact, tokenize, export) rather than a library to integrate.

Your customers can't send documents to a cloud service for compliance reasons.

Reproduce it yourself

Every number on this page comes from the script published in the public repo. No number is sourced from an internal, non-verifiable run.

git clone https://github.com/promptshield-Inc/pii-detection-benchmarks cd pii-detection-benchmarks pip install -r requirements.txt python benchmark.py

Outputs: results/presidio_counts_<date>.csv + results/presidio_entities_<date>.csv. The Presidio script is fully standalone — you don't need promptShield installed to verify the Presidio side.