Skip to main content

DNFBP AML Compliance: What Professional ... Services Firms Must

AML compliance for DNFBPs and professional services firms in 2026.

10-min read Published April 3, 2026 Updated 1 April 2026

Compliance teams have been trained to accept false-positive rates as a cost of doing business. A 95%+ false-positive rate on transaction-monitoring alerts is the industry-quoted norm — and it costs roughly 60% of every analyst's working day. This study asks a simple question: what happens when detection is done by reasoning, not rules?

The benchmark

We ran WIDTH's detection engine in shadow mode against three production rule-based tools across twelve institutions — a mix of retail banks, digital banks, fintechs, and payment firms. Over a twelve-week window, each tool produced roughly 40M alerts in aggregate. We paired-reviewed a stratified sample of 20,000 to establish ground-truth labels.

What we measured

The structural difference

Rule-based monitoring matches fixed conditions. Behaviour-aware scoring compares a customer's activity to their own baseline and to a peer group sharing their onboarding profile, transaction type, and corridor. The same $12,000 wire looks normal for one peer group and extraordinary for another — and that context never appears in a threshold.

"We went from 40,000 alerts a month to 2,000 — and the 2,000 are the ones that actually matter."

Results

Precision

Aggregate precision rose from 3.1% under rule-based monitoring to 62.4% under WIDTH. The strongest gains were in peer-sensitive typologies — structuring, layering, and APP scam variants — where the peer-group normalisation had the largest effect.

Analyst time

Median investigation time per genuine case dropped from 42 minutes to 9 — not because the investigator moved faster, but because the case arrived pre-drafted with entity graph, typology label, and citation chain.

Narrative quality

Blinded reviewers (three MLROs, independent) rated WIDTH narratives 4.6/5 on average, compared to 2.8/5 for analyst-assembled narratives written from scratch — a statistically significant gap at p < 0.001.

Where the AI still missed

The model under-flagged two classes of novel typology that no rule catches either — but which seasoned investigators intuited from context the model didn't yet have. Those gaps were closed by adding a peer-group feature the model wasn't originally ingesting.

The lesson: AI-native monitoring isn't magic, and it isn't static. The gap between WIDTH and rule-based tools widened over the study window because the model learned from feedback; the rules didn't.

What this means for compliance teams

Reproducing the study

We're making the methodology, sample schema, and review rubric available to compliance teams under NDA. Email research@width.com.

See the engine run on your alerts.

30 minutes. We'll replay a slice of your historic alerts through WIDTH and walk the precision numbers with you.