Case Study: National Enterprise Retailer

Fewer chargebacks, faster trust decisions

For this national retailer, a legacy rules engine had kept fraud tolerable but also untouchable. Changing anything risked breaking downstream workflows. But chargebacks were rising, review queues growing, and leadership needed a way to unlock savings without disrupting operations.

Elephant slotted in as a signal layer, not a replacement. With calibrated scoring thresholds, the team flagged high-risk users earlier, surfaced credible orders faster, and reduced fraud exposure without tuning a single rule.

The challenge

This enterprise retailer processes millions of online orders across multiple regions, fulfillment channels, and risk tiers. But its fraud engine hadn’t changed in years. Rules were brittle, confidence was low, and the pressure to show savings was mounting.

The team couldn’t afford a risky overhaul. They needed a new signal; one they could trust to sharpen their system without destabilizing it. That meant measurable results, explainable decisions, and compatibility with existing queues and thresholds.

Their goals were clear:

Reduce chargebacks without tightening global fraud rules

Decrease manual review volume without increasing fraud

Improve visibility into borderline decisions

Deliver measurable savings that earned internal trust

Where static systems broke down

Rules were frozen in place

The existing system hadn't been calibrated in years. Rules weren't evolving, and manual overrides were the only way to catch what the engine missed

Analysts were drowning in ambiguity

With no trust-based scoring in place, every order that fell into a grey area of data went to review, even when risk signals were low

Fraud was spreading through gaps

Risky behaviors like mismatched names, freight forwarding, or identity stitching slipped through if they didn't trigger specific thresholds

There was no visibility into trust

The system could say "this looks risky" but not "this looks good". Without a trust tier, the review queue only continued to grow

How the system changed with Elephant

Instead of rewriting rules, the team added scoring logic that mapped cleanly to the decision paths they already trusted. Elephant calibrated risk and trust thresholds using historical orders, then validated them against known outcomes.

High-risk users were flagged earlier. High-trust users were approved faster. And review queues thinned out without sacrificing precision. It worked because it fit their system, not because it replaced it.

What we implemented:

Risk tiers were mapped to thresholds like 0.1 (deny) and 0.9 (approve)
Scores were plugged directly into workflows with no custom tooling required
Analyst trust grew through clear lift metrics and transparent scoring explanations
Operational friction dropped while outcome visibility increased

Proof of signal intelligence:

Orders with freight forwarding addresses were 134% more likely to be fraud
Billing and shipping name mismatches increased fraud risk by 92%
Device-first sessions showed 80% less fraud than session-less flow
Email and phone mismatches increased risk by 115%

The real win? Millions in revenue saved without changing a single rule

The goal wasn’t to reinvent the system. It was to make it sharper, smarter, and more efficient without introducing risk. With Elephant, the team saw measurable savings and workflow relief using only a scoring overlay.

Analysts were no longer stuck reviewing everything in the middle. Fraud was caught earlier, trust surfaced faster, and business impact was clear.

$2.3 million in estimated chargeback reduction

Fraud scoring identified high-risk orders that had previously gone unflagged, reducing loss without harming approvals

15.3% reduction in manual reviews

Borderline orders were reclassified with higher confidence, allowing the team to approve or deny without intervention

ROC-AUC reached 0.93

Trust scoring sharply separated good and bad orders, improving downstream confidence and actionability

Approval rate increased 3.6pp at the same fraud threshold

With scoring based on trust, good users were surfaced earlier, enabling higher throughput without more exposure