Case Study: Travel Booking Platform

Fewer false positives, stronger bookings

For this travel booking platform, fraud was hiding in complexity. International bookings, third-party processors, and large-ticket transactions all made detection harder. While their model could flag some risk, it lacked the signal resolution needed to separate false positives from real threats.

Elephant brought clarity to the edge cases. The team improved fraud detection, reduced false positives, and gained the confidence to act on real trust rather than just red flags.

The challenge

This platform processes hundreds of thousands of flight bookings per month, including many that pass through third-party processors. High-value, high-risk bookings, like international or premium-class tickets, carried a higher risk. However, the system in place was not calibrated to identify who could actually be trusted.

False positives drained revenue. Missed fraud created chargebacks and policy abuse. Analysts lacked the signal precision needed to tune the system without tradeoffs. The company needed a way to improve model performance at both ends of the spectrum. Their priority was high-impact bookings where decisioning was most opaque.

Their goals were clear:

Improve fraud detection across complex booking types

Reduce false positives that were blocking legitimate travelers

Strengthen identity signals for users masked by third-party noise

Build analyst confidence in trust scores for better rule precision

Where static systems broke down

Incomplete data looked like high risk

High-trust users were scored poorly, not because of actual red flags, but because key signals were missing or unavailable

Third parties disrupted signal lineage

External processors stripped core user signals, making it harder to link booking data to a coherent identity and limiting the accuracy of risk scoring

Late detection made fraud irreversible

Fraud was often flagged after booking confirmation, but by then the platform had already issues tickets, absorbed the cost, and couldn't intervene

Signals lacked connectivity

Attributes like email and address weren't just weak, they were incomplete or unverifiable. The system couldn't interpret that gap, leaving fraud undetected

How the system changed with Elephant

Elephant calibrated the trust score using 165,000 historical flight bookings. Instead of relying on perfect input data, the model was trained to recognize meaningful patterns in obscured or fragmented signals. It performed especially well on traffic routed through third-party processors, where identity clarity was lowest. The team tested a range of scoring thresholds to capture both high-risk and high-trust segments with precision.

What we implemented:

Incomplete or unverifiable attributes like confirmed phone carriers or missing address validation were treated like signals, not gaps
Scoring logic adapted to third-party data noise, enabling confidence even when visibility was limited
Thresholds like 250 and 850 were optimized to catch fraud earlier and approve good users faster
Model performance held strong across complexity, reaching ROC-AUC of 0.87 even with limited ground truth

Proof of signal intelligence:

Emails older than 22 months were 37% more trusted
Users from low-risk ISPs were 55% more trusted
Phones previously seen in Elephant's identity graph were 25% more trusted
Billing addresses that could not be validated to the street level were 220% more risky
Phones without carrier validation showed 142% higher risk

The real win? Trust at the point of entry

The breakthrough wasn't just in performance, but also in flexibility. Elephant gave the team a model that could make smart, confident decisions even when key attributes were missing or inconsistent.

Instead of defaulting to caution, analysts could act on trust scores that accounted for processor-level noise, international bookings, and limited visibility. This meant fraud was caught earlier, approvals moved faster, and the system adapted to complexity without overreacting to it.

ROC-AUC reached 0.87

Trust scoring cleanly separated good and bad users, even in third-party obscured transactions

Good detection reached 53% at an 850 threshold

Trusted users were surfaced more confidently, enabling smarter decisions without sacrificing control

Fraud detection reached 37% at a 250 threshold

Threshold tuning helped the team capture more fraud earlier in the funnel, even with third-party obscured inputs

False negative rate dropped to 1.6% at a 950 threshold

The highest confidence zone excluded almost all remaining fraud, enabling near-automatic approvals