CIPHERER

Case Study

TrustWallet (Binance): Web3 data lakehouse at wallet scale

Client TrustWallet (Binance)
Mission Make on-chain behaviour observable in real time so attackers cannot move through the system unseen.
220M+ users

Stakes

TrustWallet is one of the world's largest non-custodial crypto wallets, serving over 220 million users across hundreds of blockchain networks. At that scale, fraud is not a quarterly review item, it is a real-time signal. Every minute that on-chain behaviour goes unanalysed is a minute attackers can move through the system unobserved. The brief was to make on-chain transaction data, user analytics and behavioural signals queryable in seconds, not days, and to do it without forklifting a multi-petabyte data estate.

Constraints

  • Multi-source data: on-chain transactions across many networks, off-chain user telemetry, behavioural events, and third-party intelligence feeds
  • Compliance-grade audit trail required for every query, every transformation, every model output
  • Fraud and risk teams needed sub-minute insight on suspicious flows; product analytics needed historical depth
  • Existing data tooling was fragmented across regional teams with inconsistent governance
  • Operating inside Binance's broader security and compliance envelope

Approach

Lakehouse over warehouse

We designed a Databricks and Delta Lake lakehouse as the single source of truth for on-chain and off-chain data. A lakehouse pattern (rather than a classic warehouse) was the right shape because we needed both ML-native access for fraud and product intelligence and SQL-native access for analyst-led investigation, without forking pipelines.

Bronze, silver, gold layered ingestion

Raw on-chain events landed in a bronze layer with full lineage and immutable audit; silver layers normalised entities, addresses and behavioural signals; gold layers fed downstream consumers (fraud detection, product analytics, customer-facing intelligence). Each layer carried a clear governance contract.

ML-ready infrastructure

Models for fraud detection and behavioural classification ran on SageMaker and Databricks ML, fed directly from the gold layer. Model lifecycle, evaluation and rollback were treated as first-class delivery artefacts, not afterthoughts.

Security and compliance baked in

Continuous compliance checks, encryption at rest and in transit, principle-of-least-privilege IAM, and full Delta Lake transaction history meant any auditor could trace any value to its origin. This was not retrofitted; it was the foundation.

Deliverables

  • Production data lakehouse on Databricks and Delta Lake unifying on-chain, off-chain and behavioural data
  • Real-time fraud detection pipelines with sub-minute insight on suspicious flows
  • ML platform on SageMaker and Databricks for model training, evaluation and deployment
  • Governance controls including encryption, IAM, audit logging and lineage
  • Operational runbooks, incident response patterns and on-call ownership model
  • Knowledge transfer to internal data and security teams

Outcome

TrustWallet now operates a real-time fraud detection capability and ML-driven product intelligence layer on a single, governed, audit-ready data foundation. Decisions that previously took days of cross-team triage now happen inside the platform on minutes-old data, at the scale of one of the largest wallets in the world.

Stack

  • Databricks
  • Delta Lake
  • AWS
  • Amazon SageMaker
  • Terraform
  • GitHub Actions
  • Apache Spark

Compliance posture

  • SOC 2 alignment
  • Continuous audit trail
  • Encryption at rest and in transit
  • Least-privilege IAM