Home · Detection Architecture

GGPoker Cheating Detection: Architecture, Signals, Failure Modes

14 min read

By Raul Moriarty ·Poker Software Expert

A reverse-engineered map of what GGPoker's security stack actually looks like from the outside — behavioural fingerprinting, statistical play-pattern analysis, anti-collusion graph models, and the human review layer that signs everything off.

Summary

  • GGPoker runs a four-layer detection stack. No single layer is dispositive; signals accumulate over weeks against a per-account score with a tunable false-positive budget.
  • Behavioural fingerprinting (action-timing distributions, input curvature, dwell times) is the cheapest layer and the most punishing to naive implementations.
  • Statistical play-pattern analysis catches pure-GTO output paradoxically faster than noisy strong-human play, because population variance is the baseline.
  • The collusion graph model is the layer responsible for most high-impact bans. Botting is often a side-product of catching multi-account farms under one fingerprint.
  • Human review on flagged accounts is the decisive layer. Most bot bans are signed off by a reviewer, not an automated rule.
  • Anti-detection is an adversarial classification problem — Dalvi 2004, Lowd & Meek 2005 lineage — not a feature checklist.

What counts as cheating in GGPoker's terms

Categorisation matters because each category has its own signal stack, false-positive budget and consequence path. The five categories that the public terms of service prohibit and that the security team actively works against:

Prohibited categories — operator priority and detection difficulty
CategoryOperator priorityDetection difficultyTypical signal
Collusion / chip dumpingHighest (regulatory exposure)MediumAccount graph + suspicious hand sequences
Multi-accountingHighLow–MediumDevice fingerprint + KYC join
Botting (automated play)HighMediumBehavioural fingerprint + play-pattern
Real-time assistance (RTA)Medium-HighHighStatistical play-pattern over volume
External HUDs / overlaysMediumLow (client telemetry)Client-side process detection
GhostingMedium (event-driven spikes during major MTTs)HighWin-rate vs known-skill baseline + IP joins

Collusion is the operator's first priority because it hurts customers most directly and creates the biggest regulatory exposure. Botting and RTA are next; external HUDs are a low-effort enforcement that the client process can detect locally; ghosting spikes around major series and gets disproportionate attention during them.

The four-layer detection model

The stack visible from outside contains four layers. There are almost certainly more pieces we cannot see — internal heuristics, machine-learned scoring models, undisclosed signals — but these are the four whose effects on customer accounts are observable.

Layer 1: Behavioural fingerprinting
Client telemetry on input timing, mouse-path geometry, touch dwell on mobile, action-confirmation latency, idle behaviour between hands. Cheap to compute, runs continuously, feeds into a behavioural score per session. Bites naive implementations hardest.
Layer 2: Statistical play-pattern analysis
Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute, runs nightly or weekly, produces a play-pattern outlier score.
Layer 3: Anti-collusion graph models
Account graph joined by IP, device fingerprint, deposit method, KYC document, table co-occurrence, action correlations within hands. Catches multi-accounting and chip dumping; botting falls out as a side-product when a farm runs under a single fingerprint.
Layer 4: Human review
The decisive layer. Mathematical models propose; humans decide. A reviewer reads hand history, checks chat behaviour, looks at session start/stop patterns relative to client-reported timezone, sees whether the account exhibits the small human imperfections (a fish-call mid-session, a sit-out for a phone call, a typo in chat). Most botting bans clear through here.

The four layers are weighted asynchronously. Layer 1 produces a high-frequency score that mostly stays under threshold. Layer 2 runs offline and contributes to a per-account risk score that decays slowly. Layer 3 is event-driven by graph changes. Layer 4 is the bottleneck — reviewer capacity is limited, so a queue is maintained and prioritised by combined risk score, expected revenue impact, and recent withdrawal activity.

Signal weights and observable failure modes

The exact weights are operator-confidential. The relative weights can be inferred from observed customer outcomes — which accounts get caught, in what order, after what triggering event. The pattern is consistent enough to be useful for engineers building or defending against these systems.

Detection signals × observable weight × failure mode
SignalLayerRelative weightNaive failure mode
Action-timing variance < populationL1HighConstant-latency action emission
Touch coordinate clustering on buttonsL1MediumPixel-perfect tap on button centroid
Idle behaviour between hands too uniformL1MediumNo micro-movement, no chat, no occasional tab-switch
VPIP/PFR ratio at population mass with low varianceL2HighPure GTO baseline, no human-noise overlay
Bet sizing clustered on exact pot fractionsL2HighSolver output without sizing perturbation
Win rate persistently outside skill-pool envelopeL2Very HighHot run, high stakes, no manual sessions interleaved
Shared device fingerprint across accountsL3Very High (regulatory)Bot farm on one IP / device
Withdrawal pattern → big-bang on first cashoutL3+L4HighQuiet grind for 30 days, then large withdrawal
Chat behaviour: zero outgoing messages over 5k+ handsL4MediumBot never says "nh"
Sit-out behaviour: never sits out on bad tableL4MediumBot grinds whoever sits at it

The pattern is consistent: the layers that are cheapest to compute (L1, L3) catch the laziest implementations, while the layers that require either heavy compute (L2) or expensive human attention (L4) catch the more sophisticated cases. An implementation that defeats L1 but ignores L2 will run for weeks or months before the offline play-pattern score crosses a review threshold. This explains the long observed lag between starting to use a bot and getting caught — typically 2 weeks to 9 months, with the median around 8–14 weeks contingent on session volume and withdrawal activity.

Action-timing fingerprints

The single most-discussed and least-well-engineered signal. Naive implementations emit actions on a fixed latency or with a uniform random latency around a centroid. Both are catastrophically wrong.

Real human action-timing distributions are log-normal-ish with heavy right tails and game-state dependence. A snap-fold of a garbage hand takes 600–1200ms; a tough river decision takes 5–30 seconds; a routine flop continuation-bet on a clean board takes 1.5–4 seconds. The distribution is not just wider than a naive bot's — its shape is fundamentally different, and the shape is a fingerprint.

# Schematic: behaviourally-shaped action timing
# Conceptual, not the production implementation

def sample_action_delay(decision_difficulty, action_type, hand_state):
    """Return seconds-to-act drawn from a state-conditional log-normal."""
    # Difficulty in [0,1]: 0 = trivial fold, 1 = boundary call
    mu_base = {
        'fold_trivial':   math.log(0.9),
        'cbet_routine':   math.log(2.4),
        'check_routine':  math.log(1.6),
        'river_boundary': math.log(8.5),
        'all_in_decision':math.log(12.0),
    }[action_type]

    # Difficulty stretches mu logarithmically
    mu = mu_base + 0.7 * decision_difficulty

    # Sigma rises with difficulty — humans deliberate variably on hard spots
    sigma = 0.35 + 0.55 * decision_difficulty

    delay = random.lognormvariate(mu, sigma)

    # ~3% chance of distraction tail: 8–25s independent of difficulty
    if random.random() < 0.03:
        delay += random.uniform(8, 25)

    # Floor at a non-zero minimum; humans cannot react in < 250ms
    return max(0.25, delay)

The example is schematic. Production systems condition on more variables — stack depth, opponent action sequence, position, multiway versus heads-up, and a per-session "alertness" parameter that drifts down over long sessions to mimic fatigue. The point is that the right behaviour is not "add noise" — it is "draw from a distribution whose shape matches the population, conditioned on state."

False-positive budget and review pipeline

The constraint that shapes the entire stack is false-positive cost. GGPoker cannot afford to ban significant numbers of legitimate winning players — every false positive is a regulatory complaint, a chargeback, a forum post, a churned customer. The detection system runs at a deliberately conservative false-positive rate, which means most automated signals do not trigger action on their own.

What they trigger is a review queue placement. The visible stages from outside, in order:

  1. Quiet flag. Account moves into a higher-scrutiny review bucket. No visible change to the player. Telemetry continues.
  2. Soft restriction. Withdrawal limits drop. KYC re-verification requested. Bonus eligibility quietly removed. Some players notice and modify behaviour; most don't.
  3. Structured interview. Support requests "clarifying information" about play style, schedule, and software use. The interview is logged and the answers are matched against the play-pattern model.
  4. Confiscation and closure. Winnings voided, balance held pending investigation, account closed. The investigation period extends from weeks to months depending on jurisdiction.

The cycle from first quiet flag to confiscation typically runs 14 days to 9 months, anchored on review-queue capacity and triggering events (especially withdrawal activity). The longest cycles we've seen are accounts that ran quietly for a year, withdrew their first significant amount, and were reviewed 15 days after the withdrawal. The mathematical signal was present from month two; the human review was queued only by the withdrawal event.

Anti-detection as adversarial classification

The standard mistake among bot builders is to treat detection as a feature checklist — add latency noise, vary touch coordinates, randomise schedule. This is the wrong frame. Detection is an adversarial classifier: the operator builds a model that distinguishes bot behaviour from human behaviour, and the bot's job is to produce a behaviour distribution the classifier cannot separate from the human distribution while preserving EV.

The formal literature on this dates to Dalvi et al. (2004), Adversarial Classification, and Lowd & Meek (2005), Adversarial Learning. The setting is identical in structure: an attacker (here, the bot) chooses an action that maximises expected utility under a classifier whose decision boundary the attacker can probe but not fully observe. The modern adversarial-ML literature (Goodfellow et al. 2014 onward) extends this with neural-network classifiers, gradient-based attacks, and the certified-robustness lineage.

Three operational consequences fall out of the formal frame:

The classifier's decision boundary is non-stationary
Operators retrain on detected bots. A behaviour that defeated detection in 2024 may fail in 2026. The half-life of a static implementation is empirically around 6–18 months.
Population baseline is the right reference, not "looking human"
The classifier separates your distribution from the population distribution, not from "what a human looks like" in the abstract. If the population at NL50 6-max has a specific bet-sizing histogram with a long tail on small overbets, your bot needs that tail. Not because it is "more human" but because the classifier is benchmarked against the population.
EV-detection tradeoff is the right optimisation target
Pure-GTO output maximises EV under fixed opponents. Behaviourally-shaped output gives up some EV in exchange for a lower detection score. The right optimum is not zero detection — it is the EV-maximising point under a budgeted detection probability over the account's expected lifetime.

This framing also explains a counter-intuitive observation: pure-GTO bots get caught faster than slightly-weaker bots that include human-noise overlays. The GTO bot is more profitable per hand but more separable from the population, so its expected lifetime is shorter. The product of (EV per hand) × (expected hands before ban) is what matters, not either factor alone.

Have a question? Talk to us

Adversarial classification in this domain, behavioural shaping under EV constraints, detection-system architecture from the operator side — questions on any of it land with the Poker Bot AI team.

Join the chat

References and related work

Selected references for the topics above. Names and identifiers given; URLs are stable for arXiv and persistent for the Science papers.

  • Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). The reference result for 6-max NLH at superhuman level.
  • Moravčík et al., 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356. arXiv:1701.01724.
  • Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS (Libratus core technique).
  • Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD. The foundational paper on the adversarial-classifier framing.
  • Lowd & Meek, 2005. Adversarial Learning. KDD. Probing the decision boundary of a deployed classifier.
  • Heinrich & Silver, 2016. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.

The companion notes on this site cover the broader picture: why "GGPoker hacks" do not exist and the homepage's overview of what we mean by "poker bot" in 2026. The FAQ answers specific implementation questions that come up regularly in the chat.