Home · Detection Architecture

The GG detection stack as we have mapped it from outside

Updated 27 May 2026 14 min read

By Raul Moriarty ·Poker Software Expert

We do not have an inside source. What we do have is a decade of bans, freezes, and quiet wins to map against. Our working picture of GG's detection has four layers — behavioural fingerprinting, statistical play-pattern, account-graph, and the human reviewer who actually signs the ban — and that picture has held up well enough that our last build went 11 months before its first flag.

Summary

GGPoker runs a four-layer detection system. No single layer is conclusive on its own; signals accumulate over weeks into a per-account score, gated by an adjustable false-positive budget.
Behavioural fingerprinting (action-timing distributions, input curvature, time spent per step) is the cheapest layer to run and the one that catches the least careful implementations first.
Statistical play-pattern analysis flags pure GTO output faster than a noisier human strategy, because population variance is the baseline it compares against.
The highest-impact bans come from the collusion graph layer; bot farming often gets caught here as a side-effect of multi-accounting under one fingerprint.
Human review of flagged accounts is the decisive layer — most botting bans are signed off by a reviewer, not by an automated rule.
Anti-detection is an adversarial-classification problem (the Dalvi 2004 / Lowd & Meek 2005 lineage), not a checklist of features to spoof.

What counts as cheating in GGPoker's terms

The categories matter because each one carries its own signal stack, false-positive budget, and consequence path. The security team actively works against five categories that are banned under the public terms of service:

Prohibited categories — operator priority and detection difficulty
Category	Operator priority	Detection difficulty	Typical signal
Collusion / chip dumping	Highest (regulatory exposure)	Medium	Account graph + suspicious hand sequences
Multi-accounting	High	Low–Medium	Device fingerprint + KYC join
Botting (automated play)	High	Medium	Behavioural fingerprint + play-pattern
Real-time assistance (RTA)	Medium-High	High	Statistical play-pattern over volume
External HUDs / overlays	Medium	Low (client telemetry)	Client-side process detection
Ghosting	Medium (event-driven spikes during major MTTs)	High	Win-rate vs known-skill baseline + IP joins

The operator should prioritize collusion above all else as it has the greatest negative impact on customers. The botting and RTA will be the second most important to address. Enforcement of using external HUDs will be an easy method for the client process to identify since they are running outside of their process space. Ghosting appears to spike in frequency right before and after major tournaments and thus receives disproportional amounts of attention by the community at large during these times.

The four-layer detection model

The stack of items that is visible to an external observer has a total of four components. It's likely there are additional, unobservable elements (heuristics, artificial intelligence-based scoring systems, hidden signals) which affect a customer's account; however, the above list includes the four that can be observed in terms of their effect on customer accounts.

Layer 1: Behavioural fingerprinting: Client telemetry on input timing, mouse-path geometry, touch dwell on mobile, action-confirmation latency, idle behaviour between hands. Cheap to compute, runs continuously, feeds into a behavioural score per session. Bites naive implementations hardest.
Layer 2: Statistical play-pattern analysis: Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute, runs nightly or weekly, produces a play-pattern outlier score.
Layer 3: Anti-collusion graph models: Account graph joined by IP, device fingerprint, deposit method, KYC document, table co-occurrence, action correlations within hands. Catches multi-accounting and chip dumping; botting falls out as a side-product when a farm runs under a single fingerprint.
Layer 4: Human review: The final decision point. The mathematical models propose; a human decides. Reviewers read the session's hand history, check session start/stop times against the account's reported time zone, and look for the small human tells a bot rarely produces — chatting "nh" to a fish, sitting out to take a phone call, the occasional typo in chat. Most botting bans are signed off here, not by an automated rule.

The four layers are weighted asynchronously. Layer 1 produces a high-frequency score that mostly stays under threshold. Layer 2 runs offline and contributes to a per-account risk score that decays slowly. Layer 3 is event-driven by graph changes. Layer 4 is the bottleneck — reviewer capacity is limited, so a queue is maintained and prioritised by combined risk score, expected revenue impact, and recent withdrawal activity.

Signal weights and observable failure modes

The actual weightings of the system are confidential to the operators; however, since we can infer the relative weighting based on how often customers' accounts were "caught", as well as the sequence of events that triggered the catches (i.e., first, second), it has a reliable-enough pattern to support both the design/building of such systems by other developers/engineers as well as defense/counter-attack strategies.

Detection signals × observable weight × failure mode
Signal	Layer	Relative weight	Naive failure mode
Action-timing variance < population	L1	High	Constant-latency action emission
Touch coordinate clustering on buttons	L1	Medium	Pixel-perfect tap on button centroid
Idle behaviour between hands too uniform	L1	Medium	No micro-movement, no chat, no occasional tab-switch
VPIP/PFR ratio at population mass with low variance	L2	High	Pure GTO baseline, no human-noise overlay
Bet sizing clustered on exact pot fractions	L2	High	Solver output without sizing perturbation
Win rate persistently outside skill-pool envelope	L2	Very High	Hot run, high stakes, no manual sessions interleaved
Shared device fingerprint across accounts	L3	Very High (regulatory)	Bot farm on one IP / device
Withdrawal pattern → big-bang on first cashout	L3+L4	High	Quiet grind for 30 days, then large withdrawal
Chat behaviour: zero outgoing messages over 5k+ hands	L4	Medium	Bot never says "nh"
Sit-out behaviour: never sits out on bad table	L4	Medium	Bot grinds whoever sits at it

The consistency of this pattern is evident; the least computationally intensive layers (Layers 1 & 3) tend to be targeted by the most casual users (and implementations), whereas the layers requiring either substantial computational resources (Layer 2), or significant human time/attention (Layer 4), will attract the more capable/complex implementations. It’s also why an implementation can continue to successfully evade Layer 1 detection yet fail at Layer 2 for an extended period (weeks/months) until the online player behavior patterns accumulate sufficient review triggers to exceed an off-line "play-pattern" score for a reviewer. As such, it clearly explains the commonly observed lag in time (usually from 2-9 months, with a median of approximately 8-14 weeks) between a user's initial introduction of a bot into their account, and when they are subsequently identified as using one.

Action-timing fingerprints

Human action-timing forms a wide log-normal distribution; a naive bot clusters in a narrow spike. The shape mismatch is what statistical detection flags. — A fixed think-time is itself a fingerprint — human latency is wide and state-dependent.

This is the most discussed and most poorly implemented signal of all. A naive bot acts at a fixed interval, or with uniform random noise around a central mean. Both are disastrous.

Real human action-timing distributions are roughly log-normal, with long right tails and a strong dependence on game state. A snap-fold of obvious garbage takes 600–1200ms; a hard river decision can run 5–30 seconds; a routine flop continuation-bet on a clean board lands in 1.5–4 seconds. The distribution is not merely wider than a naive bot's — its shape is fundamentally different, and that difference is itself a fingerprint.

# Schematic: behaviourally-shaped action timing
# Conceptual, not the production implementation

def sample_action_delay(decision_difficulty, action_type, hand_state):
    """Return seconds-to-act drawn from a state-conditional log-normal."""
    # Difficulty in [0,1]: 0 = trivial fold, 1 = boundary call
    mu_base = {
        'fold_trivial':   math.log(0.9),
        'cbet_routine':   math.log(2.4),
        'check_routine':  math.log(1.6),
        'river_boundary': math.log(8.5),
        'all_in_decision':math.log(12.0),
    }[action_type]

    # Difficulty stretches mu logarithmically
    mu = mu_base + 0.7 * decision_difficulty

    # Sigma rises with difficulty — humans deliberate variably on hard spots
    sigma = 0.35 + 0.55 * decision_difficulty

    delay = random.lognormvariate(mu, sigma)

    # ~3% chance of distraction tail: 8–25s independent of difficulty
    if random.random() < 0.03:
        delay += random.uniform(8, 25)

    # Floor at a non-zero minimum; humans cannot react in < 250ms
    return max(0.25, delay)

The example is schematic. Production systems condition on more variables — stack depth, opponent action sequence, position, multiway versus heads-up, and a per-session "alertness" parameter that drifts down over long sessions to mimic fatigue. The point is that the right behaviour is not "add noise" — it is "draw from a distribution whose shape matches the population, conditioned on state."

False-positive budget and review pipeline

The primary restraint upon the entire Stack is False-Positive Cost. GGPoker can't lose a large number of customers by banning significant amounts of players who are winning legitimately. Each time there is a false positive it results in a Regulatory Complaint, Chargeback, Forum Post or Churned Customer. The Detection System has been set up to run at a conservative level for false positives so as not to have too many Automated Signals automatically result in some type of Action.

What they trigger is a review queue placement. The visible stages from outside, in order:

Quiet flag. Account moves into a higher-scrutiny review bucket. No visible change to the player. Telemetry continues.
Soft restriction. Withdrawal limits drop. KYC re-verification requested. Bonus eligibility quietly removed. Some players notice and modify behaviour; most don't.
Structured interview. Support requests "clarifying information" about play style, schedule, and software use. The interview is logged and the answers are matched against the play-pattern model.
Confiscation and closure. Winnings voided, balance held pending investigation, account closed. The investigation period extends from weeks to months depending on jurisdiction.

The cycle from first quiet flag to confiscation typically runs 14 days to 9 months, anchored on review-queue capacity and triggering events (especially withdrawal activity). The longest cycles we've seen are accounts that ran quietly for a year, withdrew their first significant amount, and were reviewed 15 days after the withdrawal. The mathematical signal was present from month two; the human review was queued only by the withdrawal event.

Anti-detection as adversarial classification

The standard mistake among bot builders is to treat detection as a feature checklist — add latency noise, vary touch coordinates, randomise schedule. This is the wrong frame. Detection is an adversarial classifier: the operator builds a model that distinguishes bot behaviour from human behaviour, and the bot's job is to produce a behaviour distribution the classifier cannot separate from the human distribution while preserving EV.

The formal literature on this dates to Dalvi et al. (2004), Adversarial Classification, and Lowd & Meek (2005), Adversarial Learning. The setting is identical in structure: an attacker (here, the bot) chooses an action that maximises expected utility under a classifier whose decision boundary the attacker can probe but not fully observe. The modern adversarial-ML literature (Goodfellow et al. 2014 onward) extends this with neural-network classifiers, gradient-based attacks, and the certified-robustness lineage.

Three operational consequences fall out of the formal frame:

The classifier's decision boundary is non-stationary: Operators need to train their systems again to detect new bots. Behaviour that was undetected as bot-behaviour in 2024 might be undetectable in 2026.
Population baseline is the right reference, not "looking human": The classifier separates your bot's distribution from the population distribution — not from some abstract notion of "what a human looks like." If the NL50 6-max population has a bet-sizing histogram with an extended tail of small overbets, then your bot needs that same tail. The goal is not to look "more human" in the abstract; it is to match the population the classifier was trained on.
EV-detection tradeoff is the right optimisation target: Pure-GTO output maximises EV under fixed opponents. Behaviourally-shaped output gives up some EV in exchange for a lower detection score. The right optimum is not zero detection — it is the EV-maximising point under a budgeted detection probability over the account's expected lifetime.

This perspective can explain an apparent contradiction as well: Pure GTO bots get banned quicker than less-than-optimal bots with overlayed human noise. Although the GTO bot makes more profit on average per hand it is easier to identify; therefore, has a lower number of hands played before being identified by the system for removal.

Have a question? Talk to us

Adversarial classification in this domain, behavioural shaping under EV constraints, detection-system architecture from the operator side — questions on any of it land with the Poker Bot AI team.

Contact the team

References and related work

Selected sources on the above topics. Names and identifiers provided; URLs are stable (arXiv) and persistent (Science).

Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). The reference result for 6-max NLH at superhuman level.
Moravčík et al., 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356. arXiv:1701.01724.
Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS (Libratus core technique).
Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD. The foundational paper on the adversarial-classifier framing.
Lowd & Meek, 2005. Adversarial Learning. KDD. Probing the decision boundary of a deployed classifier.
Heinrich & Silver, 2016. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.

The companion notes on this site cover the broader picture: why "GGPoker hacks" do not exist and the homepage's overview of what we mean by "poker bot" in 2026. The FAQ answers specific implementation questions that come up regularly in the chat.