API security is stateless.
Attacks are not.

APIStrike introduces API Behavioural Penetration Testing (ABPT) — a system that evaluates how APIs behave under real-world attack conditions across users, time, and concurrency.

What stateless analysis cannot see

Single-request analysis cannot produce evidence of exploitability. It tests inputs. Exploits require sequences.
Race conditions do not exist at the request level. They exist in execution overlap. Stateless scanners cannot model overlap.
Access control failures emerge across actors, not inputs. If you test one user at a time, you will never find IDOR.
DAST
Request Response
ABPT
Actor A Actor B API State t₁ t₂ t₃ concurrent multi-actor × time × concurrency

Same endpoint. Different conclusions.

DAST
POST /api/account/withdraw
{"amount": 100}
Balance: £100
→ 200 OK. Withdrawal succeeds.
→ Balance: £0
→ No issue detected.
✓ Request valid. Response valid. Test passed.
ABPT
2× concurrent POST /api/account/withdraw
{"amount": 100}
Balance: £100
→ Both return 200 OK.
→ Balance: -£100
→ Race condition confirmed.
Tier A │ CWE-362 │ Double-spend via concurrent state mutation
DAST
GET /api/orders/1042
Authorization: Bearer token_user_A
→ 200 OK. Returns order data.
→ No issue detected.
✓ Authenticated request. Valid response. Test passed.
ABPT
GET /api/orders/1042
Authorization: Bearer token_user_B
→ 200 OK. Returns User A's order data.
→ Cross-actor data exposure confirmed.
Tier A │ CWE-639 │ IDOR — Actor B retrieves Actor A's records

Every request above is valid in isolation. The vulnerability only exists in the relationship between them. DAST tests requests. ABPT tests behaviour.

ABPT formalisation

ABPT evaluates API security through stateful, multi-actor, and concurrent interaction modelling.

Actor Symmetry

Whether different actors receive consistent security enforcement on the same resource.

e.g. User A's session token retrieves User B's order history via IDOR.

Temporal Logic

Whether sequential operations respect time-dependent constraints and ordering requirements.

e.g. Checkout accepts payment confirmation before inventory reservation completes.

Concurrency Integrity

Whether simultaneous operations maintain transactional correctness under parallel execution.

e.g. Two concurrent withdrawal requests both succeed against insufficient balance.

State Violation

Whether stateful workflows can be manipulated by replaying, skipping, or reordering steps.

e.g. Password reset token accepted after it was already consumed in a prior request.

Proof of depth

Race Condition Engine

Tests Concurrent state mutation across shared resources — double-spend, inventory oversell, TOCTOU violations.
Method Fires parallel requests against state-changing endpoints with varying timing offsets and jitter patterns, then validates final state consistency across responses.
Detects Duplicate record creation, balance inconsistency, state desynchronisation between concurrent actors.

Differential Access Engine

Tests Cross-actor authorisation boundary integrity — whether resource access is correctly scoped to the requesting identity.
Method Replays identical requests across simulated actors with different privilege levels. Compares response codes, body content, and data exposure across role boundaries.
Detects IDOR, horizontal privilege escalation, role boundary failures, data leakage through differential response analysis.

Timing Analysis Engine

Tests Information leakage through response time variance — whether server processing time reveals internal branching logic.
Method Measures response latency distributions across valid and invalid inputs with statistical significance testing. Compares percentile spreads to isolate timing-dependent branches.
Detects Username enumeration, password oracle via timing side channel, branch-based information disclosure.

Session Replay Engine

Tests Session lifecycle integrity and token validation — whether authentication artefacts can be reused, forged, or manipulated.
Method Replays modified tokens with algorithm substitution (alg:none), signature stripping, claim manipulation, and expired credential reuse. Tests post-logout token validity.
Detects JWT algorithm confusion, session fixation, token reuse after invalidation, weak signing secrets.

Mutation Engine

Tests Input boundary resilience across 600+ payloads spanning 14 injection categories.
Method Systematically injects payloads across parameter positions with baseline response hashing for false positive elimination. Covers SQLi, XSS, SSTI, command injection, XXE, LDAP injection, and polyglot vectors.
Detects Injection acceptance confirmed through behavioural response analysis — not pattern matching against known signatures.

Signal Fusion Layer

Tests Cross-engine signal correlation — whether findings from independent engines compound into higher-severity exploits.
Method Aggregates results from all engines, applies a five-tier evidence classification hierarchy, eliminates duplicate signals, and computes confidence-weighted severity with behavioural multipliers.
Detects Compound vulnerabilities invisible to individual test suites. A timing oracle combined with an auth bypass indicates credential stuffing viability — neither finding alone communicates that risk.

Tier classification

Tier Classification Criteria
Tier A Confirmed Exploit Behavioural evidence demonstrates real-world exploitability. Immediate action required.
Tier B Risky Behaviour Strong signals with consistent reproduction. Exploitable under specific conditions.
Tier C Weak Hardening Defence-in-depth gap. Not directly exploitable but reduces attack surface resilience.
Tier D Observed Behaviour Noted behaviour. Not evidence of a vulnerability in current context.
Tier A  │  CWE-639  │  IDOR via path traversal

Evidence: Actor 1 session token retrieves Actor 2 order data
Confidence: confirmed_behavior
Validation: Response body contains different user's PII
Fix: Implement object-level authorisation checks bound to session identity

Findings are not pattern matches. They are behaviourally validated outcomes.
If a finding reaches Tier A, it has been reproduced, cross-validated, and confirmed exploitable.

Signal-to-noise performance

22,748 → 72
Raw probe signals reduced to actionable findings. One endpoint. Full mode. Zero false positives in Tier A.
600+
Concurrent probes per endpoint across 12 independent test suites. 14 injection categories. 50+ reclassification rules.
5-tier
Evidence classification hierarchy. confirmed_behavior → target_responded → inconclusive → transport_failed → scanner_rejected.
Methodology: The signal fusion pipeline processes 600+ probes per endpoint across 12 concurrent suites. Raw results pass through a five-level evidence classifier (confirmed_behavior → target_responded → inconclusive → transport_failed → scanner_rejected), then through 50+ context-aware reclassification rules that account for endpoint properties (public vs. protected, read vs. write, data sensitivity). The grouping engine consolidates related findings — four missing security headers become one hardening item, not four. Behavioural multipliers weight confirmed exploit classes (auth bypass ×1.5, privilege escalation ×1.4, race condition ×1.3) to ensure scoring reflects real-world impact, not probe count.
Validated against OWASP API Top 10 attack patterns across 38+ vulnerable endpoint configurations.

Developer entry point

$ pip install apistrike
$ apistrike scan --behavioural ./spec.yaml

Supported Formats

  • OpenAPI 3.x / Swagger 2.0
  • Single endpoint URL
  • JSON export

Execution Modes

  • --mode quick  9 suites, ~150s
  • --mode full  12 suites, ~300s
  • --output json  machine-readable
Security testing that ignores behaviour will miss behavioural exploits.

ABPT is not an extension of DAST. It replaces it.