APIStrike — API Behavioural Penetration Testing

Where DAST fails

Same endpoint. Different conclusions.

DAST

POST /api/account/withdraw
{"amount": 100}
Balance: £100

→ 200 OK. Withdrawal succeeds.
→ Balance: £0
→ No issue detected.

✓ Request valid. Response valid. Test passed.

ABPT

2× concurrent POST /api/account/withdraw
{"amount": 100}
Balance: £100

→ Both return 200 OK.
→ Balance: -£100
→ Race condition confirmed.

Tier A │ CWE-362 │ Double-spend via concurrent state mutation

DAST

GET /api/orders/1042
Authorization: Bearer token_user_A

→ 200 OK. Returns order data.
→ No issue detected.

✓ Authenticated request. Valid response. Test passed.

ABPT

GET /api/orders/1042
Authorization: Bearer token_user_B

→ 200 OK. Returns User A's order data.
→ Cross-actor data exposure confirmed.

Tier A │ CWE-639 │ IDOR — Actor B retrieves Actor A's records

Every request above is valid in isolation. The vulnerability only exists in the relationship between them. DAST tests requests. ABPT tests behaviour.

Definition

ABPT formalisation

ABPT evaluates API security through stateful, multi-actor, and concurrent interaction modelling.

Actor Symmetry

Whether different actors receive consistent security enforcement on the same resource.

e.g. User A's session token retrieves User B's order history via IDOR.

Temporal Logic

Whether sequential operations respect time-dependent constraints and ordering requirements.

e.g. Checkout accepts payment confirmation before inventory reservation completes.

Concurrency Integrity

Whether simultaneous operations maintain transactional correctness under parallel execution.

e.g. Two concurrent withdrawal requests both succeed against insufficient balance.

State Violation

Whether stateful workflows can be manipulated by replaying, skipping, or reordering steps.

e.g. Password reset token accepted after it was already consumed in a prior request.

Engine architecture

Proof of depth

Race Condition Engine

Tests Concurrent state mutation across shared resources — double-spend, inventory oversell, TOCTOU violations.

Method Fires parallel requests against state-changing endpoints with varying timing offsets and jitter patterns, then validates final state consistency across responses.

Detects Duplicate record creation, balance inconsistency, state desynchronisation between concurrent actors.

Differential Access Engine

Tests Cross-actor authorisation boundary integrity — whether resource access is correctly scoped to the requesting identity.

Method Replays identical requests across simulated actors with different privilege levels. Compares response codes, body content, and data exposure across role boundaries.

Detects IDOR, horizontal privilege escalation, role boundary failures, data leakage through differential response analysis.

Timing Analysis Engine

Tests Information leakage through response time variance — whether server processing time reveals internal branching logic.

Method Measures response latency distributions across valid and invalid inputs with statistical significance testing. Compares percentile spreads to isolate timing-dependent branches.

Detects Username enumeration, password oracle via timing side channel, branch-based information disclosure.

Session Replay Engine

Tests Session lifecycle integrity and token validation — whether authentication artefacts can be reused, forged, or manipulated.

Method Replays modified tokens with algorithm substitution (alg:none), signature stripping, claim manipulation, and expired credential reuse. Tests post-logout token validity.

Detects JWT algorithm confusion, session fixation, token reuse after invalidation, weak signing secrets.

Mutation Engine

Tests Input boundary resilience across 600+ payloads spanning 14 injection categories.

Method Systematically injects payloads across parameter positions with baseline response hashing for false positive elimination. Covers SQLi, XSS, SSTI, command injection, XXE, LDAP injection, and polyglot vectors.

Detects Injection acceptance confirmed through behavioural response analysis — not pattern matching against known signatures.

Signal Fusion Layer

Tests Cross-engine signal correlation — whether findings from independent engines compound into higher-severity exploits.

Method Aggregates results from all engines, applies a five-tier evidence classification hierarchy, eliminates duplicate signals, and computes confidence-weighted severity with behavioural multipliers.

Detects Compound vulnerabilities invisible to individual test suites. A timing oracle combined with an auth bypass indicates credential stuffing viability — neither finding alone communicates that risk.

Output

Tier classification

Tier	Classification	Criteria
Tier A	Confirmed Exploit	Behavioural evidence demonstrates real-world exploitability. Immediate action required.
Tier B	Risky Behaviour	Strong signals with consistent reproduction. Exploitable under specific conditions.
Tier C	Weak Hardening	Defence-in-depth gap. Not directly exploitable but reduces attack surface resilience.
Tier D	Observed Behaviour	Noted behaviour. Not evidence of a vulnerability in current context.

Tier A │ CWE-639 │ IDOR via path traversal

Evidence: Actor 1 session token retrieves Actor 2 order data

Confidence: confirmed_behavior

Validation: Response body contains different user's PII

Fix: Implement object-level authorisation checks bound to session identity

Findings are not pattern matches. They are behaviourally validated outcomes.
If a finding reaches Tier A, it has been reproduced, cross-validated, and confirmed exploitable.

Benchmark

Signal-to-noise performance

22,748 → 72

Raw probe signals reduced to actionable findings. One endpoint. Full mode. Zero false positives in Tier A.

600+

Concurrent probes per endpoint across 12 independent test suites. 14 injection categories. 50+ reclassification rules.

5-tier

Evidence classification hierarchy. confirmed_behavior → target_responded → inconclusive → transport_failed → scanner_rejected.

Methodology: The signal fusion pipeline processes 600+ probes per endpoint across 12 concurrent suites. Raw results pass through a five-level evidence classifier (confirmed_behavior → target_responded → inconclusive → transport_failed → scanner_rejected), then through 50+ context-aware reclassification rules that account for endpoint properties (public vs. protected, read vs. write, data sensitivity). The grouping engine consolidates related findings — four missing security headers become one hardening item, not four. Behavioural multipliers weight confirmed exploit classes (auth bypass ×1.5, privilege escalation ×1.4, race condition ×1.3) to ensure scoring reflects real-world impact, not probe count.

Validated against OWASP API Top 10 attack patterns across 38+ vulnerable endpoint configurations.

Install

Developer entry point

$ pip install apistrike
$ apistrike scan --behavioural ./spec.yaml

Supported Formats

OpenAPI 3.x / Swagger 2.0
Single endpoint URL
JSON export

Execution Modes

--mode quick 9 suites, ~150s
--mode full 12 suites, ~300s
--output json machine-readable

API security is stateless.
Attacks are not.

What stateless analysis cannot see

Same endpoint. Different conclusions.

ABPT formalisation

Actor Symmetry

Temporal Logic

Concurrency Integrity

State Violation

Proof of depth

Race Condition Engine

Differential Access Engine

Timing Analysis Engine

Session Replay Engine

Mutation Engine

Signal Fusion Layer

Tier classification

Signal-to-noise performance

Developer entry point

Supported Formats

Execution Modes

API security is stateless.Attacks are not.

What stateless analysis cannot see

Same endpoint. Different conclusions.

ABPT formalisation

Actor Symmetry

Temporal Logic

Concurrency Integrity

State Violation

Proof of depth

Race Condition Engine

Differential Access Engine

Timing Analysis Engine

Session Replay Engine

Mutation Engine

Signal Fusion Layer

Tier classification

Signal-to-noise performance

Developer entry point

Supported Formats

Execution Modes

API security is stateless.
Attacks are not.