How Traditional Deep Packet Analysis Works

Traditional DPI is a deterministic pipeline built around protocol decoding and content signatures. Think accurate, explainable, tunable—with strict attention to performance and evasions.

Steps in traditional deep packet analysis

1) Packet Ingest & Normalization
  • Capture path: NIC → driver → capture framework (pcap/AF_PACKET/DPDK) → user/kernel space.
  • Timestamping & stamping: L2/L3 headers parsed; VLAN/QinQ handled; MTU anomalies noted.
  • Normalization: Normalize IP/TCP ambiguities (IP options ordering, ECN/DF flags, TCP options); align fragment/overlap policy with target OS semantics. Do not alter hop count (TTL).
  • IP defragmentation: Defrag IP before TCP reassembly; log/drop pathological fragment patterns (excessive counts, tiny overlaps).
2) Flow Tracking & Reassembly
  • State table: Flows keyed by 5-tuple (src/dst IP, ports, proto) with TCP state (SYN/ACK/FIN/RST).
  • TCP stream reassembly: Handle out-of-order/overlap, SACK, retransmits; enforce explicit overlap policy (e.g., “prefer server” or “last-wins”) to mirror endpoint behavior (Windows vs. Linux/BSD).
  • Message delimiting: For UDP/message-oriented protocols, delineate records (DNS, SIP). For QUIC: only Initial metadata (ClientHello/ALPN) is visible; application frames are opaque post-handshake without decryption.
3) Protocol Identification & Decoding
  • L7 classification: Static ports + dynamic heuristics (banner bytes, magic values, ALPN/SNI where visible). Where headers are absent/obfuscated, some stacks fall back to statistical/ML classifiers on flow features.
  • Decoders: Each protocol module extracts fields (HTTP method/URI/Host/MIME; DNS qname; SMB opcodes; etc.).
  • Encrypted ClientHello (ECH): Be aware SNI visibility can disappear as ECH adoption grows; expect more flows to be metadata-only.
4) Content Transforms (Canonicalization)
  • HTTP: De-chunk, de-compress gzip/deflate (if enabled), normalize encodings;
  • Mail: De-base64, boundary handling;
  • Safety: Bound transforms with limits (max output size, decompressed:compressed ratio, wall-clock) to prevent zip/regex bombs and memory pressure.
5) Content Inspection (Signature Engines)
  • Multi-pattern prefilter: Aho–Corasick / Hyperscan, Boyer–Moore family for fast fixed-string scanning.
  • Regex/JIT verification: Only shortlisted candidates enter PCRE/JIT exact matching.
  • File type detection: Magic bytes (not extensions) for PE/ELF/Office/PDF; optional file carving for DLP/AV hand-off (subject to size/type policies).
  • Rule languages: Snort/Suricata-style rules combine where (header/payload buffers & offsets), what (content/pcre), and context (flow:established, to_server, http_uri). Emphasize anchored fast patterns over rule order for throughput.

Example (Suricata, request-side heuristic):

alert http any any -> $EXTERNAL_NET 80 (
  msg:"HTTP suspicious exe request (curl UA)";
  flow:to_server,established;
  http.uri; content:".exe"; endswith; nocase; fast_pattern;
  http.user_agent; content:"curl"; nocase;
  classtype:policy-violation; sid:100001; rev:2;
)

For response-side PE delivery, pivot to to_client and file.magic/fileext buffers instead of URI/UA.

6) Policy, Actions, and Response
  • Verdicting: allow / alert / drop / rate-limit / mirror / tag.
  • Inline vs. out-of-band: IPS inline enforces in real time; IDS out-of-band generates alerts to SIEM/SOAR.
  • DLP hooks: If PII/PCI regexes trigger, block or allow with evidence logging in pure IPS paths; inline redaction/transform is feasible in proxy/SSL-inspection architectures.
7) Performance Engineering
  • Zero-copy & batching: Avoid extra copies; batch to amortize syscalls.
  • Parallelism: RSS with flow-hashing, per-queue/core pinning, lock-free rings, NUMA-aware memory.
  • Offloads: Disable LRO/GRO for IDS accuracy unless your capture path compensates (e.g., DPDK with its own reassembly). Leverage safe NIC offloads judiciously.
  • Hot paths: C/ASM/FPGA/ASIC acceleration for multi-pattern search when available.
  • Rule hygiene: Prune overlaps; anchor patterns; prefer fixed strings before regex; keep hot buffers small.
8) Evasion & Robustness Controls
  • Normalizer parity: Match endpoint OS semantics for fragment/TCP overlap handling.
  • HTTP quirks: Tolerate mixed casing, header folding/obs-fold, odd whitespace; canonicalize encodings.
  • Fragmentation tactics: Enforce sane fragment counts/sizes; drop pathological streams.
  • Timeout hygiene: Per-flow idle/total timeouts to prevent state-table exhaustion.
  • Resource guards: Per-decoder ceilings (bytes/objects/recursion depth), decompression limits, and back-pressure; define fail-open vs. fail-closed under load.
9) Telemetry & Forensics
  • Alert records: Rule ID, flow tuple, timestamps, matched buffers/offsets, protocol fields.
  • Artifacts: Optional file extracts (hash-first; bounded sizes), plus TLS fingerprints: JA3/JA3S and JA4 variants where supported. Treat fingerprints as signals, not verdicts (collisions/library churn).
  • Metrics: PPS, CPS, rule hit-rates, latency budget; feed to capacity planning.
10) Limitations (Why AI Helps Later)
  • Encryption opacity: Without TLS inspection, visibility is mostly metadata (QUIC/TLS1.3 opaque after handshake; ECH may hide SNI).
  • Signature drift: Novel malware/protocol tweaks can bypass static rules.
  • Cost of completeness: Full decoding + decompression + regex at line rate is expensive—prioritize hot paths and bound work.
11) Operational Add-Ons (Often Overlooked)
  • Key management / TLS inspection: If using lawful intercept or enterprise TLS decryption, define key custody, rotation, scope limits, and retention (hash artifacts; avoid plaintext persistence).
  • Audit posture: Version rulepacks, record engine/config hashes, and capture sampling policies for repeatability.

Quick Deployment Checklist

  • Place sensors at egress/ingress and critical east-west chokepoints.
  • Start in IDS/alert-only, baseline FPs, then graduate select rules to drop.
  • Enable HTTP normalization; ensure TCP/fragment overlap policy matches your endpoints.
  • Disable LRO/GRO (or compensate in DPDK); verify per-core RSS pinning.
  • Bound transforms (gzip/MIME) and set per-decoder ceilings.
  • Tune top-talker allowlists and disable dead rules; measure latency under load.