docs(01-format-specification): research phase domain

2026-02-24 23:05:52 +03:00
parent e0627c8653
commit 041a00913b
1 changed files with 400 additions and 0 deletions
--- a/.planning/phases/01-format-specification/01-RESEARCH.md
+++ b/.planning/phases/01-format-specification/01-RESEARCH.md
@@ -0,0 +1,400 @@
+# Phase 1: Format Specification - Research
+
+**Researched:** 2026-02-24
+**Domain:** Binary format design, cryptographic data structures, cross-platform compatibility
+**Confidence:** HIGH
+
+## Summary
+
+Phase 1 produces the sole deliverable: a complete binary format specification document that three independent implementations (Rust CLI, Kotlin Android decoder, busybox shell script) will build against. The spec must define every byte offset, field size, and encoding so that implementers have zero ambiguity. This is a documentation phase with no code.
+
+The critical design tensions are: (1) the format must be simple enough for a shell script using `dd`/`xxd`/`openssl` to parse, yet structured enough to hold per-file encryption metadata; (2) the format must accommodate Phase 6 obfuscation features (XOR headers, encrypted TOC, decoy padding) even though they are not implemented until later; (3) all three decoders share a single hardcoded 32-byte key, and the spec must resolve the open question of whether HMAC uses the same key or a derived subkey.
+
+**Primary recommendation:** Write the spec as a single Markdown document with ASCII diagrams showing byte layouts, a field-reference table for every structure (header, file table entry, data block), and a complete worked example showing the hex dump of a 2-file archive. Address obfuscation as reserved fields and documented future behavior (version flag controls whether obfuscation is active).
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| FMT-05 | Specification of the format as a document (before any implementation begins) | All findings below directly support creating this spec: field layouts, crypto parameter placement, worked example patterns, shell compatibility constraints, and obfuscation placeholders |
+</phase_requirements>
+
+## Standard Stack
+
+This phase produces a document, not code. There are no library dependencies. The "stack" here is the set of standards and reference materials the spec must conform to.
+
+### Core Standards
+
+| Standard | Reference | Purpose | Why It Governs the Spec |
+|----------|-----------|---------|-------------------------|
+| AES-256-CBC | NIST SP 800-38A | Block cipher mode | Defines 16-byte block size, IV requirements, PKCS7 padding behavior |
+| HMAC-SHA-256 | RFC 2104, FIPS 198-1 | Message authentication | Defines 32-byte output, key requirements |
+| PKCS7 padding | RFC 5652 Section 6.3 | Block alignment | Encrypted size = ceil((input_len + 1) / 16) * 16; always adds at least 1 byte |
+| SHA-256 | FIPS 180-4 | File integrity checksum | 32-byte digest for post-decompression verification |
+| DEFLATE/gzip | RFC 1952 | Compression | Standard gzip stream, decompressed by GZIPInputStream (Kotlin) and gunzip (shell) |
+| Encrypt-then-MAC | Bellare & Namprempre 2000; IETF draft-mcgrew-aead-aes-cbc-hmac-sha2 | Authenticated encryption construction | HMAC computed over IV + ciphertext, verified before decryption |
+
+### Key Size Constants
+
+| Parameter | Size | Notes |
+|-----------|------|-------|
+| AES key | 32 bytes (256 bits) | Hardcoded, shared across all decoders |
+| AES block size | 16 bytes | Governs IV size and PKCS7 padding |
+| IV | 16 bytes | Random per file, stored in cleartext |
+| HMAC-SHA-256 output | 32 bytes | Appended after or stored alongside ciphertext |
+| SHA-256 checksum | 32 bytes | Stored in file table, verified after decompression |
+
+### Alternatives Considered
+
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| AES-256-CBC + HMAC | AES-256-GCM (AEAD) | GCM is simpler (single operation) but `openssl enc` in busybox does NOT support GCM mode; CBC + HMAC is the only option compatible with all three decoders |
+| HMAC-SHA-256 | HMAC-SHA-512 truncated | IETF AEAD spec uses SHA-512 for AES-256, but SHA-256 is simpler, sufficient for integrity, and natively available in all three target environments |
+| Little-endian | Big-endian (network order) | Big-endian is traditional for network protocols, but Rust, x86, ARM (Android) are all little-endian natively; shell `od -t u4` reads host-endian which is LE on ARM |
+| PKCS7 | Zero-padding | PKCS7 is standard for AES-CBC, directly supported by `openssl enc` and javax.crypto PKCS5Padding (PKCS5 = PKCS7 for 16-byte blocks) |
+
+## Architecture Patterns
+
+### Recommended Spec Document Structure
+
+```
+docs/
+  FORMAT.md           # The format specification (THE deliverable)
+    - Overview & design goals
+    - Notation conventions
+    - Archive structure diagram (ASCII art)
+    - Header definition (byte-level table)
+    - File table entry definition (byte-level table)
+    - Data block layout
+    - Encryption & authentication details
+    - Compression details
+    - Obfuscation features (Phase 6 preview)
+    - Worked example with hex dump
+    - Version compatibility rules
+```
+
+### Pattern 1: Field Definition Table
+
+**What:** Every binary structure is specified as a table with offset, size, type, endianness, and description.
+**When to use:** For every fixed-layout structure in the format (header, file table entries).
+
+Example:
+```
+### Archive Header (40 bytes)
+
+| Offset | Size | Type    | Endian | Field            | Description                          |
+|--------|------|---------|--------|------------------|--------------------------------------|
+| 0x00   | 4    | bytes   | -      | magic            | Custom magic bytes: 0xCA 0xFE 0xAR 0xCH (example) |
+| 0x04   | 1    | u8      | -      | version          | Format version (1 for v1)            |
+| 0x05   | 1    | u8      | -      | flags            | Bit 0: compression, Bit 1: obfuscation |
+| 0x06   | 2    | u16     | LE     | file_count       | Number of files in archive           |
+| 0x08   | 4    | u32     | LE     | toc_offset       | Offset to file table from file start |
+| 0x0C   | 4    | u32     | LE     | toc_size         | Size of file table in bytes          |
+| 0x10   | 16   | bytes   | -      | toc_iv           | IV for encrypted file table (Phase 6)|
+| 0x20   | 8    | bytes   | -      | reserved         | Reserved for future use (zero-filled)|
+```
+
+### Pattern 2: Encrypt-then-MAC Construction (per file)
+
+**What:** The exact order of operations and data layout for each file's encrypted block.
+**When to use:** Defines how every file is stored in the archive.
+
+```
+Pipeline per file:
+  1. Read original file -> compute SHA-256 checksum -> store in file table
+  2. Compress with gzip (if compression flag set) -> compressed_data
+  3. Pad compressed_data with PKCS7 to AES block boundary
+  4. Generate random 16-byte IV
+  5. Encrypt padded data with AES-256-CBC using IV -> ciphertext
+  6. Compute HMAC-SHA-256 over (IV || ciphertext) -> mac
+  7. Store: IV (16) || ciphertext (variable) || HMAC (32)
+
+Data block layout:
+  [IV: 16 bytes][ciphertext: N bytes][HMAC: 32 bytes]
+  Where N = ceil((compressed_size + pkcs7_pad) / 16) * 16
+```
+
+### Pattern 3: Version-Gated Features
+
+**What:** Use the version byte and flags field to control which features are active, allowing the same format to work with and without obfuscation.
+**When to use:** Phase 6 obfuscation features are defined in the spec now but activated by flag bits.
+
+```
+Flags byte (offset 0x05):
+  Bit 0: Per-file compression enabled (0 = raw, 1 = gzip)
+  Bit 1: TOC encryption enabled (0 = plaintext TOC, 1 = AES-encrypted TOC)
+  Bit 2: XOR header obfuscation (0 = off, 1 = on)
+  Bit 3: Decoy padding between blocks (0 = off, 1 = on)
+  Bits 4-7: Reserved (must be 0)
+
+Decoders MUST check flags and skip unsupported features gracefully.
+```
+
+### Pattern 4: Worked Example with Hex Dump
+
+**What:** A concrete archive with known inputs showing every byte.
+**When to use:** Mandatory per success criteria -- the spec must include at least one complete worked example.
+
+```
+Example archive: 2 files
+  File 1: "hello.txt" (5 bytes: "Hello")
+  File 2: "test.apk" (simulated 32 bytes)
+
+Key: 0x00112233...EEFF (32 bytes, shown in full)
+IV for file 1: 0xAABBCCDD... (16 bytes)
+IV for file 2: 0x11223344... (16 bytes)
+
+Complete hex dump:
+  0000: CA FE xx xx 01 01 02 00  ...  <- header (magic, version, flags, count)
+  ...
+  [every byte annotated with field name]
+```
+
+### Anti-Patterns to Avoid
+
+- **Variable-length header without explicit size field:** Shell decoders need to know exactly where to `dd skip=` to. Every variable-length region must have its size recorded in a preceding fixed-offset field.
+- **Implicit padding assumptions:** Never assume "the decoder will figure out padding." Explicitly state PKCS7 rules and encrypted size formula in the spec.
+- **Mixing concerns in field table:** Don't combine "offset within archive" with "offset within data block." Use absolute offsets from archive start everywhere.
+- **Underspecifying endianness:** Every multi-byte integer must state "LE" (little-endian) explicitly. The shell decoder reads bytes with `dd` and must know byte order.
+- **Ambiguous HMAC scope:** The spec must state EXACTLY which bytes are fed to HMAC. "HMAC of the ciphertext" is ambiguous (does it include IV? padding? length?). State: "HMAC-SHA-256(key, IV || ciphertext)" with byte ranges.
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Authenticated encryption | Custom MAC scheme | Standard encrypt-then-MAC (HMAC-SHA-256 over IV+ciphertext) | Subtle errors (MAC-then-encrypt, encrypt-and-MAC) lead to padding oracle attacks |
+| Key for HMAC vs encryption | Ad-hoc key splitting | Either: (a) use same 32-byte key for both (acceptable per v1 scope), or (b) HKDF with distinct labels | IETF AEAD spec splits key; but for hardcoded key with no key reuse across protocols, same key is cryptographically safe for AES-CBC + HMAC-SHA-256 specifically |
+| Block padding | Manual zero-padding | PKCS7 (built into openssl enc, javax.crypto) | Zero-padding is ambiguous for binary files; PKCS7 is unambiguous and universally supported |
+| Compression framing | Custom compression headers | Standard gzip stream (RFC 1952) | GZIPInputStream and gunzip handle framing automatically |
+
+**Key insight:** The format spec should use standard cryptographic constructions (encrypt-then-MAC, PKCS7, gzip) composed together, rather than inventing novel schemes. The "custom" part is the container format (header, TOC, block layout), not the cryptographic primitives inside it.
+
+## Common Pitfalls
+
+### Pitfall 1: Shell Decoder Byte Extraction Fragility
+
+**What goes wrong:** The shell decoder uses `dd bs=1 skip=N count=M` to extract fields. If any offset or size in the spec is wrong by even 1 byte, the entire decode chain fails silently (produces garbage, not an error).
+**Why it happens:** Off-by-one errors in offset calculations, or forgetting that PKCS7 adds a full block when input is already block-aligned.
+**How to avoid:** The spec's worked example must include a step-by-step shell decode walkthrough: "To extract file 1 IV: `dd if=archive.bin bs=1 skip=48 count=16`". Test the worked example's offsets manually.
+**Warning signs:** The worked example's offsets don't add up when you manually sum field sizes.
+
+### Pitfall 2: HMAC Input Ambiguity
+
+**What goes wrong:** Rust computes HMAC over `IV || ciphertext`, Kotlin computes it over just `ciphertext`, shell computes it over `IV || ciphertext || padding_length`. All three produce different MACs for the same data.
+**Why it happens:** The spec says "HMAC of the encrypted data" without defining the exact byte range.
+**How to avoid:** Specify HMAC input as: "The 16-byte IV followed by the ciphertext bytes (including PKCS7 padding). Total HMAC input length = 16 + encrypted_size." Include the expected HMAC value in the worked example.
+**Warning signs:** Any phrase like "HMAC of the data" without byte-range specification.
+
+### Pitfall 3: Encrypted Size Calculation Error
+
+**What goes wrong:** The file table stores `encrypted_size` but the value is wrong because the spec doesn't account for PKCS7 padding correctly.
+**Why it happens:** AES-CBC with PKCS7 always pads: if input is N bytes, output is `(floor(N/16) + 1) * 16` bytes. A 16-byte input produces 32 bytes of ciphertext, not 16.
+**How to avoid:** State the formula explicitly: `encrypted_size = ((compressed_size / 16) + 1) * 16`. Include examples: 0 bytes -> 16, 1 byte -> 16, 15 bytes -> 16, 16 bytes -> 32, 17 bytes -> 32.
+**Warning signs:** File table `encrypted_size` equals `compressed_size` rounded up (misses the always-add-block rule).
+
+### Pitfall 4: Little-Endian Parsing in Shell
+
+**What goes wrong:** Shell script reads a 4-byte LE integer as big-endian, getting the wrong value.
+**Why it happens:** `xxd` and `od` have different default endianness. Busybox `xxd` may not support `-e` flag.
+**How to avoid:** The spec should include a reference shell function for reading LE integers: extract 4 bytes with dd, reverse byte order with a shell snippet, convert hex to decimal with printf. Document this in the spec appendix.
+**Warning signs:** Testing only with values < 256 (where endianness doesn't matter).
+
+### Pitfall 5: XOR Obfuscation Key in Spec vs. Implementation
+
+**What goes wrong:** Phase 6 implements XOR obfuscation but the key or XOR range wasn't specified, so each decoder uses different parameters.
+**Why it happens:** Phase 1 defers obfuscation to Phase 6 and doesn't fully specify it.
+**How to avoid:** The spec MUST define: XOR key bytes, which byte range is XORed (e.g., "bytes 0x00-0x27 of the header"), and what the XORed header looks like in the worked example (even if the v1 example shows flags=0 with obfuscation off).
+**Warning signs:** Obfuscation section says "TBD" or "see Phase 6."
+
+### Pitfall 6: Filename Encoding
+
+**What goes wrong:** Cyrillic filenames (requirement SHL-03) are garbled because the spec doesn't specify encoding.
+**Why it happens:** UTF-8 vs. Latin-1 assumption mismatch between encoders.
+**How to avoid:** Spec must state: "All filenames are UTF-8 encoded. The file table stores filename as a length-prefixed byte string: u16 length (in bytes) followed by that many UTF-8 bytes."
+**Warning signs:** Spec shows filename field as "fixed N bytes, null-terminated."
+
+## Code Examples
+
+This phase produces no code. However, the following reference patterns should appear IN the spec document itself:
+
+### Shell LE Integer Reading Function (Spec Appendix)
+```sh
+# Read a little-endian u32 from binary file at offset
+# Usage: read_le_u32 <file> <offset>
+read_le_u32() {
+  local file="$1" offset="$2"
+  local hex=$(dd if="$file" bs=1 skip="$offset" count=4 2>/dev/null | xxd -p)
+  # Reverse bytes: abcdef01 -> 01efcdab
+  local b0=${hex:0:2} b1=${hex:2:2} b2=${hex:4:2} b3=${hex:6:2}
+  printf '%d' "0x${b3}${b2}${b1}${b0}"
+}
+
+# Read a little-endian u16 from binary file at offset
+read_le_u16() {
+  local file="$1" offset="$2"
+  local hex=$(dd if="$file" bs=1 skip="$offset" count=2 2>/dev/null | xxd -p)
+  local b0=${hex:0:2} b1=${hex:2:2}
+  printf '%d' "0x${b1}${b0}"
+}
+```
+
+### Shell HMAC Verification (Spec Appendix)
+```sh
+# Verify HMAC-SHA256 of a data block
+# Usage: verify_hmac <file> <data_offset> <data_length> <expected_hmac_hex> <key_hex>
+verify_hmac() {
+  local file="$1" offset="$2" length="$3" expected="$4" key="$5"
+  local actual=$(dd if="$file" bs=1 skip="$offset" count="$length" 2>/dev/null \
+    | openssl dgst -sha256 -mac HMAC -macopt "hexkey:${key}" -hex 2>/dev/null \
+    | awk '{print $NF}')
+  [ "$actual" = "$expected" ]
+}
+```
+
+### Kotlin Decrypt Pattern (Spec Appendix)
+```kotlin
+// Reference decrypt for a single file entry
+fun decryptFileEntry(data: ByteArray, iv: ByteArray, key: ByteArray): ByteArray {
+    val cipher = Cipher.getInstance("AES/CBC/PKCS5Padding")
+    val secretKey = SecretKeySpec(key, "AES")
+    val ivSpec = IvParameterSpec(iv)
+    cipher.init(Cipher.DECRYPT_MODE, secretKey, ivSpec)
+    return cipher.doFinal(data)  // PKCS7 unpadding is automatic
+}
+```
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| MAC-then-encrypt | Encrypt-then-MAC | ~2010 (Bellare & Namprempre formalized) | Prevents padding oracle attacks; HMAC verification can reject before decryption |
+| Fixed filenames (8.3) | Length-prefixed UTF-8 | Standard practice | Supports Cyrillic/Unicode filenames (SHL-03) |
+| Single IV for entire archive | Per-file random IV | Standard practice | Prevents cross-file pattern analysis |
+| AEAD modes (GCM) | Still CBC+HMAC for shell compat | Ongoing | GCM is preferred when all consumers support it; busybox openssl does not support GCM |
+
+**Deprecated/outdated:**
+- `openssl enc` with `-salt` and password-based key derivation: Not applicable here (we use raw key with `-K`/`-iv`/`-nosalt`)
+- PKCS5Padding vs PKCS7Padding naming confusion: In Java/Android, `PKCS5Padding` actually implements PKCS7 for 16-byte blocks. The spec should note this equivalence.
+
+## Open Questions
+
+1. **HMAC Key: Same as encryption key or derived subkey?**
+   - What we know: IETF AEAD spec uses split keys (first 32 bytes for MAC, last 32 bytes for encryption from a 64-byte master key). Best practice recommends separate keys. However, for a hardcoded key scenario with no key reuse across protocols, using the same 32-byte key for both AES-CBC and HMAC-SHA-256 is cryptographically safe (AES and HMAC have different internal structures, no known attack exploits key reuse between them).
+   - What's unclear: The project STATE.md explicitly flags this as an open question.
+   - **Recommendation:** Use the SAME 32-byte hardcoded key for both AES-256-CBC encryption and HMAC-SHA-256. Rationale: (a) simplifies all three decoders, (b) the shell decoder would need an HKDF implementation if keys differ (busybox has no HKDF), (c) cryptographically safe for this specific combination, (d) v2 requirement SEC-01 already plans HKDF-derived per-file keys which will supersede this. Document in the spec that v1 uses a single key and v2 will derive subkeys.
+
+2. **Busybox `xxd` availability on target device**
+   - What we know: BusyBox source includes `xxd` as a configurable applet (hexdump_xxd.c). The `-p` (plain hex dump) flag is widely supported. The `-e` (little-endian) flag may NOT be available in busybox xxd.
+   - What's unclear: The exact busybox build on the target Android 13 Qualcomm device.
+   - **Recommendation:** The spec should define shell operations using `xxd -p` (plain hex) only, and implement LE byte reversal manually in shell (as shown in code examples above). Fallback: `od -A n -t x1` can replace `xxd -p`. Document both options in the spec appendix.
+
+3. **Busybox `openssl dgst -sha256 -mac HMAC` availability**
+   - What we know: Standard OpenSSL supports `-mac HMAC -macopt hexkey:...`. Busybox builds vary; some include a stripped-down openssl.
+   - What's unclear: Whether the target device's busybox-openssl supports the `-mac`/`-macopt` flags.
+   - **Recommendation:** The spec should document the HMAC verification command and note that if busybox openssl lacks `-mac` support, the shell decoder may skip HMAC verification (degrade gracefully). This is acceptable for a fallback decoder. Document the exact command and the degraded path.
+
+4. **Decoy padding size and placement**
+   - What we know: Phase 6 requires random data between blocks. The spec must define this now.
+   - What's unclear: How much padding, whether it's fixed or variable per gap, how the decoder knows where real data starts.
+   - **Recommendation:** Define in the file table entry: a `padding_after` field (u16, LE) indicating how many random bytes follow this file's data block. The decoder skips `encrypted_size + 32 (HMAC) + padding_after` bytes to reach the next file's data. When flags bit 3 is 0 (no decoy padding), `padding_after` is always 0.
+
+5. **TOC (file table) encryption IV storage**
+   - What we know: FMT-07 requires encrypted file table with its own IV.
+   - What's unclear: Where the TOC IV is stored if the header itself is XOR-obfuscated.
+   - **Recommendation:** Store the TOC IV in the header at a fixed offset (e.g., bytes 0x10-0x1F). XOR obfuscation (FMT-06) is applied AFTER the header is fully constructed, including the TOC IV. The decoder de-XORs the header first, then reads the TOC IV, then decrypts the TOC. Order of operations: de-XOR header -> read TOC IV -> decrypt TOC -> read file entries -> for each file: verify HMAC -> decrypt -> decompress -> verify SHA-256.
+
+## Recommended Format Layout
+
+Based on research, the following layout balances all constraints:
+
+```
+==========================+
+|     ARCHIVE HEADER       |  Fixed size (e.g., 40 bytes)
+|  magic(4) | ver(1) |     |
+|  flags(1) | count(2) |   |
+|  toc_offset(4) |         |
+|  toc_size(4) |           |
+|  toc_iv(16) |            |
+|  reserved(8)             |
+==========================+
+|     FILE TABLE (TOC)     |  Variable size, optionally encrypted
+|  Entry 1: name, sizes,   |
+|    offset, iv, hmac,     |
+|    sha256, flags         |
+|  Entry 2: ...            |
+|  ...                     |
+==========================+
+|     DATA BLOCK 1         |  IV(16) + ciphertext(N) + HMAC(32)
+--------------------------+
+|     [DECOY PADDING 1]    |  Optional random bytes (Phase 6)
+--------------------------+
+|     DATA BLOCK 2         |  IV(16) + ciphertext(N) + HMAC(32)
+--------------------------+
+|     [DECOY PADDING 2]    |  Optional random bytes (Phase 6)
+--------------------------+
+|     ...                  |
+==========================+
+```
+
+### File Table Entry Fields (per file)
+
+| Field | Size | Type | Description |
+|-------|------|------|-------------|
+| name_length | 2 | u16 LE | Filename length in bytes |
+| name | variable | UTF-8 bytes | Filename (not null-terminated) |
+| original_size | 4 | u32 LE | Original file size before compression |
+| compressed_size | 4 | u32 LE | Size after gzip compression |
+| encrypted_size | 4 | u32 LE | Size after AES-CBC encryption (with PKCS7) |
+| data_offset | 4 | u32 LE | Absolute offset of this file's data block |
+| iv | 16 | bytes | AES-CBC IV for this file |
+| hmac | 32 | bytes | HMAC-SHA-256 of (IV + ciphertext) |
+| sha256 | 32 | bytes | SHA-256 of original (uncompressed) file |
+| compression_flag | 1 | u8 | 0 = raw (no compression), 1 = gzip |
+| padding_after | 2 | u16 LE | Bytes of decoy padding after this data block |
+
+### Critical Design Decisions for the Spec
+
+1. **TOC placement:** After header, before data blocks. This allows the decoder to read the entire TOC first, then seek to individual data blocks. Shell decoder reads TOC with a single `dd` call.
+
+2. **Absolute offsets:** `data_offset` in each file table entry is absolute from archive byte 0. This avoids cumulative offset calculation errors in the shell decoder.
+
+3. **IV stored in BOTH TOC and data block:** The IV appears in the file table entry AND as the first 16 bytes of the data block. This is redundant but allows two decode strategies: (a) read IV from TOC (fast), or (b) read IV from data block (streaming). The spec should mandate both are identical.
+
+4. **HMAC covers IV + ciphertext:** HMAC input is exactly: the 16-byte IV followed by the encrypted data (ciphertext including PKCS7 padding). The HMAC does NOT cover the HMAC field itself or any TOC metadata.
+
+5. **Magic bytes:** Must NOT match any known file signature. Consult the [Wikipedia list of file signatures](https://en.wikipedia.org/wiki/List_of_file_signatures) and the [Gary Kessler file signatures table](https://www.garykessler.net/library/magic.html). Use 4+ bytes that do not appear in any standard signature database. Starting with a null byte (0x00) is a good practice to signal "this is binary, not text."
+
+## Sources
+
+### Primary (HIGH confidence)
+- [IETF draft-mcgrew-aead-aes-cbc-hmac-sha2-01](https://datatracker.ietf.org/doc/html/draft-mcgrew-aead-aes-cbc-hmac-sha2-01) - AEAD construction with AES-CBC + HMAC, key splitting, MAC input specification
+- [OpenSSL enc documentation (3.3)](https://docs.openssl.org/3.3/man1/openssl-enc/) - `-K`, `-iv`, `-nosalt` flags for raw key mode, PKCS padding behavior
+- [OpenSSL dgst documentation](https://www.openssl.org/docs/manmaster/man1/openssl-dgst.html) - HMAC computation with `-mac HMAC -macopt hexkey:...`
+- [Android Cryptography reference](https://developer.android.com/privacy-and-security/cryptography) - Supported ciphers on Android 13
+- [Wikipedia: List of file signatures](https://en.wikipedia.org/wiki/List_of_file_signatures) - Magic bytes collision avoidance
+- [PKWARE ZIP specification](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) - Reference for archive format structure patterns (header + central directory + entries)
+
+### Secondary (MEDIUM confidence)
+- [Encrypt-then-MAC article (Erik Ringsmuth)](https://medium.com/@ErikRingsmuth/encrypt-then-mac-fc5db94794a4) - Practical encrypt-then-MAC construction walkthrough, verified against IETF draft
+- [AES-CBC + HMAC best practices (ProAndroidDev)](https://proandroiddev.com/security-best-practices-symmetric-encryption-with-aes-in-java-and-android-part-2-b3b80e99ad36) - Android-specific AES-CBC + HMAC patterns
+- [BusyBox documentation](https://www.busybox.net/BusyBox.html) - Applet availability (dd, xxd, openssl)
+- [Designing File Formats (fadden.com)](https://fadden.com/tech/file-formats.html) - General binary format design principles
+- [GameDev binary format design](https://www.gamedev.net/tutorials/programming/general-and-gameplay-programming/design-considerations-for-custom-binary-formats-and-data-compression-part-1-r3353/) - Custom binary format design patterns
+- [XOR obfuscation analysis (SANS)](https://www.sans.org/blog/tools-for-examining-xor-obfuscation-for-malware-analysis) - XOR detection methods (informs how to design XOR obfuscation that is less trivially breakable)
+
+### Tertiary (LOW confidence)
+- BusyBox `xxd -e` flag availability on Android 13 Qualcomm - Not verified for specific target device
+- BusyBox `openssl dgst -mac HMAC` support - Varies by build, not verified for target
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH - AES-256-CBC, HMAC-SHA-256, PKCS7, gzip are well-established standards with extensive documentation
+- Architecture: HIGH - Archive format design patterns (header + TOC + data blocks) are well-understood; ZIP, TAR, and similar formats provide proven structural patterns
+- Pitfalls: HIGH - Known issues (HMAC ambiguity, PKCS7 off-by-one, LE parsing, filename encoding) are well-documented in cryptographic engineering literature
+- Shell compatibility: MEDIUM - Busybox applet availability varies by build configuration; the spec must accommodate fallbacks
+- Obfuscation details: MEDIUM - XOR obfuscation is well-understood but the specific parameters (key, byte range) are design choices without external standards to reference
+
+**Research date:** 2026-02-24
+**Valid until:** 2026-03-24 (stable domain, 30-day validity)