From 361f9bfb6bbd93d0fd06e6a2dbd7b6bef7cdfe5b Mon Sep 17 00:00:00 2001 From: NikitolProject Date: Wed, 25 Feb 2026 02:08:11 +0300 Subject: [PATCH] docs(06): research phase domain --- .../06-obfuscation-hardening/06-RESEARCH.md | 540 ++++++++++++++++++ 1 file changed, 540 insertions(+) create mode 100644 .planning/phases/06-obfuscation-hardening/06-RESEARCH.md diff --git a/.planning/phases/06-obfuscation-hardening/06-RESEARCH.md b/.planning/phases/06-obfuscation-hardening/06-RESEARCH.md new file mode 100644 index 0000000..6e1bfec --- /dev/null +++ b/.planning/phases/06-obfuscation-hardening/06-RESEARCH.md @@ -0,0 +1,540 @@ +# Phase 6: Obfuscation Hardening - Research + +**Researched:** 2026-02-25 +**Domain:** Binary format obfuscation (XOR headers, encrypted TOC, decoy padding) +**Confidence:** HIGH + +## Summary + +Phase 6 adds three obfuscation layers to the existing archive format: XOR-obfuscated headers, encrypted file table (TOC), and random decoy padding between data blocks. The specification for all three features is already fully defined in FORMAT.md Sections 9.1-9.3, including the XOR key, flag bits, and decode order. The implementation is straightforward because the format spec was designed from the start to support these features -- the header already has `toc_iv` (16 bytes), flag bits 1-3, and `padding_after` fields in every TOC entry. + +The critical complexity is that all changes must be applied atomically across four codebases (Rust archiver, Rust unpacker, Kotlin decoder, Shell decoder) while maintaining byte-identical output. The Rust archiver is the only encoder; the three decoders must all handle the new obfuscation features. The shell decoder is the most constrained: it must decrypt the TOC using `openssl enc` with raw key mode, which requires extracting the encrypted TOC to a temp file first (matching the existing pattern for per-file ciphertext extraction). + +**Primary recommendation:** Implement in two plans: (1) Rust archiver + Rust unpacker with all three obfuscation features + updated unit/integration tests, (2) Kotlin decoder + Shell decoder updates + cross-validation tests confirming byte-identical output across all three decoders. + + +## Phase Requirements + +| ID | Description | Research Support | +|----|-------------|-----------------| +| FMT-06 | XOR-obfuscation headers with fixed key | FORMAT.md Section 9.1 fully defines the 8-byte XOR key (`0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8`), cyclic application across 40-byte header, and bootstrapping detection via magic byte check. Implementation is a simple byte-level XOR loop. | +| FMT-07 | Encrypted file table with separate IV | FORMAT.md Section 9.2 defines AES-256-CBC encryption of the serialized TOC using `toc_iv` from the header. The `toc_size` field stores encrypted size (including PKCS7 padding). Same key as file encryption. All three decoders already have AES-CBC decrypt capability. | +| FMT-08 | Decoy padding (random data between blocks) | FORMAT.md Section 9.3 defines `padding_after` (u16 LE) in each TOC entry. Random bytes inserted after each data block. Decoders skip `padding_after` bytes. Max padding per file: 65535 bytes. The `data_offset` field in TOC entries already points to the correct location, so decoders that use absolute offsets (all three) naturally handle this. | + + +## Standard Stack + +### Core + +No new libraries are needed. All three obfuscation features use primitives already present in the codebase. + +| Library/Tool | Version | Purpose | Already Present | +|-------------|---------|---------|-----------------| +| `aes` + `cbc` | 0.8 / 0.1 | AES-256-CBC for TOC encryption | Yes (Cargo.toml) | +| `rand` | 0.9 | Random IV generation for TOC, random decoy padding bytes | Yes (Cargo.toml) | +| `openssl enc` | any | Shell decoder AES-CBC decryption (for TOC) | Yes (shell/decode.sh) | +| `javax.crypto.Cipher` | Android SDK | Kotlin decoder AES-CBC decryption (for TOC) | Yes (ArchiveDecoder.kt) | + +### Supporting + +| Library/Tool | Version | Purpose | When to Use | +|-------------|---------|---------|-------------| +| `hex-literal` | 1.1 | XOR key constant in tests | Yes (dev-dependencies) | +| `binwalk` | system | Manual verification that obfuscated archives are undetectable | Testing only | + +### Alternatives Considered + +No alternatives -- the spec is locked. XOR key, AES-CBC for TOC, and random padding are all specified in FORMAT.md Section 9. + +## Architecture Patterns + +### Current Codebase Architecture + +``` +src/ +├── format.rs # Header/TOC structs, read/write serialization +├── crypto.rs # AES-CBC encrypt/decrypt, HMAC, SHA-256, IV generation +├── archive.rs # pack(), unpack(), inspect() orchestration +├── compression.rs # gzip compress/decompress +├── key.rs # 32-byte hardcoded key constant +├── cli.rs # clap CLI definition +├── lib.rs # pub mod re-exports +└── main.rs # entry point + +kotlin/ +└── ArchiveDecoder.kt # Single-file decoder (parse + decrypt + decompress) + +shell/ +└── decode.sh # Busybox-compatible POSIX shell decoder +``` + +### Pattern 1: XOR Header Obfuscation + +**What:** Apply cyclic 8-byte XOR to all 40 header bytes after construction (encoding) and before parsing (decoding). + +**Implementation in Rust archiver (`format.rs` or `archive.rs`):** +```rust +/// Fixed 8-byte XOR obfuscation key (FORMAT.md Section 9.1). +const XOR_KEY: [u8; 8] = [0xA5, 0x3C, 0x96, 0x0F, 0xE1, 0x7B, 0x4D, 0xC8]; + +/// XOR-obfuscate or de-obfuscate a 40-byte header buffer in-place. +/// XOR is its own inverse, so the same function encodes and decodes. +fn xor_header(buf: &mut [u8; 40]) { + for (i, byte) in buf.iter_mut().enumerate() { + *byte ^= XOR_KEY[i % 8]; + } +} +``` + +**Decode bootstrapping (FORMAT.md Section 10, step 2):** +1. Read first 40 bytes raw. +2. Check if bytes 0-3 match MAGIC (`0x00 0xEA 0x72 0x63`). +3. If YES: header is plain, parse normally. +4. If NO: apply XOR to all 40 bytes, re-check magic. If still wrong, reject. + +**In Kotlin:** +```kotlin +val XOR_KEY = byteArrayOf( + 0xA5.toByte(), 0x3C, 0x96.toByte(), 0x0F, + 0xE1.toByte(), 0x7B, 0x4D, 0xC8.toByte() +) + +fun xorHeader(buf: ByteArray) { + for (i in 0 until 40) { + buf[i] = (buf[i].toInt() xor XOR_KEY[i % 8].toInt()).toByte() + } +} +``` + +**In shell:** +```sh +# XOR key as hex pairs +XOR_KEY="a53c960fe17b4dc8" + +# De-XOR 40 header bytes: read raw, XOR each byte, write back +# This requires per-byte hex manipulation in shell +``` + +### Pattern 2: TOC Encryption + +**What:** Serialize all TOC entries to a buffer, then encrypt the entire buffer with AES-256-CBC using a random `toc_iv`, and write the encrypted TOC. Store the encrypted size in `toc_size`. + +**Encoding (Rust archiver):** +```rust +// 1. Serialize TOC entries to a Vec +let mut toc_buf = Vec::new(); +for entry in &entries { + format::write_toc_entry(&mut toc_buf, entry)?; +} + +// 2. Generate random toc_iv +let toc_iv = crypto::generate_iv(); + +// 3. Encrypt the serialized TOC +let encrypted_toc = crypto::encrypt_data(&toc_buf, &KEY, &toc_iv); +let toc_size = encrypted_toc.len() as u32; // encrypted size + +// 4. Write header with toc_iv and encrypted toc_size +// 5. Write encrypted_toc bytes at toc_offset +``` + +**Decoding (all decoders):** +1. Read `toc_offset`, `toc_size`, `toc_iv` from (de-XORed) header. +2. Check flags bit 1 (`toc_encrypted`). +3. If set: read `toc_size` bytes at `toc_offset`, decrypt with AES-256-CBC using `toc_iv` and KEY, remove PKCS7 padding. +4. Parse TOC entries from decrypted buffer. + +**Shell decoder TOC decryption:** +```sh +# Extract encrypted TOC to temp file +dd if="$ARCHIVE" bs=1 skip="$toc_offset" count="$toc_size" of="$TMPDIR/toc_enc.bin" 2>/dev/null + +# Decrypt TOC +openssl enc -d -aes-256-cbc -nosalt \ + -K "$KEY_HEX" -iv "$toc_iv_hex" \ + -in "$TMPDIR/toc_enc.bin" -out "$TMPDIR/toc_dec.bin" + +# Now parse TOC entries from the decrypted file +# (requires switching from reading TOC fields directly from $ARCHIVE +# to reading from $TMPDIR/toc_dec.bin with offset 0) +``` + +### Pattern 3: Decoy Padding + +**What:** After writing each file's ciphertext, write random bytes of random length (0-65535). + +**Encoding (Rust archiver):** +```rust +use rand::Rng; + +// For each file, generate random padding length +let padding_after: u16 = rng.random_range(64..=4096); // sensible range +// Write ciphertext, then write padding_after random bytes +let mut padding = vec![0u8; padding_after as usize]; +rand::Fill::fill(&mut padding[..], &mut rng); +out_file.write_all(&padding)?; +``` + +**Decoding:** All three decoders already use absolute `data_offset` from the TOC to seek to each file's data block, so they naturally skip over padding. The `padding_after` field in TOC entries is already parsed by all decoders (currently always 0). No decoder changes needed for the actual extraction -- the decoders just need to not break when `padding_after > 0`. + +### Pattern 4: Flag Bits Management + +**Current state:** The archiver sets flags bit 0 (compression) when any file is compressed. Bits 1-3 are always 0. + +**Phase 6 changes:** When obfuscation is active, set: +- Bit 1 (`0x02`): TOC encrypted +- Bit 2 (`0x04`): XOR header +- Bit 3 (`0x08`): Decoy padding + +All three features should be enabled together (flags = `0x0F` when compression + all obfuscation). The archiver should always enable all three obfuscation features. There is no user-facing toggle needed (FORMAT.md says "can be activated independently" but the v1 goal is full obfuscation). + +### Recommended Modification Order + +The correct order of operations for the encoder is: + +``` +1. Compute data offsets accounting for decoy padding +2. Serialize TOC entries (with padding_after values) +3. Encrypt serialized TOC → encrypted_toc +4. Build header (with toc_iv, encrypted toc_size, flags with bits 1-3 set) +5. Serialize header to 40-byte buffer +6. XOR the 40-byte header buffer +7. Write: XOR'd header || encrypted TOC || (data blocks with interleaved padding) +``` + +The correct order of operations for the decoder is (FORMAT.md Section 10): + +``` +1. Read 40 raw bytes +2. Check magic → if mismatch, XOR and re-check +3. Parse header fields (including toc_iv, flags) +4. If flags bit 1: decrypt TOC with toc_iv +5. Parse TOC entries from (decrypted) buffer +6. For each file: seek to data_offset, read encrypted_size, verify HMAC, decrypt, decompress, verify SHA-256 + (padding_after is naturally skipped because next file uses its own data_offset) +``` + +### Anti-Patterns to Avoid + +- **XOR after TOC encryption:** The XOR must be applied last (to the header) during encoding, because the header contains the `toc_iv` needed for TOC decryption. If you XOR first and then modify the header, the XOR output is invalid. +- **Using piped input for openssl TOC decryption in shell:** The existing shell decoder already extracts ciphertext to a temp file before decryption to avoid pipe buffering issues. The same pattern MUST be used for TOC decryption. +- **Modifying data_offset calculation without accounting for padding:** When computing `data_offset` for each file, the offset must include all preceding files' `encrypted_size + padding_after` values. The current code only sums `encrypted_size`. +- **Forgetting the TOC size change:** When TOC encryption is on, `toc_size` in the header is the encrypted size (with PKCS7 padding), not the plaintext size. The data block start offset is `toc_offset + toc_size` (encrypted). +- **Shell decoder: parsing TOC from archive file vs decrypted buffer:** Currently, the shell decoder reads TOC fields directly from `$ARCHIVE` using absolute offsets. With TOC encryption, it must read from the decrypted TOC temp file with relative offsets (starting at 0). This is a significant refactor of the shell decoder's TOC parsing loop. + +## Don't Hand-Roll + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| XOR obfuscation | Custom bit manipulation tricks | Simple `byte ^= key[i % 8]` loop | XOR is trivially simple; any "optimization" adds complexity without benefit | +| TOC encryption | Custom encryption scheme | Existing `crypto::encrypt_data` / `crypto::decrypt_data` | Same AES-256-CBC already used for file encryption | +| Random byte generation | Pseudo-random with manual seeding | `rand::Fill` (Rust), `/dev/urandom` (shell), `SecureRandom` (Kotlin) | CSPRNG is already in use for IV generation | +| PKCS7 padding for TOC | Manual padding logic | `cbc` crate handles PKCS7 automatically | The encrypt/decrypt functions already handle padding | + +**Key insight:** Every cryptographic primitive needed is already in the codebase. Phase 6 is purely about wiring existing functions into the encode/decode pipeline in the correct order. + +## Common Pitfalls + +### Pitfall 1: Shell Decoder TOC Parsing Refactor + +**What goes wrong:** The current shell decoder reads TOC fields directly from `$ARCHIVE` at absolute offsets (`pos=$toc_offset`, then `read_le_u16 "$ARCHIVE" "$pos"`). After TOC encryption, the TOC must be decrypted to a temp file first, and all TOC reads must come from that temp file with offsets starting at 0 instead of `$toc_offset`. + +**Why it happens:** The entire TOC parsing loop in `decode.sh` (lines 139-244) uses `$ARCHIVE` as the file argument to `read_hex`, `read_le_u16`, `read_le_u32`, and `dd`. All of these calls need to be changed to read from the decrypted TOC file with a reset position counter. + +**How to avoid:** Extract the TOC parsing into a section that operates on a "TOC file" variable. When TOC encryption is off, the TOC file is the archive itself (with pos starting at toc_offset). When TOC encryption is on, the TOC file is the decrypted temp file (with pos starting at 0). + +**Warning signs:** Tests pass with TOC encryption off but fail with TOC encryption on; the shell decoder reads garbage field values. + +### Pitfall 2: XOR Header Bootstrapping in Shell + +**What goes wrong:** The shell decoder currently reads magic bytes and immediately validates them. With XOR obfuscation, the first 4 bytes will NOT be the magic bytes -- they'll be XOR'd. The decoder must attempt XOR de-obfuscation before parsing. + +**Why it happens:** The current shell code at line 108-113 reads magic and exits immediately on mismatch. This must become a conditional: try raw first, then try XOR. + +**How to avoid:** Implement the bootstrapping algorithm from FORMAT.md Section 10 step 2: read 40 bytes, check magic, if mismatch XOR all 40 bytes and re-check. + +**Warning signs:** Shell decoder rejects all obfuscated archives with "bad magic bytes". + +### Pitfall 3: XOR in Shell Requires Per-Byte Hex Manipulation + +**What goes wrong:** Shell/POSIX sh has no native XOR operator for bytes. Implementing XOR in shell requires reading each byte as hex, converting to decimal, XORing with the key byte (also as decimal), and converting back to hex. This is significantly more complex than in Rust or Kotlin. + +**Why it happens:** POSIX sh arithmetic supports XOR (`$(( ))` with `^` operator), but converting between hex bytes and shell arithmetic requires careful hex string slicing. + +**How to avoid:** Use shell arithmetic: `result=$(( 0x${byte_hex} ^ 0x${key_hex} ))` and then `printf '%02x' "$result"`. Process all 40 header bytes in a loop, building the de-XORed header either in a hex string or as a temp binary file. + +**Practical approach:** Read the 40-byte header as a hex string, XOR each byte pair in a loop, write the result to a temp file, then use the existing `read_le_u16`/`read_le_u32` functions on the temp file. + +```sh +# Read 40-byte header as hex +header_hex=$(read_hex "$ARCHIVE" 0 40) +xor_key="a53c960fe17b4dc8" + +# XOR each byte +i=0 +result="" +while [ $i -lt 80 ]; do # 80 hex chars = 40 bytes + byte=$(printf '%.2s' "${header_hex#$(printf "%${i}s" | tr ' ' '?')}") + # ... extract byte at position i/2 from header_hex + key_byte_idx=$(( (i / 2) % 8 )) + key_byte=$(printf '%.2s' "${xor_key#$(printf "%$((key_byte_idx * 2))s" | tr ' ' '?')}") + xored=$(printf '%02x' "$(( 0x$byte ^ 0x$key_byte ))") + result="${result}${xored}" + i=$((i + 2)) +done +# Write result to temp file using printf or xxd -r -p +``` + +**Warning signs:** Hex string indexing errors, off-by-one in the XOR loop, wrong byte order. + +### Pitfall 4: Kotlin Signed Byte XOR + +**What goes wrong:** Kotlin bytes are signed (-128 to 127). XOR operations on bytes require `.toInt() and 0xFF` masking to avoid sign extension. The XOR key contains bytes > 0x7F (e.g., `0xA5`, `0xC8`) which are negative in Kotlin's signed byte representation. + +**Why it happens:** `0xA5.toByte()` in Kotlin is `-91`, and XOR between two signed bytes can produce unexpected results without masking. + +**How to avoid:** Always use `(buf[i].toInt() and 0xFF) xor (XOR_KEY[i % 8].toInt() and 0xFF)` and then `.toByte()` the result. This is the same pattern already used in `ArchiveDecoder.kt` for other byte operations. + +**Warning signs:** XOR produces wrong values for bytes > 0x7F; magic byte check fails after de-XOR. + +### Pitfall 5: Data Offset Computation with Padding + +**What goes wrong:** The archiver computes `data_offset` for each file by summing `toc_offset + toc_size + sum(encrypted_sizes_before)`. With decoy padding, it must also add `sum(padding_after_before)`. + +**Why it happens:** The current pack() function computes offsets in a simple loop without padding. + +**How to avoid:** Generate all `padding_after` values first, then compute offsets as `current_offset += encrypted_size + padding_after` for each file. + +**Warning signs:** Data offsets in TOC entries point to wrong locations; decoders read garbage ciphertext. + +### Pitfall 6: TOC Size for Encrypted TOC + +**What goes wrong:** The `toc_size` header field must store the **encrypted** TOC size (which includes PKCS7 padding), not the plaintext serialized size. The encrypted size is `((plaintext_size / 16) + 1) * 16`. + +**Why it happens:** The current code sets `toc_size` to the plaintext size. After encryption, the size grows due to PKCS7 padding. + +**How to avoid:** Serialize TOC to buffer first, encrypt, then use `encrypted_toc.len()` as `toc_size`. + +**Warning signs:** Decoder reads wrong number of bytes for encrypted TOC; AES decryption fails with "invalid padding". + +### Pitfall 7: Inspect Command with Obfuscation + +**What goes wrong:** The `inspect` command currently reads the header and TOC in plaintext. After obfuscation, it must de-XOR the header and decrypt the TOC before printing metadata. + +**Why it happens:** The inspect path shares code with unpack but the developer might forget to update it. + +**How to avoid:** Factor out header de-obfuscation and TOC decryption into reusable functions called by both `unpack()` and `inspect()`. + +**Warning signs:** `inspect` command crashes or shows garbage on obfuscated archives. + +## Code Examples + +### XOR Header Round-Trip (Rust) + +```rust +// Source: FORMAT.md Section 9.1 + +const XOR_KEY: [u8; 8] = [0xA5, 0x3C, 0x96, 0x0F, 0xE1, 0x7B, 0x4D, 0xC8]; + +fn xor_header_buf(buf: &mut [u8]) { + assert!(buf.len() >= 40); + for i in 0..40 { + buf[i] ^= XOR_KEY[i % 8]; + } +} + +// Encoding: write header normally, then XOR +let mut header_buf = Vec::new(); +write_header(&mut header_buf, &header)?; +xor_header_buf(&mut header_buf); +out_file.write_all(&header_buf)?; + +// Decoding: read 40 bytes, check magic, if no match XOR and re-check +let mut buf = [0u8; 40]; +reader.read_exact(&mut buf)?; +if buf[0..4] != MAGIC { + xor_header_buf(&mut buf); + anyhow::ensure!(buf[0..4] == MAGIC, "Invalid magic bytes after XOR attempt"); +} +// Parse header from buf... +``` + +### TOC Encryption (Rust) + +```rust +// Source: FORMAT.md Section 9.2 + +// Encoding +let mut toc_plaintext = Vec::new(); +for entry in &toc_entries { + write_toc_entry(&mut toc_plaintext, entry)?; +} +let toc_iv = crypto::generate_iv(); +let encrypted_toc = crypto::encrypt_data(&toc_plaintext, &KEY, &toc_iv); +// encrypted_toc.len() is the toc_size to store in header + +// Decoding +let encrypted_toc_buf = /* read toc_size bytes from toc_offset */; +let toc_plaintext = crypto::decrypt_data(&encrypted_toc_buf, &KEY, &header.toc_iv)?; +let mut cursor = Cursor::new(&toc_plaintext); +let entries = read_toc(&mut cursor, header.file_count)?; +``` + +### Decoy Padding (Rust) + +```rust +// Source: FORMAT.md Section 9.3 + +use rand::Rng; + +let mut rng = rand::rng(); + +// For each file, during pack: +let padding_after: u16 = rng.random_range(64..=4096); +let mut padding_bytes = vec![0u8; padding_after as usize]; +rand::Fill::fill(&mut padding_bytes[..], &mut rng); + +// After writing ciphertext for this file: +out_file.write_all(&pf.ciphertext)?; +out_file.write_all(&padding_bytes)?; +``` + +### Shell Decoder XOR De-obfuscation + +```sh +# Source: FORMAT.md Section 9.1 + Section 10 step 2 + +XOR_KEY_HEX="a53c960fe17b4dc8" + +# Read 40-byte header as hex +raw_header_hex=$(read_hex "$ARCHIVE" 0 40) +magic_hex=$(printf '%.8s' "$raw_header_hex") + +if [ "$magic_hex" = "00ea7263" ]; then + header_hex="$raw_header_hex" +else + # Apply XOR de-obfuscation + header_hex="" + byte_idx=0 + while [ "$byte_idx" -lt 40 ]; do + hex_pos=$((byte_idx * 2)) + # Extract byte from raw header + raw_byte_hex=$(printf '%s' "$raw_header_hex" | cut -c$((hex_pos + 1))-$((hex_pos + 2))) + # Extract key byte (cyclic) + key_pos=$(( (byte_idx % 8) * 2 )) + key_byte_hex=$(printf '%s' "$XOR_KEY_HEX" | cut -c$((key_pos + 1))-$((key_pos + 2))) + # XOR + result=$(printf '%02x' "$(( 0x$raw_byte_hex ^ 0x$key_byte_hex ))") + header_hex="${header_hex}${result}" + byte_idx=$((byte_idx + 1)) + done + + # Verify magic after XOR + magic_hex=$(printf '%.8s' "$header_hex") + if [ "$magic_hex" != "00ea7263" ]; then + printf 'Invalid archive: bad magic bytes\n' >&2 + exit 1 + fi +fi + +# Write de-XORed header to temp file for field parsing +printf '%s' "$header_hex" | xxd -r -p > "$TMPDIR/header.bin" +# Now use read_le_u16/read_le_u32 on "$TMPDIR/header.bin" +``` + +### Kotlin XOR De-obfuscation + +```kotlin +// Source: FORMAT.md Section 9.1 + +val XOR_KEY = byteArrayOf( + 0xA5.toByte(), 0x3C, 0x96.toByte(), 0x0F, + 0xE1.toByte(), 0x7B, 0x4D, 0xC8.toByte() +) + +fun xorHeader(buf: ByteArray) { + for (i in 0 until minOf(buf.size, 40)) { + buf[i] = ((buf[i].toInt() and 0xFF) xor (XOR_KEY[i % 8].toInt() and 0xFF)).toByte() + } +} + +// In decode(): +val headerBytes = ByteArray(HEADER_SIZE) +raf.readFully(headerBytes) + +// Check magic before XOR +if (!(headerBytes[0] == MAGIC[0] && headerBytes[1] == MAGIC[1] && + headerBytes[2] == MAGIC[2] && headerBytes[3] == MAGIC[3])) { + // Attempt XOR de-obfuscation + xorHeader(headerBytes) +} + +val header = parseHeader(headerBytes) + +// If TOC encrypted: +if (header.flags and 0x02 != 0) { + raf.seek(header.tocOffset) + val encryptedToc = ByteArray(header.tocSize.toInt()) + raf.readFully(encryptedToc) + val decryptedToc = decryptAesCbc(encryptedToc, header.tocIv, KEY) + val entries = parseToc(decryptedToc, header.fileCount) + // ... proceed with entries +} +``` + +## State of the Art + +| Old Approach (current) | New Approach (Phase 6) | Impact | +|------------------------|------------------------|--------| +| Plaintext header with MAGIC visible | XOR-obfuscated header -- no recognizable bytes | `file` and `binwalk` cannot identify format | +| Plaintext TOC with filenames visible | AES-encrypted TOC -- `strings` reveals nothing | Hex editors see no metadata | +| Contiguous data blocks | Data blocks with random padding gaps | Size analysis of individual files is defeated | +| `flags = 0x01` (compression only) | `flags = 0x0F` (compression + all obfuscation) | All obfuscation active by default | + +**Nothing is deprecated:** The old approach still works (flags bits 1-3 = 0). The decoder always checks whether obfuscation is active and handles both cases. + +## Open Questions + +1. **Padding size range** + - What we know: `padding_after` is u16 (0-65535). FORMAT.md doesn't specify a recommended range. + - What's unclear: Should padding be uniformly random in a fixed range, or proportional to file size? + - Recommendation: Use a fixed range of 64-4096 bytes per file. This adds meaningful noise without significantly inflating archive size. The exact range is not spec-mandated, so the planner can decide. + +2. **Should obfuscation be the default or opt-in?** + - What we know: The spec says features "can be activated independently." Phase 6 success criteria say "all three decoders still produce byte-identical output after obfuscation is applied." + - What's unclear: Should `pack` always enable obfuscation, or should there be a `--no-obfuscate` flag? + - Recommendation: Always enable all three obfuscation features. The whole point of Phase 6 is hardening. Add a `--no-obfuscate` flag for backward compatibility testing only. This simplifies the implementation. + +3. **Existing test archives** + - What we know: Current tests create archives without obfuscation. + - What's unclear: Should existing tests still pass with obfuscation enabled by default? + - Recommendation: Existing round-trip tests should still pass because they test pack→unpack, and both sides will now use obfuscation. Golden test vectors for crypto primitives are unaffected. Cross-validation tests (Kotlin, Shell) need to be re-run against obfuscated archives. + +4. **Shell `cut` vs substring approach for hex processing** + - What we know: POSIX sh substring syntax (`${var:offset:length}`) is a bashism not available in strict POSIX sh. The current shell decoder uses `printf '%.2s'` and `${var#??}` patterns for string slicing. + - What's unclear: Is `cut -c` POSIX-compliant for hex byte extraction in the XOR loop? + - Recommendation: `cut -c` is POSIX-compliant and available in busybox. Use `printf '%s' "$hex" | cut -c$start-$end` for byte extraction. Alternatively, use the existing `${var#??}` pattern in a loop. Test with busybox sh. + +## Sources + +### Primary (HIGH confidence) +- FORMAT.md Sections 9.1-9.3 and Section 10 -- complete specification of all three obfuscation features, including XOR key, flag bits, decode order, and bootstrapping algorithm +- Existing codebase (src/format.rs, src/crypto.rs, src/archive.rs, kotlin/ArchiveDecoder.kt, shell/decode.sh) -- verified current implementation patterns + +### Secondary (MEDIUM confidence) +- [OpenSSL enc documentation](https://docs.openssl.org/3.3/man1/openssl-enc/) -- confirms `-K`/`-iv`/`-nosalt` raw key mode works with piped/file input for TOC decryption +- [Malwarebytes XOR obfuscation](https://www.threatdown.com/blog/nowhere-to-hide-three-methods-of-xor-obfuscation/) -- confirms XOR obfuscation is standard practice for hiding binary structure +- [Security Lab entropy analysis](https://securitylab.servicenow.com/research/2025-04-07-Binary-Data-Analysis-The-Role-of-Entropy/) -- confirms random padding disrupts entropy-based analysis tools + +### Tertiary (LOW confidence) +- None -- all findings verified against primary spec and codebase + +## Metadata + +**Confidence breakdown:** +- Standard stack: HIGH -- no new dependencies, all primitives already in codebase +- Architecture: HIGH -- FORMAT.md fully specifies all three features with byte-level precision +- Pitfalls: HIGH -- identified by analyzing actual code structure and known shell/Kotlin quirks + +**Research date:** 2026-02-25 +**Valid until:** 2026-03-25 (stable -- format spec is frozen for v1)