--- phase: 01-format-specification plan: 01 subsystem: docs tags: [binary-format, aes-256-cbc, hmac-sha256, pkcs7, gzip, specification] # Dependency graph requires: - phase: none provides: "First phase, no dependencies" provides: - "Complete binary format specification at docs/FORMAT.md" - "Byte-level field definitions for header (40 bytes), file table entries (variable), and data blocks" - "Encryption pipeline: SHA-256 -> gzip -> PKCS7 -> AES-256-CBC -> HMAC-SHA-256" - "Worked example: 2-file archive with annotated hex dump (323 bytes)" - "Shell decoder reference functions (read_le_u16, read_le_u32, verify_hmac, decrypt_file)" - "Kotlin decoder reference (decrypt, HMAC verify, gzip decompress, SHA-256 verify)" affects: [02-core-archiver, 03-round-trip-verification, 04-kotlin-decoder, 05-shell-decoder, 06-obfuscation-hardening] # Tech tracking tech-stack: added: [] patterns: [encrypt-then-mac, pkcs7-padding, little-endian-integers, length-prefixed-utf8-strings] key-files: created: [docs/FORMAT.md] modified: [] key-decisions: - "IV stored only in TOC, not duplicated in data blocks (simplifies dd extraction)" - "Same 32-byte key for AES-256-CBC and HMAC-SHA-256 in v1 (v2 will use HKDF)" - "Magic bytes 0x00 0xEA 0x72 0x63 (leading null signals binary, not in any known signature DB)" - "XOR obfuscation key: 0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8 (8-byte repeating)" - "HMAC input is IV (16 bytes) || ciphertext (encrypted_size bytes), no ambiguity" patterns-established: - "Field table pattern: offset, size, type, endian, field name, description for every binary structure" - "Worked example pattern: concrete inputs -> pipeline walkthrough -> hex dump -> shell decode commands" - "All offsets absolute from archive byte 0" requirements-completed: [FMT-05] # Metrics duration: 7min completed: 2026-02-24 --- # Phase 1 Plan 1: Format Specification Summary **Complete binary format spec with 40-byte header, variable-length TOC entries, AES-256-CBC encrypt-then-MAC pipeline, PKCS7 padding formula, obfuscation features (XOR/encrypted TOC/decoy padding), and a 323-byte worked example with annotated hex dump** ## Performance - **Duration:** 7 min - **Started:** 2026-02-24T20:15:59Z - **Completed:** 2026-02-24T20:22:38Z - **Tasks:** 2 - **Files modified:** 1 ## Accomplishments - Complete binary format specification (1031 lines) covering all 13 sections from overview through shell/Kotlin reference appendix - Every binary structure (header, file table entry, data block) has a field table with exact offsets, sizes, types, and endianness - Worked example with 2 files (hello.txt, data.bin) showing every byte of a 323-byte archive with real SHA-256 hashes and internally consistent offsets verified by automated script - Shell decode walkthrough with exact dd/xxd commands for every extraction step - Obfuscation features (XOR header, encrypted TOC, decoy padding) fully specified with byte ranges and activation flags ## Task Commits Each task was committed atomically: 1. **Task 1: Write format specification with byte-level field definitions** - `fcd37f5` (feat) 2. **Task 2: Write worked example with annotated hex dump and shell reference appendix** - `1c698fe` (feat) ## Files Created/Modified - `docs/FORMAT.md` - Complete binary format specification (1031 lines, 13 sections) ## Decisions Made - **IV storage location:** IV stored only in TOC entry, not duplicated at data block start. Differs from research suggestion of dual storage. Rationale: simpler dd extraction, TOC is always read before data blocks. - **HMAC key:** Same 32-byte key for both AES-256-CBC and HMAC-SHA-256 in v1. Cryptographically safe for this specific combination. v2 will use HKDF-derived subkeys. - **Magic bytes:** `0x00 0xEA 0x72 0x63`. Leading null byte signals binary. Verified not in known file signature databases. - **XOR obfuscation key:** `0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8` (8-byte repeating over 40-byte header). - **HMAC scope:** Explicitly defined as `IV (16 bytes) || ciphertext (encrypted_size bytes)`. No ambiguity. ## Deviations from Plan None - plan executed exactly as written. ## Issues Encountered None. ## User Setup Required None - no external service configuration required. ## Next Phase Readiness - Format specification is complete and ready to serve as the build target for Phase 2 (Core Archiver) - All three decoder implementations (Rust, Kotlin, Shell) have sufficient detail to implement against - Worked example can serve as golden reference for Phase 3 test vectors - Open questions from research resolved: HMAC uses same key as AES (v1), shell functions use xxd -p with manual byte reversal ## Self-Check: PASSED - FOUND: `docs/FORMAT.md` (1031 lines) - FOUND: `.planning/phases/01-format-specification/01-01-SUMMARY.md` - FOUND: commit `fcd37f5` (Task 1) - FOUND: commit `1c698fe` (Task 2) --- *Phase: 01-format-specification* *Completed: 2026-02-24*