docs(02): research phase domain — Rust crypto stack, binary format, CLI patterns

2026-02-24 23:44:54 +03:00
parent 9125b388da
commit cc7ff6db10
1 changed files with 468 additions and 0 deletions
--- a/.planning/phases/02-core-archiver/02-RESEARCH.md
+++ b/.planning/phases/02-core-archiver/02-RESEARCH.md
@@ -0,0 +1,468 @@
+# Phase 2: Core Archiver - Research
+
+**Researched:** 2026-02-24
+**Domain:** Rust CLI binary with custom binary format, AES-256-CBC encryption, gzip compression, HMAC-SHA-256 authentication
+**Confidence:** HIGH
+
+## Summary
+
+Phase 2 implements the core Rust CLI archiver from scratch (greenfield -- no existing source code). The tool must produce archives matching the FORMAT.md specification (v1) exactly: 40-byte fixed header, variable-length TOC with per-file metadata, and encrypted data blocks. The pipeline for each file is: SHA-256 hash -> gzip compress (optional) -> PKCS7 pad -> AES-256-CBC encrypt -> HMAC-SHA-256 authenticate.
+
+The Rust ecosystem has mature, well-tested crates for every component: `aes` + `cbc` for encryption, `hmac` + `sha2` for authentication and hashing, `flate2` for gzip, `clap` for CLI, `rand` for IV generation. All stable versions are compatible and compile together (verified). The full crypto pipeline (compress -> encrypt -> HMAC -> verify -> decrypt -> decompress -> verify SHA-256) was validated as a working Rust program during this research.
+
+**Primary recommendation:** Use stable RustCrypto crates (aes 0.8, cbc 0.1, hmac 0.12, sha2 0.10) rather than the 0.9/0.2/0.13/0.11 release candidates. The stable versions are battle-tested, have extensive documentation, and all compile together with Rust 1.93. Structure the project with clear module separation: `cli`, `format`, `crypto`, `compression`, `archive` (pack/unpack/inspect logic).
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| FMT-01 | Custom binary format with non-standard magic bytes (not recognized by binwalk/file/7z) | Magic bytes `0x00 0xEA 0x72 0x63` defined in FORMAT.md; leading null byte prevents `file` recognition. Binary serialization uses Rust std `to_le_bytes()`/`from_le_bytes()` -- no external crate needed. |
+| FMT-02 | Version field (1 byte) for forward compatibility | Simple u8 at offset 0x04; reject version != 1. Trivial to implement. |
+| FMT-03 | File table with metadata: name, sizes, offset, IV, HMAC, SHA-256 | Variable-length TOC entries (101 + name_length bytes each). UTF-8 filenames, length-prefixed. All field types are standard Rust primitives. |
+| FMT-04 | Little-endian for all multi-byte fields | Rust std: `u16::to_le_bytes()`, `u32::to_le_bytes()`, `u16::from_le_bytes()`, `u32::from_le_bytes()`. No external crate needed. |
+| ENC-01 | AES-256-CBC encryption per file | `aes 0.8.4` + `cbc 0.1.2` crates. Type alias: `type Aes256CbcEnc = cbc::Encryptor<aes::Aes256>`. Verified working. |
+| ENC-02 | HMAC-SHA-256 authentication (encrypt-then-MAC) per file | `hmac 0.12.1` + `sha2 0.10.9`. HMAC input = IV (16 bytes) \|\| ciphertext. Verified working. |
+| ENC-03 | Random 16-byte IV per file, stored in cleartext TOC | `rand 0.9.2`: `rand::rng().fill(&mut iv)`. ThreadRng is cryptographically secure (ChaCha-based with OS seeding). |
+| ENC-04 | Hardcoded 32-byte key | Const array `const KEY: [u8; 32] = [...]` in source. Same key for AES and HMAC in v1. |
+| ENC-05 | PKCS7 padding for AES-CBC | `cbc` crate handles PKCS7 via `encrypt_padded_mut::<Pkcs7>()`. Formula: `encrypted_size = ((compressed_size / 16) + 1) * 16`. Verified. |
+| CMP-01 | Gzip compression per file before encryption | `flate2 1.1.9`: `GzEncoder::new(Vec::new(), Compression::default())`. Use `GzBuilder::new().mtime(0)` for reproducible output in tests. |
+| CMP-02 | Per-file compression flag (skip for already-compressed files) | CLI `--no-compress` flag + extension-based auto-detection for `.apk`, `.zip`, `.png`, `.jpg`, `.jpeg`, `.gz`, `.bz2`, `.xz`, `.mp4`, `.mp3`. |
+| INT-01 | SHA-256 checksum per file (verify after decompression) | `sha2 0.10.9`: `Sha256::digest(&original_data)`. Computed BEFORE compression. Stored in TOC entry. |
+| CLI-01 | Rust CLI utility for archive creation (Linux/macOS) | `clap 4.5.60` with derive API. Binary target in `src/main.rs`. Standard cargo build. |
+| CLI-02 | Pack multiple files (text + APK) into one archive | `pack` subcommand accepts `Vec<PathBuf>` input files + `-o` output path. Reads files into memory (per Out of Scope: no streaming). |
+| CLI-03 | Subcommands: pack, unpack, inspect | Three subcommands via clap `#[derive(Subcommand)]`. `inspect` reads header + TOC only, displays metadata without decrypting data blocks. |
+</phase_requirements>
+
+## Standard Stack
+
+### Core
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| `aes` | 0.8.4 | AES-256 block cipher | RustCrypto official. 96M+ downloads. Pure Rust with hardware acceleration (AES-NI). |
+| `cbc` | 0.1.2 | CBC mode of operation | RustCrypto official. Handles PKCS7 padding natively via `block_padding::Pkcs7`. |
+| `hmac` | 0.12.1 | HMAC-SHA-256 computation | RustCrypto official. Constant-time comparison via `verify_slice()`. |
+| `sha2` | 0.10.9 | SHA-256 hashing | RustCrypto official. Both one-shot (`Sha256::digest()`) and streaming APIs. |
+| `flate2` | 1.1.9 | Gzip compression/decompression | De facto standard. Uses miniz_oxide (pure Rust) by default. |
+| `clap` | 4.5.60 | CLI argument parsing | Industry standard. Derive API for subcommands. |
+| `rand` | 0.9.2 | Cryptographic random IV generation | `rand::rng()` returns ChaCha-based CSPRNG with OS seeding. |
+| `anyhow` | 1.0.102 | Error handling | Ergonomic `Result<T>` with context. Standard for CLI apps. |
+
+### Supporting
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| (none -- std lib) | - | Little-endian serialization | `u16::to_le_bytes()`, `u32::from_le_bytes()` etc. Built into Rust std. |
+
+### Alternatives Considered
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| `aes` 0.8 + `cbc` 0.1 (stable) | `aes` 0.9-rc + `cbc` 0.2-rc (RC) | RC versions have newer API but are pre-release. Stable versions are battle-tested and fully compatible. Use stable. |
+| `byteorder` crate | Rust std `to_le_bytes()`/`from_le_bytes()` | std is sufficient since Rust 1.32. No external crate needed. |
+| `ring` (Google) | RustCrypto stack | `ring` does not expose AES-CBC. It focuses on AEAD modes (AES-GCM). Not suitable for this format. |
+| `openssl` crate | RustCrypto stack | Links to C library. RustCrypto is pure Rust, no system dependencies. Simpler cross-compilation. |
+| `serde` + `bincode` | Manual binary serialization | Format spec requires exact byte layout. Manual serialization gives precise control over every byte. Serde/bincode add unnecessary abstraction for a fixed binary format. |
+
+**Installation:**
+```bash
+cargo init --name encrypted_archive
+cargo add aes@0.8 cbc@0.1 hmac@0.12 sha2@0.10 flate2@1.1 clap@4.5 --features clap/derive rand@0.9 anyhow@1.0
+```
+
+## Architecture Patterns
+
+### Recommended Project Structure
+```
+encrypted_archive/
+├── Cargo.toml
+├── src/
+│   ├── main.rs              # Entry point: clap CLI parsing, dispatch to commands
+│   ├── cli.rs               # Clap derive structs (Cli, Commands enum)
+│   ├── format.rs            # Binary format constants, header/TOC structs, serialization/deserialization
+│   ├── crypto.rs            # encrypt_file(), decrypt_file(), compute_hmac(), verify_hmac()
+│   ├── compression.rs       # compress(), decompress(), should_compress()
+│   ├── archive.rs           # pack(), unpack(), inspect() -- orchestration logic
+│   └── key.rs               # Hardcoded 32-byte key constant
+├── docs/
+│   └── FORMAT.md            # Binary format specification (already exists)
+└── tests/                   # Integration tests (Phase 3)
+```
+
+### Pattern 1: Pipeline Processing per File
+**What:** Each file goes through a sequential pipeline: hash -> compress -> pad+encrypt -> HMAC
+**When to use:** Always during `pack` operation
+**Example:**
+```rust
+// Source: Verified working pipeline from research validation
+use aes::cipher::{block_padding::Pkcs7, BlockEncryptMut, KeyIvInit};
+use hmac::{Hmac, Mac};
+use sha2::{Sha256, Digest};
+use flate2::write::GzEncoder;
+use flate2::Compression;
+use std::io::Write;
+
+type Aes256CbcEnc = cbc::Encryptor<aes::Aes256>;
+type HmacSha256 = Hmac<Sha256>;
+
+struct ProcessedFile {
+    name: String,
+    original_size: u32,
+    compressed_size: u32,
+    encrypted_size: u32,
+    iv: [u8; 16],
+    hmac: [u8; 32],
+    sha256: [u8; 32],
+    compression_flag: u8,
+    ciphertext: Vec<u8>,
+}
+
+fn process_file(name: &str, data: &[u8], key: &[u8; 32], compress: bool) -> ProcessedFile {
+    // Step 1: SHA-256 of original
+    let sha256: [u8; 32] = Sha256::digest(data).into();
+
+    // Step 2: Compress (optional)
+    let compressed = if compress {
+        let mut encoder = GzEncoder::new(Vec::new(), Compression::default());
+        encoder.write_all(data).unwrap();
+        encoder.finish().unwrap()
+    } else {
+        data.to_vec()
+    };
+
+    // Step 3: Generate random IV
+    let mut iv = [0u8; 16];
+    rand::rng().fill(&mut iv);
+
+    // Step 4: Encrypt with PKCS7 padding
+    let encrypted_size = ((compressed.len() / 16) + 1) * 16;
+    let mut buf = vec![0u8; encrypted_size];
+    buf[..compressed.len()].copy_from_slice(&compressed);
+    let ciphertext = Aes256CbcEnc::new(key.into(), &iv.into())
+        .encrypt_padded_mut::<Pkcs7>(&mut buf, compressed.len())
+        .unwrap()
+        .to_vec();
+
+    // Step 5: HMAC-SHA-256 over IV || ciphertext
+    let mut mac = HmacSha256::new_from_slice(key).unwrap();
+    mac.update(&iv);
+    mac.update(&ciphertext);
+    let hmac: [u8; 32] = mac.finalize().into_bytes().into();
+
+    ProcessedFile {
+        name: name.to_string(),
+        original_size: data.len() as u32,
+        compressed_size: compressed.len() as u32,
+        encrypted_size: encrypted_size as u32,
+        iv,
+        hmac,
+        sha256,
+        compression_flag: if compress { 1 } else { 0 },
+        ciphertext,
+    }
+}
+```
+
+### Pattern 2: Two-Pass Archive Writing
+**What:** First pass processes all files to compute sizes and offsets; second pass writes the archive sequentially.
+**When to use:** Always during `pack`. The TOC must contain `data_offset` for each file, but data blocks come after the TOC. You must know TOC size before writing data blocks.
+**Example:**
+```rust
+fn compute_offsets(files: &mut [ProcessedFile], file_count: u16) {
+    let header_size: u32 = 40;
+
+    // Compute TOC size
+    let toc_size: u32 = files.iter()
+        .map(|f| 101 + f.name.len() as u32)
+        .sum();
+
+    let toc_offset = header_size;
+    let mut data_offset = toc_offset + toc_size;
+
+    // Assign data offsets
+    for file in files.iter_mut() {
+        file.data_offset = data_offset;
+        data_offset += file.encrypted_size;
+        // padding_after = 0 in Phase 2 (no decoy padding)
+    }
+}
+```
+
+### Pattern 3: CLI Subcommand Dispatch
+**What:** Use clap derive API with an enum of subcommands
+**When to use:** Always for the CLI entry point
+**Example:**
+```rust
+// Source: Verified working clap derive pattern from research validation
+use clap::{Parser, Subcommand};
+use std::path::PathBuf;
+
+#[derive(Parser)]
+#[command(name = "encrypted_archive")]
+#[command(about = "Custom encrypted archive tool")]
+struct Cli {
+    #[command(subcommand)]
+    command: Commands,
+}
+
+#[derive(Subcommand)]
+enum Commands {
+    /// Pack files into an encrypted archive
+    Pack {
+        /// Input files to archive
+        #[arg(required = true)]
+        files: Vec<PathBuf>,
+        /// Output archive file
+        #[arg(short, long)]
+        output: PathBuf,
+        /// Disable compression for specified files
+        #[arg(long)]
+        no_compress: Vec<String>,
+    },
+    /// Unpack an encrypted archive (for testing)
+    Unpack {
+        /// Archive file to unpack
+        archive: PathBuf,
+        /// Output directory
+        #[arg(short, long, default_value = ".")]
+        output_dir: PathBuf,
+    },
+    /// Inspect archive metadata without decrypting
+    Inspect {
+        /// Archive file to inspect
+        archive: PathBuf,
+    },
+}
+```
+
+### Anti-Patterns to Avoid
+- **Streaming writes without knowing offsets:** The TOC contains `data_offset` for each file. You MUST compute all offsets before writing the TOC. Process all files first, then serialize.
+- **Using serde/bincode for binary format:** The format spec requires exact byte-level control. Manual serialization with `to_le_bytes()` is correct and simpler.
+- **Single large buffer for entire archive:** Process and encrypt files individually, write them sequentially. Each file should be processed independently.
+- **Reusing IVs:** Each file MUST have a unique random IV. Never reuse IVs across files or archive creations.
+- **MAC-then-encrypt:** The spec mandates encrypt-then-MAC. HMAC MUST be computed over `IV || ciphertext`, NOT over plaintext.
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| AES-256-CBC encryption | Custom AES implementation | `aes 0.8` + `cbc 0.1` crates | Side-channel resistance, hardware acceleration, audited |
+| PKCS7 padding | Manual padding logic | `cbc` crate's `Pkcs7` padding (via `block_padding`) | Off-by-one errors in padding are security-critical |
+| HMAC-SHA-256 | Manual HMAC construction | `hmac 0.12` crate | Constant-time comparison, correct key scheduling |
+| SHA-256 hashing | Custom hash | `sha2 0.10` crate | Correctness, performance, hardware acceleration |
+| Gzip compression | Custom deflate | `flate2 1.1` crate | RFC 1952 compliance, performance, battle-tested |
+| CLI argument parsing | Manual arg parsing | `clap 4.5` with derive | Validation, help text, error messages, subcommands |
+| Random IV generation | Custom RNG | `rand 0.9` with `rand::rng()` | CSPRNG with OS seeding, no bias |
+| Little-endian serialization | Manual byte shifting | Rust std `to_le_bytes()`/`from_le_bytes()` | Built-in, zero-cost, correct |
+
+**Key insight:** Every component in the encryption pipeline is security-sensitive. Using audited, well-tested crates for crypto operations is not optional -- hand-rolled crypto is the single highest-risk anti-pattern in this domain.
+
+## Common Pitfalls
+
+### Pitfall 1: Buffer Sizing for `encrypt_padded_mut`
+**What goes wrong:** `PadError` at runtime because the buffer is too small for PKCS7-padded output.
+**Why it happens:** PKCS7 ALWAYS adds at least 1 byte. When input is a multiple of 16, a full 16-byte padding block is added. Formula: `((input_len / 16) + 1) * 16`.
+**How to avoid:** Always allocate `encrypted_size = ((compressed_size / 16) + 1) * 16` bytes for the encryption buffer. Copy compressed data to the start, then call `encrypt_padded_mut` with `compressed_size` as the plaintext length.
+**Warning signs:** `PadError` or `unwrap()` panic during encryption.
+
+### Pitfall 2: Gzip Non-Determinism in Tests
+**What goes wrong:** Gzip output varies between runs (different `compressed_size`), making golden tests impossible.
+**Why it happens:** Gzip headers contain a timestamp (`mtime`) and OS byte that vary.
+**How to avoid:** Use `GzBuilder::new().mtime(0).write(Vec::new(), Compression::default())` to zero out the timestamp. The OS byte defaults to the build platform but is consistent on the same machine.
+**Warning signs:** `compressed_size` changes between test runs for identical input.
+
+### Pitfall 3: Incorrect HMAC Scope
+**What goes wrong:** HMAC computed over wrong data (just ciphertext, or including TOC metadata).
+**Why it happens:** Ambiguity about what "encrypt-then-MAC" covers.
+**How to avoid:** FORMAT.md is explicit: `HMAC_input = IV (16 bytes) || ciphertext (encrypted_size bytes)`. Nothing else. The IV from the TOC entry, concatenated with the ciphertext from the data block.
+**Warning signs:** HMAC verification failures in other decoders (Kotlin, shell).
+
+### Pitfall 4: TOC Offset Calculation Errors
+**What goes wrong:** Data blocks written at wrong offsets; decoders read garbage.
+**Why it happens:** Variable-length filename fields make TOC entry sizes differ. Off-by-one in offset arithmetic.
+**How to avoid:** Use the formula from FORMAT.md: `entry_size = 101 + name_length`. Total TOC size = sum of all entry sizes. First data block offset = `toc_offset + toc_size`. Each subsequent data block offset = previous offset + previous `encrypted_size`.
+**Warning signs:** `inspect` command shows corrupted filenames or impossible sizes.
+
+### Pitfall 5: Endianness Errors
+**What goes wrong:** Multi-byte fields written in big-endian or native-endian instead of little-endian.
+**Why it happens:** Forgetting to convert, or using wrong conversion function.
+**How to avoid:** Always use `value.to_le_bytes()` when writing and `u32::from_le_bytes([b0, b1, b2, b3])` when reading. Never use `to_ne_bytes()` or `to_be_bytes()`.
+**Warning signs:** Values look "swapped" when inspecting hex dump. Shell decoder reads wrong numbers.
+
+### Pitfall 6: UTF-8 Filename Length vs. Character Count
+**What goes wrong:** `name_length` field stores character count instead of byte count.
+**Why it happens:** Confusion between `str.len()` (byte count, correct) and `str.chars().count()` (character count, wrong).
+**How to avoid:** FORMAT.md specifies `name_length` as "Filename length in bytes (UTF-8 encoded byte count)". In Rust, `String::len()` returns byte count, which is correct.
+**Warning signs:** Non-ASCII filenames (Cyrillic) cause parsing errors in decoders.
+
+### Pitfall 7: Forgetting Flags Byte
+**What goes wrong:** Archive header has wrong flags, decoders misinterpret format features.
+**Why it happens:** Phase 2 uses only bit 0 (compression). Bits 1-7 must be zero.
+**How to avoid:** Set `flags = 0x01` when any file uses compression (global flag), `flags = 0x00` when no files use compression. Bits 1-3 are for Phase 6 obfuscation features. Bits 4-7 MUST be zero.
+**Warning signs:** Decoders reject archive due to unknown flags.
+
+## Code Examples
+
+Verified patterns from official sources and research validation:
+
+### Binary Format Serialization (Header)
+```rust
+// Source: FORMAT.md Section 4 + Rust std library
+fn write_header(
+    writer: &mut impl std::io::Write,
+    file_count: u16,
+    toc_offset: u32,
+    toc_size: u32,
+    flags: u8,
+) -> std::io::Result<()> {
+    // Magic bytes
+    writer.write_all(&[0x00, 0xEA, 0x72, 0x63])?;
+    // Version
+    writer.write_all(&[0x01])?;
+    // Flags
+    writer.write_all(&[flags])?;
+    // File count (LE)
+    writer.write_all(&file_count.to_le_bytes())?;
+    // TOC offset (LE)
+    writer.write_all(&toc_offset.to_le_bytes())?;
+    // TOC size (LE)
+    writer.write_all(&toc_size.to_le_bytes())?;
+    // TOC IV (zero-filled, TOC not encrypted in Phase 2)
+    writer.write_all(&[0u8; 16])?;
+    // Reserved
+    writer.write_all(&[0u8; 8])?;
+    Ok(())
+}
+```
+
+### TOC Entry Serialization
+```rust
+// Source: FORMAT.md Section 5
+fn write_toc_entry(
+    writer: &mut impl std::io::Write,
+    file: &ProcessedFile,
+    data_offset: u32,
+) -> std::io::Result<()> {
+    let name_bytes = file.name.as_bytes();
+    writer.write_all(&(name_bytes.len() as u16).to_le_bytes())?;
+    writer.write_all(name_bytes)?;
+    writer.write_all(&file.original_size.to_le_bytes())?;
+    writer.write_all(&file.compressed_size.to_le_bytes())?;
+    writer.write_all(&file.encrypted_size.to_le_bytes())?;
+    writer.write_all(&data_offset.to_le_bytes())?;
+    writer.write_all(&file.iv)?;
+    writer.write_all(&file.hmac)?;
+    writer.write_all(&file.sha256)?;
+    writer.write_all(&[file.compression_flag])?;
+    writer.write_all(&0u16.to_le_bytes())?; // padding_after = 0
+    Ok(())
+}
+```
+
+### Inspect Command (Read Header + TOC Only)
+```rust
+// Source: FORMAT.md Section 10, steps 1-4
+use std::io::{Read, Seek, SeekFrom};
+
+fn read_header(reader: &mut impl Read) -> anyhow::Result<Header> {
+    let mut buf = [0u8; 40];
+    reader.read_exact(&mut buf)?;
+
+    // Verify magic
+    anyhow::ensure!(
+        buf[0..4] == [0x00, 0xEA, 0x72, 0x63],
+        "Invalid magic bytes"
+    );
+
+    let version = buf[4];
+    anyhow::ensure!(version == 1, "Unsupported version: {}", version);
+
+    let flags = buf[5];
+    anyhow::ensure!(flags & 0xF0 == 0, "Unknown flags set: 0x{:02X}", flags);
+
+    let file_count = u16::from_le_bytes([buf[6], buf[7]]);
+    let toc_offset = u32::from_le_bytes([buf[8], buf[9], buf[10], buf[11]]);
+    let toc_size = u32::from_le_bytes([buf[12], buf[13], buf[14], buf[15]]);
+
+    Ok(Header { version, flags, file_count, toc_offset, toc_size })
+}
+```
+
+### Compression Decision Heuristic
+```rust
+// Source: FORMAT.md Section 8 recommendation
+fn should_compress(filename: &str, no_compress_list: &[String]) -> bool {
+    // Explicit exclusion from CLI
+    if no_compress_list.iter().any(|nc| filename.ends_with(nc) || filename == nc) {
+        return false;
+    }
+    // Auto-detect already-compressed formats
+    let ext = filename.rsplit('.').next().unwrap_or("").to_lowercase();
+    !matches!(
+        ext.as_str(),
+        "apk" | "zip" | "gz" | "bz2" | "xz" | "zst"
+        | "png" | "jpg" | "jpeg" | "gif" | "webp"
+        | "mp4" | "mp3" | "aac" | "ogg" | "flac"
+        | "7z" | "rar" | "jar"
+    )
+}
+```
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| `block-modes 0.8` crate | `cbc 0.1` crate (separate crate per mode) | 2022 | `block-modes` is deprecated. Use `cbc` directly. |
+| `rand::thread_rng()` | `rand::rng()` | rand 0.9 (2025) | Function renamed. Same underlying ChaCha CSPRNG. |
+| `GenericArray` for keys/IVs | `.into()` conversion from `[u8; N]` | aes/cbc 0.8/0.1 | Can pass `&key.into()` directly from fixed arrays. |
+| `byteorder` crate | Rust std `to_le_bytes()`/`from_le_bytes()` | Rust 1.32 (2018) | No external crate needed for endian conversion. |
+
+**Deprecated/outdated:**
+- `block-modes` crate: Replaced by individual mode crates (`cbc`, `ecb`, `cfb`, `ofb`). Do NOT use `block-modes`.
+- `rand::thread_rng()`: Renamed to `rand::rng()` in 0.9. The old name is removed.
+- `crypto-mac` crate: Merged into `digest` 0.10. Use `hmac 0.12` which uses `digest 0.10` internally.
+
+## Open Questions
+
+1. **Hardcoded key value**
+   - What we know: The key is 32 bytes, hardcoded, shared across all decoders.
+   - What's unclear: The specific key bytes are not defined in FORMAT.md (only the worked example uses `00 01 02 ... 1F`).
+   - Recommendation: Define a non-trivial key constant in `src/key.rs`. The planner should decide the actual key bytes or generate them randomly once. The worked example key is fine for testing but should be replaced for production.
+
+2. **Error handling strategy for `unpack`**
+   - What we know: FORMAT.md says "MUST reject" on HMAC failure, "MUST fail" on bad version.
+   - What's unclear: Should `unpack` abort on first file error, or continue extracting other files?
+   - Recommendation: Abort on header/TOC errors. For per-file errors (HMAC mismatch, SHA-256 mismatch), report the error but continue extracting remaining files (with a non-zero exit code at the end).
+
+3. **Maximum file size constraint (u32)**
+   - What we know: `original_size`, `compressed_size`, `encrypted_size` are all u32 (max ~4 GB).
+   - What's unclear: Should the archiver check and reject files > 4 GB?
+   - Recommendation: Yes, validate file sizes during `pack` and produce a clear error if any file exceeds `u32::MAX`. This is acceptable given the Out of Scope note ("files fit in memory").
+
+## Sources
+
+### Primary (HIGH confidence)
+- `docs/FORMAT.md` v1.0 -- The normative specification for the binary format. All byte offsets, field sizes, and pipeline steps are from this document.
+- `docs.rs/aes/0.8.4` -- AES crate API documentation
+- `docs.rs/cbc/0.1.2` -- CBC mode crate API documentation and usage examples
+- `docs.rs/hmac/0.12.1` -- HMAC crate API documentation and usage examples
+- `docs.rs/sha2/0.10.9` -- SHA-2 crate API documentation
+- `docs.rs/flate2/1.1.9` -- flate2 crate API documentation (GzEncoder, GzDecoder, GzBuilder)
+- `docs.rs/clap/4.5.60` -- Clap CLI crate documentation
+- `docs.rs/rand/0.9.2` -- Rand crate documentation
+- **Research validation:** Full pipeline (compress -> encrypt -> HMAC -> verify -> decrypt -> decompress -> verify) was compiled and executed successfully as a Rust program during this research.
+
+### Secondary (MEDIUM confidence)
+- `crates.io` version listings -- Latest stable versions verified via `cargo search` and crates.io API
+- `rust-random.github.io/book` -- Rand book confirming ThreadRng is ChaCha-based CSPRNG
+
+### Tertiary (LOW confidence)
+- None. All findings are verified against official documentation and compilation tests.
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH -- All crates verified via `cargo check`, full pipeline compiled and executed
+- Architecture: HIGH -- Follows standard Rust CLI patterns; FORMAT.md provides exact byte-level specification
+- Pitfalls: HIGH -- Common issues identified from official docs, GitHub issues, and practical validation
+
+**Research date:** 2026-02-24
+**Valid until:** 2026-04-24 (stable crates, slow-moving ecosystem)