docs(02): research phase domain — Rust crypto stack, binary format, CLI patterns

This commit is contained in:
NikitolProject
2026-02-24 23:44:54 +03:00
parent 9125b388da
commit cc7ff6db10

View File

@@ -0,0 +1,468 @@
# Phase 2: Core Archiver - Research
**Researched:** 2026-02-24
**Domain:** Rust CLI binary with custom binary format, AES-256-CBC encryption, gzip compression, HMAC-SHA-256 authentication
**Confidence:** HIGH
## Summary
Phase 2 implements the core Rust CLI archiver from scratch (greenfield -- no existing source code). The tool must produce archives matching the FORMAT.md specification (v1) exactly: 40-byte fixed header, variable-length TOC with per-file metadata, and encrypted data blocks. The pipeline for each file is: SHA-256 hash -> gzip compress (optional) -> PKCS7 pad -> AES-256-CBC encrypt -> HMAC-SHA-256 authenticate.
The Rust ecosystem has mature, well-tested crates for every component: `aes` + `cbc` for encryption, `hmac` + `sha2` for authentication and hashing, `flate2` for gzip, `clap` for CLI, `rand` for IV generation. All stable versions are compatible and compile together (verified). The full crypto pipeline (compress -> encrypt -> HMAC -> verify -> decrypt -> decompress -> verify SHA-256) was validated as a working Rust program during this research.
**Primary recommendation:** Use stable RustCrypto crates (aes 0.8, cbc 0.1, hmac 0.12, sha2 0.10) rather than the 0.9/0.2/0.13/0.11 release candidates. The stable versions are battle-tested, have extensive documentation, and all compile together with Rust 1.93. Structure the project with clear module separation: `cli`, `format`, `crypto`, `compression`, `archive` (pack/unpack/inspect logic).
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| FMT-01 | Custom binary format with non-standard magic bytes (not recognized by binwalk/file/7z) | Magic bytes `0x00 0xEA 0x72 0x63` defined in FORMAT.md; leading null byte prevents `file` recognition. Binary serialization uses Rust std `to_le_bytes()`/`from_le_bytes()` -- no external crate needed. |
| FMT-02 | Version field (1 byte) for forward compatibility | Simple u8 at offset 0x04; reject version != 1. Trivial to implement. |
| FMT-03 | File table with metadata: name, sizes, offset, IV, HMAC, SHA-256 | Variable-length TOC entries (101 + name_length bytes each). UTF-8 filenames, length-prefixed. All field types are standard Rust primitives. |
| FMT-04 | Little-endian for all multi-byte fields | Rust std: `u16::to_le_bytes()`, `u32::to_le_bytes()`, `u16::from_le_bytes()`, `u32::from_le_bytes()`. No external crate needed. |
| ENC-01 | AES-256-CBC encryption per file | `aes 0.8.4` + `cbc 0.1.2` crates. Type alias: `type Aes256CbcEnc = cbc::Encryptor<aes::Aes256>`. Verified working. |
| ENC-02 | HMAC-SHA-256 authentication (encrypt-then-MAC) per file | `hmac 0.12.1` + `sha2 0.10.9`. HMAC input = IV (16 bytes) \|\| ciphertext. Verified working. |
| ENC-03 | Random 16-byte IV per file, stored in cleartext TOC | `rand 0.9.2`: `rand::rng().fill(&mut iv)`. ThreadRng is cryptographically secure (ChaCha-based with OS seeding). |
| ENC-04 | Hardcoded 32-byte key | Const array `const KEY: [u8; 32] = [...]` in source. Same key for AES and HMAC in v1. |
| ENC-05 | PKCS7 padding for AES-CBC | `cbc` crate handles PKCS7 via `encrypt_padded_mut::<Pkcs7>()`. Formula: `encrypted_size = ((compressed_size / 16) + 1) * 16`. Verified. |
| CMP-01 | Gzip compression per file before encryption | `flate2 1.1.9`: `GzEncoder::new(Vec::new(), Compression::default())`. Use `GzBuilder::new().mtime(0)` for reproducible output in tests. |
| CMP-02 | Per-file compression flag (skip for already-compressed files) | CLI `--no-compress` flag + extension-based auto-detection for `.apk`, `.zip`, `.png`, `.jpg`, `.jpeg`, `.gz`, `.bz2`, `.xz`, `.mp4`, `.mp3`. |
| INT-01 | SHA-256 checksum per file (verify after decompression) | `sha2 0.10.9`: `Sha256::digest(&original_data)`. Computed BEFORE compression. Stored in TOC entry. |
| CLI-01 | Rust CLI utility for archive creation (Linux/macOS) | `clap 4.5.60` with derive API. Binary target in `src/main.rs`. Standard cargo build. |
| CLI-02 | Pack multiple files (text + APK) into one archive | `pack` subcommand accepts `Vec<PathBuf>` input files + `-o` output path. Reads files into memory (per Out of Scope: no streaming). |
| CLI-03 | Subcommands: pack, unpack, inspect | Three subcommands via clap `#[derive(Subcommand)]`. `inspect` reads header + TOC only, displays metadata without decrypting data blocks. |
</phase_requirements>
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| `aes` | 0.8.4 | AES-256 block cipher | RustCrypto official. 96M+ downloads. Pure Rust with hardware acceleration (AES-NI). |
| `cbc` | 0.1.2 | CBC mode of operation | RustCrypto official. Handles PKCS7 padding natively via `block_padding::Pkcs7`. |
| `hmac` | 0.12.1 | HMAC-SHA-256 computation | RustCrypto official. Constant-time comparison via `verify_slice()`. |
| `sha2` | 0.10.9 | SHA-256 hashing | RustCrypto official. Both one-shot (`Sha256::digest()`) and streaming APIs. |
| `flate2` | 1.1.9 | Gzip compression/decompression | De facto standard. Uses miniz_oxide (pure Rust) by default. |
| `clap` | 4.5.60 | CLI argument parsing | Industry standard. Derive API for subcommands. |
| `rand` | 0.9.2 | Cryptographic random IV generation | `rand::rng()` returns ChaCha-based CSPRNG with OS seeding. |
| `anyhow` | 1.0.102 | Error handling | Ergonomic `Result<T>` with context. Standard for CLI apps. |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| (none -- std lib) | - | Little-endian serialization | `u16::to_le_bytes()`, `u32::from_le_bytes()` etc. Built into Rust std. |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| `aes` 0.8 + `cbc` 0.1 (stable) | `aes` 0.9-rc + `cbc` 0.2-rc (RC) | RC versions have newer API but are pre-release. Stable versions are battle-tested and fully compatible. Use stable. |
| `byteorder` crate | Rust std `to_le_bytes()`/`from_le_bytes()` | std is sufficient since Rust 1.32. No external crate needed. |
| `ring` (Google) | RustCrypto stack | `ring` does not expose AES-CBC. It focuses on AEAD modes (AES-GCM). Not suitable for this format. |
| `openssl` crate | RustCrypto stack | Links to C library. RustCrypto is pure Rust, no system dependencies. Simpler cross-compilation. |
| `serde` + `bincode` | Manual binary serialization | Format spec requires exact byte layout. Manual serialization gives precise control over every byte. Serde/bincode add unnecessary abstraction for a fixed binary format. |
**Installation:**
```bash
cargo init --name encrypted_archive
cargo add aes@0.8 cbc@0.1 hmac@0.12 sha2@0.10 flate2@1.1 clap@4.5 --features clap/derive rand@0.9 anyhow@1.0
```
## Architecture Patterns
### Recommended Project Structure
```
encrypted_archive/
├── Cargo.toml
├── src/
│ ├── main.rs # Entry point: clap CLI parsing, dispatch to commands
│ ├── cli.rs # Clap derive structs (Cli, Commands enum)
│ ├── format.rs # Binary format constants, header/TOC structs, serialization/deserialization
│ ├── crypto.rs # encrypt_file(), decrypt_file(), compute_hmac(), verify_hmac()
│ ├── compression.rs # compress(), decompress(), should_compress()
│ ├── archive.rs # pack(), unpack(), inspect() -- orchestration logic
│ └── key.rs # Hardcoded 32-byte key constant
├── docs/
│ └── FORMAT.md # Binary format specification (already exists)
└── tests/ # Integration tests (Phase 3)
```
### Pattern 1: Pipeline Processing per File
**What:** Each file goes through a sequential pipeline: hash -> compress -> pad+encrypt -> HMAC
**When to use:** Always during `pack` operation
**Example:**
```rust
// Source: Verified working pipeline from research validation
use aes::cipher::{block_padding::Pkcs7, BlockEncryptMut, KeyIvInit};
use hmac::{Hmac, Mac};
use sha2::{Sha256, Digest};
use flate2::write::GzEncoder;
use flate2::Compression;
use std::io::Write;
type Aes256CbcEnc = cbc::Encryptor<aes::Aes256>;
type HmacSha256 = Hmac<Sha256>;
struct ProcessedFile {
name: String,
original_size: u32,
compressed_size: u32,
encrypted_size: u32,
iv: [u8; 16],
hmac: [u8; 32],
sha256: [u8; 32],
compression_flag: u8,
ciphertext: Vec<u8>,
}
fn process_file(name: &str, data: &[u8], key: &[u8; 32], compress: bool) -> ProcessedFile {
// Step 1: SHA-256 of original
let sha256: [u8; 32] = Sha256::digest(data).into();
// Step 2: Compress (optional)
let compressed = if compress {
let mut encoder = GzEncoder::new(Vec::new(), Compression::default());
encoder.write_all(data).unwrap();
encoder.finish().unwrap()
} else {
data.to_vec()
};
// Step 3: Generate random IV
let mut iv = [0u8; 16];
rand::rng().fill(&mut iv);
// Step 4: Encrypt with PKCS7 padding
let encrypted_size = ((compressed.len() / 16) + 1) * 16;
let mut buf = vec![0u8; encrypted_size];
buf[..compressed.len()].copy_from_slice(&compressed);
let ciphertext = Aes256CbcEnc::new(key.into(), &iv.into())
.encrypt_padded_mut::<Pkcs7>(&mut buf, compressed.len())
.unwrap()
.to_vec();
// Step 5: HMAC-SHA-256 over IV || ciphertext
let mut mac = HmacSha256::new_from_slice(key).unwrap();
mac.update(&iv);
mac.update(&ciphertext);
let hmac: [u8; 32] = mac.finalize().into_bytes().into();
ProcessedFile {
name: name.to_string(),
original_size: data.len() as u32,
compressed_size: compressed.len() as u32,
encrypted_size: encrypted_size as u32,
iv,
hmac,
sha256,
compression_flag: if compress { 1 } else { 0 },
ciphertext,
}
}
```
### Pattern 2: Two-Pass Archive Writing
**What:** First pass processes all files to compute sizes and offsets; second pass writes the archive sequentially.
**When to use:** Always during `pack`. The TOC must contain `data_offset` for each file, but data blocks come after the TOC. You must know TOC size before writing data blocks.
**Example:**
```rust
fn compute_offsets(files: &mut [ProcessedFile], file_count: u16) {
let header_size: u32 = 40;
// Compute TOC size
let toc_size: u32 = files.iter()
.map(|f| 101 + f.name.len() as u32)
.sum();
let toc_offset = header_size;
let mut data_offset = toc_offset + toc_size;
// Assign data offsets
for file in files.iter_mut() {
file.data_offset = data_offset;
data_offset += file.encrypted_size;
// padding_after = 0 in Phase 2 (no decoy padding)
}
}
```
### Pattern 3: CLI Subcommand Dispatch
**What:** Use clap derive API with an enum of subcommands
**When to use:** Always for the CLI entry point
**Example:**
```rust
// Source: Verified working clap derive pattern from research validation
use clap::{Parser, Subcommand};
use std::path::PathBuf;
#[derive(Parser)]
#[command(name = "encrypted_archive")]
#[command(about = "Custom encrypted archive tool")]
struct Cli {
#[command(subcommand)]
command: Commands,
}
#[derive(Subcommand)]
enum Commands {
/// Pack files into an encrypted archive
Pack {
/// Input files to archive
#[arg(required = true)]
files: Vec<PathBuf>,
/// Output archive file
#[arg(short, long)]
output: PathBuf,
/// Disable compression for specified files
#[arg(long)]
no_compress: Vec<String>,
},
/// Unpack an encrypted archive (for testing)
Unpack {
/// Archive file to unpack
archive: PathBuf,
/// Output directory
#[arg(short, long, default_value = ".")]
output_dir: PathBuf,
},
/// Inspect archive metadata without decrypting
Inspect {
/// Archive file to inspect
archive: PathBuf,
},
}
```
### Anti-Patterns to Avoid
- **Streaming writes without knowing offsets:** The TOC contains `data_offset` for each file. You MUST compute all offsets before writing the TOC. Process all files first, then serialize.
- **Using serde/bincode for binary format:** The format spec requires exact byte-level control. Manual serialization with `to_le_bytes()` is correct and simpler.
- **Single large buffer for entire archive:** Process and encrypt files individually, write them sequentially. Each file should be processed independently.
- **Reusing IVs:** Each file MUST have a unique random IV. Never reuse IVs across files or archive creations.
- **MAC-then-encrypt:** The spec mandates encrypt-then-MAC. HMAC MUST be computed over `IV || ciphertext`, NOT over plaintext.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| AES-256-CBC encryption | Custom AES implementation | `aes 0.8` + `cbc 0.1` crates | Side-channel resistance, hardware acceleration, audited |
| PKCS7 padding | Manual padding logic | `cbc` crate's `Pkcs7` padding (via `block_padding`) | Off-by-one errors in padding are security-critical |
| HMAC-SHA-256 | Manual HMAC construction | `hmac 0.12` crate | Constant-time comparison, correct key scheduling |
| SHA-256 hashing | Custom hash | `sha2 0.10` crate | Correctness, performance, hardware acceleration |
| Gzip compression | Custom deflate | `flate2 1.1` crate | RFC 1952 compliance, performance, battle-tested |
| CLI argument parsing | Manual arg parsing | `clap 4.5` with derive | Validation, help text, error messages, subcommands |
| Random IV generation | Custom RNG | `rand 0.9` with `rand::rng()` | CSPRNG with OS seeding, no bias |
| Little-endian serialization | Manual byte shifting | Rust std `to_le_bytes()`/`from_le_bytes()` | Built-in, zero-cost, correct |
**Key insight:** Every component in the encryption pipeline is security-sensitive. Using audited, well-tested crates for crypto operations is not optional -- hand-rolled crypto is the single highest-risk anti-pattern in this domain.
## Common Pitfalls
### Pitfall 1: Buffer Sizing for `encrypt_padded_mut`
**What goes wrong:** `PadError` at runtime because the buffer is too small for PKCS7-padded output.
**Why it happens:** PKCS7 ALWAYS adds at least 1 byte. When input is a multiple of 16, a full 16-byte padding block is added. Formula: `((input_len / 16) + 1) * 16`.
**How to avoid:** Always allocate `encrypted_size = ((compressed_size / 16) + 1) * 16` bytes for the encryption buffer. Copy compressed data to the start, then call `encrypt_padded_mut` with `compressed_size` as the plaintext length.
**Warning signs:** `PadError` or `unwrap()` panic during encryption.
### Pitfall 2: Gzip Non-Determinism in Tests
**What goes wrong:** Gzip output varies between runs (different `compressed_size`), making golden tests impossible.
**Why it happens:** Gzip headers contain a timestamp (`mtime`) and OS byte that vary.
**How to avoid:** Use `GzBuilder::new().mtime(0).write(Vec::new(), Compression::default())` to zero out the timestamp. The OS byte defaults to the build platform but is consistent on the same machine.
**Warning signs:** `compressed_size` changes between test runs for identical input.
### Pitfall 3: Incorrect HMAC Scope
**What goes wrong:** HMAC computed over wrong data (just ciphertext, or including TOC metadata).
**Why it happens:** Ambiguity about what "encrypt-then-MAC" covers.
**How to avoid:** FORMAT.md is explicit: `HMAC_input = IV (16 bytes) || ciphertext (encrypted_size bytes)`. Nothing else. The IV from the TOC entry, concatenated with the ciphertext from the data block.
**Warning signs:** HMAC verification failures in other decoders (Kotlin, shell).
### Pitfall 4: TOC Offset Calculation Errors
**What goes wrong:** Data blocks written at wrong offsets; decoders read garbage.
**Why it happens:** Variable-length filename fields make TOC entry sizes differ. Off-by-one in offset arithmetic.
**How to avoid:** Use the formula from FORMAT.md: `entry_size = 101 + name_length`. Total TOC size = sum of all entry sizes. First data block offset = `toc_offset + toc_size`. Each subsequent data block offset = previous offset + previous `encrypted_size`.
**Warning signs:** `inspect` command shows corrupted filenames or impossible sizes.
### Pitfall 5: Endianness Errors
**What goes wrong:** Multi-byte fields written in big-endian or native-endian instead of little-endian.
**Why it happens:** Forgetting to convert, or using wrong conversion function.
**How to avoid:** Always use `value.to_le_bytes()` when writing and `u32::from_le_bytes([b0, b1, b2, b3])` when reading. Never use `to_ne_bytes()` or `to_be_bytes()`.
**Warning signs:** Values look "swapped" when inspecting hex dump. Shell decoder reads wrong numbers.
### Pitfall 6: UTF-8 Filename Length vs. Character Count
**What goes wrong:** `name_length` field stores character count instead of byte count.
**Why it happens:** Confusion between `str.len()` (byte count, correct) and `str.chars().count()` (character count, wrong).
**How to avoid:** FORMAT.md specifies `name_length` as "Filename length in bytes (UTF-8 encoded byte count)". In Rust, `String::len()` returns byte count, which is correct.
**Warning signs:** Non-ASCII filenames (Cyrillic) cause parsing errors in decoders.
### Pitfall 7: Forgetting Flags Byte
**What goes wrong:** Archive header has wrong flags, decoders misinterpret format features.
**Why it happens:** Phase 2 uses only bit 0 (compression). Bits 1-7 must be zero.
**How to avoid:** Set `flags = 0x01` when any file uses compression (global flag), `flags = 0x00` when no files use compression. Bits 1-3 are for Phase 6 obfuscation features. Bits 4-7 MUST be zero.
**Warning signs:** Decoders reject archive due to unknown flags.
## Code Examples
Verified patterns from official sources and research validation:
### Binary Format Serialization (Header)
```rust
// Source: FORMAT.md Section 4 + Rust std library
fn write_header(
writer: &mut impl std::io::Write,
file_count: u16,
toc_offset: u32,
toc_size: u32,
flags: u8,
) -> std::io::Result<()> {
// Magic bytes
writer.write_all(&[0x00, 0xEA, 0x72, 0x63])?;
// Version
writer.write_all(&[0x01])?;
// Flags
writer.write_all(&[flags])?;
// File count (LE)
writer.write_all(&file_count.to_le_bytes())?;
// TOC offset (LE)
writer.write_all(&toc_offset.to_le_bytes())?;
// TOC size (LE)
writer.write_all(&toc_size.to_le_bytes())?;
// TOC IV (zero-filled, TOC not encrypted in Phase 2)
writer.write_all(&[0u8; 16])?;
// Reserved
writer.write_all(&[0u8; 8])?;
Ok(())
}
```
### TOC Entry Serialization
```rust
// Source: FORMAT.md Section 5
fn write_toc_entry(
writer: &mut impl std::io::Write,
file: &ProcessedFile,
data_offset: u32,
) -> std::io::Result<()> {
let name_bytes = file.name.as_bytes();
writer.write_all(&(name_bytes.len() as u16).to_le_bytes())?;
writer.write_all(name_bytes)?;
writer.write_all(&file.original_size.to_le_bytes())?;
writer.write_all(&file.compressed_size.to_le_bytes())?;
writer.write_all(&file.encrypted_size.to_le_bytes())?;
writer.write_all(&data_offset.to_le_bytes())?;
writer.write_all(&file.iv)?;
writer.write_all(&file.hmac)?;
writer.write_all(&file.sha256)?;
writer.write_all(&[file.compression_flag])?;
writer.write_all(&0u16.to_le_bytes())?; // padding_after = 0
Ok(())
}
```
### Inspect Command (Read Header + TOC Only)
```rust
// Source: FORMAT.md Section 10, steps 1-4
use std::io::{Read, Seek, SeekFrom};
fn read_header(reader: &mut impl Read) -> anyhow::Result<Header> {
let mut buf = [0u8; 40];
reader.read_exact(&mut buf)?;
// Verify magic
anyhow::ensure!(
buf[0..4] == [0x00, 0xEA, 0x72, 0x63],
"Invalid magic bytes"
);
let version = buf[4];
anyhow::ensure!(version == 1, "Unsupported version: {}", version);
let flags = buf[5];
anyhow::ensure!(flags & 0xF0 == 0, "Unknown flags set: 0x{:02X}", flags);
let file_count = u16::from_le_bytes([buf[6], buf[7]]);
let toc_offset = u32::from_le_bytes([buf[8], buf[9], buf[10], buf[11]]);
let toc_size = u32::from_le_bytes([buf[12], buf[13], buf[14], buf[15]]);
Ok(Header { version, flags, file_count, toc_offset, toc_size })
}
```
### Compression Decision Heuristic
```rust
// Source: FORMAT.md Section 8 recommendation
fn should_compress(filename: &str, no_compress_list: &[String]) -> bool {
// Explicit exclusion from CLI
if no_compress_list.iter().any(|nc| filename.ends_with(nc) || filename == nc) {
return false;
}
// Auto-detect already-compressed formats
let ext = filename.rsplit('.').next().unwrap_or("").to_lowercase();
!matches!(
ext.as_str(),
"apk" | "zip" | "gz" | "bz2" | "xz" | "zst"
| "png" | "jpg" | "jpeg" | "gif" | "webp"
| "mp4" | "mp3" | "aac" | "ogg" | "flac"
| "7z" | "rar" | "jar"
)
}
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| `block-modes 0.8` crate | `cbc 0.1` crate (separate crate per mode) | 2022 | `block-modes` is deprecated. Use `cbc` directly. |
| `rand::thread_rng()` | `rand::rng()` | rand 0.9 (2025) | Function renamed. Same underlying ChaCha CSPRNG. |
| `GenericArray` for keys/IVs | `.into()` conversion from `[u8; N]` | aes/cbc 0.8/0.1 | Can pass `&key.into()` directly from fixed arrays. |
| `byteorder` crate | Rust std `to_le_bytes()`/`from_le_bytes()` | Rust 1.32 (2018) | No external crate needed for endian conversion. |
**Deprecated/outdated:**
- `block-modes` crate: Replaced by individual mode crates (`cbc`, `ecb`, `cfb`, `ofb`). Do NOT use `block-modes`.
- `rand::thread_rng()`: Renamed to `rand::rng()` in 0.9. The old name is removed.
- `crypto-mac` crate: Merged into `digest` 0.10. Use `hmac 0.12` which uses `digest 0.10` internally.
## Open Questions
1. **Hardcoded key value**
- What we know: The key is 32 bytes, hardcoded, shared across all decoders.
- What's unclear: The specific key bytes are not defined in FORMAT.md (only the worked example uses `00 01 02 ... 1F`).
- Recommendation: Define a non-trivial key constant in `src/key.rs`. The planner should decide the actual key bytes or generate them randomly once. The worked example key is fine for testing but should be replaced for production.
2. **Error handling strategy for `unpack`**
- What we know: FORMAT.md says "MUST reject" on HMAC failure, "MUST fail" on bad version.
- What's unclear: Should `unpack` abort on first file error, or continue extracting other files?
- Recommendation: Abort on header/TOC errors. For per-file errors (HMAC mismatch, SHA-256 mismatch), report the error but continue extracting remaining files (with a non-zero exit code at the end).
3. **Maximum file size constraint (u32)**
- What we know: `original_size`, `compressed_size`, `encrypted_size` are all u32 (max ~4 GB).
- What's unclear: Should the archiver check and reject files > 4 GB?
- Recommendation: Yes, validate file sizes during `pack` and produce a clear error if any file exceeds `u32::MAX`. This is acceptable given the Out of Scope note ("files fit in memory").
## Sources
### Primary (HIGH confidence)
- `docs/FORMAT.md` v1.0 -- The normative specification for the binary format. All byte offsets, field sizes, and pipeline steps are from this document.
- `docs.rs/aes/0.8.4` -- AES crate API documentation
- `docs.rs/cbc/0.1.2` -- CBC mode crate API documentation and usage examples
- `docs.rs/hmac/0.12.1` -- HMAC crate API documentation and usage examples
- `docs.rs/sha2/0.10.9` -- SHA-2 crate API documentation
- `docs.rs/flate2/1.1.9` -- flate2 crate API documentation (GzEncoder, GzDecoder, GzBuilder)
- `docs.rs/clap/4.5.60` -- Clap CLI crate documentation
- `docs.rs/rand/0.9.2` -- Rand crate documentation
- **Research validation:** Full pipeline (compress -> encrypt -> HMAC -> verify -> decrypt -> decompress -> verify) was compiled and executed successfully as a Rust program during this research.
### Secondary (MEDIUM confidence)
- `crates.io` version listings -- Latest stable versions verified via `cargo search` and crates.io API
- `rust-random.github.io/book` -- Rand book confirming ThreadRng is ChaCha-based CSPRNG
### Tertiary (LOW confidence)
- None. All findings are verified against official documentation and compilation tests.
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH -- All crates verified via `cargo check`, full pipeline compiled and executed
- Architecture: HIGH -- Follows standard Rust CLI patterns; FORMAT.md provides exact byte-level specification
- Pitfalls: HIGH -- Common issues identified from official docs, GitHub issues, and practical validation
**Research date:** 2026-02-24
**Valid until:** 2026-04-24 (stable crates, slow-moving ecosystem)