From 40dcfd4ac016b3a7a129d7c7e7605474eb6fa9c6 Mon Sep 17 00:00:00 2001 From: NikitolProject Date: Tue, 24 Feb 2026 22:51:05 +0300 Subject: [PATCH] docs: add project research --- .planning/research/ARCHITECTURE.md | 378 +++++++++++++++++++++++++++++ .planning/research/FEATURES.md | 124 ++++++++++ .planning/research/PITFALLS.md | 89 +++++++ .planning/research/STACK.md | 174 +++++++++++++ .planning/research/SUMMARY.md | 76 ++++++ 5 files changed, 841 insertions(+) create mode 100644 .planning/research/ARCHITECTURE.md create mode 100644 .planning/research/FEATURES.md create mode 100644 .planning/research/PITFALLS.md create mode 100644 .planning/research/STACK.md create mode 100644 .planning/research/SUMMARY.md diff --git a/.planning/research/ARCHITECTURE.md b/.planning/research/ARCHITECTURE.md new file mode 100644 index 0000000..71310c8 --- /dev/null +++ b/.planning/research/ARCHITECTURE.md @@ -0,0 +1,378 @@ +# Architecture Patterns + +**Domain:** Custom encrypted archiver with obfuscated binary format +**Researched:** 2026-02-24 + +## Recommended Architecture + +The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail. + +### High-Level Overview + +``` + +-----------------+ + | FORMAT SPEC | + | (shared doc) | + +--------+--------+ + | + +------------------+------------------+ + | | | + +---------v---------+ +----v------+ +--------v--------+ + | RUST ARCHIVER | | KOTLIN | | BUSYBOX SHELL | + | (CLI, Linux/Mac) | | DECODER | | DECODER | + | | | (Android)| | (fallback) | + +-------------------+ +-----------+ +-----------------+ +``` + +### Component Boundaries + +| Component | Responsibility | Communicates With | Language | +|-----------|---------------|-------------------|----------| +| **Format Spec** | Defines binary layout, magic bytes strategy, block structure, obfuscation scheme | All three implementations reference this | Documentation | +| **Rust Archiver CLI** | Reads input files, compresses, encrypts, obfuscates, writes archive | Filesystem (input files, output archive) | Rust | +| **Kotlin Decoder** | Reads archive, de-obfuscates, decrypts, decompresses, writes output files | Android filesystem, embedded key | Kotlin | +| **Shell Decoder** | Same as Kotlin but via busybox commands | busybox (dd, xxd, openssl), filesystem | Shell (sh) | +| **Test Harness** | Round-trip validation: archive -> decode -> compare | All three components | Rust + shell scripts | + +### Internal Component Structure (Rust Archiver) + +The archiver itself has a clear pipeline architecture with five layers: + +``` +Input Files + | + v ++-------------------+ +| FILE COLLECTOR | Walks paths, reads files, captures metadata ++-------------------+ + | + v ++-------------------+ +| COMPRESSOR | gzip (DEFLATE) per-file compression ++-------------------+ + | + v ++-------------------+ +| ENCRYPTOR | AES-256-CBC + HMAC-SHA256 per-file ++-------------------+ + | + v ++-------------------+ +| FORMAT BUILDER | Assembles binary structure: header, TOC, data blocks ++-------------------+ + | + v ++-------------------+ +| OBFUSCATOR | Shuffles blocks, inserts decoys, transforms magic bytes ++-------------------+ + | + v +Output Archive File +``` + +## Data Flow: Archival (Packing) + +### Step 1: File Collection + +``` +for each input_path: + read file bytes + record: filename, original_size, file_type_hint + -> Vec +``` + +### Step 2: Compression (per-file) + +Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory. + +``` +for each FileEntry: + compressed_data = gzip_compress(data) + record: compressed_size + -> Vec +``` + +**Why compress before encrypt:** Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice. + +### Step 3: Encryption (per-file) + +Each compressed file is encrypted independently with a unique IV. + +``` +for each CompressedEntry: + iv = random_16_bytes() // unique per file, AES block size + ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data)) + hmac = hmac_sha256(key, iv || ciphertext) // encrypt-then-MAC + -> Vec +``` + +**Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.** + +Use **AES-256-CBC + HMAC-SHA256** because: +- busybox `openssl` supports `aes-256-cbc` natively (GCM is NOT available in busybox openssl) +- Android/Kotlin `javax.crypto` supports AES-256-CBC natively +- Rust RustCrypto crates (`aes`, `cbc`, `hmac`) support it fully +- Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions) +- ChaCha20 would require custom implementation for shell fallback +- GCM would require custom implementation for shell fallback + +**Encrypt-then-MAC pattern:** HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks. + +### Step 4: Format Assembly + +The format builder creates the binary layout: + +``` ++----------------------------------------------------------+ +| OBFUSCATED HEADER (variable, see Step 5) | ++----------------------------------------------------------+ +| FILE TABLE (encrypted) | +| - number_of_files: u32 | +| - for each file: | +| filename_len: u16 | +| filename: [u8; filename_len] | +| original_size: u64 | +| compressed_size: u64 | +| encrypted_size: u64 | +| data_offset: u64 | +| iv: [u8; 16] | +| hmac: [u8; 32] | ++----------------------------------------------------------+ +| DATA BLOCKS | +| [encrypted_file_1_data] | +| [encrypted_file_2_data] | +| ... | ++----------------------------------------------------------+ +``` + +**The file table itself is encrypted** with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes. + +### Step 5: Obfuscation + +The obfuscation layer transforms the assembled binary to resist pattern analysis: + +1. **No standard magic bytes** -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes +2. **Decoy padding** -- insert random-length garbage blocks between real data blocks +3. **Header scatter** -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are +4. **Byte-level transforms** -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random) + +``` +FINAL BINARY LAYOUT: + +[fake_magic: 8 bytes] <- XOR'd known pattern +[decoy_block: random 32-512 bytes] +[index_locator: 4 bytes at offset derived from fake_magic] +[data_block_1] +[file_table_chunk_1] +[decoy_block] +[data_block_2] +[file_table_chunk_2] +[data_block_3] +... +[index_block] <- lists offsets of file_table_chunks and data_blocks +[trailing_garbage: random 0-256 bytes] +``` + +**Important:** The obfuscation MUST be simple enough to implement in a shell script with `dd` and `xxd`. Anything requiring bit manipulation beyond XOR is too complex. Keep it to: +- Fixed XOR key for header regions (hardcoded in all three decoders) +- Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file") +- Sequential reads with `dd bs=1 skip=N count=M` + +## Data Flow: Extraction (Unpacking) + +### Kotlin Path (Primary) + +```kotlin +// 1. Read archive bytes +val archive = File(path).readBytes() + +// 2. De-obfuscate: recover index block location +val indexOffset = deobfuscateHeader(archive) + +// 3. Read index block -> get file table chunk offsets +val index = parseIndex(archive, indexOffset) + +// 4. Reassemble and decrypt file table +val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV) + +// 5. For each file entry in table: +for (entry in fileTable.entries) { + val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize) + verifyHmac(ciphertext, entry.iv, entry.hmac, KEY) + val compressed = decryptAesCbc(ciphertext, KEY, entry.iv) + val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes() + writeFile(outputDir, entry.filename, original) +} +``` + +**Kotlin compression:** Using gzip (`java.util.zip.GZIPInputStream`) which is built into Android SDK. No native libraries needed. + +### Shell Path (Fallback) + +```sh +#!/bin/sh +# Hardcoded values +KEY_HEX="abcdef0123456789..." # 64 hex chars = 32 bytes +XOR_KEY_HEX="deadbeef" + +ARCHIVE="$1" +OUTDIR="$2" + +# 1. De-obfuscate header: read first 8 bytes, XOR to get real magic +MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p) +# ... validate XOR pattern ... + +# 2. Find index block offset (bytes 8-11, little-endian) +INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p) +# Convert LE hex to decimal +INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \ + sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')") + +# 3. Read index block, parse file table chunk offsets +# ... dd + xxd to extract offsets ... + +# 4. For each file: extract ciphertext, decrypt, decompress +dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \ + openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \ + gunzip > "$OUTDIR/$FILENAME" + +# 5. Verify HMAC +COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \ + openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}') +``` + +**Shell limitations that constrain the entire format design:** +- `dd` reads are byte-precise but slow for large files with bs=1 +- `xxd` handles hex conversion but no binary arithmetic +- `openssl` in busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO) +- HMAC verification via `openssl dgst -sha256 -hmac` (available in most busybox builds) +- Integer arithmetic limited to shell `$(( ))` -- handles 64-bit on most platforms +- **Endianness:** all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing) + +## Patterns to Follow + +### Pattern 1: Pipeline Architecture (Archiver) + +**What:** Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others. + +**When:** Always. This is the core design pattern. + +**Why:** Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job). + +```rust +// Each stage is a function or module with typed input/output +mod collect; // Vec -> Vec +mod compress; // Vec -> Vec +mod encrypt; // Vec -> Vec +mod format; // Vec -> RawArchive (unobfuscated bytes) +mod obfuscate; // RawArchive -> Vec (final obfuscated bytes) + +// Main pipeline +pub fn create_archive(paths: Vec, key: &[u8; 32]) -> Result> { + let files = collect::gather(paths)?; + let compressed = compress::compress_all(files)?; + let encrypted = encrypt::encrypt_all(compressed, key)?; + let raw = format::build(encrypted)?; + let obfuscated = obfuscate::apply(raw)?; + Ok(obfuscated) +} +``` + +### Pattern 2: Format Version Field + +**What:** Include a format version byte in the archive header (post-deobfuscation). Start at version 1. + +**When:** Always. Format will evolve. + +**Why:** Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output. + +### Pattern 3: Per-File Independence + +**What:** Each file in the archive is compressed and encrypted independently with its own IV and HMAC. + +**When:** Always. + +**Why:** +- Shell decoder can extract a single file without processing the entire archive +- A corruption in one file does not cascade to others +- Memory usage is bounded by the largest single file, not the archive total + +### Pattern 4: Shared Format Specification as Source of Truth + +**What:** A single document defines every byte of the format. All three implementations are derived from this spec. + +**When:** Before writing any code. + +**Why:** With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption. + +### Pattern 5: Encrypt-then-MAC + +**What:** Apply HMAC after encryption, computed over (IV || ciphertext). + +**When:** Always. Non-negotiable for CBC mode. + +**Why:** CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms. + +## Anti-Patterns to Avoid + +| Anti-Pattern | Why Bad | Instead | +|-------------|---------|---------| +| Streaming/Chunked Encryption | Shell can't seek into stream cipher | Encrypt each file independently | +| Complex Obfuscation | Can't implement in busybox shell | XOR + fixed offsets + decoy padding | +| Obfuscation as Security | Trivially reversible from source code | Encryption = security, obfuscation = anti-detection | +| GCM Mode | busybox openssl doesn't support it | AES-256-CBC + HMAC-SHA256 | +| zstd/lz4 Compression | No busybox/Android SDK support | gzip (DEFLATE) | +| MAC-then-Encrypt | Padding oracle attacks possible | Encrypt-then-MAC | + +## Suggested Build Order + +``` +Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF + | + v +Phase 2: RUST ARCHIVER (core pipeline) + | + v +Phase 3: RUST ROUND-TRIP TEST DECODER + | + v +Phase 4: KOTLIN DECODER + | + v +Phase 5: SHELL DECODER + | + v +Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING +``` + +**Why this order:** + +1. **Format spec first** -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code. +2. **Rust archiver before decoders** -- need archives to test decoders against. +3. **Rust test decoder before Kotlin/shell** -- catches format bugs in same language, avoids cross-language debugging. +4. **Kotlin before shell** -- primary path first; if Kotlin works, format is validated. +5. **Obfuscation hardening last** -- core pipeline must work first. Obfuscation is a layer on top. + +## Key Architectural Decisions Summary + +| Decision | Choice | Rationale | +|----------|--------|-----------| +| Compression | gzip (DEFLATE) via `flate2` | Native on all three platforms | +| Encryption | AES-256-CBC | busybox openssl supports CBC; GCM not available | +| Authentication | HMAC-SHA256 (encrypt-then-MAC) | Authenticated encryption for CBC; verifiable everywhere | +| Byte order | Little-endian | ARM native order; simpler shell parsing | +| File processing | Per-file independent | Shell needs random access; bounded memory; fault isolation | +| Obfuscation | XOR headers + scattered blocks + decoy padding | Simple enough for shell; defeats binwalk/file | +| Format contract | Standalone spec document written first | Three implementations need byte-exact agreement | +| Key storage | Hardcoded 32-byte key in all decoders | Per requirements; sufficient for casual user threat model | +| PKCS7 padding | Standard PKCS7 for CBC mode | openssl uses PKCS7 by default; Kotlin supports natively | + +## Sources + +- Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg) +- busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported +- Android SDK javax.crypto and java.util.zip documentation +- Rust RustCrypto ecosystem: `flate2`, `aes`, `cbc`, `hmac`, `sha2` +- Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard + +**Verification needed:** Run `busybox openssl enc -ciphers` on target device to confirm aes-256-cbc availability. diff --git a/.planning/research/FEATURES.md b/.planning/research/FEATURES.md new file mode 100644 index 0000000..0264a77 --- /dev/null +++ b/.planning/research/FEATURES.md @@ -0,0 +1,124 @@ +# Feature Landscape + +**Domain:** Custom encrypted archiver with proprietary binary format +**Researched:** 2026-02-24 +**Confidence:** MEDIUM (based on domain knowledge of archive formats, encryption patterns, and Android constraints) + +## Table Stakes + +Features that are mandatory for this product to function correctly. Missing any of these means the archive is either non-functional, insecure against casual inspection, or unreliable. + +| Feature | Why Expected | Complexity | Notes | +|---------|--------------|------------|-------| +| Multi-file packing/unpacking | Core purpose: bundle texts + APKs into one archive | Medium | Need file table/index structure; handle varied sizes from few KB to tens of MB | +| AES-256-CBC encryption | Without real encryption, any hex editor reveals content | Medium | busybox openssl supports `aes-256-cbc`; Android javax.crypto supports AES natively | +| HMAC-SHA256 integrity (encrypt-then-MAC) | Detect corruption and tampering | Medium | busybox `openssl dgst -sha256 -hmac`; Kotlin `Mac("HmacSHA256")` | +| Compression before encryption | Reduce archive size; compression after encryption is ineffective (encrypted data has max entropy) | Low | Use deflate/gzip; must compress BEFORE encrypt | +| Hardcoded key embedding | Project requirement: no user-entered passwords, key baked into dearchiver code | Low | Key in Kotlin code and shell script; rotate key means new build | +| Custom magic bytes | Standard magic bytes (PK, 7z, etc.) let file/binwalk identify format; custom bytes prevent this | Low | Use random-looking bytes, not human-readable strings; avoid patterns that match known formats | +| Round-trip fidelity | Unpacked files must be byte-identical to originals | Low | Verified via checksums; critical for APKs (signature breaks if single byte changes) | +| CLI interface for packing | Archiver runs on Linux/macOS developer machine | Low | Standard CLI: `encrypted_archive pack -o output.bin file1.txt file2.apk` | +| Kotlin unpacker (Android 13) | Primary dearchiver path on target device | High | Pure JVM, no native libs; must handle javax.crypto for AES | +| Busybox shell unpacker (fallback) | Backup when Kotlin app unavailable | High | Only dd, xxd, openssl, sh; format must be simple enough for positional extraction | +| File metadata preservation (name, size) | Unpacker must know which bytes belong to which file and what to name them | Low | Stored in file table; at minimum: filename, original size, compressed size, offset | + +## Differentiators + +Features that exceed baseline expectations and provide meaningful protection or usability improvements. Not all are needed for MVP, but they strengthen the product. + +| Feature | Value Proposition | Complexity | Notes | +|---------|-------------------|------------|-------| +| Format obfuscation: fake headers | Misleads automated analysis tools (binwalk, foremost) that scan for patterns | Medium | Insert decoy headers resembling JPEG, PNG, or random formats at predictable offsets; casual user sees "corrupted image" not "encrypted archive" | +| Format obfuscation: shuffled blocks | File data blocks stored out-of-order with a scramble map | Medium | Prevents sequential extraction even if encryption is somehow bypassed; adds complexity to busybox unpacker | +| Format obfuscation: randomized padding | Variable-length random padding between blocks | Low | Makes block boundaries unpredictable to static analysis; minimal implementation cost | +| Version field in header | Forward-compatible format evolution | Low | Single byte version; unpackers check version and reject incompatible archives gracefully | +| Per-file encryption with derived keys | Each file encrypted with unique key derived from master key + file index/salt | Medium | Limits damage if one file's plaintext is known (known-plaintext attack on specific block) | +| Progress reporting during pack/unpack | UX for large archives (tens of MB of APKs) | Low | CLI progress bar; Kotlin callback for UI integration | +| Dry-run / validation mode | Check archive integrity without full extraction | Low | Verify checksums and structure without writing files to disk; useful for debugging on device | +| Configurable compression level | Trade speed vs size for different content types (APKs are already compressed, texts compress well) | Low | APKs benefit little from compression; allow per-file or global setting | +| Salt / IV per archive | Each archive uses random IV/nonce even with same key; prevents identical plaintext producing identical ciphertext | Low | Standard crypto practice; 16-byte IV for AES-CBC; must be stored in archive header (unencrypted) | +| Error messages that do not leak format info | Unpacker errors say "invalid archive" not "checksum mismatch at block 3 offset 0x4A2" | Low | Defense in depth: even error messages should not help reverse engineering | + +## Anti-Features + +Features to explicitly NOT build. Each would add complexity without matching the project's threat model or constraints. + +| Anti-Feature | Why Avoid | What to Do Instead | +|--------------|-----------|-------------------| +| Password-based key derivation (PBKDF2/Argon2) | Project explicitly uses hardcoded key; password entry UX is unwanted on car head unit | Embed key directly in Kotlin/shell code; accept that key extraction from APK is possible for determined attackers | +| GUI for archiver | Scope creep; CLI is sufficient for developer workflow (pack on laptop, deploy to device) | Well-designed CLI with clear flags and help text | +| Windows archiver support | Out of scope per project constraints; Rust cross-compiles easily IF needed later | Linux/macOS only; document that WSL works if Windows user needs it | +| Streaming/pipe support | Files are small enough (KB to tens of MB) to fit in memory; streaming adds format complexity that breaks busybox compatibility | Load entire file into memory for pack/unpack; document max file size assumption | +| Nested/recursive archives | No use case: archive contains flat list of texts and APKs | Single-level file list only | +| File permissions / ownership metadata | Android target manages its own permissions; Unix permissions from build machine are irrelevant | Store only filename and size; ignore mode/owner/timestamps | +| Compression algorithm selection at runtime | Over-engineering; one good default is sufficient | Use deflate/gzip -- available everywhere: Rust, Kotlin, busybox; hardcode the choice | +| Public-key / asymmetric encryption | Massive complexity increase for no benefit given hardcoded key model | Symmetric encryption only (AES-256) | +| Self-extracting archives | Target is Android, not desktop; shell script IS the extractor | Separate archive file + separate unpacker (Kotlin app or shell script) | +| DRM or license enforcement | Not the purpose; this is content bundling protection, not DRM | Simple encryption is sufficient for the threat model | +| File deduplication within archive | Archive contains distinct files (texts and different APKs); dedup adds complexity with near-zero benefit | Pack files as-is | +| Encryption of filenames in file table | Nice in theory but busybox shell unpacker needs to know filenames to extract; encrypting the file table massively complicates the shell path | Store filenames inside the encrypted payload (entire payload is encrypted, so filenames are protected by archive-level encryption) | + +## Feature Dependencies + +``` +Compression --> Encryption --> Format Assembly (compression MUST happen before encryption) + | + v + Integrity Checks (HMAC over encrypted blocks) + +Custom Magic Bytes --> Format Header Design +Version Field --> Format Header Design +Salt/IV Storage --> Format Header Design + +File Metadata (name, size) --> File Table Structure --> Format Assembly + +Format Assembly --> CLI Packer (Rust) + +Format Specification --> Kotlin Unpacker +Format Specification --> Busybox Shell Unpacker + +Per-file Key Derivation --> Requires Format Specification to include file index/salt +Fake Headers --> Requires Format Assembly to insert decoys at correct positions +Shuffled Blocks --> Requires File Table to store block ordering map +``` + +**Critical dependency chain:** +``` +Format Spec (on paper) + --> Rust Packer (implements spec) + --> Kotlin Unpacker (reads spec) + --> Shell Unpacker (reads spec) + --> Round-trip tests (validates all three agree) +``` + +The format specification must be finalized BEFORE any implementation begins, because three independent implementations (Rust, Kotlin, shell) must produce identical results. + +## MVP Recommendation + +**Prioritize (Phase 1 -- must ship):** + +1. **Format specification document** -- Define header, file table, block layout, magic bytes, version field, IV/salt placement +2. **Compression + Encryption pipeline** -- Compress with gzip, encrypt with AES-256-CBC, authenticate with HMAC-SHA256 +3. **Rust CLI packer** -- Pack multiple files into the custom format +4. **Integrity verification via HMAC-SHA256** -- Encrypt-then-MAC for both integrity and authenticity +5. **Kotlin unpacker** -- Primary extraction path on Android 13. Pure JVM using javax.crypto +6. **Busybox shell unpacker** -- Fallback extraction. This constrains the format to be simple +7. **Round-trip tests** -- Verify Rust-pack, Kotlin-unpack, shell-unpack all produce identical output + +**Defer (Phase 2 -- after MVP works):** + +- **Fake headers / decoy data** -- Obfuscation layer; adds no functional value, purely anti-analysis +- **Shuffled blocks** -- Significant complexity, especially for busybox +- **Progress reporting** -- Nice UX but not blocking +- **Configurable compression** -- Start with one setting that works; optimize later +- **Dry-run / validation mode** -- Useful for debugging but not for initial delivery +- **Per-file derived keys** -- Defense-in-depth for later + +**Key MVP constraint:** The busybox shell unpacker is the most constraining component. Every format decision must be validated against "can busybox dd/xxd/openssl do this?" If the answer is no, the feature must be deferred or redesigned. + +## Sources + +- Domain knowledge of archive format design (ZIP, tar, 7z format specifications) +- Domain knowledge of cryptographic best practices (NIST, libsodium documentation patterns) +- Domain knowledge of Android crypto APIs (javax.crypto, OpenSSL CLI) +- Domain knowledge of busybox utility capabilities diff --git a/.planning/research/PITFALLS.md b/.planning/research/PITFALLS.md new file mode 100644 index 0000000..adf6f37 --- /dev/null +++ b/.planning/research/PITFALLS.md @@ -0,0 +1,89 @@ +# Common Pitfalls + +**Domain:** Custom encrypted archiver with busybox/Kotlin decompression +**Researched:** 2026-02-24 +**Confidence:** HIGH + +## Critical Pitfalls + +### Pitfall 1: Busybox OpenSSL Cipher Availability +**What:** Target busybox may not have the chosen cipher. GCM, ChaCha20 are likely unavailable. +**Prevention:** Test `busybox openssl enc -ciphers` on actual device FIRST. Use AES-256-CBC (universally available). +**Phase:** Phase 1 (format design) — blocking decision. + +### Pitfall 2: Endianness Mismatch Across Platforms +**What:** Inconsistent byte order between Rust (x86_64), Kotlin (ARM64), shell (xxd parsing). +**Prevention:** Use little-endian everywhere. Rust: `to_le_bytes()`. Kotlin: `ByteBuffer.order(LITTLE_ENDIAN)`. Document in format spec. +**Phase:** Phase 1 (format design). + +### Pitfall 3: PKCS7 Padding Incompatibility +**What:** Different padding handling between Rust crates, javax.crypto, and busybox openssl causes last-block corruption. +**Prevention:** Store exact compressed-data length in header. Use `-nopad` in openssl and truncate manually, OR let openssl handle padding with `-K`/`-iv` flags. Test with non-16-byte-aligned data. +**Phase:** Phase 2 (encryption implementation). + +### Pitfall 4: OpenSSL Key Derivation (EVP_BytesToKey vs Raw Key) +**What:** busybox `openssl enc` derives keys via EVP_BytesToKey by default. Rust/Kotlin use raw keys. Decryption produces garbage. +**Prevention:** Use `-K HEX -iv HEX -nosalt` flags for raw key mode. Test on target device FIRST. This is the #1 failure mode. +**Phase:** Phase 1 (format design) — blocking. + +### Pitfall 5: Shell Arithmetic Overflow with Large Files +**What:** busybox `sh` arithmetic may be 32-bit signed, overflowing at 2GB offsets. +**Prevention:** Use `dd bs=4096` with block-count math. Limit archive size in spec. Test with >50MB archives. +**Phase:** Phase 1 (format) + Phase 5 (shell decoder). + +### Pitfall 6: IV/Nonce Reuse with Hardcoded Key +**What:** Same key + same IV = identical ciphertext for identical files. Information leak. +**Prevention:** Random 16-byte IV per file, stored in cleartext alongside ciphertext. Never deterministic IV. +**Phase:** Phase 2 (encryption). + +### Pitfall 7: Busybox xxd Behavioral Differences +**What:** busybox `xxd` may not support all GNU xxd flags. Key/IV hex conversion fails silently. +**Prevention:** Use only `xxd -p` and `xxd -r -p`. Test on actual device. Fallback: `od -A n -t x1`. +**Phase:** Phase 5 (shell decoder). + +## Moderate Pitfalls + +### Pitfall 8: APK Files Don't Compress +**What:** APKs are already ZIP-compressed. Gzip makes them larger. +**Prevention:** Per-file compression flag. Skip compression if output >= input. +**Phase:** Phase 2 (compression). + +### Pitfall 9: Missing Integrity Verification +**What:** No checksums = silent data corruption from bit flips during transfer. +**Prevention:** SHA-256 checksum per file. Verify AFTER decompression in all three decoders. +**Phase:** Phase 1 (format) + all decoder phases. + +### Pitfall 10: Over-Engineering Obfuscation +**What:** Complex obfuscation (block shuffling, fake headers) triples implementation complexity with minimal security gain against casual users. +**Prevention:** AES encryption IS obfuscation. Custom magic bytes + encrypted payload is sufficient. Keep it simple. +**Phase:** Phase 1 (format design). + +### Pitfall 11: Rust/Kotlin Crypto Output Incompatibility +**What:** "Same algorithm" produces different bytes due to framing differences (IV placement, padding). +**Prevention:** Define wire format explicitly: `[16-byte IV][PKCS7-padded ciphertext]`. Golden test vectors mandatory. +**Phase:** Phase 2 (encryption). + +### Pitfall 12: Android Filesystem Permissions +**What:** Extracted files land in wrong location or have wrong permissions on Android. +**Prevention:** Store only relative paths. Don't store Unix permissions. Let each decoder handle permissions. +**Phase:** Phase 4 (Kotlin decoder). + +## Minor Pitfalls + +- **#13:** Flush/sync writes in shell script — use `sync` after critical files +- **#14:** Missing version field — add 1-2 byte version after magic bytes (non-negotiable) +- **#15:** Testing only ASCII filenames — include Cyrillic test files +- **#16:** Hardcoded key visible in `strings` — store as byte array, split hex fragments in shell + +## Key Insight + +**The busybox shell constraint drives everything.** Every format decision must be validated against "can busybox sh + dd + xxd + openssl actually implement this?" Build shell decompressor prototype EARLY, not last. + +## Phase Mapping + +| Phase | Critical Pitfalls | Action | +|-------|------------------|--------| +| Format design | #1, #2, #4, #5, #10, #14 | Test busybox on device. Write format spec. Keep it simple. | +| Encryption | #3, #6, #8, #11 | Golden test vectors. Random IV. Per-file compression flag. | +| Kotlin decoder | #11, #12 | Explicit wire format. Test on device. | +| Shell decoder | #1, #4, #5, #7, #15 | Busybox compatibility suite. Large file tests. | diff --git a/.planning/research/STACK.md b/.planning/research/STACK.md new file mode 100644 index 0000000..a5be580 --- /dev/null +++ b/.planning/research/STACK.md @@ -0,0 +1,174 @@ +# Technology Stack + +**Project:** encrypted_archive (Custom Encrypted Archiver) +**Researched:** 2026-02-24 +**Overall confidence:** MEDIUM (versions from training data — verify before use) + +--- + +## Critical Constraint: Three-Platform Compatibility + +Every technology choice is constrained by the weakest link: **busybox shell**. The Rust archiver can use any library, but the format it produces must be decodable by: +1. Kotlin on Android 13 (javax.crypto / java.security) +2. busybox shell (openssl, dd, xxd) + +This constraint eliminates many otherwise-superior choices. + +--- + +## Recommended Stack + +### Encryption + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| `aes` crate | ^0.8 | AES-256-CBC block cipher | busybox openssl supports `aes-256-cbc` — the critical three-way constraint. Android javax.crypto supports AES natively. | HIGH | +| `cbc` crate | ^0.1 | CBC mode of operation | Standard mode: openssl CLI `aes-256-cbc`, javax.crypto `AES/CBC/PKCS5Padding` | HIGH | +| `hmac` + `sha2` | ^0.12 / ^0.10 | HMAC-SHA256 integrity | Encrypt-then-MAC. busybox `openssl dgst -sha256 -hmac`. Kotlin `Mac("HmacSHA256")` | HIGH | + +**Why AES-256-CBC over AES-GCM:** busybox openssl does NOT support AEAD modes (GCM/CCM) in `openssl enc`. AES-CBC + HMAC-SHA256 (encrypt-then-MAC) provides equivalent security. CBC is the only AES mode reliably available across all three platforms. + +**Why NOT ChaCha20-Poly1305:** busybox openssl does not support ChaCha20. Android javax.crypto has no standard ChaCha20 support. Would require native libraries, violating constraints. + +**Why NOT aes-gcm crate:** Not decodable via `openssl enc` in busybox. + +### Compression + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| `flate2` crate | ^1.0 | gzip compression | busybox `gunzip` works natively. Android `GZIPInputStream` works natively. Simplest cross-platform path. | HIGH | + +**Why gzip-wrapped DEFLATE:** busybox `gunzip` handles it. Android `GZIPInputStream` handles it. flate2 `GzEncoder`/`GzDecoder` produces standard gzip. + +**Why NOT zstd/lz4/brotli:** busybox has no decompressors for any of these. + +### CLI Framework + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| `clap` | ^4 | CLI argument parsing | De facto Rust standard. Derive macros. Subcommands (`pack`/`unpack`/`inspect`). | HIGH | + +### Binary Format + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| Manual byte-level I/O | N/A | Custom binary format | Using bincode/serde would make format recognizable by forensic tools. Manual bytes give full control for obfuscation. | HIGH | + +### Hashing / Integrity + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| `sha2` crate | ^0.10 | SHA-256 checksums | busybox `sha256sum`, Android `MessageDigest("SHA-256")` | HIGH | + +### Random / IV Generation + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| `rand` | ^0.8 | Random IV generation | CSPRNG for AES-CBC initialization vectors. Each archive gets unique IV. | HIGH | + +### Error Handling + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| `anyhow` | ^1 | Application errors | CLI app, not library. Ergonomic error chains. | HIGH | +| `thiserror` | ^2 | Typed format errors | Specific errors for format validation, decryption, integrity. | MEDIUM | + +### Testing + +| Technology | Version | Purpose | Why | Confidence | +|------------|---------|---------|-----|------------| +| Built-in `#[test]` | N/A | Unit tests | Round-trip pack/unpack/compare | HIGH | +| `assert_cmd` | ^2 | CLI integration tests | Test actual binary | MEDIUM | +| `tempfile` | ^3 | Temp dirs | Clean test isolation | HIGH | + +--- + +## Kotlin/Android Decompressor Stack (Zero External Dependencies) + +| Technology | Source | Purpose | +|------------|--------|---------| +| `javax.crypto.Cipher` | Android SDK | AES-256-CBC: `Cipher.getInstance("AES/CBC/PKCS5Padding")` | +| `javax.crypto.spec.SecretKeySpec` | Android SDK | 32-byte hardcoded key | +| `javax.crypto.spec.IvParameterSpec` | Android SDK | IV from archive header | +| `javax.crypto.Mac` | Android SDK | HMAC-SHA256: `Mac.getInstance("HmacSHA256")` | +| `java.util.zip.GZIPInputStream` | Android SDK | Gzip decompression | +| `java.security.MessageDigest` | Android SDK | SHA-256 integrity | +| `java.nio.ByteBuffer` | Android SDK | Little-endian parsing | + +--- + +## Busybox Shell Decompressor Stack + +| Tool | Purpose | Key Flags | +|------|---------|-----------| +| `dd` | Extract byte ranges | `bs=1 skip=N count=M` | +| `xxd` | Hex encode/decode keys | `xxd -p` | +| `openssl enc` | AES-256-CBC decrypt | `-d -aes-256-cbc -K HEX -iv HEX -nosalt` | +| `openssl dgst` | HMAC-SHA256 verify | `-sha256 -hmac KEY -binary` | +| `gunzip` | Gzip decompress | Standard input/output | +| `sha256sum` | Integrity check | `-c checksums` | + +**Critical:** busybox `openssl enc` uses EVP_BytesToKey by default. MUST pass `-K` (hex key) + `-iv` (hex IV) + `-nosalt` for raw key mode. IV must be in cleartext header. + +--- + +## Cross-Platform Compatibility Matrix + +| Rust | Android SDK | busybox | Notes | +|------|-------------|---------|-------| +| `aes`+`cbc` (PKCS7) | `Cipher("AES/CBC/PKCS5Padding")` | `openssl enc -aes-256-cbc` | PKCS5=PKCS7 for 16-byte blocks | +| `hmac`+`sha2` | `Mac("HmacSHA256")` | `openssl dgst -sha256 -hmac` | Raw key, not password | +| `flate2` (GzEncoder) | `GZIPInputStream` | `gunzip` | Standard gzip | +| `sha2` | `MessageDigest("SHA-256")` | `sha256sum` | Hex comparison | + +--- + +## Alternatives Considered + +| Category | Recommended | Alternative | Why Not | +|----------|-------------|-------------|---------| +| Encryption | AES-256-CBC + HMAC | AES-256-GCM | busybox openssl lacks GCM | +| Encryption | AES-256-CBC + HMAC | ChaCha20-Poly1305 | Not in busybox/Android SDK | +| Compression | flate2 (gzip) | zstd | No busybox decompressor | +| Compression | flate2 (gzip) | lz4 | No busybox decompressor | +| Format | Manual bytes | bincode/serde | Recognizable patterns | +| Crypto ecosystem | RustCrypto (aes+cbc) | ring | ring bundles C code | +| Crypto ecosystem | RustCrypto (aes+cbc) | openssl-rs | Unnecessary system dep | + +--- + +## Cargo.toml + +```toml +[package] +name = "encrypted_archive" +version = "0.1.0" +edition = "2021" + +[dependencies] +aes = "0.8" +cbc = "0.1" +hmac = "0.12" +sha2 = "0.10" +flate2 = "1.0" +clap = { version = "4", features = ["derive"] } +rand = "0.8" +anyhow = "1" +thiserror = "2" + +[dev-dependencies] +assert_cmd = "2" +tempfile = "3" +``` + +**WARNING: Versions from training data (cutoff May 2025). Verify with `cargo search CRATE --limit 1` before use.** + +--- + +## Gaps Requiring Verification + +1. Exact latest crate versions (could not verify via crates.io) +2. Confirm target busybox build includes `openssl` applet +3. Confirm `xxd` availability in target busybox (fallback: `od`) +4. Test PKCS7 padding round-trip across all three platforms +5. Test flate2 GzEncoder output with busybox `gunzip` and Android `GZIPInputStream` diff --git a/.planning/research/SUMMARY.md b/.planning/research/SUMMARY.md new file mode 100644 index 0000000..e3737bd --- /dev/null +++ b/.planning/research/SUMMARY.md @@ -0,0 +1,76 @@ +# Research Summary + +**Project:** encrypted_archive +**Researched:** 2026-02-24 + +## Stack Decision + +| Layer | Choice | Rationale | +|-------|--------|-----------| +| Language (archiver) | Rust | Memory safety, crypto ecosystem, cross-compilation | +| Encryption | AES-256-CBC + HMAC-SHA256 | Only cipher reliably available in busybox openssl + javax.crypto + RustCrypto | +| Compression | gzip (DEFLATE) via `flate2` | Native everywhere: Rust, Android GZIPInputStream, busybox gunzip | +| CLI | `clap` v4 | De facto Rust standard | +| Binary format | Manual byte I/O | Full control for obfuscation; no recognizable patterns | +| Kotlin decoder | javax.crypto + java.util.zip | Zero external dependencies, Android SDK built-in | +| Shell decoder | dd + xxd + openssl + gunzip | Standard busybox applets | + +**Critical constraint:** busybox shell is the weakest link. Every technology choice is validated against "can busybox do this?" + +## Table Stakes Features + +1. Multi-file packing/unpacking (texts + APKs) +2. AES-256-CBC encryption with HMAC-SHA256 (encrypt-then-MAC) +3. Gzip compression before encryption +4. Custom magic bytes (not recognizable by binwalk/file/7z) +5. Hardcoded 32-byte key in all decoders +6. Per-file IV (random 16 bytes, stored in cleartext) +7. Round-trip fidelity (byte-identical decompression) +8. Kotlin decoder (primary, Android 13) +9. Shell decoder (fallback, busybox) +10. File metadata (name, sizes, offsets) +11. Data integrity (SHA-256 checksums per file) + +## Architecture + +Three independent implementations unified by a shared format specification: + +``` +FORMAT SPEC (shared document) + | + +-- Rust Archiver (CLI, Linux/macOS) + +-- Kotlin Decoder (Android 13, primary) + +-- Shell Decoder (busybox, fallback) +``` + +**Rust archiver pipeline:** collect → compress → encrypt → format → obfuscate + +**Key patterns:** +- Per-file independence (each file compressed/encrypted separately) +- Encrypt-then-MAC (HMAC over IV || ciphertext) +- Little-endian everywhere +- Format version field for forward compatibility + +## Top Pitfalls to Prevent + +1. **busybox openssl cipher availability** — test on actual device before format design +2. **OpenSSL key derivation mismatch** — use `-K HEX -iv HEX -nosalt` for raw keys +3. **Cross-platform crypto incompatibility** — golden test vectors mandatory +4. **Over-engineering obfuscation** — AES encryption IS obfuscation for casual users +5. **APKs don't compress** — per-file compression flag needed + +## Recommended Build Order + +1. **Format spec + busybox feasibility proof** — validate constraints first +2. **Rust archiver** — core pipeline (compress → encrypt → format) +3. **Rust test decoder** — catch format bugs in same language +4. **Kotlin decoder** — primary extraction path +5. **Shell decoder** — busybox fallback +6. **Obfuscation hardening + integration testing** — binwalk/file/strings testing + +## Open Questions + +- Does target busybox have `openssl enc -aes-256-cbc` with `-K`/`-iv` flags? +- Is `xxd` available in target busybox? (fallback: `od`) +- Is `gunzip` available in target busybox? +- Should HMAC use same key as AES or derived subkey?