# Architecture Patterns **Domain:** Custom encrypted archiver with obfuscated binary format **Researched:** 2026-02-24 ## Recommended Architecture The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail. ### High-Level Overview ``` +-----------------+ | FORMAT SPEC | | (shared doc) | +--------+--------+ | +------------------+------------------+ | | | +---------v---------+ +----v------+ +--------v--------+ | RUST ARCHIVER | | KOTLIN | | BUSYBOX SHELL | | (CLI, Linux/Mac) | | DECODER | | DECODER | | | | (Android)| | (fallback) | +-------------------+ +-----------+ +-----------------+ ``` ### Component Boundaries | Component | Responsibility | Communicates With | Language | |-----------|---------------|-------------------|----------| | **Format Spec** | Defines binary layout, magic bytes strategy, block structure, obfuscation scheme | All three implementations reference this | Documentation | | **Rust Archiver CLI** | Reads input files, compresses, encrypts, obfuscates, writes archive | Filesystem (input files, output archive) | Rust | | **Kotlin Decoder** | Reads archive, de-obfuscates, decrypts, decompresses, writes output files | Android filesystem, embedded key | Kotlin | | **Shell Decoder** | Same as Kotlin but via busybox commands | busybox (dd, xxd, openssl), filesystem | Shell (sh) | | **Test Harness** | Round-trip validation: archive -> decode -> compare | All three components | Rust + shell scripts | ### Internal Component Structure (Rust Archiver) The archiver itself has a clear pipeline architecture with five layers: ``` Input Files | v +-------------------+ | FILE COLLECTOR | Walks paths, reads files, captures metadata +-------------------+ | v +-------------------+ | COMPRESSOR | gzip (DEFLATE) per-file compression +-------------------+ | v +-------------------+ | ENCRYPTOR | AES-256-CBC + HMAC-SHA256 per-file +-------------------+ | v +-------------------+ | FORMAT BUILDER | Assembles binary structure: header, TOC, data blocks +-------------------+ | v +-------------------+ | OBFUSCATOR | Shuffles blocks, inserts decoys, transforms magic bytes +-------------------+ | v Output Archive File ``` ## Data Flow: Archival (Packing) ### Step 1: File Collection ``` for each input_path: read file bytes record: filename, original_size, file_type_hint -> Vec ``` ### Step 2: Compression (per-file) Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory. ``` for each FileEntry: compressed_data = gzip_compress(data) record: compressed_size -> Vec ``` **Why compress before encrypt:** Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice. ### Step 3: Encryption (per-file) Each compressed file is encrypted independently with a unique IV. ``` for each CompressedEntry: iv = random_16_bytes() // unique per file, AES block size ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data)) hmac = hmac_sha256(key, iv || ciphertext) // encrypt-then-MAC -> Vec ``` **Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.** Use **AES-256-CBC + HMAC-SHA256** because: - busybox `openssl` supports `aes-256-cbc` natively (GCM is NOT available in busybox openssl) - Android/Kotlin `javax.crypto` supports AES-256-CBC natively - Rust RustCrypto crates (`aes`, `cbc`, `hmac`) support it fully - Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions) - ChaCha20 would require custom implementation for shell fallback - GCM would require custom implementation for shell fallback **Encrypt-then-MAC pattern:** HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks. ### Step 4: Format Assembly The format builder creates the binary layout: ``` +----------------------------------------------------------+ | OBFUSCATED HEADER (variable, see Step 5) | +----------------------------------------------------------+ | FILE TABLE (encrypted) | | - number_of_files: u32 | | - for each file: | | filename_len: u16 | | filename: [u8; filename_len] | | original_size: u64 | | compressed_size: u64 | | encrypted_size: u64 | | data_offset: u64 | | iv: [u8; 16] | | hmac: [u8; 32] | +----------------------------------------------------------+ | DATA BLOCKS | | [encrypted_file_1_data] | | [encrypted_file_2_data] | | ... | +----------------------------------------------------------+ ``` **The file table itself is encrypted** with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes. ### Step 5: Obfuscation The obfuscation layer transforms the assembled binary to resist pattern analysis: 1. **No standard magic bytes** -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes 2. **Decoy padding** -- insert random-length garbage blocks between real data blocks 3. **Header scatter** -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are 4. **Byte-level transforms** -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random) ``` FINAL BINARY LAYOUT: [fake_magic: 8 bytes] <- XOR'd known pattern [decoy_block: random 32-512 bytes] [index_locator: 4 bytes at offset derived from fake_magic] [data_block_1] [file_table_chunk_1] [decoy_block] [data_block_2] [file_table_chunk_2] [data_block_3] ... [index_block] <- lists offsets of file_table_chunks and data_blocks [trailing_garbage: random 0-256 bytes] ``` **Important:** The obfuscation MUST be simple enough to implement in a shell script with `dd` and `xxd`. Anything requiring bit manipulation beyond XOR is too complex. Keep it to: - Fixed XOR key for header regions (hardcoded in all three decoders) - Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file") - Sequential reads with `dd bs=1 skip=N count=M` ## Data Flow: Extraction (Unpacking) ### Kotlin Path (Primary) ```kotlin // 1. Read archive bytes val archive = File(path).readBytes() // 2. De-obfuscate: recover index block location val indexOffset = deobfuscateHeader(archive) // 3. Read index block -> get file table chunk offsets val index = parseIndex(archive, indexOffset) // 4. Reassemble and decrypt file table val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV) // 5. For each file entry in table: for (entry in fileTable.entries) { val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize) verifyHmac(ciphertext, entry.iv, entry.hmac, KEY) val compressed = decryptAesCbc(ciphertext, KEY, entry.iv) val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes() writeFile(outputDir, entry.filename, original) } ``` **Kotlin compression:** Using gzip (`java.util.zip.GZIPInputStream`) which is built into Android SDK. No native libraries needed. ### Shell Path (Fallback) ```sh #!/bin/sh # Hardcoded values KEY_HEX="abcdef0123456789..." # 64 hex chars = 32 bytes XOR_KEY_HEX="deadbeef" ARCHIVE="$1" OUTDIR="$2" # 1. De-obfuscate header: read first 8 bytes, XOR to get real magic MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p) # ... validate XOR pattern ... # 2. Find index block offset (bytes 8-11, little-endian) INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p) # Convert LE hex to decimal INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \ sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')") # 3. Read index block, parse file table chunk offsets # ... dd + xxd to extract offsets ... # 4. For each file: extract ciphertext, decrypt, decompress dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \ openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \ gunzip > "$OUTDIR/$FILENAME" # 5. Verify HMAC COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \ openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}') ``` **Shell limitations that constrain the entire format design:** - `dd` reads are byte-precise but slow for large files with bs=1 - `xxd` handles hex conversion but no binary arithmetic - `openssl` in busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO) - HMAC verification via `openssl dgst -sha256 -hmac` (available in most busybox builds) - Integer arithmetic limited to shell `$(( ))` -- handles 64-bit on most platforms - **Endianness:** all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing) ## Patterns to Follow ### Pattern 1: Pipeline Architecture (Archiver) **What:** Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others. **When:** Always. This is the core design pattern. **Why:** Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job). ```rust // Each stage is a function or module with typed input/output mod collect; // Vec -> Vec mod compress; // Vec -> Vec mod encrypt; // Vec -> Vec mod format; // Vec -> RawArchive (unobfuscated bytes) mod obfuscate; // RawArchive -> Vec (final obfuscated bytes) // Main pipeline pub fn create_archive(paths: Vec, key: &[u8; 32]) -> Result> { let files = collect::gather(paths)?; let compressed = compress::compress_all(files)?; let encrypted = encrypt::encrypt_all(compressed, key)?; let raw = format::build(encrypted)?; let obfuscated = obfuscate::apply(raw)?; Ok(obfuscated) } ``` ### Pattern 2: Format Version Field **What:** Include a format version byte in the archive header (post-deobfuscation). Start at version 1. **When:** Always. Format will evolve. **Why:** Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output. ### Pattern 3: Per-File Independence **What:** Each file in the archive is compressed and encrypted independently with its own IV and HMAC. **When:** Always. **Why:** - Shell decoder can extract a single file without processing the entire archive - A corruption in one file does not cascade to others - Memory usage is bounded by the largest single file, not the archive total ### Pattern 4: Shared Format Specification as Source of Truth **What:** A single document defines every byte of the format. All three implementations are derived from this spec. **When:** Before writing any code. **Why:** With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption. ### Pattern 5: Encrypt-then-MAC **What:** Apply HMAC after encryption, computed over (IV || ciphertext). **When:** Always. Non-negotiable for CBC mode. **Why:** CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms. ## Anti-Patterns to Avoid | Anti-Pattern | Why Bad | Instead | |-------------|---------|---------| | Streaming/Chunked Encryption | Shell can't seek into stream cipher | Encrypt each file independently | | Complex Obfuscation | Can't implement in busybox shell | XOR + fixed offsets + decoy padding | | Obfuscation as Security | Trivially reversible from source code | Encryption = security, obfuscation = anti-detection | | GCM Mode | busybox openssl doesn't support it | AES-256-CBC + HMAC-SHA256 | | zstd/lz4 Compression | No busybox/Android SDK support | gzip (DEFLATE) | | MAC-then-Encrypt | Padding oracle attacks possible | Encrypt-then-MAC | ## Suggested Build Order ``` Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF | v Phase 2: RUST ARCHIVER (core pipeline) | v Phase 3: RUST ROUND-TRIP TEST DECODER | v Phase 4: KOTLIN DECODER | v Phase 5: SHELL DECODER | v Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING ``` **Why this order:** 1. **Format spec first** -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code. 2. **Rust archiver before decoders** -- need archives to test decoders against. 3. **Rust test decoder before Kotlin/shell** -- catches format bugs in same language, avoids cross-language debugging. 4. **Kotlin before shell** -- primary path first; if Kotlin works, format is validated. 5. **Obfuscation hardening last** -- core pipeline must work first. Obfuscation is a layer on top. ## Key Architectural Decisions Summary | Decision | Choice | Rationale | |----------|--------|-----------| | Compression | gzip (DEFLATE) via `flate2` | Native on all three platforms | | Encryption | AES-256-CBC | busybox openssl supports CBC; GCM not available | | Authentication | HMAC-SHA256 (encrypt-then-MAC) | Authenticated encryption for CBC; verifiable everywhere | | Byte order | Little-endian | ARM native order; simpler shell parsing | | File processing | Per-file independent | Shell needs random access; bounded memory; fault isolation | | Obfuscation | XOR headers + scattered blocks + decoy padding | Simple enough for shell; defeats binwalk/file | | Format contract | Standalone spec document written first | Three implementations need byte-exact agreement | | Key storage | Hardcoded 32-byte key in all decoders | Per requirements; sufficient for casual user threat model | | PKCS7 padding | Standard PKCS7 for CBC mode | openssl uses PKCS7 by default; Kotlin supports natively | ## Sources - Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg) - busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported - Android SDK javax.crypto and java.util.zip documentation - Rust RustCrypto ecosystem: `flate2`, `aes`, `cbc`, `hmac`, `sha2` - Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard **Verification needed:** Run `busybox openssl enc -ciphers` on target device to confirm aes-256-cbc availability.