15 KiB
Architecture Patterns
Domain: Custom encrypted archiver with obfuscated binary format Researched: 2026-02-24
Recommended Architecture
The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail.
High-Level Overview
+-----------------+
| FORMAT SPEC |
| (shared doc) |
+--------+--------+
|
+------------------+------------------+
| | |
+---------v---------+ +----v------+ +--------v--------+
| RUST ARCHIVER | | KOTLIN | | BUSYBOX SHELL |
| (CLI, Linux/Mac) | | DECODER | | DECODER |
| | | (Android)| | (fallback) |
+-------------------+ +-----------+ +-----------------+
Component Boundaries
| Component | Responsibility | Communicates With | Language |
|---|---|---|---|
| Format Spec | Defines binary layout, magic bytes strategy, block structure, obfuscation scheme | All three implementations reference this | Documentation |
| Rust Archiver CLI | Reads input files, compresses, encrypts, obfuscates, writes archive | Filesystem (input files, output archive) | Rust |
| Kotlin Decoder | Reads archive, de-obfuscates, decrypts, decompresses, writes output files | Android filesystem, embedded key | Kotlin |
| Shell Decoder | Same as Kotlin but via busybox commands | busybox (dd, xxd, openssl), filesystem | Shell (sh) |
| Test Harness | Round-trip validation: archive -> decode -> compare | All three components | Rust + shell scripts |
Internal Component Structure (Rust Archiver)
The archiver itself has a clear pipeline architecture with five layers:
Input Files
|
v
+-------------------+
| FILE COLLECTOR | Walks paths, reads files, captures metadata
+-------------------+
|
v
+-------------------+
| COMPRESSOR | gzip (DEFLATE) per-file compression
+-------------------+
|
v
+-------------------+
| ENCRYPTOR | AES-256-CBC + HMAC-SHA256 per-file
+-------------------+
|
v
+-------------------+
| FORMAT BUILDER | Assembles binary structure: header, TOC, data blocks
+-------------------+
|
v
+-------------------+
| OBFUSCATOR | Shuffles blocks, inserts decoys, transforms magic bytes
+-------------------+
|
v
Output Archive File
Data Flow: Archival (Packing)
Step 1: File Collection
for each input_path:
read file bytes
record: filename, original_size, file_type_hint
-> Vec<FileEntry { name, data, metadata }>
Step 2: Compression (per-file)
Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory.
for each FileEntry:
compressed_data = gzip_compress(data)
record: compressed_size
-> Vec<CompressedEntry { name, compressed_data, original_size, compressed_size }>
Why compress before encrypt: Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice.
Step 3: Encryption (per-file)
Each compressed file is encrypted independently with a unique IV.
for each CompressedEntry:
iv = random_16_bytes() // unique per file, AES block size
ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data))
hmac = hmac_sha256(key, iv || ciphertext) // encrypt-then-MAC
-> Vec<EncryptedEntry { name, iv, ciphertext, hmac, sizes... }>
Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.
Use AES-256-CBC + HMAC-SHA256 because:
- busybox
opensslsupportsaes-256-cbcnatively (GCM is NOT available in busybox openssl) - Android/Kotlin
javax.cryptosupports AES-256-CBC natively - Rust RustCrypto crates (
aes,cbc,hmac) support it fully - Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions)
- ChaCha20 would require custom implementation for shell fallback
- GCM would require custom implementation for shell fallback
Encrypt-then-MAC pattern: HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks.
Step 4: Format Assembly
The format builder creates the binary layout:
+----------------------------------------------------------+
| OBFUSCATED HEADER (variable, see Step 5) |
+----------------------------------------------------------+
| FILE TABLE (encrypted) |
| - number_of_files: u32 |
| - for each file: |
| filename_len: u16 |
| filename: [u8; filename_len] |
| original_size: u64 |
| compressed_size: u64 |
| encrypted_size: u64 |
| data_offset: u64 |
| iv: [u8; 16] |
| hmac: [u8; 32] |
+----------------------------------------------------------+
| DATA BLOCKS |
| [encrypted_file_1_data] |
| [encrypted_file_2_data] |
| ... |
+----------------------------------------------------------+
The file table itself is encrypted with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes.
Step 5: Obfuscation
The obfuscation layer transforms the assembled binary to resist pattern analysis:
- No standard magic bytes -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes
- Decoy padding -- insert random-length garbage blocks between real data blocks
- Header scatter -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are
- Byte-level transforms -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random)
FINAL BINARY LAYOUT:
[fake_magic: 8 bytes] <- XOR'd known pattern
[decoy_block: random 32-512 bytes]
[index_locator: 4 bytes at offset derived from fake_magic]
[data_block_1]
[file_table_chunk_1]
[decoy_block]
[data_block_2]
[file_table_chunk_2]
[data_block_3]
...
[index_block] <- lists offsets of file_table_chunks and data_blocks
[trailing_garbage: random 0-256 bytes]
Important: The obfuscation MUST be simple enough to implement in a shell script with dd and xxd. Anything requiring bit manipulation beyond XOR is too complex. Keep it to:
- Fixed XOR key for header regions (hardcoded in all three decoders)
- Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file")
- Sequential reads with
dd bs=1 skip=N count=M
Data Flow: Extraction (Unpacking)
Kotlin Path (Primary)
// 1. Read archive bytes
val archive = File(path).readBytes()
// 2. De-obfuscate: recover index block location
val indexOffset = deobfuscateHeader(archive)
// 3. Read index block -> get file table chunk offsets
val index = parseIndex(archive, indexOffset)
// 4. Reassemble and decrypt file table
val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV)
// 5. For each file entry in table:
for (entry in fileTable.entries) {
val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize)
verifyHmac(ciphertext, entry.iv, entry.hmac, KEY)
val compressed = decryptAesCbc(ciphertext, KEY, entry.iv)
val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
writeFile(outputDir, entry.filename, original)
}
Kotlin compression: Using gzip (java.util.zip.GZIPInputStream) which is built into Android SDK. No native libraries needed.
Shell Path (Fallback)
#!/bin/sh
# Hardcoded values
KEY_HEX="abcdef0123456789..." # 64 hex chars = 32 bytes
XOR_KEY_HEX="deadbeef"
ARCHIVE="$1"
OUTDIR="$2"
# 1. De-obfuscate header: read first 8 bytes, XOR to get real magic
MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p)
# ... validate XOR pattern ...
# 2. Find index block offset (bytes 8-11, little-endian)
INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p)
# Convert LE hex to decimal
INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \
sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')")
# 3. Read index block, parse file table chunk offsets
# ... dd + xxd to extract offsets ...
# 4. For each file: extract ciphertext, decrypt, decompress
dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \
gunzip > "$OUTDIR/$FILENAME"
# 5. Verify HMAC
COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}')
Shell limitations that constrain the entire format design:
ddreads are byte-precise but slow for large files with bs=1xxdhandles hex conversion but no binary arithmeticopensslin busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO)- HMAC verification via
openssl dgst -sha256 -hmac(available in most busybox builds) - Integer arithmetic limited to shell
$(( ))-- handles 64-bit on most platforms - Endianness: all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing)
Patterns to Follow
Pattern 1: Pipeline Architecture (Archiver)
What: Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others.
When: Always. This is the core design pattern.
Why: Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job).
// Each stage is a function or module with typed input/output
mod collect; // Vec<PathBuf> -> Vec<FileEntry>
mod compress; // Vec<FileEntry> -> Vec<CompressedEntry>
mod encrypt; // Vec<CompressedEntry> -> Vec<EncryptedEntry>
mod format; // Vec<EncryptedEntry> -> RawArchive (unobfuscated bytes)
mod obfuscate; // RawArchive -> Vec<u8> (final obfuscated bytes)
// Main pipeline
pub fn create_archive(paths: Vec<PathBuf>, key: &[u8; 32]) -> Result<Vec<u8>> {
let files = collect::gather(paths)?;
let compressed = compress::compress_all(files)?;
let encrypted = encrypt::encrypt_all(compressed, key)?;
let raw = format::build(encrypted)?;
let obfuscated = obfuscate::apply(raw)?;
Ok(obfuscated)
}
Pattern 2: Format Version Field
What: Include a format version byte in the archive header (post-deobfuscation). Start at version 1.
When: Always. Format will evolve.
Why: Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output.
Pattern 3: Per-File Independence
What: Each file in the archive is compressed and encrypted independently with its own IV and HMAC.
When: Always.
Why:
- Shell decoder can extract a single file without processing the entire archive
- A corruption in one file does not cascade to others
- Memory usage is bounded by the largest single file, not the archive total
Pattern 4: Shared Format Specification as Source of Truth
What: A single document defines every byte of the format. All three implementations are derived from this spec.
When: Before writing any code.
Why: With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption.
Pattern 5: Encrypt-then-MAC
What: Apply HMAC after encryption, computed over (IV || ciphertext).
When: Always. Non-negotiable for CBC mode.
Why: CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms.
Anti-Patterns to Avoid
| Anti-Pattern | Why Bad | Instead |
|---|---|---|
| Streaming/Chunked Encryption | Shell can't seek into stream cipher | Encrypt each file independently |
| Complex Obfuscation | Can't implement in busybox shell | XOR + fixed offsets + decoy padding |
| Obfuscation as Security | Trivially reversible from source code | Encryption = security, obfuscation = anti-detection |
| GCM Mode | busybox openssl doesn't support it | AES-256-CBC + HMAC-SHA256 |
| zstd/lz4 Compression | No busybox/Android SDK support | gzip (DEFLATE) |
| MAC-then-Encrypt | Padding oracle attacks possible | Encrypt-then-MAC |
Suggested Build Order
Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF
|
v
Phase 2: RUST ARCHIVER (core pipeline)
|
v
Phase 3: RUST ROUND-TRIP TEST DECODER
|
v
Phase 4: KOTLIN DECODER
|
v
Phase 5: SHELL DECODER
|
v
Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING
Why this order:
- Format spec first -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code.
- Rust archiver before decoders -- need archives to test decoders against.
- Rust test decoder before Kotlin/shell -- catches format bugs in same language, avoids cross-language debugging.
- Kotlin before shell -- primary path first; if Kotlin works, format is validated.
- Obfuscation hardening last -- core pipeline must work first. Obfuscation is a layer on top.
Key Architectural Decisions Summary
| Decision | Choice | Rationale |
|---|---|---|
| Compression | gzip (DEFLATE) via flate2 |
Native on all three platforms |
| Encryption | AES-256-CBC | busybox openssl supports CBC; GCM not available |
| Authentication | HMAC-SHA256 (encrypt-then-MAC) | Authenticated encryption for CBC; verifiable everywhere |
| Byte order | Little-endian | ARM native order; simpler shell parsing |
| File processing | Per-file independent | Shell needs random access; bounded memory; fault isolation |
| Obfuscation | XOR headers + scattered blocks + decoy padding | Simple enough for shell; defeats binwalk/file |
| Format contract | Standalone spec document written first | Three implementations need byte-exact agreement |
| Key storage | Hardcoded 32-byte key in all decoders | Per requirements; sufficient for casual user threat model |
| PKCS7 padding | Standard PKCS7 for CBC mode | openssl uses PKCS7 by default; Kotlin supports natively |
Sources
- Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg)
- busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported
- Android SDK javax.crypto and java.util.zip documentation
- Rust RustCrypto ecosystem:
flate2,aes,cbc,hmac,sha2 - Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard
Verification needed: Run busybox openssl enc -ciphers on target device to confirm aes-256-cbc availability.