379 lines
15 KiB
Markdown
379 lines
15 KiB
Markdown
# Architecture Patterns
|
|
|
|
**Domain:** Custom encrypted archiver with obfuscated binary format
|
|
**Researched:** 2026-02-24
|
|
|
|
## Recommended Architecture
|
|
|
|
The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail.
|
|
|
|
### High-Level Overview
|
|
|
|
```
|
|
+-----------------+
|
|
| FORMAT SPEC |
|
|
| (shared doc) |
|
|
+--------+--------+
|
|
|
|
|
+------------------+------------------+
|
|
| | |
|
|
+---------v---------+ +----v------+ +--------v--------+
|
|
| RUST ARCHIVER | | KOTLIN | | BUSYBOX SHELL |
|
|
| (CLI, Linux/Mac) | | DECODER | | DECODER |
|
|
| | | (Android)| | (fallback) |
|
|
+-------------------+ +-----------+ +-----------------+
|
|
```
|
|
|
|
### Component Boundaries
|
|
|
|
| Component | Responsibility | Communicates With | Language |
|
|
|-----------|---------------|-------------------|----------|
|
|
| **Format Spec** | Defines binary layout, magic bytes strategy, block structure, obfuscation scheme | All three implementations reference this | Documentation |
|
|
| **Rust Archiver CLI** | Reads input files, compresses, encrypts, obfuscates, writes archive | Filesystem (input files, output archive) | Rust |
|
|
| **Kotlin Decoder** | Reads archive, de-obfuscates, decrypts, decompresses, writes output files | Android filesystem, embedded key | Kotlin |
|
|
| **Shell Decoder** | Same as Kotlin but via busybox commands | busybox (dd, xxd, openssl), filesystem | Shell (sh) |
|
|
| **Test Harness** | Round-trip validation: archive -> decode -> compare | All three components | Rust + shell scripts |
|
|
|
|
### Internal Component Structure (Rust Archiver)
|
|
|
|
The archiver itself has a clear pipeline architecture with five layers:
|
|
|
|
```
|
|
Input Files
|
|
|
|
|
v
|
|
+-------------------+
|
|
| FILE COLLECTOR | Walks paths, reads files, captures metadata
|
|
+-------------------+
|
|
|
|
|
v
|
|
+-------------------+
|
|
| COMPRESSOR | gzip (DEFLATE) per-file compression
|
|
+-------------------+
|
|
|
|
|
v
|
|
+-------------------+
|
|
| ENCRYPTOR | AES-256-CBC + HMAC-SHA256 per-file
|
|
+-------------------+
|
|
|
|
|
v
|
|
+-------------------+
|
|
| FORMAT BUILDER | Assembles binary structure: header, TOC, data blocks
|
|
+-------------------+
|
|
|
|
|
v
|
|
+-------------------+
|
|
| OBFUSCATOR | Shuffles blocks, inserts decoys, transforms magic bytes
|
|
+-------------------+
|
|
|
|
|
v
|
|
Output Archive File
|
|
```
|
|
|
|
## Data Flow: Archival (Packing)
|
|
|
|
### Step 1: File Collection
|
|
|
|
```
|
|
for each input_path:
|
|
read file bytes
|
|
record: filename, original_size, file_type_hint
|
|
-> Vec<FileEntry { name, data, metadata }>
|
|
```
|
|
|
|
### Step 2: Compression (per-file)
|
|
|
|
Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory.
|
|
|
|
```
|
|
for each FileEntry:
|
|
compressed_data = gzip_compress(data)
|
|
record: compressed_size
|
|
-> Vec<CompressedEntry { name, compressed_data, original_size, compressed_size }>
|
|
```
|
|
|
|
**Why compress before encrypt:** Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice.
|
|
|
|
### Step 3: Encryption (per-file)
|
|
|
|
Each compressed file is encrypted independently with a unique IV.
|
|
|
|
```
|
|
for each CompressedEntry:
|
|
iv = random_16_bytes() // unique per file, AES block size
|
|
ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data))
|
|
hmac = hmac_sha256(key, iv || ciphertext) // encrypt-then-MAC
|
|
-> Vec<EncryptedEntry { name, iv, ciphertext, hmac, sizes... }>
|
|
```
|
|
|
|
**Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.**
|
|
|
|
Use **AES-256-CBC + HMAC-SHA256** because:
|
|
- busybox `openssl` supports `aes-256-cbc` natively (GCM is NOT available in busybox openssl)
|
|
- Android/Kotlin `javax.crypto` supports AES-256-CBC natively
|
|
- Rust RustCrypto crates (`aes`, `cbc`, `hmac`) support it fully
|
|
- Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions)
|
|
- ChaCha20 would require custom implementation for shell fallback
|
|
- GCM would require custom implementation for shell fallback
|
|
|
|
**Encrypt-then-MAC pattern:** HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks.
|
|
|
|
### Step 4: Format Assembly
|
|
|
|
The format builder creates the binary layout:
|
|
|
|
```
|
|
+----------------------------------------------------------+
|
|
| OBFUSCATED HEADER (variable, see Step 5) |
|
|
+----------------------------------------------------------+
|
|
| FILE TABLE (encrypted) |
|
|
| - number_of_files: u32 |
|
|
| - for each file: |
|
|
| filename_len: u16 |
|
|
| filename: [u8; filename_len] |
|
|
| original_size: u64 |
|
|
| compressed_size: u64 |
|
|
| encrypted_size: u64 |
|
|
| data_offset: u64 |
|
|
| iv: [u8; 16] |
|
|
| hmac: [u8; 32] |
|
|
+----------------------------------------------------------+
|
|
| DATA BLOCKS |
|
|
| [encrypted_file_1_data] |
|
|
| [encrypted_file_2_data] |
|
|
| ... |
|
|
+----------------------------------------------------------+
|
|
```
|
|
|
|
**The file table itself is encrypted** with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes.
|
|
|
|
### Step 5: Obfuscation
|
|
|
|
The obfuscation layer transforms the assembled binary to resist pattern analysis:
|
|
|
|
1. **No standard magic bytes** -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes
|
|
2. **Decoy padding** -- insert random-length garbage blocks between real data blocks
|
|
3. **Header scatter** -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are
|
|
4. **Byte-level transforms** -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random)
|
|
|
|
```
|
|
FINAL BINARY LAYOUT:
|
|
|
|
[fake_magic: 8 bytes] <- XOR'd known pattern
|
|
[decoy_block: random 32-512 bytes]
|
|
[index_locator: 4 bytes at offset derived from fake_magic]
|
|
[data_block_1]
|
|
[file_table_chunk_1]
|
|
[decoy_block]
|
|
[data_block_2]
|
|
[file_table_chunk_2]
|
|
[data_block_3]
|
|
...
|
|
[index_block] <- lists offsets of file_table_chunks and data_blocks
|
|
[trailing_garbage: random 0-256 bytes]
|
|
```
|
|
|
|
**Important:** The obfuscation MUST be simple enough to implement in a shell script with `dd` and `xxd`. Anything requiring bit manipulation beyond XOR is too complex. Keep it to:
|
|
- Fixed XOR key for header regions (hardcoded in all three decoders)
|
|
- Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file")
|
|
- Sequential reads with `dd bs=1 skip=N count=M`
|
|
|
|
## Data Flow: Extraction (Unpacking)
|
|
|
|
### Kotlin Path (Primary)
|
|
|
|
```kotlin
|
|
// 1. Read archive bytes
|
|
val archive = File(path).readBytes()
|
|
|
|
// 2. De-obfuscate: recover index block location
|
|
val indexOffset = deobfuscateHeader(archive)
|
|
|
|
// 3. Read index block -> get file table chunk offsets
|
|
val index = parseIndex(archive, indexOffset)
|
|
|
|
// 4. Reassemble and decrypt file table
|
|
val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV)
|
|
|
|
// 5. For each file entry in table:
|
|
for (entry in fileTable.entries) {
|
|
val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize)
|
|
verifyHmac(ciphertext, entry.iv, entry.hmac, KEY)
|
|
val compressed = decryptAesCbc(ciphertext, KEY, entry.iv)
|
|
val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
|
|
writeFile(outputDir, entry.filename, original)
|
|
}
|
|
```
|
|
|
|
**Kotlin compression:** Using gzip (`java.util.zip.GZIPInputStream`) which is built into Android SDK. No native libraries needed.
|
|
|
|
### Shell Path (Fallback)
|
|
|
|
```sh
|
|
#!/bin/sh
|
|
# Hardcoded values
|
|
KEY_HEX="abcdef0123456789..." # 64 hex chars = 32 bytes
|
|
XOR_KEY_HEX="deadbeef"
|
|
|
|
ARCHIVE="$1"
|
|
OUTDIR="$2"
|
|
|
|
# 1. De-obfuscate header: read first 8 bytes, XOR to get real magic
|
|
MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p)
|
|
# ... validate XOR pattern ...
|
|
|
|
# 2. Find index block offset (bytes 8-11, little-endian)
|
|
INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p)
|
|
# Convert LE hex to decimal
|
|
INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \
|
|
sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')")
|
|
|
|
# 3. Read index block, parse file table chunk offsets
|
|
# ... dd + xxd to extract offsets ...
|
|
|
|
# 4. For each file: extract ciphertext, decrypt, decompress
|
|
dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
|
|
openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \
|
|
gunzip > "$OUTDIR/$FILENAME"
|
|
|
|
# 5. Verify HMAC
|
|
COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
|
|
openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}')
|
|
```
|
|
|
|
**Shell limitations that constrain the entire format design:**
|
|
- `dd` reads are byte-precise but slow for large files with bs=1
|
|
- `xxd` handles hex conversion but no binary arithmetic
|
|
- `openssl` in busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO)
|
|
- HMAC verification via `openssl dgst -sha256 -hmac` (available in most busybox builds)
|
|
- Integer arithmetic limited to shell `$(( ))` -- handles 64-bit on most platforms
|
|
- **Endianness:** all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing)
|
|
|
|
## Patterns to Follow
|
|
|
|
### Pattern 1: Pipeline Architecture (Archiver)
|
|
|
|
**What:** Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others.
|
|
|
|
**When:** Always. This is the core design pattern.
|
|
|
|
**Why:** Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job).
|
|
|
|
```rust
|
|
// Each stage is a function or module with typed input/output
|
|
mod collect; // Vec<PathBuf> -> Vec<FileEntry>
|
|
mod compress; // Vec<FileEntry> -> Vec<CompressedEntry>
|
|
mod encrypt; // Vec<CompressedEntry> -> Vec<EncryptedEntry>
|
|
mod format; // Vec<EncryptedEntry> -> RawArchive (unobfuscated bytes)
|
|
mod obfuscate; // RawArchive -> Vec<u8> (final obfuscated bytes)
|
|
|
|
// Main pipeline
|
|
pub fn create_archive(paths: Vec<PathBuf>, key: &[u8; 32]) -> Result<Vec<u8>> {
|
|
let files = collect::gather(paths)?;
|
|
let compressed = compress::compress_all(files)?;
|
|
let encrypted = encrypt::encrypt_all(compressed, key)?;
|
|
let raw = format::build(encrypted)?;
|
|
let obfuscated = obfuscate::apply(raw)?;
|
|
Ok(obfuscated)
|
|
}
|
|
```
|
|
|
|
### Pattern 2: Format Version Field
|
|
|
|
**What:** Include a format version byte in the archive header (post-deobfuscation). Start at version 1.
|
|
|
|
**When:** Always. Format will evolve.
|
|
|
|
**Why:** Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output.
|
|
|
|
### Pattern 3: Per-File Independence
|
|
|
|
**What:** Each file in the archive is compressed and encrypted independently with its own IV and HMAC.
|
|
|
|
**When:** Always.
|
|
|
|
**Why:**
|
|
- Shell decoder can extract a single file without processing the entire archive
|
|
- A corruption in one file does not cascade to others
|
|
- Memory usage is bounded by the largest single file, not the archive total
|
|
|
|
### Pattern 4: Shared Format Specification as Source of Truth
|
|
|
|
**What:** A single document defines every byte of the format. All three implementations are derived from this spec.
|
|
|
|
**When:** Before writing any code.
|
|
|
|
**Why:** With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption.
|
|
|
|
### Pattern 5: Encrypt-then-MAC
|
|
|
|
**What:** Apply HMAC after encryption, computed over (IV || ciphertext).
|
|
|
|
**When:** Always. Non-negotiable for CBC mode.
|
|
|
|
**Why:** CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms.
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
| Anti-Pattern | Why Bad | Instead |
|
|
|-------------|---------|---------|
|
|
| Streaming/Chunked Encryption | Shell can't seek into stream cipher | Encrypt each file independently |
|
|
| Complex Obfuscation | Can't implement in busybox shell | XOR + fixed offsets + decoy padding |
|
|
| Obfuscation as Security | Trivially reversible from source code | Encryption = security, obfuscation = anti-detection |
|
|
| GCM Mode | busybox openssl doesn't support it | AES-256-CBC + HMAC-SHA256 |
|
|
| zstd/lz4 Compression | No busybox/Android SDK support | gzip (DEFLATE) |
|
|
| MAC-then-Encrypt | Padding oracle attacks possible | Encrypt-then-MAC |
|
|
|
|
## Suggested Build Order
|
|
|
|
```
|
|
Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF
|
|
|
|
|
v
|
|
Phase 2: RUST ARCHIVER (core pipeline)
|
|
|
|
|
v
|
|
Phase 3: RUST ROUND-TRIP TEST DECODER
|
|
|
|
|
v
|
|
Phase 4: KOTLIN DECODER
|
|
|
|
|
v
|
|
Phase 5: SHELL DECODER
|
|
|
|
|
v
|
|
Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING
|
|
```
|
|
|
|
**Why this order:**
|
|
|
|
1. **Format spec first** -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code.
|
|
2. **Rust archiver before decoders** -- need archives to test decoders against.
|
|
3. **Rust test decoder before Kotlin/shell** -- catches format bugs in same language, avoids cross-language debugging.
|
|
4. **Kotlin before shell** -- primary path first; if Kotlin works, format is validated.
|
|
5. **Obfuscation hardening last** -- core pipeline must work first. Obfuscation is a layer on top.
|
|
|
|
## Key Architectural Decisions Summary
|
|
|
|
| Decision | Choice | Rationale |
|
|
|----------|--------|-----------|
|
|
| Compression | gzip (DEFLATE) via `flate2` | Native on all three platforms |
|
|
| Encryption | AES-256-CBC | busybox openssl supports CBC; GCM not available |
|
|
| Authentication | HMAC-SHA256 (encrypt-then-MAC) | Authenticated encryption for CBC; verifiable everywhere |
|
|
| Byte order | Little-endian | ARM native order; simpler shell parsing |
|
|
| File processing | Per-file independent | Shell needs random access; bounded memory; fault isolation |
|
|
| Obfuscation | XOR headers + scattered blocks + decoy padding | Simple enough for shell; defeats binwalk/file |
|
|
| Format contract | Standalone spec document written first | Three implementations need byte-exact agreement |
|
|
| Key storage | Hardcoded 32-byte key in all decoders | Per requirements; sufficient for casual user threat model |
|
|
| PKCS7 padding | Standard PKCS7 for CBC mode | openssl uses PKCS7 by default; Kotlin supports natively |
|
|
|
|
## Sources
|
|
|
|
- Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg)
|
|
- busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported
|
|
- Android SDK javax.crypto and java.util.zip documentation
|
|
- Rust RustCrypto ecosystem: `flate2`, `aes`, `cbc`, `hmac`, `sha2`
|
|
- Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard
|
|
|
|
**Verification needed:** Run `busybox openssl enc -ciphers` on target device to confirm aes-256-cbc availability.
|