docs: add project research
This commit is contained in:
378
.planning/research/ARCHITECTURE.md
Normal file
378
.planning/research/ARCHITECTURE.md
Normal file
@@ -0,0 +1,378 @@
|
||||
# Architecture Patterns
|
||||
|
||||
**Domain:** Custom encrypted archiver with obfuscated binary format
|
||||
**Researched:** 2026-02-24
|
||||
|
||||
## Recommended Architecture
|
||||
|
||||
The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail.
|
||||
|
||||
### High-Level Overview
|
||||
|
||||
```
|
||||
+-----------------+
|
||||
| FORMAT SPEC |
|
||||
| (shared doc) |
|
||||
+--------+--------+
|
||||
|
|
||||
+------------------+------------------+
|
||||
| | |
|
||||
+---------v---------+ +----v------+ +--------v--------+
|
||||
| RUST ARCHIVER | | KOTLIN | | BUSYBOX SHELL |
|
||||
| (CLI, Linux/Mac) | | DECODER | | DECODER |
|
||||
| | | (Android)| | (fallback) |
|
||||
+-------------------+ +-----------+ +-----------------+
|
||||
```
|
||||
|
||||
### Component Boundaries
|
||||
|
||||
| Component | Responsibility | Communicates With | Language |
|
||||
|-----------|---------------|-------------------|----------|
|
||||
| **Format Spec** | Defines binary layout, magic bytes strategy, block structure, obfuscation scheme | All three implementations reference this | Documentation |
|
||||
| **Rust Archiver CLI** | Reads input files, compresses, encrypts, obfuscates, writes archive | Filesystem (input files, output archive) | Rust |
|
||||
| **Kotlin Decoder** | Reads archive, de-obfuscates, decrypts, decompresses, writes output files | Android filesystem, embedded key | Kotlin |
|
||||
| **Shell Decoder** | Same as Kotlin but via busybox commands | busybox (dd, xxd, openssl), filesystem | Shell (sh) |
|
||||
| **Test Harness** | Round-trip validation: archive -> decode -> compare | All three components | Rust + shell scripts |
|
||||
|
||||
### Internal Component Structure (Rust Archiver)
|
||||
|
||||
The archiver itself has a clear pipeline architecture with five layers:
|
||||
|
||||
```
|
||||
Input Files
|
||||
|
|
||||
v
|
||||
+-------------------+
|
||||
| FILE COLLECTOR | Walks paths, reads files, captures metadata
|
||||
+-------------------+
|
||||
|
|
||||
v
|
||||
+-------------------+
|
||||
| COMPRESSOR | gzip (DEFLATE) per-file compression
|
||||
+-------------------+
|
||||
|
|
||||
v
|
||||
+-------------------+
|
||||
| ENCRYPTOR | AES-256-CBC + HMAC-SHA256 per-file
|
||||
+-------------------+
|
||||
|
|
||||
v
|
||||
+-------------------+
|
||||
| FORMAT BUILDER | Assembles binary structure: header, TOC, data blocks
|
||||
+-------------------+
|
||||
|
|
||||
v
|
||||
+-------------------+
|
||||
| OBFUSCATOR | Shuffles blocks, inserts decoys, transforms magic bytes
|
||||
+-------------------+
|
||||
|
|
||||
v
|
||||
Output Archive File
|
||||
```
|
||||
|
||||
## Data Flow: Archival (Packing)
|
||||
|
||||
### Step 1: File Collection
|
||||
|
||||
```
|
||||
for each input_path:
|
||||
read file bytes
|
||||
record: filename, original_size, file_type_hint
|
||||
-> Vec<FileEntry { name, data, metadata }>
|
||||
```
|
||||
|
||||
### Step 2: Compression (per-file)
|
||||
|
||||
Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory.
|
||||
|
||||
```
|
||||
for each FileEntry:
|
||||
compressed_data = gzip_compress(data)
|
||||
record: compressed_size
|
||||
-> Vec<CompressedEntry { name, compressed_data, original_size, compressed_size }>
|
||||
```
|
||||
|
||||
**Why compress before encrypt:** Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice.
|
||||
|
||||
### Step 3: Encryption (per-file)
|
||||
|
||||
Each compressed file is encrypted independently with a unique IV.
|
||||
|
||||
```
|
||||
for each CompressedEntry:
|
||||
iv = random_16_bytes() // unique per file, AES block size
|
||||
ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data))
|
||||
hmac = hmac_sha256(key, iv || ciphertext) // encrypt-then-MAC
|
||||
-> Vec<EncryptedEntry { name, iv, ciphertext, hmac, sizes... }>
|
||||
```
|
||||
|
||||
**Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.**
|
||||
|
||||
Use **AES-256-CBC + HMAC-SHA256** because:
|
||||
- busybox `openssl` supports `aes-256-cbc` natively (GCM is NOT available in busybox openssl)
|
||||
- Android/Kotlin `javax.crypto` supports AES-256-CBC natively
|
||||
- Rust RustCrypto crates (`aes`, `cbc`, `hmac`) support it fully
|
||||
- Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions)
|
||||
- ChaCha20 would require custom implementation for shell fallback
|
||||
- GCM would require custom implementation for shell fallback
|
||||
|
||||
**Encrypt-then-MAC pattern:** HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks.
|
||||
|
||||
### Step 4: Format Assembly
|
||||
|
||||
The format builder creates the binary layout:
|
||||
|
||||
```
|
||||
+----------------------------------------------------------+
|
||||
| OBFUSCATED HEADER (variable, see Step 5) |
|
||||
+----------------------------------------------------------+
|
||||
| FILE TABLE (encrypted) |
|
||||
| - number_of_files: u32 |
|
||||
| - for each file: |
|
||||
| filename_len: u16 |
|
||||
| filename: [u8; filename_len] |
|
||||
| original_size: u64 |
|
||||
| compressed_size: u64 |
|
||||
| encrypted_size: u64 |
|
||||
| data_offset: u64 |
|
||||
| iv: [u8; 16] |
|
||||
| hmac: [u8; 32] |
|
||||
+----------------------------------------------------------+
|
||||
| DATA BLOCKS |
|
||||
| [encrypted_file_1_data] |
|
||||
| [encrypted_file_2_data] |
|
||||
| ... |
|
||||
+----------------------------------------------------------+
|
||||
```
|
||||
|
||||
**The file table itself is encrypted** with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes.
|
||||
|
||||
### Step 5: Obfuscation
|
||||
|
||||
The obfuscation layer transforms the assembled binary to resist pattern analysis:
|
||||
|
||||
1. **No standard magic bytes** -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes
|
||||
2. **Decoy padding** -- insert random-length garbage blocks between real data blocks
|
||||
3. **Header scatter** -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are
|
||||
4. **Byte-level transforms** -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random)
|
||||
|
||||
```
|
||||
FINAL BINARY LAYOUT:
|
||||
|
||||
[fake_magic: 8 bytes] <- XOR'd known pattern
|
||||
[decoy_block: random 32-512 bytes]
|
||||
[index_locator: 4 bytes at offset derived from fake_magic]
|
||||
[data_block_1]
|
||||
[file_table_chunk_1]
|
||||
[decoy_block]
|
||||
[data_block_2]
|
||||
[file_table_chunk_2]
|
||||
[data_block_3]
|
||||
...
|
||||
[index_block] <- lists offsets of file_table_chunks and data_blocks
|
||||
[trailing_garbage: random 0-256 bytes]
|
||||
```
|
||||
|
||||
**Important:** The obfuscation MUST be simple enough to implement in a shell script with `dd` and `xxd`. Anything requiring bit manipulation beyond XOR is too complex. Keep it to:
|
||||
- Fixed XOR key for header regions (hardcoded in all three decoders)
|
||||
- Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file")
|
||||
- Sequential reads with `dd bs=1 skip=N count=M`
|
||||
|
||||
## Data Flow: Extraction (Unpacking)
|
||||
|
||||
### Kotlin Path (Primary)
|
||||
|
||||
```kotlin
|
||||
// 1. Read archive bytes
|
||||
val archive = File(path).readBytes()
|
||||
|
||||
// 2. De-obfuscate: recover index block location
|
||||
val indexOffset = deobfuscateHeader(archive)
|
||||
|
||||
// 3. Read index block -> get file table chunk offsets
|
||||
val index = parseIndex(archive, indexOffset)
|
||||
|
||||
// 4. Reassemble and decrypt file table
|
||||
val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV)
|
||||
|
||||
// 5. For each file entry in table:
|
||||
for (entry in fileTable.entries) {
|
||||
val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize)
|
||||
verifyHmac(ciphertext, entry.iv, entry.hmac, KEY)
|
||||
val compressed = decryptAesCbc(ciphertext, KEY, entry.iv)
|
||||
val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
|
||||
writeFile(outputDir, entry.filename, original)
|
||||
}
|
||||
```
|
||||
|
||||
**Kotlin compression:** Using gzip (`java.util.zip.GZIPInputStream`) which is built into Android SDK. No native libraries needed.
|
||||
|
||||
### Shell Path (Fallback)
|
||||
|
||||
```sh
|
||||
#!/bin/sh
|
||||
# Hardcoded values
|
||||
KEY_HEX="abcdef0123456789..." # 64 hex chars = 32 bytes
|
||||
XOR_KEY_HEX="deadbeef"
|
||||
|
||||
ARCHIVE="$1"
|
||||
OUTDIR="$2"
|
||||
|
||||
# 1. De-obfuscate header: read first 8 bytes, XOR to get real magic
|
||||
MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p)
|
||||
# ... validate XOR pattern ...
|
||||
|
||||
# 2. Find index block offset (bytes 8-11, little-endian)
|
||||
INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p)
|
||||
# Convert LE hex to decimal
|
||||
INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \
|
||||
sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')")
|
||||
|
||||
# 3. Read index block, parse file table chunk offsets
|
||||
# ... dd + xxd to extract offsets ...
|
||||
|
||||
# 4. For each file: extract ciphertext, decrypt, decompress
|
||||
dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
|
||||
openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \
|
||||
gunzip > "$OUTDIR/$FILENAME"
|
||||
|
||||
# 5. Verify HMAC
|
||||
COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
|
||||
openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}')
|
||||
```
|
||||
|
||||
**Shell limitations that constrain the entire format design:**
|
||||
- `dd` reads are byte-precise but slow for large files with bs=1
|
||||
- `xxd` handles hex conversion but no binary arithmetic
|
||||
- `openssl` in busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO)
|
||||
- HMAC verification via `openssl dgst -sha256 -hmac` (available in most busybox builds)
|
||||
- Integer arithmetic limited to shell `$(( ))` -- handles 64-bit on most platforms
|
||||
- **Endianness:** all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing)
|
||||
|
||||
## Patterns to Follow
|
||||
|
||||
### Pattern 1: Pipeline Architecture (Archiver)
|
||||
|
||||
**What:** Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others.
|
||||
|
||||
**When:** Always. This is the core design pattern.
|
||||
|
||||
**Why:** Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job).
|
||||
|
||||
```rust
|
||||
// Each stage is a function or module with typed input/output
|
||||
mod collect; // Vec<PathBuf> -> Vec<FileEntry>
|
||||
mod compress; // Vec<FileEntry> -> Vec<CompressedEntry>
|
||||
mod encrypt; // Vec<CompressedEntry> -> Vec<EncryptedEntry>
|
||||
mod format; // Vec<EncryptedEntry> -> RawArchive (unobfuscated bytes)
|
||||
mod obfuscate; // RawArchive -> Vec<u8> (final obfuscated bytes)
|
||||
|
||||
// Main pipeline
|
||||
pub fn create_archive(paths: Vec<PathBuf>, key: &[u8; 32]) -> Result<Vec<u8>> {
|
||||
let files = collect::gather(paths)?;
|
||||
let compressed = compress::compress_all(files)?;
|
||||
let encrypted = encrypt::encrypt_all(compressed, key)?;
|
||||
let raw = format::build(encrypted)?;
|
||||
let obfuscated = obfuscate::apply(raw)?;
|
||||
Ok(obfuscated)
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Format Version Field
|
||||
|
||||
**What:** Include a format version byte in the archive header (post-deobfuscation). Start at version 1.
|
||||
|
||||
**When:** Always. Format will evolve.
|
||||
|
||||
**Why:** Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output.
|
||||
|
||||
### Pattern 3: Per-File Independence
|
||||
|
||||
**What:** Each file in the archive is compressed and encrypted independently with its own IV and HMAC.
|
||||
|
||||
**When:** Always.
|
||||
|
||||
**Why:**
|
||||
- Shell decoder can extract a single file without processing the entire archive
|
||||
- A corruption in one file does not cascade to others
|
||||
- Memory usage is bounded by the largest single file, not the archive total
|
||||
|
||||
### Pattern 4: Shared Format Specification as Source of Truth
|
||||
|
||||
**What:** A single document defines every byte of the format. All three implementations are derived from this spec.
|
||||
|
||||
**When:** Before writing any code.
|
||||
|
||||
**Why:** With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption.
|
||||
|
||||
### Pattern 5: Encrypt-then-MAC
|
||||
|
||||
**What:** Apply HMAC after encryption, computed over (IV || ciphertext).
|
||||
|
||||
**When:** Always. Non-negotiable for CBC mode.
|
||||
|
||||
**Why:** CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms.
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
| Anti-Pattern | Why Bad | Instead |
|
||||
|-------------|---------|---------|
|
||||
| Streaming/Chunked Encryption | Shell can't seek into stream cipher | Encrypt each file independently |
|
||||
| Complex Obfuscation | Can't implement in busybox shell | XOR + fixed offsets + decoy padding |
|
||||
| Obfuscation as Security | Trivially reversible from source code | Encryption = security, obfuscation = anti-detection |
|
||||
| GCM Mode | busybox openssl doesn't support it | AES-256-CBC + HMAC-SHA256 |
|
||||
| zstd/lz4 Compression | No busybox/Android SDK support | gzip (DEFLATE) |
|
||||
| MAC-then-Encrypt | Padding oracle attacks possible | Encrypt-then-MAC |
|
||||
|
||||
## Suggested Build Order
|
||||
|
||||
```
|
||||
Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF
|
||||
|
|
||||
v
|
||||
Phase 2: RUST ARCHIVER (core pipeline)
|
||||
|
|
||||
v
|
||||
Phase 3: RUST ROUND-TRIP TEST DECODER
|
||||
|
|
||||
v
|
||||
Phase 4: KOTLIN DECODER
|
||||
|
|
||||
v
|
||||
Phase 5: SHELL DECODER
|
||||
|
|
||||
v
|
||||
Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING
|
||||
```
|
||||
|
||||
**Why this order:**
|
||||
|
||||
1. **Format spec first** -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code.
|
||||
2. **Rust archiver before decoders** -- need archives to test decoders against.
|
||||
3. **Rust test decoder before Kotlin/shell** -- catches format bugs in same language, avoids cross-language debugging.
|
||||
4. **Kotlin before shell** -- primary path first; if Kotlin works, format is validated.
|
||||
5. **Obfuscation hardening last** -- core pipeline must work first. Obfuscation is a layer on top.
|
||||
|
||||
## Key Architectural Decisions Summary
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Compression | gzip (DEFLATE) via `flate2` | Native on all three platforms |
|
||||
| Encryption | AES-256-CBC | busybox openssl supports CBC; GCM not available |
|
||||
| Authentication | HMAC-SHA256 (encrypt-then-MAC) | Authenticated encryption for CBC; verifiable everywhere |
|
||||
| Byte order | Little-endian | ARM native order; simpler shell parsing |
|
||||
| File processing | Per-file independent | Shell needs random access; bounded memory; fault isolation |
|
||||
| Obfuscation | XOR headers + scattered blocks + decoy padding | Simple enough for shell; defeats binwalk/file |
|
||||
| Format contract | Standalone spec document written first | Three implementations need byte-exact agreement |
|
||||
| Key storage | Hardcoded 32-byte key in all decoders | Per requirements; sufficient for casual user threat model |
|
||||
| PKCS7 padding | Standard PKCS7 for CBC mode | openssl uses PKCS7 by default; Kotlin supports natively |
|
||||
|
||||
## Sources
|
||||
|
||||
- Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg)
|
||||
- busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported
|
||||
- Android SDK javax.crypto and java.util.zip documentation
|
||||
- Rust RustCrypto ecosystem: `flate2`, `aes`, `cbc`, `hmac`, `sha2`
|
||||
- Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard
|
||||
|
||||
**Verification needed:** Run `busybox openssl enc -ciphers` on target device to confirm aes-256-cbc availability.
|
||||
Reference in New Issue
Block a user