docs: add project research

This commit is contained in:
NikitolProject
2026-02-24 22:51:05 +03:00
parent 914d88458a
commit 40dcfd4ac0
5 changed files with 841 additions and 0 deletions

View File

@@ -0,0 +1,378 @@
# Architecture Patterns
**Domain:** Custom encrypted archiver with obfuscated binary format
**Researched:** 2026-02-24
## Recommended Architecture
The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail.
### High-Level Overview
```
+-----------------+
| FORMAT SPEC |
| (shared doc) |
+--------+--------+
|
+------------------+------------------+
| | |
+---------v---------+ +----v------+ +--------v--------+
| RUST ARCHIVER | | KOTLIN | | BUSYBOX SHELL |
| (CLI, Linux/Mac) | | DECODER | | DECODER |
| | | (Android)| | (fallback) |
+-------------------+ +-----------+ +-----------------+
```
### Component Boundaries
| Component | Responsibility | Communicates With | Language |
|-----------|---------------|-------------------|----------|
| **Format Spec** | Defines binary layout, magic bytes strategy, block structure, obfuscation scheme | All three implementations reference this | Documentation |
| **Rust Archiver CLI** | Reads input files, compresses, encrypts, obfuscates, writes archive | Filesystem (input files, output archive) | Rust |
| **Kotlin Decoder** | Reads archive, de-obfuscates, decrypts, decompresses, writes output files | Android filesystem, embedded key | Kotlin |
| **Shell Decoder** | Same as Kotlin but via busybox commands | busybox (dd, xxd, openssl), filesystem | Shell (sh) |
| **Test Harness** | Round-trip validation: archive -> decode -> compare | All three components | Rust + shell scripts |
### Internal Component Structure (Rust Archiver)
The archiver itself has a clear pipeline architecture with five layers:
```
Input Files
|
v
+-------------------+
| FILE COLLECTOR | Walks paths, reads files, captures metadata
+-------------------+
|
v
+-------------------+
| COMPRESSOR | gzip (DEFLATE) per-file compression
+-------------------+
|
v
+-------------------+
| ENCRYPTOR | AES-256-CBC + HMAC-SHA256 per-file
+-------------------+
|
v
+-------------------+
| FORMAT BUILDER | Assembles binary structure: header, TOC, data blocks
+-------------------+
|
v
+-------------------+
| OBFUSCATOR | Shuffles blocks, inserts decoys, transforms magic bytes
+-------------------+
|
v
Output Archive File
```
## Data Flow: Archival (Packing)
### Step 1: File Collection
```
for each input_path:
read file bytes
record: filename, original_size, file_type_hint
-> Vec<FileEntry { name, data, metadata }>
```
### Step 2: Compression (per-file)
Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory.
```
for each FileEntry:
compressed_data = gzip_compress(data)
record: compressed_size
-> Vec<CompressedEntry { name, compressed_data, original_size, compressed_size }>
```
**Why compress before encrypt:** Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice.
### Step 3: Encryption (per-file)
Each compressed file is encrypted independently with a unique IV.
```
for each CompressedEntry:
iv = random_16_bytes() // unique per file, AES block size
ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data))
hmac = hmac_sha256(key, iv || ciphertext) // encrypt-then-MAC
-> Vec<EncryptedEntry { name, iv, ciphertext, hmac, sizes... }>
```
**Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.**
Use **AES-256-CBC + HMAC-SHA256** because:
- busybox `openssl` supports `aes-256-cbc` natively (GCM is NOT available in busybox openssl)
- Android/Kotlin `javax.crypto` supports AES-256-CBC natively
- Rust RustCrypto crates (`aes`, `cbc`, `hmac`) support it fully
- Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions)
- ChaCha20 would require custom implementation for shell fallback
- GCM would require custom implementation for shell fallback
**Encrypt-then-MAC pattern:** HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks.
### Step 4: Format Assembly
The format builder creates the binary layout:
```
+----------------------------------------------------------+
| OBFUSCATED HEADER (variable, see Step 5) |
+----------------------------------------------------------+
| FILE TABLE (encrypted) |
| - number_of_files: u32 |
| - for each file: |
| filename_len: u16 |
| filename: [u8; filename_len] |
| original_size: u64 |
| compressed_size: u64 |
| encrypted_size: u64 |
| data_offset: u64 |
| iv: [u8; 16] |
| hmac: [u8; 32] |
+----------------------------------------------------------+
| DATA BLOCKS |
| [encrypted_file_1_data] |
| [encrypted_file_2_data] |
| ... |
+----------------------------------------------------------+
```
**The file table itself is encrypted** with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes.
### Step 5: Obfuscation
The obfuscation layer transforms the assembled binary to resist pattern analysis:
1. **No standard magic bytes** -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes
2. **Decoy padding** -- insert random-length garbage blocks between real data blocks
3. **Header scatter** -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are
4. **Byte-level transforms** -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random)
```
FINAL BINARY LAYOUT:
[fake_magic: 8 bytes] <- XOR'd known pattern
[decoy_block: random 32-512 bytes]
[index_locator: 4 bytes at offset derived from fake_magic]
[data_block_1]
[file_table_chunk_1]
[decoy_block]
[data_block_2]
[file_table_chunk_2]
[data_block_3]
...
[index_block] <- lists offsets of file_table_chunks and data_blocks
[trailing_garbage: random 0-256 bytes]
```
**Important:** The obfuscation MUST be simple enough to implement in a shell script with `dd` and `xxd`. Anything requiring bit manipulation beyond XOR is too complex. Keep it to:
- Fixed XOR key for header regions (hardcoded in all three decoders)
- Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file")
- Sequential reads with `dd bs=1 skip=N count=M`
## Data Flow: Extraction (Unpacking)
### Kotlin Path (Primary)
```kotlin
// 1. Read archive bytes
val archive = File(path).readBytes()
// 2. De-obfuscate: recover index block location
val indexOffset = deobfuscateHeader(archive)
// 3. Read index block -> get file table chunk offsets
val index = parseIndex(archive, indexOffset)
// 4. Reassemble and decrypt file table
val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV)
// 5. For each file entry in table:
for (entry in fileTable.entries) {
val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize)
verifyHmac(ciphertext, entry.iv, entry.hmac, KEY)
val compressed = decryptAesCbc(ciphertext, KEY, entry.iv)
val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
writeFile(outputDir, entry.filename, original)
}
```
**Kotlin compression:** Using gzip (`java.util.zip.GZIPInputStream`) which is built into Android SDK. No native libraries needed.
### Shell Path (Fallback)
```sh
#!/bin/sh
# Hardcoded values
KEY_HEX="abcdef0123456789..." # 64 hex chars = 32 bytes
XOR_KEY_HEX="deadbeef"
ARCHIVE="$1"
OUTDIR="$2"
# 1. De-obfuscate header: read first 8 bytes, XOR to get real magic
MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p)
# ... validate XOR pattern ...
# 2. Find index block offset (bytes 8-11, little-endian)
INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p)
# Convert LE hex to decimal
INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \
sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')")
# 3. Read index block, parse file table chunk offsets
# ... dd + xxd to extract offsets ...
# 4. For each file: extract ciphertext, decrypt, decompress
dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \
gunzip > "$OUTDIR/$FILENAME"
# 5. Verify HMAC
COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}')
```
**Shell limitations that constrain the entire format design:**
- `dd` reads are byte-precise but slow for large files with bs=1
- `xxd` handles hex conversion but no binary arithmetic
- `openssl` in busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO)
- HMAC verification via `openssl dgst -sha256 -hmac` (available in most busybox builds)
- Integer arithmetic limited to shell `$(( ))` -- handles 64-bit on most platforms
- **Endianness:** all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing)
## Patterns to Follow
### Pattern 1: Pipeline Architecture (Archiver)
**What:** Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others.
**When:** Always. This is the core design pattern.
**Why:** Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job).
```rust
// Each stage is a function or module with typed input/output
mod collect; // Vec<PathBuf> -> Vec<FileEntry>
mod compress; // Vec<FileEntry> -> Vec<CompressedEntry>
mod encrypt; // Vec<CompressedEntry> -> Vec<EncryptedEntry>
mod format; // Vec<EncryptedEntry> -> RawArchive (unobfuscated bytes)
mod obfuscate; // RawArchive -> Vec<u8> (final obfuscated bytes)
// Main pipeline
pub fn create_archive(paths: Vec<PathBuf>, key: &[u8; 32]) -> Result<Vec<u8>> {
let files = collect::gather(paths)?;
let compressed = compress::compress_all(files)?;
let encrypted = encrypt::encrypt_all(compressed, key)?;
let raw = format::build(encrypted)?;
let obfuscated = obfuscate::apply(raw)?;
Ok(obfuscated)
}
```
### Pattern 2: Format Version Field
**What:** Include a format version byte in the archive header (post-deobfuscation). Start at version 1.
**When:** Always. Format will evolve.
**Why:** Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output.
### Pattern 3: Per-File Independence
**What:** Each file in the archive is compressed and encrypted independently with its own IV and HMAC.
**When:** Always.
**Why:**
- Shell decoder can extract a single file without processing the entire archive
- A corruption in one file does not cascade to others
- Memory usage is bounded by the largest single file, not the archive total
### Pattern 4: Shared Format Specification as Source of Truth
**What:** A single document defines every byte of the format. All three implementations are derived from this spec.
**When:** Before writing any code.
**Why:** With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption.
### Pattern 5: Encrypt-then-MAC
**What:** Apply HMAC after encryption, computed over (IV || ciphertext).
**When:** Always. Non-negotiable for CBC mode.
**Why:** CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms.
## Anti-Patterns to Avoid
| Anti-Pattern | Why Bad | Instead |
|-------------|---------|---------|
| Streaming/Chunked Encryption | Shell can't seek into stream cipher | Encrypt each file independently |
| Complex Obfuscation | Can't implement in busybox shell | XOR + fixed offsets + decoy padding |
| Obfuscation as Security | Trivially reversible from source code | Encryption = security, obfuscation = anti-detection |
| GCM Mode | busybox openssl doesn't support it | AES-256-CBC + HMAC-SHA256 |
| zstd/lz4 Compression | No busybox/Android SDK support | gzip (DEFLATE) |
| MAC-then-Encrypt | Padding oracle attacks possible | Encrypt-then-MAC |
## Suggested Build Order
```
Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF
|
v
Phase 2: RUST ARCHIVER (core pipeline)
|
v
Phase 3: RUST ROUND-TRIP TEST DECODER
|
v
Phase 4: KOTLIN DECODER
|
v
Phase 5: SHELL DECODER
|
v
Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING
```
**Why this order:**
1. **Format spec first** -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code.
2. **Rust archiver before decoders** -- need archives to test decoders against.
3. **Rust test decoder before Kotlin/shell** -- catches format bugs in same language, avoids cross-language debugging.
4. **Kotlin before shell** -- primary path first; if Kotlin works, format is validated.
5. **Obfuscation hardening last** -- core pipeline must work first. Obfuscation is a layer on top.
## Key Architectural Decisions Summary
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Compression | gzip (DEFLATE) via `flate2` | Native on all three platforms |
| Encryption | AES-256-CBC | busybox openssl supports CBC; GCM not available |
| Authentication | HMAC-SHA256 (encrypt-then-MAC) | Authenticated encryption for CBC; verifiable everywhere |
| Byte order | Little-endian | ARM native order; simpler shell parsing |
| File processing | Per-file independent | Shell needs random access; bounded memory; fault isolation |
| Obfuscation | XOR headers + scattered blocks + decoy padding | Simple enough for shell; defeats binwalk/file |
| Format contract | Standalone spec document written first | Three implementations need byte-exact agreement |
| Key storage | Hardcoded 32-byte key in all decoders | Per requirements; sufficient for casual user threat model |
| PKCS7 padding | Standard PKCS7 for CBC mode | openssl uses PKCS7 by default; Kotlin supports natively |
## Sources
- Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg)
- busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported
- Android SDK javax.crypto and java.util.zip documentation
- Rust RustCrypto ecosystem: `flate2`, `aes`, `cbc`, `hmac`, `sha2`
- Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard
**Verification needed:** Run `busybox openssl enc -ciphers` on target device to confirm aes-256-cbc availability.

View File

@@ -0,0 +1,124 @@
# Feature Landscape
**Domain:** Custom encrypted archiver with proprietary binary format
**Researched:** 2026-02-24
**Confidence:** MEDIUM (based on domain knowledge of archive formats, encryption patterns, and Android constraints)
## Table Stakes
Features that are mandatory for this product to function correctly. Missing any of these means the archive is either non-functional, insecure against casual inspection, or unreliable.
| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| Multi-file packing/unpacking | Core purpose: bundle texts + APKs into one archive | Medium | Need file table/index structure; handle varied sizes from few KB to tens of MB |
| AES-256-CBC encryption | Without real encryption, any hex editor reveals content | Medium | busybox openssl supports `aes-256-cbc`; Android javax.crypto supports AES natively |
| HMAC-SHA256 integrity (encrypt-then-MAC) | Detect corruption and tampering | Medium | busybox `openssl dgst -sha256 -hmac`; Kotlin `Mac("HmacSHA256")` |
| Compression before encryption | Reduce archive size; compression after encryption is ineffective (encrypted data has max entropy) | Low | Use deflate/gzip; must compress BEFORE encrypt |
| Hardcoded key embedding | Project requirement: no user-entered passwords, key baked into dearchiver code | Low | Key in Kotlin code and shell script; rotate key means new build |
| Custom magic bytes | Standard magic bytes (PK, 7z, etc.) let file/binwalk identify format; custom bytes prevent this | Low | Use random-looking bytes, not human-readable strings; avoid patterns that match known formats |
| Round-trip fidelity | Unpacked files must be byte-identical to originals | Low | Verified via checksums; critical for APKs (signature breaks if single byte changes) |
| CLI interface for packing | Archiver runs on Linux/macOS developer machine | Low | Standard CLI: `encrypted_archive pack -o output.bin file1.txt file2.apk` |
| Kotlin unpacker (Android 13) | Primary dearchiver path on target device | High | Pure JVM, no native libs; must handle javax.crypto for AES |
| Busybox shell unpacker (fallback) | Backup when Kotlin app unavailable | High | Only dd, xxd, openssl, sh; format must be simple enough for positional extraction |
| File metadata preservation (name, size) | Unpacker must know which bytes belong to which file and what to name them | Low | Stored in file table; at minimum: filename, original size, compressed size, offset |
## Differentiators
Features that exceed baseline expectations and provide meaningful protection or usability improvements. Not all are needed for MVP, but they strengthen the product.
| Feature | Value Proposition | Complexity | Notes |
|---------|-------------------|------------|-------|
| Format obfuscation: fake headers | Misleads automated analysis tools (binwalk, foremost) that scan for patterns | Medium | Insert decoy headers resembling JPEG, PNG, or random formats at predictable offsets; casual user sees "corrupted image" not "encrypted archive" |
| Format obfuscation: shuffled blocks | File data blocks stored out-of-order with a scramble map | Medium | Prevents sequential extraction even if encryption is somehow bypassed; adds complexity to busybox unpacker |
| Format obfuscation: randomized padding | Variable-length random padding between blocks | Low | Makes block boundaries unpredictable to static analysis; minimal implementation cost |
| Version field in header | Forward-compatible format evolution | Low | Single byte version; unpackers check version and reject incompatible archives gracefully |
| Per-file encryption with derived keys | Each file encrypted with unique key derived from master key + file index/salt | Medium | Limits damage if one file's plaintext is known (known-plaintext attack on specific block) |
| Progress reporting during pack/unpack | UX for large archives (tens of MB of APKs) | Low | CLI progress bar; Kotlin callback for UI integration |
| Dry-run / validation mode | Check archive integrity without full extraction | Low | Verify checksums and structure without writing files to disk; useful for debugging on device |
| Configurable compression level | Trade speed vs size for different content types (APKs are already compressed, texts compress well) | Low | APKs benefit little from compression; allow per-file or global setting |
| Salt / IV per archive | Each archive uses random IV/nonce even with same key; prevents identical plaintext producing identical ciphertext | Low | Standard crypto practice; 16-byte IV for AES-CBC; must be stored in archive header (unencrypted) |
| Error messages that do not leak format info | Unpacker errors say "invalid archive" not "checksum mismatch at block 3 offset 0x4A2" | Low | Defense in depth: even error messages should not help reverse engineering |
## Anti-Features
Features to explicitly NOT build. Each would add complexity without matching the project's threat model or constraints.
| Anti-Feature | Why Avoid | What to Do Instead |
|--------------|-----------|-------------------|
| Password-based key derivation (PBKDF2/Argon2) | Project explicitly uses hardcoded key; password entry UX is unwanted on car head unit | Embed key directly in Kotlin/shell code; accept that key extraction from APK is possible for determined attackers |
| GUI for archiver | Scope creep; CLI is sufficient for developer workflow (pack on laptop, deploy to device) | Well-designed CLI with clear flags and help text |
| Windows archiver support | Out of scope per project constraints; Rust cross-compiles easily IF needed later | Linux/macOS only; document that WSL works if Windows user needs it |
| Streaming/pipe support | Files are small enough (KB to tens of MB) to fit in memory; streaming adds format complexity that breaks busybox compatibility | Load entire file into memory for pack/unpack; document max file size assumption |
| Nested/recursive archives | No use case: archive contains flat list of texts and APKs | Single-level file list only |
| File permissions / ownership metadata | Android target manages its own permissions; Unix permissions from build machine are irrelevant | Store only filename and size; ignore mode/owner/timestamps |
| Compression algorithm selection at runtime | Over-engineering; one good default is sufficient | Use deflate/gzip -- available everywhere: Rust, Kotlin, busybox; hardcode the choice |
| Public-key / asymmetric encryption | Massive complexity increase for no benefit given hardcoded key model | Symmetric encryption only (AES-256) |
| Self-extracting archives | Target is Android, not desktop; shell script IS the extractor | Separate archive file + separate unpacker (Kotlin app or shell script) |
| DRM or license enforcement | Not the purpose; this is content bundling protection, not DRM | Simple encryption is sufficient for the threat model |
| File deduplication within archive | Archive contains distinct files (texts and different APKs); dedup adds complexity with near-zero benefit | Pack files as-is |
| Encryption of filenames in file table | Nice in theory but busybox shell unpacker needs to know filenames to extract; encrypting the file table massively complicates the shell path | Store filenames inside the encrypted payload (entire payload is encrypted, so filenames are protected by archive-level encryption) |
## Feature Dependencies
```
Compression --> Encryption --> Format Assembly (compression MUST happen before encryption)
|
v
Integrity Checks (HMAC over encrypted blocks)
Custom Magic Bytes --> Format Header Design
Version Field --> Format Header Design
Salt/IV Storage --> Format Header Design
File Metadata (name, size) --> File Table Structure --> Format Assembly
Format Assembly --> CLI Packer (Rust)
Format Specification --> Kotlin Unpacker
Format Specification --> Busybox Shell Unpacker
Per-file Key Derivation --> Requires Format Specification to include file index/salt
Fake Headers --> Requires Format Assembly to insert decoys at correct positions
Shuffled Blocks --> Requires File Table to store block ordering map
```
**Critical dependency chain:**
```
Format Spec (on paper)
--> Rust Packer (implements spec)
--> Kotlin Unpacker (reads spec)
--> Shell Unpacker (reads spec)
--> Round-trip tests (validates all three agree)
```
The format specification must be finalized BEFORE any implementation begins, because three independent implementations (Rust, Kotlin, shell) must produce identical results.
## MVP Recommendation
**Prioritize (Phase 1 -- must ship):**
1. **Format specification document** -- Define header, file table, block layout, magic bytes, version field, IV/salt placement
2. **Compression + Encryption pipeline** -- Compress with gzip, encrypt with AES-256-CBC, authenticate with HMAC-SHA256
3. **Rust CLI packer** -- Pack multiple files into the custom format
4. **Integrity verification via HMAC-SHA256** -- Encrypt-then-MAC for both integrity and authenticity
5. **Kotlin unpacker** -- Primary extraction path on Android 13. Pure JVM using javax.crypto
6. **Busybox shell unpacker** -- Fallback extraction. This constrains the format to be simple
7. **Round-trip tests** -- Verify Rust-pack, Kotlin-unpack, shell-unpack all produce identical output
**Defer (Phase 2 -- after MVP works):**
- **Fake headers / decoy data** -- Obfuscation layer; adds no functional value, purely anti-analysis
- **Shuffled blocks** -- Significant complexity, especially for busybox
- **Progress reporting** -- Nice UX but not blocking
- **Configurable compression** -- Start with one setting that works; optimize later
- **Dry-run / validation mode** -- Useful for debugging but not for initial delivery
- **Per-file derived keys** -- Defense-in-depth for later
**Key MVP constraint:** The busybox shell unpacker is the most constraining component. Every format decision must be validated against "can busybox dd/xxd/openssl do this?" If the answer is no, the feature must be deferred or redesigned.
## Sources
- Domain knowledge of archive format design (ZIP, tar, 7z format specifications)
- Domain knowledge of cryptographic best practices (NIST, libsodium documentation patterns)
- Domain knowledge of Android crypto APIs (javax.crypto, OpenSSL CLI)
- Domain knowledge of busybox utility capabilities

View File

@@ -0,0 +1,89 @@
# Common Pitfalls
**Domain:** Custom encrypted archiver with busybox/Kotlin decompression
**Researched:** 2026-02-24
**Confidence:** HIGH
## Critical Pitfalls
### Pitfall 1: Busybox OpenSSL Cipher Availability
**What:** Target busybox may not have the chosen cipher. GCM, ChaCha20 are likely unavailable.
**Prevention:** Test `busybox openssl enc -ciphers` on actual device FIRST. Use AES-256-CBC (universally available).
**Phase:** Phase 1 (format design) — blocking decision.
### Pitfall 2: Endianness Mismatch Across Platforms
**What:** Inconsistent byte order between Rust (x86_64), Kotlin (ARM64), shell (xxd parsing).
**Prevention:** Use little-endian everywhere. Rust: `to_le_bytes()`. Kotlin: `ByteBuffer.order(LITTLE_ENDIAN)`. Document in format spec.
**Phase:** Phase 1 (format design).
### Pitfall 3: PKCS7 Padding Incompatibility
**What:** Different padding handling between Rust crates, javax.crypto, and busybox openssl causes last-block corruption.
**Prevention:** Store exact compressed-data length in header. Use `-nopad` in openssl and truncate manually, OR let openssl handle padding with `-K`/`-iv` flags. Test with non-16-byte-aligned data.
**Phase:** Phase 2 (encryption implementation).
### Pitfall 4: OpenSSL Key Derivation (EVP_BytesToKey vs Raw Key)
**What:** busybox `openssl enc` derives keys via EVP_BytesToKey by default. Rust/Kotlin use raw keys. Decryption produces garbage.
**Prevention:** Use `-K HEX -iv HEX -nosalt` flags for raw key mode. Test on target device FIRST. This is the #1 failure mode.
**Phase:** Phase 1 (format design) — blocking.
### Pitfall 5: Shell Arithmetic Overflow with Large Files
**What:** busybox `sh` arithmetic may be 32-bit signed, overflowing at 2GB offsets.
**Prevention:** Use `dd bs=4096` with block-count math. Limit archive size in spec. Test with >50MB archives.
**Phase:** Phase 1 (format) + Phase 5 (shell decoder).
### Pitfall 6: IV/Nonce Reuse with Hardcoded Key
**What:** Same key + same IV = identical ciphertext for identical files. Information leak.
**Prevention:** Random 16-byte IV per file, stored in cleartext alongside ciphertext. Never deterministic IV.
**Phase:** Phase 2 (encryption).
### Pitfall 7: Busybox xxd Behavioral Differences
**What:** busybox `xxd` may not support all GNU xxd flags. Key/IV hex conversion fails silently.
**Prevention:** Use only `xxd -p` and `xxd -r -p`. Test on actual device. Fallback: `od -A n -t x1`.
**Phase:** Phase 5 (shell decoder).
## Moderate Pitfalls
### Pitfall 8: APK Files Don't Compress
**What:** APKs are already ZIP-compressed. Gzip makes them larger.
**Prevention:** Per-file compression flag. Skip compression if output >= input.
**Phase:** Phase 2 (compression).
### Pitfall 9: Missing Integrity Verification
**What:** No checksums = silent data corruption from bit flips during transfer.
**Prevention:** SHA-256 checksum per file. Verify AFTER decompression in all three decoders.
**Phase:** Phase 1 (format) + all decoder phases.
### Pitfall 10: Over-Engineering Obfuscation
**What:** Complex obfuscation (block shuffling, fake headers) triples implementation complexity with minimal security gain against casual users.
**Prevention:** AES encryption IS obfuscation. Custom magic bytes + encrypted payload is sufficient. Keep it simple.
**Phase:** Phase 1 (format design).
### Pitfall 11: Rust/Kotlin Crypto Output Incompatibility
**What:** "Same algorithm" produces different bytes due to framing differences (IV placement, padding).
**Prevention:** Define wire format explicitly: `[16-byte IV][PKCS7-padded ciphertext]`. Golden test vectors mandatory.
**Phase:** Phase 2 (encryption).
### Pitfall 12: Android Filesystem Permissions
**What:** Extracted files land in wrong location or have wrong permissions on Android.
**Prevention:** Store only relative paths. Don't store Unix permissions. Let each decoder handle permissions.
**Phase:** Phase 4 (Kotlin decoder).
## Minor Pitfalls
- **#13:** Flush/sync writes in shell script — use `sync` after critical files
- **#14:** Missing version field — add 1-2 byte version after magic bytes (non-negotiable)
- **#15:** Testing only ASCII filenames — include Cyrillic test files
- **#16:** Hardcoded key visible in `strings` — store as byte array, split hex fragments in shell
## Key Insight
**The busybox shell constraint drives everything.** Every format decision must be validated against "can busybox sh + dd + xxd + openssl actually implement this?" Build shell decompressor prototype EARLY, not last.
## Phase Mapping
| Phase | Critical Pitfalls | Action |
|-------|------------------|--------|
| Format design | #1, #2, #4, #5, #10, #14 | Test busybox on device. Write format spec. Keep it simple. |
| Encryption | #3, #6, #8, #11 | Golden test vectors. Random IV. Per-file compression flag. |
| Kotlin decoder | #11, #12 | Explicit wire format. Test on device. |
| Shell decoder | #1, #4, #5, #7, #15 | Busybox compatibility suite. Large file tests. |

174
.planning/research/STACK.md Normal file
View File

@@ -0,0 +1,174 @@
# Technology Stack
**Project:** encrypted_archive (Custom Encrypted Archiver)
**Researched:** 2026-02-24
**Overall confidence:** MEDIUM (versions from training data — verify before use)
---
## Critical Constraint: Three-Platform Compatibility
Every technology choice is constrained by the weakest link: **busybox shell**. The Rust archiver can use any library, but the format it produces must be decodable by:
1. Kotlin on Android 13 (javax.crypto / java.security)
2. busybox shell (openssl, dd, xxd)
This constraint eliminates many otherwise-superior choices.
---
## Recommended Stack
### Encryption
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| `aes` crate | ^0.8 | AES-256-CBC block cipher | busybox openssl supports `aes-256-cbc` — the critical three-way constraint. Android javax.crypto supports AES natively. | HIGH |
| `cbc` crate | ^0.1 | CBC mode of operation | Standard mode: openssl CLI `aes-256-cbc`, javax.crypto `AES/CBC/PKCS5Padding` | HIGH |
| `hmac` + `sha2` | ^0.12 / ^0.10 | HMAC-SHA256 integrity | Encrypt-then-MAC. busybox `openssl dgst -sha256 -hmac`. Kotlin `Mac("HmacSHA256")` | HIGH |
**Why AES-256-CBC over AES-GCM:** busybox openssl does NOT support AEAD modes (GCM/CCM) in `openssl enc`. AES-CBC + HMAC-SHA256 (encrypt-then-MAC) provides equivalent security. CBC is the only AES mode reliably available across all three platforms.
**Why NOT ChaCha20-Poly1305:** busybox openssl does not support ChaCha20. Android javax.crypto has no standard ChaCha20 support. Would require native libraries, violating constraints.
**Why NOT aes-gcm crate:** Not decodable via `openssl enc` in busybox.
### Compression
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| `flate2` crate | ^1.0 | gzip compression | busybox `gunzip` works natively. Android `GZIPInputStream` works natively. Simplest cross-platform path. | HIGH |
**Why gzip-wrapped DEFLATE:** busybox `gunzip` handles it. Android `GZIPInputStream` handles it. flate2 `GzEncoder`/`GzDecoder` produces standard gzip.
**Why NOT zstd/lz4/brotli:** busybox has no decompressors for any of these.
### CLI Framework
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| `clap` | ^4 | CLI argument parsing | De facto Rust standard. Derive macros. Subcommands (`pack`/`unpack`/`inspect`). | HIGH |
### Binary Format
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| Manual byte-level I/O | N/A | Custom binary format | Using bincode/serde would make format recognizable by forensic tools. Manual bytes give full control for obfuscation. | HIGH |
### Hashing / Integrity
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| `sha2` crate | ^0.10 | SHA-256 checksums | busybox `sha256sum`, Android `MessageDigest("SHA-256")` | HIGH |
### Random / IV Generation
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| `rand` | ^0.8 | Random IV generation | CSPRNG for AES-CBC initialization vectors. Each archive gets unique IV. | HIGH |
### Error Handling
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| `anyhow` | ^1 | Application errors | CLI app, not library. Ergonomic error chains. | HIGH |
| `thiserror` | ^2 | Typed format errors | Specific errors for format validation, decryption, integrity. | MEDIUM |
### Testing
| Technology | Version | Purpose | Why | Confidence |
|------------|---------|---------|-----|------------|
| Built-in `#[test]` | N/A | Unit tests | Round-trip pack/unpack/compare | HIGH |
| `assert_cmd` | ^2 | CLI integration tests | Test actual binary | MEDIUM |
| `tempfile` | ^3 | Temp dirs | Clean test isolation | HIGH |
---
## Kotlin/Android Decompressor Stack (Zero External Dependencies)
| Technology | Source | Purpose |
|------------|--------|---------|
| `javax.crypto.Cipher` | Android SDK | AES-256-CBC: `Cipher.getInstance("AES/CBC/PKCS5Padding")` |
| `javax.crypto.spec.SecretKeySpec` | Android SDK | 32-byte hardcoded key |
| `javax.crypto.spec.IvParameterSpec` | Android SDK | IV from archive header |
| `javax.crypto.Mac` | Android SDK | HMAC-SHA256: `Mac.getInstance("HmacSHA256")` |
| `java.util.zip.GZIPInputStream` | Android SDK | Gzip decompression |
| `java.security.MessageDigest` | Android SDK | SHA-256 integrity |
| `java.nio.ByteBuffer` | Android SDK | Little-endian parsing |
---
## Busybox Shell Decompressor Stack
| Tool | Purpose | Key Flags |
|------|---------|-----------|
| `dd` | Extract byte ranges | `bs=1 skip=N count=M` |
| `xxd` | Hex encode/decode keys | `xxd -p` |
| `openssl enc` | AES-256-CBC decrypt | `-d -aes-256-cbc -K HEX -iv HEX -nosalt` |
| `openssl dgst` | HMAC-SHA256 verify | `-sha256 -hmac KEY -binary` |
| `gunzip` | Gzip decompress | Standard input/output |
| `sha256sum` | Integrity check | `-c checksums` |
**Critical:** busybox `openssl enc` uses EVP_BytesToKey by default. MUST pass `-K` (hex key) + `-iv` (hex IV) + `-nosalt` for raw key mode. IV must be in cleartext header.
---
## Cross-Platform Compatibility Matrix
| Rust | Android SDK | busybox | Notes |
|------|-------------|---------|-------|
| `aes`+`cbc` (PKCS7) | `Cipher("AES/CBC/PKCS5Padding")` | `openssl enc -aes-256-cbc` | PKCS5=PKCS7 for 16-byte blocks |
| `hmac`+`sha2` | `Mac("HmacSHA256")` | `openssl dgst -sha256 -hmac` | Raw key, not password |
| `flate2` (GzEncoder) | `GZIPInputStream` | `gunzip` | Standard gzip |
| `sha2` | `MessageDigest("SHA-256")` | `sha256sum` | Hex comparison |
---
## Alternatives Considered
| Category | Recommended | Alternative | Why Not |
|----------|-------------|-------------|---------|
| Encryption | AES-256-CBC + HMAC | AES-256-GCM | busybox openssl lacks GCM |
| Encryption | AES-256-CBC + HMAC | ChaCha20-Poly1305 | Not in busybox/Android SDK |
| Compression | flate2 (gzip) | zstd | No busybox decompressor |
| Compression | flate2 (gzip) | lz4 | No busybox decompressor |
| Format | Manual bytes | bincode/serde | Recognizable patterns |
| Crypto ecosystem | RustCrypto (aes+cbc) | ring | ring bundles C code |
| Crypto ecosystem | RustCrypto (aes+cbc) | openssl-rs | Unnecessary system dep |
---
## Cargo.toml
```toml
[package]
name = "encrypted_archive"
version = "0.1.0"
edition = "2021"
[dependencies]
aes = "0.8"
cbc = "0.1"
hmac = "0.12"
sha2 = "0.10"
flate2 = "1.0"
clap = { version = "4", features = ["derive"] }
rand = "0.8"
anyhow = "1"
thiserror = "2"
[dev-dependencies]
assert_cmd = "2"
tempfile = "3"
```
**WARNING: Versions from training data (cutoff May 2025). Verify with `cargo search CRATE --limit 1` before use.**
---
## Gaps Requiring Verification
1. Exact latest crate versions (could not verify via crates.io)
2. Confirm target busybox build includes `openssl` applet
3. Confirm `xxd` availability in target busybox (fallback: `od`)
4. Test PKCS7 padding round-trip across all three platforms
5. Test flate2 GzEncoder output with busybox `gunzip` and Android `GZIPInputStream`

View File

@@ -0,0 +1,76 @@
# Research Summary
**Project:** encrypted_archive
**Researched:** 2026-02-24
## Stack Decision
| Layer | Choice | Rationale |
|-------|--------|-----------|
| Language (archiver) | Rust | Memory safety, crypto ecosystem, cross-compilation |
| Encryption | AES-256-CBC + HMAC-SHA256 | Only cipher reliably available in busybox openssl + javax.crypto + RustCrypto |
| Compression | gzip (DEFLATE) via `flate2` | Native everywhere: Rust, Android GZIPInputStream, busybox gunzip |
| CLI | `clap` v4 | De facto Rust standard |
| Binary format | Manual byte I/O | Full control for obfuscation; no recognizable patterns |
| Kotlin decoder | javax.crypto + java.util.zip | Zero external dependencies, Android SDK built-in |
| Shell decoder | dd + xxd + openssl + gunzip | Standard busybox applets |
**Critical constraint:** busybox shell is the weakest link. Every technology choice is validated against "can busybox do this?"
## Table Stakes Features
1. Multi-file packing/unpacking (texts + APKs)
2. AES-256-CBC encryption with HMAC-SHA256 (encrypt-then-MAC)
3. Gzip compression before encryption
4. Custom magic bytes (not recognizable by binwalk/file/7z)
5. Hardcoded 32-byte key in all decoders
6. Per-file IV (random 16 bytes, stored in cleartext)
7. Round-trip fidelity (byte-identical decompression)
8. Kotlin decoder (primary, Android 13)
9. Shell decoder (fallback, busybox)
10. File metadata (name, sizes, offsets)
11. Data integrity (SHA-256 checksums per file)
## Architecture
Three independent implementations unified by a shared format specification:
```
FORMAT SPEC (shared document)
|
+-- Rust Archiver (CLI, Linux/macOS)
+-- Kotlin Decoder (Android 13, primary)
+-- Shell Decoder (busybox, fallback)
```
**Rust archiver pipeline:** collect → compress → encrypt → format → obfuscate
**Key patterns:**
- Per-file independence (each file compressed/encrypted separately)
- Encrypt-then-MAC (HMAC over IV || ciphertext)
- Little-endian everywhere
- Format version field for forward compatibility
## Top Pitfalls to Prevent
1. **busybox openssl cipher availability** — test on actual device before format design
2. **OpenSSL key derivation mismatch** — use `-K HEX -iv HEX -nosalt` for raw keys
3. **Cross-platform crypto incompatibility** — golden test vectors mandatory
4. **Over-engineering obfuscation** — AES encryption IS obfuscation for casual users
5. **APKs don't compress** — per-file compression flag needed
## Recommended Build Order
1. **Format spec + busybox feasibility proof** — validate constraints first
2. **Rust archiver** — core pipeline (compress → encrypt → format)
3. **Rust test decoder** — catch format bugs in same language
4. **Kotlin decoder** — primary extraction path
5. **Shell decoder** — busybox fallback
6. **Obfuscation hardening + integration testing** — binwalk/file/strings testing
## Open Questions
- Does target busybox have `openssl enc -aes-256-cbc` with `-K`/`-iv` flags?
- Is `xxd` available in target busybox? (fallback: `od`)
- Is `gunzip` available in target busybox?
- Should HMAC use same key as AES or derived subkey?