Files
2026-02-24 22:51:05 +03:00

15 KiB

Architecture Patterns

Domain: Custom encrypted archiver with obfuscated binary format Researched: 2026-02-24

The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail.

High-Level Overview

                        +-----------------+
                        |  FORMAT SPEC    |
                        |  (shared doc)   |
                        +--------+--------+
                                 |
              +------------------+------------------+
              |                  |                  |
    +---------v---------+  +----v------+  +--------v--------+
    |  RUST ARCHIVER    |  |  KOTLIN   |  |  BUSYBOX SHELL  |
    |  (CLI, Linux/Mac) |  |  DECODER  |  |  DECODER        |
    |                   |  |  (Android)|  |  (fallback)     |
    +-------------------+  +-----------+  +-----------------+

Component Boundaries

Component Responsibility Communicates With Language
Format Spec Defines binary layout, magic bytes strategy, block structure, obfuscation scheme All three implementations reference this Documentation
Rust Archiver CLI Reads input files, compresses, encrypts, obfuscates, writes archive Filesystem (input files, output archive) Rust
Kotlin Decoder Reads archive, de-obfuscates, decrypts, decompresses, writes output files Android filesystem, embedded key Kotlin
Shell Decoder Same as Kotlin but via busybox commands busybox (dd, xxd, openssl), filesystem Shell (sh)
Test Harness Round-trip validation: archive -> decode -> compare All three components Rust + shell scripts

Internal Component Structure (Rust Archiver)

The archiver itself has a clear pipeline architecture with five layers:

Input Files
    |
    v
+-------------------+
| FILE COLLECTOR    |  Walks paths, reads files, captures metadata
+-------------------+
    |
    v
+-------------------+
| COMPRESSOR        |  gzip (DEFLATE) per-file compression
+-------------------+
    |
    v
+-------------------+
| ENCRYPTOR         |  AES-256-CBC + HMAC-SHA256 per-file
+-------------------+
    |
    v
+-------------------+
| FORMAT BUILDER    |  Assembles binary structure: header, TOC, data blocks
+-------------------+
    |
    v
+-------------------+
| OBFUSCATOR        |  Shuffles blocks, inserts decoys, transforms magic bytes
+-------------------+
    |
    v
Output Archive File

Data Flow: Archival (Packing)

Step 1: File Collection

for each input_path:
    read file bytes
    record: filename, original_size, file_type_hint
    -> Vec<FileEntry { name, data, metadata }>

Step 2: Compression (per-file)

Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory.

for each FileEntry:
    compressed_data = gzip_compress(data)
    record: compressed_size
    -> Vec<CompressedEntry { name, compressed_data, original_size, compressed_size }>

Why compress before encrypt: Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice.

Step 3: Encryption (per-file)

Each compressed file is encrypted independently with a unique IV.

for each CompressedEntry:
    iv = random_16_bytes()  // unique per file, AES block size
    ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data))
    hmac = hmac_sha256(key, iv || ciphertext)  // encrypt-then-MAC
    -> Vec<EncryptedEntry { name, iv, ciphertext, hmac, sizes... }>

Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.

Use AES-256-CBC + HMAC-SHA256 because:

  • busybox openssl supports aes-256-cbc natively (GCM is NOT available in busybox openssl)
  • Android/Kotlin javax.crypto supports AES-256-CBC natively
  • Rust RustCrypto crates (aes, cbc, hmac) support it fully
  • Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions)
  • ChaCha20 would require custom implementation for shell fallback
  • GCM would require custom implementation for shell fallback

Encrypt-then-MAC pattern: HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks.

Step 4: Format Assembly

The format builder creates the binary layout:

+----------------------------------------------------------+
| OBFUSCATED HEADER (variable, see Step 5)                 |
+----------------------------------------------------------+
| FILE TABLE (encrypted)                                   |
|   - number_of_files: u32                                 |
|   - for each file:                                       |
|       filename_len: u16                                  |
|       filename: [u8; filename_len]                       |
|       original_size: u64                                 |
|       compressed_size: u64                               |
|       encrypted_size: u64                                |
|       data_offset: u64                                   |
|       iv: [u8; 16]                                       |
|       hmac: [u8; 32]                                     |
+----------------------------------------------------------+
| DATA BLOCKS                                              |
|   [encrypted_file_1_data]                                |
|   [encrypted_file_2_data]                                |
|   ...                                                    |
+----------------------------------------------------------+

The file table itself is encrypted with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes.

Step 5: Obfuscation

The obfuscation layer transforms the assembled binary to resist pattern analysis:

  1. No standard magic bytes -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes
  2. Decoy padding -- insert random-length garbage blocks between real data blocks
  3. Header scatter -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are
  4. Byte-level transforms -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random)
FINAL BINARY LAYOUT:

[fake_magic: 8 bytes]            <- XOR'd known pattern
[decoy_block: random 32-512 bytes]
[index_locator: 4 bytes at offset derived from fake_magic]
[data_block_1]
[file_table_chunk_1]
[decoy_block]
[data_block_2]
[file_table_chunk_2]
[data_block_3]
...
[index_block]                    <- lists offsets of file_table_chunks and data_blocks
[trailing_garbage: random 0-256 bytes]

Important: The obfuscation MUST be simple enough to implement in a shell script with dd and xxd. Anything requiring bit manipulation beyond XOR is too complex. Keep it to:

  • Fixed XOR key for header regions (hardcoded in all three decoders)
  • Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file")
  • Sequential reads with dd bs=1 skip=N count=M

Data Flow: Extraction (Unpacking)

Kotlin Path (Primary)

// 1. Read archive bytes
val archive = File(path).readBytes()

// 2. De-obfuscate: recover index block location
val indexOffset = deobfuscateHeader(archive)

// 3. Read index block -> get file table chunk offsets
val index = parseIndex(archive, indexOffset)

// 4. Reassemble and decrypt file table
val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV)

// 5. For each file entry in table:
for (entry in fileTable.entries) {
    val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize)
    verifyHmac(ciphertext, entry.iv, entry.hmac, KEY)
    val compressed = decryptAesCbc(ciphertext, KEY, entry.iv)
    val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
    writeFile(outputDir, entry.filename, original)
}

Kotlin compression: Using gzip (java.util.zip.GZIPInputStream) which is built into Android SDK. No native libraries needed.

Shell Path (Fallback)

#!/bin/sh
# Hardcoded values
KEY_HEX="abcdef0123456789..."   # 64 hex chars = 32 bytes
XOR_KEY_HEX="deadbeef"

ARCHIVE="$1"
OUTDIR="$2"

# 1. De-obfuscate header: read first 8 bytes, XOR to get real magic
MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p)
# ... validate XOR pattern ...

# 2. Find index block offset (bytes 8-11, little-endian)
INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p)
# Convert LE hex to decimal
INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \
    sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')")

# 3. Read index block, parse file table chunk offsets
# ... dd + xxd to extract offsets ...

# 4. For each file: extract ciphertext, decrypt, decompress
dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
    openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \
    gunzip > "$OUTDIR/$FILENAME"

# 5. Verify HMAC
COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
    openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}')

Shell limitations that constrain the entire format design:

  • dd reads are byte-precise but slow for large files with bs=1
  • xxd handles hex conversion but no binary arithmetic
  • openssl in busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO)
  • HMAC verification via openssl dgst -sha256 -hmac (available in most busybox builds)
  • Integer arithmetic limited to shell $(( )) -- handles 64-bit on most platforms
  • Endianness: all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing)

Patterns to Follow

Pattern 1: Pipeline Architecture (Archiver)

What: Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others.

When: Always. This is the core design pattern.

Why: Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job).

// Each stage is a function or module with typed input/output
mod collect;     // Vec<PathBuf> -> Vec<FileEntry>
mod compress;    // Vec<FileEntry> -> Vec<CompressedEntry>
mod encrypt;     // Vec<CompressedEntry> -> Vec<EncryptedEntry>
mod format;      // Vec<EncryptedEntry> -> RawArchive (unobfuscated bytes)
mod obfuscate;   // RawArchive -> Vec<u8> (final obfuscated bytes)

// Main pipeline
pub fn create_archive(paths: Vec<PathBuf>, key: &[u8; 32]) -> Result<Vec<u8>> {
    let files = collect::gather(paths)?;
    let compressed = compress::compress_all(files)?;
    let encrypted = encrypt::encrypt_all(compressed, key)?;
    let raw = format::build(encrypted)?;
    let obfuscated = obfuscate::apply(raw)?;
    Ok(obfuscated)
}

Pattern 2: Format Version Field

What: Include a format version byte in the archive header (post-deobfuscation). Start at version 1.

When: Always. Format will evolve.

Why: Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output.

Pattern 3: Per-File Independence

What: Each file in the archive is compressed and encrypted independently with its own IV and HMAC.

When: Always.

Why:

  • Shell decoder can extract a single file without processing the entire archive
  • A corruption in one file does not cascade to others
  • Memory usage is bounded by the largest single file, not the archive total

Pattern 4: Shared Format Specification as Source of Truth

What: A single document defines every byte of the format. All three implementations are derived from this spec.

When: Before writing any code.

Why: With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption.

Pattern 5: Encrypt-then-MAC

What: Apply HMAC after encryption, computed over (IV || ciphertext).

When: Always. Non-negotiable for CBC mode.

Why: CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms.

Anti-Patterns to Avoid

Anti-Pattern Why Bad Instead
Streaming/Chunked Encryption Shell can't seek into stream cipher Encrypt each file independently
Complex Obfuscation Can't implement in busybox shell XOR + fixed offsets + decoy padding
Obfuscation as Security Trivially reversible from source code Encryption = security, obfuscation = anti-detection
GCM Mode busybox openssl doesn't support it AES-256-CBC + HMAC-SHA256
zstd/lz4 Compression No busybox/Android SDK support gzip (DEFLATE)
MAC-then-Encrypt Padding oracle attacks possible Encrypt-then-MAC

Suggested Build Order

Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF
    |
    v
Phase 2: RUST ARCHIVER (core pipeline)
    |
    v
Phase 3: RUST ROUND-TRIP TEST DECODER
    |
    v
Phase 4: KOTLIN DECODER
    |
    v
Phase 5: SHELL DECODER
    |
    v
Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING

Why this order:

  1. Format spec first -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code.
  2. Rust archiver before decoders -- need archives to test decoders against.
  3. Rust test decoder before Kotlin/shell -- catches format bugs in same language, avoids cross-language debugging.
  4. Kotlin before shell -- primary path first; if Kotlin works, format is validated.
  5. Obfuscation hardening last -- core pipeline must work first. Obfuscation is a layer on top.

Key Architectural Decisions Summary

Decision Choice Rationale
Compression gzip (DEFLATE) via flate2 Native on all three platforms
Encryption AES-256-CBC busybox openssl supports CBC; GCM not available
Authentication HMAC-SHA256 (encrypt-then-MAC) Authenticated encryption for CBC; verifiable everywhere
Byte order Little-endian ARM native order; simpler shell parsing
File processing Per-file independent Shell needs random access; bounded memory; fault isolation
Obfuscation XOR headers + scattered blocks + decoy padding Simple enough for shell; defeats binwalk/file
Format contract Standalone spec document written first Three implementations need byte-exact agreement
Key storage Hardcoded 32-byte key in all decoders Per requirements; sufficient for casual user threat model
PKCS7 padding Standard PKCS7 for CBC mode openssl uses PKCS7 by default; Kotlin supports natively

Sources

  • Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg)
  • busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported
  • Android SDK javax.crypto and java.util.zip documentation
  • Rust RustCrypto ecosystem: flate2, aes, cbc, hmac, sha2
  • Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard

Verification needed: Run busybox openssl enc -ciphers on target device to confirm aes-256-cbc availability.