NikitolProject/android-encrypted-archiver

Fork 0

Files

NikitolProject 40dcfd4ac0 docs: add project research

2026-02-24 22:51:05 +03:00

15 KiB

Raw Permalink Blame History

Architecture Patterns

Domain: Custom encrypted archiver with obfuscated binary format Researched: 2026-02-24

Recommended Architecture

The system decomposes into three independent deliverables (archiver, Kotlin decompressor, shell decompressor) that share a single specification: the binary format. The format is the contract. Everything else is implementation detail.

High-Level Overview

                        +-----------------+
                        |  FORMAT SPEC    |
                        |  (shared doc)   |
                        +--------+--------+
                                 |
              +------------------+------------------+
              |                  |                  |
    +---------v---------+  +----v------+  +--------v--------+
    |  RUST ARCHIVER    |  |  KOTLIN   |  |  BUSYBOX SHELL  |
    |  (CLI, Linux/Mac) |  |  DECODER  |  |  DECODER        |
    |                   |  |  (Android)|  |  (fallback)     |
    +-------------------+  +-----------+  +-----------------+

Component Boundaries

Component	Responsibility	Communicates With	Language
Format Spec	Defines binary layout, magic bytes strategy, block structure, obfuscation scheme	All three implementations reference this	Documentation
Rust Archiver CLI	Reads input files, compresses, encrypts, obfuscates, writes archive	Filesystem (input files, output archive)	Rust
Kotlin Decoder	Reads archive, de-obfuscates, decrypts, decompresses, writes output files	Android filesystem, embedded key	Kotlin
Shell Decoder	Same as Kotlin but via busybox commands	busybox (dd, xxd, openssl), filesystem	Shell (sh)
Test Harness	Round-trip validation: archive -> decode -> compare	All three components	Rust + shell scripts

Internal Component Structure (Rust Archiver)

The archiver itself has a clear pipeline architecture with five layers:

Input Files
    |
    v
+-------------------+
| FILE COLLECTOR    |  Walks paths, reads files, captures metadata
+-------------------+
    |
    v
+-------------------+
| COMPRESSOR        |  gzip (DEFLATE) per-file compression
+-------------------+
    |
    v
+-------------------+
| ENCRYPTOR         |  AES-256-CBC + HMAC-SHA256 per-file
+-------------------+
    |
    v
+-------------------+
| FORMAT BUILDER    |  Assembles binary structure: header, TOC, data blocks
+-------------------+
    |
    v
+-------------------+
| OBFUSCATOR        |  Shuffles blocks, inserts decoys, transforms magic bytes
+-------------------+
    |
    v
Output Archive File

Data Flow: Archival (Packing)

Step 1: File Collection

for each input_path:
    read file bytes
    record: filename, original_size, file_type_hint
    -> Vec<FileEntry { name, data, metadata }>

Step 2: Compression (per-file)

Each file is compressed independently. This is critical -- per-file compression means the shell decoder can decompress one file at a time without holding the entire archive in memory.

for each FileEntry:
    compressed_data = gzip_compress(data)
    record: compressed_size
    -> Vec<CompressedEntry { name, compressed_data, original_size, compressed_size }>

Why compress before encrypt: Encrypted data has maximum entropy and cannot be compressed. Compress-then-encrypt is the only valid order. This is a fundamental constraint, not a design choice.

Step 3: Encryption (per-file)

Each compressed file is encrypted independently with a unique IV.

for each CompressedEntry:
    iv = random_16_bytes()  // unique per file, AES block size
    ciphertext = aes_256_cbc_encrypt(key, iv, pkcs7_pad(compressed_data))
    hmac = hmac_sha256(key, iv || ciphertext)  // encrypt-then-MAC
    -> Vec<EncryptedEntry { name, iv, ciphertext, hmac, sizes... }>

Key decision: AES-256-GCM vs AES-256-CBC vs ChaCha20-Poly1305.

Use AES-256-CBC + HMAC-SHA256 because:

busybox openssl supports aes-256-cbc natively (GCM is NOT available in busybox openssl)
Android/Kotlin javax.crypto supports AES-256-CBC natively
Rust RustCrypto crates (aes, cbc, hmac) support it fully
Qualcomm SoC has AES hardware acceleration (ARMv8 Cryptography Extensions)
ChaCha20 would require custom implementation for shell fallback
GCM would require custom implementation for shell fallback

Encrypt-then-MAC pattern: HMAC is computed over (IV || ciphertext) to provide authenticated encryption. The decoder verifies HMAC before attempting decryption, preventing padding oracle attacks.

Step 4: Format Assembly

The format builder creates the binary layout:

+----------------------------------------------------------+
| OBFUSCATED HEADER (variable, see Step 5)                 |
+----------------------------------------------------------+
| FILE TABLE (encrypted)                                   |
|   - number_of_files: u32                                 |
|   - for each file:                                       |
|       filename_len: u16                                  |
|       filename: [u8; filename_len]                       |
|       original_size: u64                                 |
|       compressed_size: u64                               |
|       encrypted_size: u64                                |
|       data_offset: u64                                   |
|       iv: [u8; 16]                                       |
|       hmac: [u8; 32]                                     |
+----------------------------------------------------------+
| DATA BLOCKS                                              |
|   [encrypted_file_1_data]                                |
|   [encrypted_file_2_data]                                |
|   ...                                                    |
+----------------------------------------------------------+

The file table itself is encrypted with the same key but a dedicated IV. This prevents casual inspection of filenames and sizes.

Step 5: Obfuscation

The obfuscation layer transforms the assembled binary to resist pattern analysis:

No standard magic bytes -- use random-looking bytes that are actually a known XOR pattern the decoder recognizes
Decoy padding -- insert random-length garbage blocks between real data blocks
Header scatter -- split the file table into chunks interleaved with data blocks, with a small "index block" at a known-offset that tells where the chunks are
Byte-level transforms -- simple XOR on the header region (not on encrypted data, which is already indistinguishable from random)

FINAL BINARY LAYOUT:

[fake_magic: 8 bytes]            <- XOR'd known pattern
[decoy_block: random 32-512 bytes]
[index_locator: 4 bytes at offset derived from fake_magic]
[data_block_1]
[file_table_chunk_1]
[decoy_block]
[data_block_2]
[file_table_chunk_2]
[data_block_3]
...
[index_block]                    <- lists offsets of file_table_chunks and data_blocks
[trailing_garbage: random 0-256 bytes]

Important: The obfuscation MUST be simple enough to implement in a shell script with dd and xxd. Anything requiring bit manipulation beyond XOR is too complex. Keep it to:

Fixed XOR key for header regions (hardcoded in all three decoders)
Fixed offset calculations (e.g., "index block starts at byte offset stored in bytes 8-11 of file")
Sequential reads with dd bs=1 skip=N count=M

Data Flow: Extraction (Unpacking)

Kotlin Path (Primary)

// 1. Read archive bytes
val archive = File(path).readBytes()

// 2. De-obfuscate: recover index block location
val indexOffset = deobfuscateHeader(archive)

// 3. Read index block -> get file table chunk offsets
val index = parseIndex(archive, indexOffset)

// 4. Reassemble and decrypt file table
val fileTable = decryptFileTable(index.fileTableChunks, KEY, IV)

// 5. For each file entry in table:
for (entry in fileTable.entries) {
    val ciphertext = readDataBlock(archive, entry.offset, entry.encryptedSize)
    verifyHmac(ciphertext, entry.iv, entry.hmac, KEY)
    val compressed = decryptAesCbc(ciphertext, KEY, entry.iv)
    val original = GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
    writeFile(outputDir, entry.filename, original)
}

Kotlin compression: Using gzip (java.util.zip.GZIPInputStream) which is built into Android SDK. No native libraries needed.

Shell Path (Fallback)

#!/bin/sh
# Hardcoded values
KEY_HEX="abcdef0123456789..."   # 64 hex chars = 32 bytes
XOR_KEY_HEX="deadbeef"

ARCHIVE="$1"
OUTDIR="$2"

# 1. De-obfuscate header: read first 8 bytes, XOR to get real magic
MAGIC=$(dd if="$ARCHIVE" bs=1 count=8 2>/dev/null | xxd -p)
# ... validate XOR pattern ...

# 2. Find index block offset (bytes 8-11, little-endian)
INDEX_OFF_HEX=$(dd if="$ARCHIVE" bs=1 skip=8 count=4 2>/dev/null | xxd -p)
# Convert LE hex to decimal
INDEX_OFF=$(printf "%d" "0x$(echo $INDEX_OFF_HEX | \
    sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/')")

# 3. Read index block, parse file table chunk offsets
# ... dd + xxd to extract offsets ...

# 4. For each file: extract ciphertext, decrypt, decompress
dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
    openssl aes-256-cbc -d -K "$KEY_HEX" -iv "$IV_HEX" -nosalt | \
    gunzip > "$OUTDIR/$FILENAME"

# 5. Verify HMAC
COMPUTED_HMAC=$(dd if="$ARCHIVE" bs=1 skip=$DATA_OFFSET count=$ENC_SIZE 2>/dev/null | \
    openssl dgst -sha256 -hmac "$KEY_HEX" -hex | awk '{print $2}')

Shell limitations that constrain the entire format design:

dd reads are byte-precise but slow for large files with bs=1
xxd handles hex conversion but no binary arithmetic
openssl in busybox supports limited ciphers (aes-256-cbc YES, GCM/CCM NO)
HMAC verification via openssl dgst -sha256 -hmac (available in most busybox builds)
Integer arithmetic limited to shell $(( )) -- handles 64-bit on most platforms
Endianness: all multi-byte integers in format MUST be little-endian (ARM native, simpler shell parsing)

Patterns to Follow

Pattern 1: Pipeline Architecture (Archiver)

What: Each transformation (collect, compress, encrypt, format, obfuscate) is a separate module with a clear input/output type. No module knows about the others.

When: Always. This is the core design pattern.

Why: Testability (test each stage in isolation), flexibility (swap compression algorithm without touching encryption), clarity (each module has one job).

// Each stage is a function or module with typed input/output
mod collect;     // Vec<PathBuf> -> Vec<FileEntry>
mod compress;    // Vec<FileEntry> -> Vec<CompressedEntry>
mod encrypt;     // Vec<CompressedEntry> -> Vec<EncryptedEntry>
mod format;      // Vec<EncryptedEntry> -> RawArchive (unobfuscated bytes)
mod obfuscate;   // RawArchive -> Vec<u8> (final obfuscated bytes)

// Main pipeline
pub fn create_archive(paths: Vec<PathBuf>, key: &[u8; 32]) -> Result<Vec<u8>> {
    let files = collect::gather(paths)?;
    let compressed = compress::compress_all(files)?;
    let encrypted = encrypt::encrypt_all(compressed, key)?;
    let raw = format::build(encrypted)?;
    let obfuscated = obfuscate::apply(raw)?;
    Ok(obfuscated)
}

Pattern 2: Format Version Field

What: Include a format version byte in the archive header (post-deobfuscation). Start at version 1.

When: Always. Format will evolve.

Why: Forward compatibility. Decoders can check the version and refuse to decode unknown versions with a clear error, rather than silently producing corrupt output.

Pattern 3: Per-File Independence

What: Each file in the archive is compressed and encrypted independently with its own IV and HMAC.

When: Always.

Why:

Shell decoder can extract a single file without processing the entire archive
A corruption in one file does not cascade to others
Memory usage is bounded by the largest single file, not the archive total

Pattern 4: Shared Format Specification as Source of Truth

What: A single document defines every byte of the format. All three implementations are derived from this spec.

When: Before writing any code.

Why: With three independent implementations (Rust, Kotlin, shell), byte-level compatibility is critical. Off-by-one errors in offset calculations will produce silent data corruption.

Pattern 5: Encrypt-then-MAC

What: Apply HMAC after encryption, computed over (IV || ciphertext).

When: Always. Non-negotiable for CBC mode.

Why: CBC without authentication is vulnerable to padding oracle attacks. Encrypt-then-MAC is the proven pattern. Verify HMAC before decryption on all platforms.

Anti-Patterns to Avoid

Anti-Pattern	Why Bad	Instead
Streaming/Chunked Encryption	Shell can't seek into stream cipher	Encrypt each file independently
Complex Obfuscation	Can't implement in busybox shell	XOR + fixed offsets + decoy padding
Obfuscation as Security	Trivially reversible from source code	Encryption = security, obfuscation = anti-detection
GCM Mode	busybox openssl doesn't support it	AES-256-CBC + HMAC-SHA256
zstd/lz4 Compression	No busybox/Android SDK support	gzip (DEFLATE)
MAC-then-Encrypt	Padding oracle attacks possible	Encrypt-then-MAC

Suggested Build Order

Phase 1: FORMAT SPEC + SHELL FEASIBILITY PROOF
    |
    v
Phase 2: RUST ARCHIVER (core pipeline)
    |
    v
Phase 3: RUST ROUND-TRIP TEST DECODER
    |
    v
Phase 4: KOTLIN DECODER
    |
    v
Phase 5: SHELL DECODER
    |
    v
Phase 6: OBFUSCATION HARDENING + INTEGRATION TESTING

Why this order:

Format spec first -- shared contract, constrained by busybox. Validate shell feasibility before investing in Rust/Kotlin code.
Rust archiver before decoders -- need archives to test decoders against.
Rust test decoder before Kotlin/shell -- catches format bugs in same language, avoids cross-language debugging.
Kotlin before shell -- primary path first; if Kotlin works, format is validated.
Obfuscation hardening last -- core pipeline must work first. Obfuscation is a layer on top.

Key Architectural Decisions Summary

Decision	Choice	Rationale
Compression	gzip (DEFLATE) via `flate2`	Native on all three platforms
Encryption	AES-256-CBC	busybox openssl supports CBC; GCM not available
Authentication	HMAC-SHA256 (encrypt-then-MAC)	Authenticated encryption for CBC; verifiable everywhere
Byte order	Little-endian	ARM native order; simpler shell parsing
File processing	Per-file independent	Shell needs random access; bounded memory; fault isolation
Obfuscation	XOR headers + scattered blocks + decoy padding	Simple enough for shell; defeats binwalk/file
Format contract	Standalone spec document written first	Three implementations need byte-exact agreement
Key storage	Hardcoded 32-byte key in all decoders	Per requirements; sufficient for casual user threat model
PKCS7 padding	Standard PKCS7 for CBC mode	openssl uses PKCS7 by default; Kotlin supports natively

Sources

Architecture patterns from encrypted archive design (ZIP encryption, age, tar+gpg)
busybox openssl capabilities: aes-256-cbc supported, GCM/CCM not supported
Android SDK javax.crypto and java.util.zip documentation
Rust RustCrypto ecosystem: flate2, aes, cbc, hmac, sha2
Encrypt-then-MAC: Hugo Krawczyk (2001), industry standard

Verification needed: Run busybox openssl enc -ciphers on target device to confirm aes-256-cbc availability.

15 KiB Raw Permalink Blame History