NikitolProject/android-encrypted-archiver

Files

NikitolProject 361f9bfb6b docs(06): research phase domain

2026-02-25 02:08:11 +03:00

26 KiB

Raw Blame History

Phase 6: Obfuscation Hardening - Research

Researched: 2026-02-25 Domain: Binary format obfuscation (XOR headers, encrypted TOC, decoy padding) Confidence: HIGH

Summary

Phase 6 adds three obfuscation layers to the existing archive format: XOR-obfuscated headers, encrypted file table (TOC), and random decoy padding between data blocks. The specification for all three features is already fully defined in FORMAT.md Sections 9.1-9.3, including the XOR key, flag bits, and decode order. The implementation is straightforward because the format spec was designed from the start to support these features -- the header already has toc_iv (16 bytes), flag bits 1-3, and padding_after fields in every TOC entry.

The critical complexity is that all changes must be applied atomically across four codebases (Rust archiver, Rust unpacker, Kotlin decoder, Shell decoder) while maintaining byte-identical output. The Rust archiver is the only encoder; the three decoders must all handle the new obfuscation features. The shell decoder is the most constrained: it must decrypt the TOC using openssl enc with raw key mode, which requires extracting the encrypted TOC to a temp file first (matching the existing pattern for per-file ciphertext extraction).

Primary recommendation: Implement in two plans: (1) Rust archiver + Rust unpacker with all three obfuscation features + updated unit/integration tests, (2) Kotlin decoder + Shell decoder updates + cross-validation tests confirming byte-identical output across all three decoders.

<phase_requirements>

Phase Requirements

ID	Description	Research Support
FMT-06	XOR-obfuscation headers with fixed key	FORMAT.md Section 9.1 fully defines the 8-byte XOR key (`0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8`), cyclic application across 40-byte header, and bootstrapping detection via magic byte check. Implementation is a simple byte-level XOR loop.
FMT-07	Encrypted file table with separate IV	FORMAT.md Section 9.2 defines AES-256-CBC encryption of the serialized TOC using `toc_iv` from the header. The `toc_size` field stores encrypted size (including PKCS7 padding). Same key as file encryption. All three decoders already have AES-CBC decrypt capability.
FMT-08	Decoy padding (random data between blocks)	FORMAT.md Section 9.3 defines `padding_after` (u16 LE) in each TOC entry. Random bytes inserted after each data block. Decoders skip `padding_after` bytes. Max padding per file: 65535 bytes. The `data_offset` field in TOC entries already points to the correct location, so decoders that use absolute offsets (all three) naturally handle this.
</phase_requirements>

Standard Stack

Core

No new libraries are needed. All three obfuscation features use primitives already present in the codebase.

Library/Tool	Version	Purpose	Already Present
`aes` + `cbc`	0.8 / 0.1	AES-256-CBC for TOC encryption	Yes (Cargo.toml)
`rand`	0.9	Random IV generation for TOC, random decoy padding bytes	Yes (Cargo.toml)
`openssl enc`	any	Shell decoder AES-CBC decryption (for TOC)	Yes (shell/decode.sh)
`javax.crypto.Cipher`	Android SDK	Kotlin decoder AES-CBC decryption (for TOC)	Yes (ArchiveDecoder.kt)

Supporting

Library/Tool	Version	Purpose	When to Use
`hex-literal`	1.1	XOR key constant in tests	Yes (dev-dependencies)
`binwalk`	system	Manual verification that obfuscated archives are undetectable	Testing only

Alternatives Considered

No alternatives -- the spec is locked. XOR key, AES-CBC for TOC, and random padding are all specified in FORMAT.md Section 9.

Architecture Patterns

Current Codebase Architecture

src/
├── format.rs       # Header/TOC structs, read/write serialization
├── crypto.rs       # AES-CBC encrypt/decrypt, HMAC, SHA-256, IV generation
├── archive.rs      # pack(), unpack(), inspect() orchestration
├── compression.rs  # gzip compress/decompress
├── key.rs          # 32-byte hardcoded key constant
├── cli.rs          # clap CLI definition
├── lib.rs          # pub mod re-exports
└── main.rs         # entry point

kotlin/
└── ArchiveDecoder.kt   # Single-file decoder (parse + decrypt + decompress)

shell/
└── decode.sh           # Busybox-compatible POSIX shell decoder

Pattern 1: XOR Header Obfuscation

What: Apply cyclic 8-byte XOR to all 40 header bytes after construction (encoding) and before parsing (decoding).

Implementation in Rust archiver (format.rs or archive.rs):

/// Fixed 8-byte XOR obfuscation key (FORMAT.md Section 9.1).
const XOR_KEY: [u8; 8] = [0xA5, 0x3C, 0x96, 0x0F, 0xE1, 0x7B, 0x4D, 0xC8];

/// XOR-obfuscate or de-obfuscate a 40-byte header buffer in-place.
/// XOR is its own inverse, so the same function encodes and decodes.
fn xor_header(buf: &mut [u8; 40]) {
    for (i, byte) in buf.iter_mut().enumerate() {
        *byte ^= XOR_KEY[i % 8];
    }
}

Decode bootstrapping (FORMAT.md Section 10, step 2):

Read first 40 bytes raw.
Check if bytes 0-3 match MAGIC (0x00 0xEA 0x72 0x63).
If YES: header is plain, parse normally.
If NO: apply XOR to all 40 bytes, re-check magic. If still wrong, reject.

In Kotlin:

val XOR_KEY = byteArrayOf(
    0xA5.toByte(), 0x3C, 0x96.toByte(), 0x0F,
    0xE1.toByte(), 0x7B, 0x4D, 0xC8.toByte()
)

fun xorHeader(buf: ByteArray) {
    for (i in 0 until 40) {
        buf[i] = (buf[i].toInt() xor XOR_KEY[i % 8].toInt()).toByte()
    }
}

In shell:

# XOR key as hex pairs
XOR_KEY="a53c960fe17b4dc8"

# De-XOR 40 header bytes: read raw, XOR each byte, write back
# This requires per-byte hex manipulation in shell

Pattern 2: TOC Encryption

What: Serialize all TOC entries to a buffer, then encrypt the entire buffer with AES-256-CBC using a random toc_iv, and write the encrypted TOC. Store the encrypted size in toc_size.

Encoding (Rust archiver):

// 1. Serialize TOC entries to a Vec<u8>
let mut toc_buf = Vec::new();
for entry in &entries {
    format::write_toc_entry(&mut toc_buf, entry)?;
}

// 2. Generate random toc_iv
let toc_iv = crypto::generate_iv();

// 3. Encrypt the serialized TOC
let encrypted_toc = crypto::encrypt_data(&toc_buf, &KEY, &toc_iv);
let toc_size = encrypted_toc.len() as u32;  // encrypted size

// 4. Write header with toc_iv and encrypted toc_size
// 5. Write encrypted_toc bytes at toc_offset

Decoding (all decoders):

Read toc_offset, toc_size, toc_iv from (de-XORed) header.
Check flags bit 1 (toc_encrypted).
If set: read toc_size bytes at toc_offset, decrypt with AES-256-CBC using toc_iv and KEY, remove PKCS7 padding.
Parse TOC entries from decrypted buffer.

Shell decoder TOC decryption:

# Extract encrypted TOC to temp file
dd if="$ARCHIVE" bs=1 skip="$toc_offset" count="$toc_size" of="$TMPDIR/toc_enc.bin" 2>/dev/null

# Decrypt TOC
openssl enc -d -aes-256-cbc -nosalt \
    -K "$KEY_HEX" -iv "$toc_iv_hex" \
    -in "$TMPDIR/toc_enc.bin" -out "$TMPDIR/toc_dec.bin"

# Now parse TOC entries from the decrypted file
# (requires switching from reading TOC fields directly from $ARCHIVE
#  to reading from $TMPDIR/toc_dec.bin with offset 0)

Pattern 3: Decoy Padding

What: After writing each file's ciphertext, write random bytes of random length (0-65535).

Encoding (Rust archiver):

use rand::Rng;

// For each file, generate random padding length
let padding_after: u16 = rng.random_range(64..=4096);  // sensible range
// Write ciphertext, then write padding_after random bytes
let mut padding = vec![0u8; padding_after as usize];
rand::Fill::fill(&mut padding[..], &mut rng);
out_file.write_all(&padding)?;

Decoding: All three decoders already use absolute data_offset from the TOC to seek to each file's data block, so they naturally skip over padding. The padding_after field in TOC entries is already parsed by all decoders (currently always 0). No decoder changes needed for the actual extraction -- the decoders just need to not break when padding_after > 0.

Pattern 4: Flag Bits Management

Current state: The archiver sets flags bit 0 (compression) when any file is compressed. Bits 1-3 are always 0.

Phase 6 changes: When obfuscation is active, set:

Bit 1 (0x02): TOC encrypted
Bit 2 (0x04): XOR header
Bit 3 (0x08): Decoy padding

All three features should be enabled together (flags = 0x0F when compression + all obfuscation). The archiver should always enable all three obfuscation features. There is no user-facing toggle needed (FORMAT.md says "can be activated independently" but the v1 goal is full obfuscation).

Recommended Modification Order

The correct order of operations for the encoder is:

1. Compute data offsets accounting for decoy padding
2. Serialize TOC entries (with padding_after values)
3. Encrypt serialized TOC → encrypted_toc
4. Build header (with toc_iv, encrypted toc_size, flags with bits 1-3 set)
5. Serialize header to 40-byte buffer
6. XOR the 40-byte header buffer
7. Write: XOR'd header || encrypted TOC || (data blocks with interleaved padding)

The correct order of operations for the decoder is (FORMAT.md Section 10):

1. Read 40 raw bytes
2. Check magic → if mismatch, XOR and re-check
3. Parse header fields (including toc_iv, flags)
4. If flags bit 1: decrypt TOC with toc_iv
5. Parse TOC entries from (decrypted) buffer
6. For each file: seek to data_offset, read encrypted_size, verify HMAC, decrypt, decompress, verify SHA-256
   (padding_after is naturally skipped because next file uses its own data_offset)

Anti-Patterns to Avoid

XOR after TOC encryption: The XOR must be applied last (to the header) during encoding, because the header contains the toc_iv needed for TOC decryption. If you XOR first and then modify the header, the XOR output is invalid.
Using piped input for openssl TOC decryption in shell: The existing shell decoder already extracts ciphertext to a temp file before decryption to avoid pipe buffering issues. The same pattern MUST be used for TOC decryption.
Modifying data_offset calculation without accounting for padding: When computing data_offset for each file, the offset must include all preceding files' encrypted_size + padding_after values. The current code only sums encrypted_size.
Forgetting the TOC size change: When TOC encryption is on, toc_size in the header is the encrypted size (with PKCS7 padding), not the plaintext size. The data block start offset is toc_offset + toc_size (encrypted).
Shell decoder: parsing TOC from archive file vs decrypted buffer: Currently, the shell decoder reads TOC fields directly from $ARCHIVE using absolute offsets. With TOC encryption, it must read from the decrypted TOC temp file with relative offsets (starting at 0). This is a significant refactor of the shell decoder's TOC parsing loop.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
XOR obfuscation	Custom bit manipulation tricks	Simple `byte ^= key[i % 8]` loop	XOR is trivially simple; any "optimization" adds complexity without benefit
TOC encryption	Custom encryption scheme	Existing `crypto::encrypt_data` / `crypto::decrypt_data`	Same AES-256-CBC already used for file encryption
Random byte generation	Pseudo-random with manual seeding	`rand::Fill` (Rust), `/dev/urandom` (shell), `SecureRandom` (Kotlin)	CSPRNG is already in use for IV generation
PKCS7 padding for TOC	Manual padding logic	`cbc` crate handles PKCS7 automatically	The encrypt/decrypt functions already handle padding

Key insight: Every cryptographic primitive needed is already in the codebase. Phase 6 is purely about wiring existing functions into the encode/decode pipeline in the correct order.

Common Pitfalls

Pitfall 1: Shell Decoder TOC Parsing Refactor

What goes wrong: The current shell decoder reads TOC fields directly from $ARCHIVE at absolute offsets (pos=$toc_offset, then read_le_u16 "$ARCHIVE" "$pos"). After TOC encryption, the TOC must be decrypted to a temp file first, and all TOC reads must come from that temp file with offsets starting at 0 instead of $toc_offset.

Why it happens: The entire TOC parsing loop in decode.sh (lines 139-244) uses $ARCHIVE as the file argument to read_hex, read_le_u16, read_le_u32, and dd. All of these calls need to be changed to read from the decrypted TOC file with a reset position counter.

How to avoid: Extract the TOC parsing into a section that operates on a "TOC file" variable. When TOC encryption is off, the TOC file is the archive itself (with pos starting at toc_offset). When TOC encryption is on, the TOC file is the decrypted temp file (with pos starting at 0).

Warning signs: Tests pass with TOC encryption off but fail with TOC encryption on; the shell decoder reads garbage field values.

Pitfall 2: XOR Header Bootstrapping in Shell

What goes wrong: The shell decoder currently reads magic bytes and immediately validates them. With XOR obfuscation, the first 4 bytes will NOT be the magic bytes -- they'll be XOR'd. The decoder must attempt XOR de-obfuscation before parsing.

Why it happens: The current shell code at line 108-113 reads magic and exits immediately on mismatch. This must become a conditional: try raw first, then try XOR.

How to avoid: Implement the bootstrapping algorithm from FORMAT.md Section 10 step 2: read 40 bytes, check magic, if mismatch XOR all 40 bytes and re-check.

Warning signs: Shell decoder rejects all obfuscated archives with "bad magic bytes".

Pitfall 3: XOR in Shell Requires Per-Byte Hex Manipulation

What goes wrong: Shell/POSIX sh has no native XOR operator for bytes. Implementing XOR in shell requires reading each byte as hex, converting to decimal, XORing with the key byte (also as decimal), and converting back to hex. This is significantly more complex than in Rust or Kotlin.

Why it happens: POSIX sh arithmetic supports XOR ($(( )) with ^ operator), but converting between hex bytes and shell arithmetic requires careful hex string slicing.

How to avoid: Use shell arithmetic: result=$(( 0x${byte_hex} ^ 0x${key_hex} )) and then printf '%02x' "$result". Process all 40 header bytes in a loop, building the de-XORed header either in a hex string or as a temp binary file.

Practical approach: Read the 40-byte header as a hex string, XOR each byte pair in a loop, write the result to a temp file, then use the existing read_le_u16/read_le_u32 functions on the temp file.

# Read 40-byte header as hex
header_hex=$(read_hex "$ARCHIVE" 0 40)
xor_key="a53c960fe17b4dc8"

# XOR each byte
i=0
result=""
while [ $i -lt 80 ]; do  # 80 hex chars = 40 bytes
    byte=$(printf '%.2s' "${header_hex#$(printf "%${i}s" | tr ' ' '?')}")
    # ... extract byte at position i/2 from header_hex
    key_byte_idx=$(( (i / 2) % 8 ))
    key_byte=$(printf '%.2s' "${xor_key#$(printf "%$((key_byte_idx * 2))s" | tr ' ' '?')}")
    xored=$(printf '%02x' "$(( 0x$byte ^ 0x$key_byte ))")
    result="${result}${xored}"
    i=$((i + 2))
done
# Write result to temp file using printf or xxd -r -p

Warning signs: Hex string indexing errors, off-by-one in the XOR loop, wrong byte order.

Pitfall 4: Kotlin Signed Byte XOR

What goes wrong: Kotlin bytes are signed (-128 to 127). XOR operations on bytes require .toInt() and 0xFF masking to avoid sign extension. The XOR key contains bytes > 0x7F (e.g., 0xA5, 0xC8) which are negative in Kotlin's signed byte representation.

Why it happens: 0xA5.toByte() in Kotlin is -91, and XOR between two signed bytes can produce unexpected results without masking.

How to avoid: Always use (buf[i].toInt() and 0xFF) xor (XOR_KEY[i % 8].toInt() and 0xFF) and then .toByte() the result. This is the same pattern already used in ArchiveDecoder.kt for other byte operations.

Warning signs: XOR produces wrong values for bytes > 0x7F; magic byte check fails after de-XOR.

Pitfall 5: Data Offset Computation with Padding

What goes wrong: The archiver computes data_offset for each file by summing toc_offset + toc_size + sum(encrypted_sizes_before). With decoy padding, it must also add sum(padding_after_before).

Why it happens: The current pack() function computes offsets in a simple loop without padding.

How to avoid: Generate all padding_after values first, then compute offsets as current_offset += encrypted_size + padding_after for each file.

Warning signs: Data offsets in TOC entries point to wrong locations; decoders read garbage ciphertext.

Pitfall 6: TOC Size for Encrypted TOC

What goes wrong: The toc_size header field must store the encrypted TOC size (which includes PKCS7 padding), not the plaintext serialized size. The encrypted size is ((plaintext_size / 16) + 1) * 16.

Why it happens: The current code sets toc_size to the plaintext size. After encryption, the size grows due to PKCS7 padding.

How to avoid: Serialize TOC to buffer first, encrypt, then use encrypted_toc.len() as toc_size.

Warning signs: Decoder reads wrong number of bytes for encrypted TOC; AES decryption fails with "invalid padding".

Pitfall 7: Inspect Command with Obfuscation

What goes wrong: The inspect command currently reads the header and TOC in plaintext. After obfuscation, it must de-XOR the header and decrypt the TOC before printing metadata.

Why it happens: The inspect path shares code with unpack but the developer might forget to update it.

How to avoid: Factor out header de-obfuscation and TOC decryption into reusable functions called by both unpack() and inspect().

Warning signs: inspect command crashes or shows garbage on obfuscated archives.

Code Examples

XOR Header Round-Trip (Rust)

// Source: FORMAT.md Section 9.1

const XOR_KEY: [u8; 8] = [0xA5, 0x3C, 0x96, 0x0F, 0xE1, 0x7B, 0x4D, 0xC8];

fn xor_header_buf(buf: &mut [u8]) {
    assert!(buf.len() >= 40);
    for i in 0..40 {
        buf[i] ^= XOR_KEY[i % 8];
    }
}

// Encoding: write header normally, then XOR
let mut header_buf = Vec::new();
write_header(&mut header_buf, &header)?;
xor_header_buf(&mut header_buf);
out_file.write_all(&header_buf)?;

// Decoding: read 40 bytes, check magic, if no match XOR and re-check
let mut buf = [0u8; 40];
reader.read_exact(&mut buf)?;
if buf[0..4] != MAGIC {
    xor_header_buf(&mut buf);
    anyhow::ensure!(buf[0..4] == MAGIC, "Invalid magic bytes after XOR attempt");
}
// Parse header from buf...

TOC Encryption (Rust)

// Source: FORMAT.md Section 9.2

// Encoding
let mut toc_plaintext = Vec::new();
for entry in &toc_entries {
    write_toc_entry(&mut toc_plaintext, entry)?;
}
let toc_iv = crypto::generate_iv();
let encrypted_toc = crypto::encrypt_data(&toc_plaintext, &KEY, &toc_iv);
// encrypted_toc.len() is the toc_size to store in header

// Decoding
let encrypted_toc_buf = /* read toc_size bytes from toc_offset */;
let toc_plaintext = crypto::decrypt_data(&encrypted_toc_buf, &KEY, &header.toc_iv)?;
let mut cursor = Cursor::new(&toc_plaintext);
let entries = read_toc(&mut cursor, header.file_count)?;

Decoy Padding (Rust)

// Source: FORMAT.md Section 9.3

use rand::Rng;

let mut rng = rand::rng();

// For each file, during pack:
let padding_after: u16 = rng.random_range(64..=4096);
let mut padding_bytes = vec![0u8; padding_after as usize];
rand::Fill::fill(&mut padding_bytes[..], &mut rng);

// After writing ciphertext for this file:
out_file.write_all(&pf.ciphertext)?;
out_file.write_all(&padding_bytes)?;

Shell Decoder XOR De-obfuscation

# Source: FORMAT.md Section 9.1 + Section 10 step 2

XOR_KEY_HEX="a53c960fe17b4dc8"

# Read 40-byte header as hex
raw_header_hex=$(read_hex "$ARCHIVE" 0 40)
magic_hex=$(printf '%.8s' "$raw_header_hex")

if [ "$magic_hex" = "00ea7263" ]; then
    header_hex="$raw_header_hex"
else
    # Apply XOR de-obfuscation
    header_hex=""
    byte_idx=0
    while [ "$byte_idx" -lt 40 ]; do
        hex_pos=$((byte_idx * 2))
        # Extract byte from raw header
        raw_byte_hex=$(printf '%s' "$raw_header_hex" | cut -c$((hex_pos + 1))-$((hex_pos + 2)))
        # Extract key byte (cyclic)
        key_pos=$(( (byte_idx % 8) * 2 ))
        key_byte_hex=$(printf '%s' "$XOR_KEY_HEX" | cut -c$((key_pos + 1))-$((key_pos + 2)))
        # XOR
        result=$(printf '%02x' "$(( 0x$raw_byte_hex ^ 0x$key_byte_hex ))")
        header_hex="${header_hex}${result}"
        byte_idx=$((byte_idx + 1))
    done

    # Verify magic after XOR
    magic_hex=$(printf '%.8s' "$header_hex")
    if [ "$magic_hex" != "00ea7263" ]; then
        printf 'Invalid archive: bad magic bytes\n' >&2
        exit 1
    fi
fi

# Write de-XORed header to temp file for field parsing
printf '%s' "$header_hex" | xxd -r -p > "$TMPDIR/header.bin"
# Now use read_le_u16/read_le_u32 on "$TMPDIR/header.bin"

Kotlin XOR De-obfuscation

// Source: FORMAT.md Section 9.1

val XOR_KEY = byteArrayOf(
    0xA5.toByte(), 0x3C, 0x96.toByte(), 0x0F,
    0xE1.toByte(), 0x7B, 0x4D, 0xC8.toByte()
)

fun xorHeader(buf: ByteArray) {
    for (i in 0 until minOf(buf.size, 40)) {
        buf[i] = ((buf[i].toInt() and 0xFF) xor (XOR_KEY[i % 8].toInt() and 0xFF)).toByte()
    }
}

// In decode():
val headerBytes = ByteArray(HEADER_SIZE)
raf.readFully(headerBytes)

// Check magic before XOR
if (!(headerBytes[0] == MAGIC[0] && headerBytes[1] == MAGIC[1] &&
      headerBytes[2] == MAGIC[2] && headerBytes[3] == MAGIC[3])) {
    // Attempt XOR de-obfuscation
    xorHeader(headerBytes)
}

val header = parseHeader(headerBytes)

// If TOC encrypted:
if (header.flags and 0x02 != 0) {
    raf.seek(header.tocOffset)
    val encryptedToc = ByteArray(header.tocSize.toInt())
    raf.readFully(encryptedToc)
    val decryptedToc = decryptAesCbc(encryptedToc, header.tocIv, KEY)
    val entries = parseToc(decryptedToc, header.fileCount)
    // ... proceed with entries
}

State of the Art

Old Approach (current)	New Approach (Phase 6)	Impact
Plaintext header with MAGIC visible	XOR-obfuscated header -- no recognizable bytes	`file` and `binwalk` cannot identify format
Plaintext TOC with filenames visible	AES-encrypted TOC -- `strings` reveals nothing	Hex editors see no metadata
Contiguous data blocks	Data blocks with random padding gaps	Size analysis of individual files is defeated
`flags = 0x01` (compression only)	`flags = 0x0F` (compression + all obfuscation)	All obfuscation active by default

Nothing is deprecated: The old approach still works (flags bits 1-3 = 0). The decoder always checks whether obfuscation is active and handles both cases.

Open Questions

Padding size range
- What we know: padding_after is u16 (0-65535). FORMAT.md doesn't specify a recommended range.
- What's unclear: Should padding be uniformly random in a fixed range, or proportional to file size?
- Recommendation: Use a fixed range of 64-4096 bytes per file. This adds meaningful noise without significantly inflating archive size. The exact range is not spec-mandated, so the planner can decide.
Should obfuscation be the default or opt-in?
- What we know: The spec says features "can be activated independently." Phase 6 success criteria say "all three decoders still produce byte-identical output after obfuscation is applied."
- What's unclear: Should pack always enable obfuscation, or should there be a --no-obfuscate flag?
- Recommendation: Always enable all three obfuscation features. The whole point of Phase 6 is hardening. Add a --no-obfuscate flag for backward compatibility testing only. This simplifies the implementation.
Existing test archives
- What we know: Current tests create archives without obfuscation.
- What's unclear: Should existing tests still pass with obfuscation enabled by default?
- Recommendation: Existing round-trip tests should still pass because they test pack→unpack, and both sides will now use obfuscation. Golden test vectors for crypto primitives are unaffected. Cross-validation tests (Kotlin, Shell) need to be re-run against obfuscated archives.
Shell cut vs substring approach for hex processing
- What we know: POSIX sh substring syntax (${var:offset:length}) is a bashism not available in strict POSIX sh. The current shell decoder uses printf '%.2s' and ${var#??} patterns for string slicing.
- What's unclear: Is cut -c POSIX-compliant for hex byte extraction in the XOR loop?
- Recommendation: cut -c is POSIX-compliant and available in busybox. Use printf '%s' "$hex" | cut -c$start-$end for byte extraction. Alternatively, use the existing ${var#??} pattern in a loop. Test with busybox sh.

Sources

Primary (HIGH confidence)

FORMAT.md Sections 9.1-9.3 and Section 10 -- complete specification of all three obfuscation features, including XOR key, flag bits, decode order, and bootstrapping algorithm
Existing codebase (src/format.rs, src/crypto.rs, src/archive.rs, kotlin/ArchiveDecoder.kt, shell/decode.sh) -- verified current implementation patterns

Secondary (MEDIUM confidence)

OpenSSL enc documentation -- confirms -K/-iv/-nosalt raw key mode works with piped/file input for TOC decryption
Malwarebytes XOR obfuscation -- confirms XOR obfuscation is standard practice for hiding binary structure
Security Lab entropy analysis -- confirms random padding disrupts entropy-based analysis tools

Tertiary (LOW confidence)

None -- all findings verified against primary spec and codebase

Metadata

Confidence breakdown:

Standard stack: HIGH -- no new dependencies, all primitives already in codebase
Architecture: HIGH -- FORMAT.md fully specifies all three features with byte-level precision
Pitfalls: HIGH -- identified by analyzing actual code structure and known shell/Kotlin quirks

Research date: 2026-02-25 Valid until: 2026-03-25 (stable -- format spec is frozen for v1)

26 KiB Raw Blame History