26 KiB
Phase 6: Obfuscation Hardening - Research
Researched: 2026-02-25 Domain: Binary format obfuscation (XOR headers, encrypted TOC, decoy padding) Confidence: HIGH
Summary
Phase 6 adds three obfuscation layers to the existing archive format: XOR-obfuscated headers, encrypted file table (TOC), and random decoy padding between data blocks. The specification for all three features is already fully defined in FORMAT.md Sections 9.1-9.3, including the XOR key, flag bits, and decode order. The implementation is straightforward because the format spec was designed from the start to support these features -- the header already has toc_iv (16 bytes), flag bits 1-3, and padding_after fields in every TOC entry.
The critical complexity is that all changes must be applied atomically across four codebases (Rust archiver, Rust unpacker, Kotlin decoder, Shell decoder) while maintaining byte-identical output. The Rust archiver is the only encoder; the three decoders must all handle the new obfuscation features. The shell decoder is the most constrained: it must decrypt the TOC using openssl enc with raw key mode, which requires extracting the encrypted TOC to a temp file first (matching the existing pattern for per-file ciphertext extraction).
Primary recommendation: Implement in two plans: (1) Rust archiver + Rust unpacker with all three obfuscation features + updated unit/integration tests, (2) Kotlin decoder + Shell decoder updates + cross-validation tests confirming byte-identical output across all three decoders.
<phase_requirements>
Phase Requirements
| ID | Description | Research Support |
|---|---|---|
| FMT-06 | XOR-obfuscation headers with fixed key | FORMAT.md Section 9.1 fully defines the 8-byte XOR key (0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8), cyclic application across 40-byte header, and bootstrapping detection via magic byte check. Implementation is a simple byte-level XOR loop. |
| FMT-07 | Encrypted file table with separate IV | FORMAT.md Section 9.2 defines AES-256-CBC encryption of the serialized TOC using toc_iv from the header. The toc_size field stores encrypted size (including PKCS7 padding). Same key as file encryption. All three decoders already have AES-CBC decrypt capability. |
| FMT-08 | Decoy padding (random data between blocks) | FORMAT.md Section 9.3 defines padding_after (u16 LE) in each TOC entry. Random bytes inserted after each data block. Decoders skip padding_after bytes. Max padding per file: 65535 bytes. The data_offset field in TOC entries already points to the correct location, so decoders that use absolute offsets (all three) naturally handle this. |
| </phase_requirements> |
Standard Stack
Core
No new libraries are needed. All three obfuscation features use primitives already present in the codebase.
| Library/Tool | Version | Purpose | Already Present |
|---|---|---|---|
aes + cbc |
0.8 / 0.1 | AES-256-CBC for TOC encryption | Yes (Cargo.toml) |
rand |
0.9 | Random IV generation for TOC, random decoy padding bytes | Yes (Cargo.toml) |
openssl enc |
any | Shell decoder AES-CBC decryption (for TOC) | Yes (shell/decode.sh) |
javax.crypto.Cipher |
Android SDK | Kotlin decoder AES-CBC decryption (for TOC) | Yes (ArchiveDecoder.kt) |
Supporting
| Library/Tool | Version | Purpose | When to Use |
|---|---|---|---|
hex-literal |
1.1 | XOR key constant in tests | Yes (dev-dependencies) |
binwalk |
system | Manual verification that obfuscated archives are undetectable | Testing only |
Alternatives Considered
No alternatives -- the spec is locked. XOR key, AES-CBC for TOC, and random padding are all specified in FORMAT.md Section 9.
Architecture Patterns
Current Codebase Architecture
src/
├── format.rs # Header/TOC structs, read/write serialization
├── crypto.rs # AES-CBC encrypt/decrypt, HMAC, SHA-256, IV generation
├── archive.rs # pack(), unpack(), inspect() orchestration
├── compression.rs # gzip compress/decompress
├── key.rs # 32-byte hardcoded key constant
├── cli.rs # clap CLI definition
├── lib.rs # pub mod re-exports
└── main.rs # entry point
kotlin/
└── ArchiveDecoder.kt # Single-file decoder (parse + decrypt + decompress)
shell/
└── decode.sh # Busybox-compatible POSIX shell decoder
Pattern 1: XOR Header Obfuscation
What: Apply cyclic 8-byte XOR to all 40 header bytes after construction (encoding) and before parsing (decoding).
Implementation in Rust archiver (format.rs or archive.rs):
/// Fixed 8-byte XOR obfuscation key (FORMAT.md Section 9.1).
const XOR_KEY: [u8; 8] = [0xA5, 0x3C, 0x96, 0x0F, 0xE1, 0x7B, 0x4D, 0xC8];
/// XOR-obfuscate or de-obfuscate a 40-byte header buffer in-place.
/// XOR is its own inverse, so the same function encodes and decodes.
fn xor_header(buf: &mut [u8; 40]) {
for (i, byte) in buf.iter_mut().enumerate() {
*byte ^= XOR_KEY[i % 8];
}
}
Decode bootstrapping (FORMAT.md Section 10, step 2):
- Read first 40 bytes raw.
- Check if bytes 0-3 match MAGIC (
0x00 0xEA 0x72 0x63). - If YES: header is plain, parse normally.
- If NO: apply XOR to all 40 bytes, re-check magic. If still wrong, reject.
In Kotlin:
val XOR_KEY = byteArrayOf(
0xA5.toByte(), 0x3C, 0x96.toByte(), 0x0F,
0xE1.toByte(), 0x7B, 0x4D, 0xC8.toByte()
)
fun xorHeader(buf: ByteArray) {
for (i in 0 until 40) {
buf[i] = (buf[i].toInt() xor XOR_KEY[i % 8].toInt()).toByte()
}
}
In shell:
# XOR key as hex pairs
XOR_KEY="a53c960fe17b4dc8"
# De-XOR 40 header bytes: read raw, XOR each byte, write back
# This requires per-byte hex manipulation in shell
Pattern 2: TOC Encryption
What: Serialize all TOC entries to a buffer, then encrypt the entire buffer with AES-256-CBC using a random toc_iv, and write the encrypted TOC. Store the encrypted size in toc_size.
Encoding (Rust archiver):
// 1. Serialize TOC entries to a Vec<u8>
let mut toc_buf = Vec::new();
for entry in &entries {
format::write_toc_entry(&mut toc_buf, entry)?;
}
// 2. Generate random toc_iv
let toc_iv = crypto::generate_iv();
// 3. Encrypt the serialized TOC
let encrypted_toc = crypto::encrypt_data(&toc_buf, &KEY, &toc_iv);
let toc_size = encrypted_toc.len() as u32; // encrypted size
// 4. Write header with toc_iv and encrypted toc_size
// 5. Write encrypted_toc bytes at toc_offset
Decoding (all decoders):
- Read
toc_offset,toc_size,toc_ivfrom (de-XORed) header. - Check flags bit 1 (
toc_encrypted). - If set: read
toc_sizebytes attoc_offset, decrypt with AES-256-CBC usingtoc_ivand KEY, remove PKCS7 padding. - Parse TOC entries from decrypted buffer.
Shell decoder TOC decryption:
# Extract encrypted TOC to temp file
dd if="$ARCHIVE" bs=1 skip="$toc_offset" count="$toc_size" of="$TMPDIR/toc_enc.bin" 2>/dev/null
# Decrypt TOC
openssl enc -d -aes-256-cbc -nosalt \
-K "$KEY_HEX" -iv "$toc_iv_hex" \
-in "$TMPDIR/toc_enc.bin" -out "$TMPDIR/toc_dec.bin"
# Now parse TOC entries from the decrypted file
# (requires switching from reading TOC fields directly from $ARCHIVE
# to reading from $TMPDIR/toc_dec.bin with offset 0)
Pattern 3: Decoy Padding
What: After writing each file's ciphertext, write random bytes of random length (0-65535).
Encoding (Rust archiver):
use rand::Rng;
// For each file, generate random padding length
let padding_after: u16 = rng.random_range(64..=4096); // sensible range
// Write ciphertext, then write padding_after random bytes
let mut padding = vec![0u8; padding_after as usize];
rand::Fill::fill(&mut padding[..], &mut rng);
out_file.write_all(&padding)?;
Decoding: All three decoders already use absolute data_offset from the TOC to seek to each file's data block, so they naturally skip over padding. The padding_after field in TOC entries is already parsed by all decoders (currently always 0). No decoder changes needed for the actual extraction -- the decoders just need to not break when padding_after > 0.
Pattern 4: Flag Bits Management
Current state: The archiver sets flags bit 0 (compression) when any file is compressed. Bits 1-3 are always 0.
Phase 6 changes: When obfuscation is active, set:
- Bit 1 (
0x02): TOC encrypted - Bit 2 (
0x04): XOR header - Bit 3 (
0x08): Decoy padding
All three features should be enabled together (flags = 0x0F when compression + all obfuscation). The archiver should always enable all three obfuscation features. There is no user-facing toggle needed (FORMAT.md says "can be activated independently" but the v1 goal is full obfuscation).
Recommended Modification Order
The correct order of operations for the encoder is:
1. Compute data offsets accounting for decoy padding
2. Serialize TOC entries (with padding_after values)
3. Encrypt serialized TOC → encrypted_toc
4. Build header (with toc_iv, encrypted toc_size, flags with bits 1-3 set)
5. Serialize header to 40-byte buffer
6. XOR the 40-byte header buffer
7. Write: XOR'd header || encrypted TOC || (data blocks with interleaved padding)
The correct order of operations for the decoder is (FORMAT.md Section 10):
1. Read 40 raw bytes
2. Check magic → if mismatch, XOR and re-check
3. Parse header fields (including toc_iv, flags)
4. If flags bit 1: decrypt TOC with toc_iv
5. Parse TOC entries from (decrypted) buffer
6. For each file: seek to data_offset, read encrypted_size, verify HMAC, decrypt, decompress, verify SHA-256
(padding_after is naturally skipped because next file uses its own data_offset)
Anti-Patterns to Avoid
- XOR after TOC encryption: The XOR must be applied last (to the header) during encoding, because the header contains the
toc_ivneeded for TOC decryption. If you XOR first and then modify the header, the XOR output is invalid. - Using piped input for openssl TOC decryption in shell: The existing shell decoder already extracts ciphertext to a temp file before decryption to avoid pipe buffering issues. The same pattern MUST be used for TOC decryption.
- Modifying data_offset calculation without accounting for padding: When computing
data_offsetfor each file, the offset must include all preceding files'encrypted_size + padding_aftervalues. The current code only sumsencrypted_size. - Forgetting the TOC size change: When TOC encryption is on,
toc_sizein the header is the encrypted size (with PKCS7 padding), not the plaintext size. The data block start offset istoc_offset + toc_size(encrypted). - Shell decoder: parsing TOC from archive file vs decrypted buffer: Currently, the shell decoder reads TOC fields directly from
$ARCHIVEusing absolute offsets. With TOC encryption, it must read from the decrypted TOC temp file with relative offsets (starting at 0). This is a significant refactor of the shell decoder's TOC parsing loop.
Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| XOR obfuscation | Custom bit manipulation tricks | Simple byte ^= key[i % 8] loop |
XOR is trivially simple; any "optimization" adds complexity without benefit |
| TOC encryption | Custom encryption scheme | Existing crypto::encrypt_data / crypto::decrypt_data |
Same AES-256-CBC already used for file encryption |
| Random byte generation | Pseudo-random with manual seeding | rand::Fill (Rust), /dev/urandom (shell), SecureRandom (Kotlin) |
CSPRNG is already in use for IV generation |
| PKCS7 padding for TOC | Manual padding logic | cbc crate handles PKCS7 automatically |
The encrypt/decrypt functions already handle padding |
Key insight: Every cryptographic primitive needed is already in the codebase. Phase 6 is purely about wiring existing functions into the encode/decode pipeline in the correct order.
Common Pitfalls
Pitfall 1: Shell Decoder TOC Parsing Refactor
What goes wrong: The current shell decoder reads TOC fields directly from $ARCHIVE at absolute offsets (pos=$toc_offset, then read_le_u16 "$ARCHIVE" "$pos"). After TOC encryption, the TOC must be decrypted to a temp file first, and all TOC reads must come from that temp file with offsets starting at 0 instead of $toc_offset.
Why it happens: The entire TOC parsing loop in decode.sh (lines 139-244) uses $ARCHIVE as the file argument to read_hex, read_le_u16, read_le_u32, and dd. All of these calls need to be changed to read from the decrypted TOC file with a reset position counter.
How to avoid: Extract the TOC parsing into a section that operates on a "TOC file" variable. When TOC encryption is off, the TOC file is the archive itself (with pos starting at toc_offset). When TOC encryption is on, the TOC file is the decrypted temp file (with pos starting at 0).
Warning signs: Tests pass with TOC encryption off but fail with TOC encryption on; the shell decoder reads garbage field values.
Pitfall 2: XOR Header Bootstrapping in Shell
What goes wrong: The shell decoder currently reads magic bytes and immediately validates them. With XOR obfuscation, the first 4 bytes will NOT be the magic bytes -- they'll be XOR'd. The decoder must attempt XOR de-obfuscation before parsing.
Why it happens: The current shell code at line 108-113 reads magic and exits immediately on mismatch. This must become a conditional: try raw first, then try XOR.
How to avoid: Implement the bootstrapping algorithm from FORMAT.md Section 10 step 2: read 40 bytes, check magic, if mismatch XOR all 40 bytes and re-check.
Warning signs: Shell decoder rejects all obfuscated archives with "bad magic bytes".
Pitfall 3: XOR in Shell Requires Per-Byte Hex Manipulation
What goes wrong: Shell/POSIX sh has no native XOR operator for bytes. Implementing XOR in shell requires reading each byte as hex, converting to decimal, XORing with the key byte (also as decimal), and converting back to hex. This is significantly more complex than in Rust or Kotlin.
Why it happens: POSIX sh arithmetic supports XOR ($(( )) with ^ operator), but converting between hex bytes and shell arithmetic requires careful hex string slicing.
How to avoid: Use shell arithmetic: result=$(( 0x${byte_hex} ^ 0x${key_hex} )) and then printf '%02x' "$result". Process all 40 header bytes in a loop, building the de-XORed header either in a hex string or as a temp binary file.
Practical approach: Read the 40-byte header as a hex string, XOR each byte pair in a loop, write the result to a temp file, then use the existing read_le_u16/read_le_u32 functions on the temp file.
# Read 40-byte header as hex
header_hex=$(read_hex "$ARCHIVE" 0 40)
xor_key="a53c960fe17b4dc8"
# XOR each byte
i=0
result=""
while [ $i -lt 80 ]; do # 80 hex chars = 40 bytes
byte=$(printf '%.2s' "${header_hex#$(printf "%${i}s" | tr ' ' '?')}")
# ... extract byte at position i/2 from header_hex
key_byte_idx=$(( (i / 2) % 8 ))
key_byte=$(printf '%.2s' "${xor_key#$(printf "%$((key_byte_idx * 2))s" | tr ' ' '?')}")
xored=$(printf '%02x' "$(( 0x$byte ^ 0x$key_byte ))")
result="${result}${xored}"
i=$((i + 2))
done
# Write result to temp file using printf or xxd -r -p
Warning signs: Hex string indexing errors, off-by-one in the XOR loop, wrong byte order.
Pitfall 4: Kotlin Signed Byte XOR
What goes wrong: Kotlin bytes are signed (-128 to 127). XOR operations on bytes require .toInt() and 0xFF masking to avoid sign extension. The XOR key contains bytes > 0x7F (e.g., 0xA5, 0xC8) which are negative in Kotlin's signed byte representation.
Why it happens: 0xA5.toByte() in Kotlin is -91, and XOR between two signed bytes can produce unexpected results without masking.
How to avoid: Always use (buf[i].toInt() and 0xFF) xor (XOR_KEY[i % 8].toInt() and 0xFF) and then .toByte() the result. This is the same pattern already used in ArchiveDecoder.kt for other byte operations.
Warning signs: XOR produces wrong values for bytes > 0x7F; magic byte check fails after de-XOR.
Pitfall 5: Data Offset Computation with Padding
What goes wrong: The archiver computes data_offset for each file by summing toc_offset + toc_size + sum(encrypted_sizes_before). With decoy padding, it must also add sum(padding_after_before).
Why it happens: The current pack() function computes offsets in a simple loop without padding.
How to avoid: Generate all padding_after values first, then compute offsets as current_offset += encrypted_size + padding_after for each file.
Warning signs: Data offsets in TOC entries point to wrong locations; decoders read garbage ciphertext.
Pitfall 6: TOC Size for Encrypted TOC
What goes wrong: The toc_size header field must store the encrypted TOC size (which includes PKCS7 padding), not the plaintext serialized size. The encrypted size is ((plaintext_size / 16) + 1) * 16.
Why it happens: The current code sets toc_size to the plaintext size. After encryption, the size grows due to PKCS7 padding.
How to avoid: Serialize TOC to buffer first, encrypt, then use encrypted_toc.len() as toc_size.
Warning signs: Decoder reads wrong number of bytes for encrypted TOC; AES decryption fails with "invalid padding".
Pitfall 7: Inspect Command with Obfuscation
What goes wrong: The inspect command currently reads the header and TOC in plaintext. After obfuscation, it must de-XOR the header and decrypt the TOC before printing metadata.
Why it happens: The inspect path shares code with unpack but the developer might forget to update it.
How to avoid: Factor out header de-obfuscation and TOC decryption into reusable functions called by both unpack() and inspect().
Warning signs: inspect command crashes or shows garbage on obfuscated archives.
Code Examples
XOR Header Round-Trip (Rust)
// Source: FORMAT.md Section 9.1
const XOR_KEY: [u8; 8] = [0xA5, 0x3C, 0x96, 0x0F, 0xE1, 0x7B, 0x4D, 0xC8];
fn xor_header_buf(buf: &mut [u8]) {
assert!(buf.len() >= 40);
for i in 0..40 {
buf[i] ^= XOR_KEY[i % 8];
}
}
// Encoding: write header normally, then XOR
let mut header_buf = Vec::new();
write_header(&mut header_buf, &header)?;
xor_header_buf(&mut header_buf);
out_file.write_all(&header_buf)?;
// Decoding: read 40 bytes, check magic, if no match XOR and re-check
let mut buf = [0u8; 40];
reader.read_exact(&mut buf)?;
if buf[0..4] != MAGIC {
xor_header_buf(&mut buf);
anyhow::ensure!(buf[0..4] == MAGIC, "Invalid magic bytes after XOR attempt");
}
// Parse header from buf...
TOC Encryption (Rust)
// Source: FORMAT.md Section 9.2
// Encoding
let mut toc_plaintext = Vec::new();
for entry in &toc_entries {
write_toc_entry(&mut toc_plaintext, entry)?;
}
let toc_iv = crypto::generate_iv();
let encrypted_toc = crypto::encrypt_data(&toc_plaintext, &KEY, &toc_iv);
// encrypted_toc.len() is the toc_size to store in header
// Decoding
let encrypted_toc_buf = /* read toc_size bytes from toc_offset */;
let toc_plaintext = crypto::decrypt_data(&encrypted_toc_buf, &KEY, &header.toc_iv)?;
let mut cursor = Cursor::new(&toc_plaintext);
let entries = read_toc(&mut cursor, header.file_count)?;
Decoy Padding (Rust)
// Source: FORMAT.md Section 9.3
use rand::Rng;
let mut rng = rand::rng();
// For each file, during pack:
let padding_after: u16 = rng.random_range(64..=4096);
let mut padding_bytes = vec![0u8; padding_after as usize];
rand::Fill::fill(&mut padding_bytes[..], &mut rng);
// After writing ciphertext for this file:
out_file.write_all(&pf.ciphertext)?;
out_file.write_all(&padding_bytes)?;
Shell Decoder XOR De-obfuscation
# Source: FORMAT.md Section 9.1 + Section 10 step 2
XOR_KEY_HEX="a53c960fe17b4dc8"
# Read 40-byte header as hex
raw_header_hex=$(read_hex "$ARCHIVE" 0 40)
magic_hex=$(printf '%.8s' "$raw_header_hex")
if [ "$magic_hex" = "00ea7263" ]; then
header_hex="$raw_header_hex"
else
# Apply XOR de-obfuscation
header_hex=""
byte_idx=0
while [ "$byte_idx" -lt 40 ]; do
hex_pos=$((byte_idx * 2))
# Extract byte from raw header
raw_byte_hex=$(printf '%s' "$raw_header_hex" | cut -c$((hex_pos + 1))-$((hex_pos + 2)))
# Extract key byte (cyclic)
key_pos=$(( (byte_idx % 8) * 2 ))
key_byte_hex=$(printf '%s' "$XOR_KEY_HEX" | cut -c$((key_pos + 1))-$((key_pos + 2)))
# XOR
result=$(printf '%02x' "$(( 0x$raw_byte_hex ^ 0x$key_byte_hex ))")
header_hex="${header_hex}${result}"
byte_idx=$((byte_idx + 1))
done
# Verify magic after XOR
magic_hex=$(printf '%.8s' "$header_hex")
if [ "$magic_hex" != "00ea7263" ]; then
printf 'Invalid archive: bad magic bytes\n' >&2
exit 1
fi
fi
# Write de-XORed header to temp file for field parsing
printf '%s' "$header_hex" | xxd -r -p > "$TMPDIR/header.bin"
# Now use read_le_u16/read_le_u32 on "$TMPDIR/header.bin"
Kotlin XOR De-obfuscation
// Source: FORMAT.md Section 9.1
val XOR_KEY = byteArrayOf(
0xA5.toByte(), 0x3C, 0x96.toByte(), 0x0F,
0xE1.toByte(), 0x7B, 0x4D, 0xC8.toByte()
)
fun xorHeader(buf: ByteArray) {
for (i in 0 until minOf(buf.size, 40)) {
buf[i] = ((buf[i].toInt() and 0xFF) xor (XOR_KEY[i % 8].toInt() and 0xFF)).toByte()
}
}
// In decode():
val headerBytes = ByteArray(HEADER_SIZE)
raf.readFully(headerBytes)
// Check magic before XOR
if (!(headerBytes[0] == MAGIC[0] && headerBytes[1] == MAGIC[1] &&
headerBytes[2] == MAGIC[2] && headerBytes[3] == MAGIC[3])) {
// Attempt XOR de-obfuscation
xorHeader(headerBytes)
}
val header = parseHeader(headerBytes)
// If TOC encrypted:
if (header.flags and 0x02 != 0) {
raf.seek(header.tocOffset)
val encryptedToc = ByteArray(header.tocSize.toInt())
raf.readFully(encryptedToc)
val decryptedToc = decryptAesCbc(encryptedToc, header.tocIv, KEY)
val entries = parseToc(decryptedToc, header.fileCount)
// ... proceed with entries
}
State of the Art
| Old Approach (current) | New Approach (Phase 6) | Impact |
|---|---|---|
| Plaintext header with MAGIC visible | XOR-obfuscated header -- no recognizable bytes | file and binwalk cannot identify format |
| Plaintext TOC with filenames visible | AES-encrypted TOC -- strings reveals nothing |
Hex editors see no metadata |
| Contiguous data blocks | Data blocks with random padding gaps | Size analysis of individual files is defeated |
flags = 0x01 (compression only) |
flags = 0x0F (compression + all obfuscation) |
All obfuscation active by default |
Nothing is deprecated: The old approach still works (flags bits 1-3 = 0). The decoder always checks whether obfuscation is active and handles both cases.
Open Questions
-
Padding size range
- What we know:
padding_afteris u16 (0-65535). FORMAT.md doesn't specify a recommended range. - What's unclear: Should padding be uniformly random in a fixed range, or proportional to file size?
- Recommendation: Use a fixed range of 64-4096 bytes per file. This adds meaningful noise without significantly inflating archive size. The exact range is not spec-mandated, so the planner can decide.
- What we know:
-
Should obfuscation be the default or opt-in?
- What we know: The spec says features "can be activated independently." Phase 6 success criteria say "all three decoders still produce byte-identical output after obfuscation is applied."
- What's unclear: Should
packalways enable obfuscation, or should there be a--no-obfuscateflag? - Recommendation: Always enable all three obfuscation features. The whole point of Phase 6 is hardening. Add a
--no-obfuscateflag for backward compatibility testing only. This simplifies the implementation.
-
Existing test archives
- What we know: Current tests create archives without obfuscation.
- What's unclear: Should existing tests still pass with obfuscation enabled by default?
- Recommendation: Existing round-trip tests should still pass because they test pack→unpack, and both sides will now use obfuscation. Golden test vectors for crypto primitives are unaffected. Cross-validation tests (Kotlin, Shell) need to be re-run against obfuscated archives.
-
Shell
cutvs substring approach for hex processing- What we know: POSIX sh substring syntax (
${var:offset:length}) is a bashism not available in strict POSIX sh. The current shell decoder usesprintf '%.2s'and${var#??}patterns for string slicing. - What's unclear: Is
cut -cPOSIX-compliant for hex byte extraction in the XOR loop? - Recommendation:
cut -cis POSIX-compliant and available in busybox. Useprintf '%s' "$hex" | cut -c$start-$endfor byte extraction. Alternatively, use the existing${var#??}pattern in a loop. Test with busybox sh.
- What we know: POSIX sh substring syntax (
Sources
Primary (HIGH confidence)
- FORMAT.md Sections 9.1-9.3 and Section 10 -- complete specification of all three obfuscation features, including XOR key, flag bits, decode order, and bootstrapping algorithm
- Existing codebase (src/format.rs, src/crypto.rs, src/archive.rs, kotlin/ArchiveDecoder.kt, shell/decode.sh) -- verified current implementation patterns
Secondary (MEDIUM confidence)
- OpenSSL enc documentation -- confirms
-K/-iv/-nosaltraw key mode works with piped/file input for TOC decryption - Malwarebytes XOR obfuscation -- confirms XOR obfuscation is standard practice for hiding binary structure
- Security Lab entropy analysis -- confirms random padding disrupts entropy-based analysis tools
Tertiary (LOW confidence)
- None -- all findings verified against primary spec and codebase
Metadata
Confidence breakdown:
- Standard stack: HIGH -- no new dependencies, all primitives already in codebase
- Architecture: HIGH -- FORMAT.md fully specifies all three features with byte-level precision
- Pitfalls: HIGH -- identified by analyzing actual code structure and known shell/Kotlin quirks
Research date: 2026-02-25 Valid until: 2026-03-25 (stable -- format spec is frozen for v1)