22 KiB
Phase 5: Shell Decoder - Research
Researched: 2026-02-25 Domain: POSIX/busybox shell scripting, binary format parsing, AES-256-CBC decryption via CLI Confidence: HIGH
Summary
The shell decoder is a busybox-compatible shell script that extracts files from archives created by the Rust archiver. The script must parse the binary format (header + TOC) using dd and hex conversion tools, decrypt each file with openssl enc -aes-256-cbc, optionally decompress with gunzip, and verify integrity with sha256sum. The format spec (FORMAT.md Section 13) already provides reference functions for most operations.
The main technical challenges are: (1) openssl is NOT a busybox applet -- it requires a separate openssl binary on the target system; (2) xxd was added to busybox in v1.28 (2017) but older versions lack it -- od must serve as fallback; (3) extracting UTF-8 filenames (Cyrillic) from binary data requires careful byte-range extraction with dd piped to raw output; (4) little-endian integer parsing in shell requires byte-swapping via hex string manipulation.
Primary recommendation: Build a single self-contained decode.sh script using the reference functions from FORMAT.md Section 13 as a foundation, with od-based fallbacks for xxd, graceful degradation for HMAC verification if openssl dgst lacks -mac support, and a cross-validation test script modeled after kotlin/test_decoder.sh.
<phase_requirements>
Phase Requirements
| ID | Description | Research Support |
|---|---|---|
| SHL-01 | Shell script dearchivation via busybox (dd, xxd, openssl, gunzip) | FORMAT.md Section 13 provides complete reference functions. dd and gunzip are native busybox applets. xxd available in busybox >=1.28 with od fallback. openssl requires external binary. |
| SHL-02 | openssl enc -aes-256-cbc with -K/-iv/-nosalt for raw key mode | OpenSSL enc supports -K (hex key), -iv (hex IV), -nosalt and auto-removes PKCS7 padding on decryption. Standard across OpenSSL 1.x and 3.x. |
| SHL-03 | Support files with non-ASCII names (Cyrillic) | dd extracts raw bytes which are valid UTF-8. The filename bytes can be written directly to a variable using command substitution. Shell/filesystem handles UTF-8 natively if LANG is set properly. |
| </phase_requirements> |
Standard Stack
Core
| Tool | Version | Purpose | Why Standard |
|---|---|---|---|
dd |
busybox built-in | Extract byte ranges from archive | Native applet, universal on all busybox builds |
openssl |
1.1.1+ or 3.x (external) | AES-256-CBC decryption, HMAC-SHA-256 | Only CLI tool supporting raw-key AES-CBC decryption |
gunzip |
busybox built-in | Gzip decompression | Native applet, handles standard gzip streams |
sha256sum |
busybox built-in | SHA-256 integrity verification | Native applet, produces standard hash output |
sh |
busybox ash/sh | Script interpreter | POSIX-compatible shell, always available |
Supporting
| Tool | Version | Purpose | When to Use |
|---|---|---|---|
xxd |
busybox >=1.28 (optional) | Binary-to-hex conversion | Primary hex encoder; added to busybox in 2017 |
od |
busybox built-in | Binary-to-hex conversion (fallback) | Fallback when xxd is unavailable |
awk |
busybox built-in | Text processing (parse openssl output) | Extract hash values from command output |
tr |
busybox built-in | Character deletion/translation | Remove whitespace/newlines from hex output |
printf |
shell built-in | Hex-to-decimal conversion | Convert 0xNN strings to decimal integers |
mktemp |
busybox built-in | Create temporary files | Temporary storage for ciphertext/decrypted data |
Alternatives Considered
| Instead of | Could Use | Tradeoff |
|---|---|---|
xxd -p for hex |
od -A n -t x1 for hex |
od is universally available but output needs more cleanup (spaces, newlines) |
openssl dgst -mac HMAC for HMAC |
Skip HMAC verification | Older/minimal openssl may not support -mac HMAC -macopt; graceful degradation is the spec's recommendation |
printf '%d' "0x..." for hex-to-dec |
$(( 16#... )) bash arithmetic |
printf is more portable across sh implementations; bash arithmetic is not POSIX |
sha256sum for SHA-256 |
openssl dgst -sha256 |
Both work; sha256sum is a busybox applet (no external dependency) |
Architecture Patterns
Recommended Script Structure
shell/
├── decode.sh # Main decoder script (single file, self-contained)
└── test_decoder.sh # Cross-validation test script (Rust pack -> Shell decode)
The decoder is a SINGLE self-contained script. No external libraries, no sourced files. This matches the project's deployment model: the script is copied alongside the archive to the target device.
Pattern 1: Detect-and-Fallback for xxd/od
What: Auto-detect if xxd is available; if not, define wrapper functions using od.
When to use: At script startup, before any binary parsing.
Example:
# Source: FORMAT.md Section 13.1 + 13.2 (od fallback)
if command -v xxd >/dev/null 2>&1; then
read_hex() {
dd if="$1" bs=1 skip="$2" count="$3" 2>/dev/null | xxd -p | tr -d '\n'
}
else
read_hex() {
dd if="$1" bs=1 skip="$2" count="$3" 2>/dev/null \
| od -A n -t x1 | tr -d ' \n'
}
fi
Pattern 2: Little-Endian Integer Parsing via Hex Byte Swap
What: Read N bytes as hex, swap byte order, convert to decimal. When to use: For every u16/u32 field in the header and TOC. Example:
# Source: FORMAT.md Section 13.1
read_le_u16() {
local hex=$(read_hex "$1" "$2" 2)
local b0=${hex:0:2} b1=${hex:2:2}
printf '%d' "0x${b1}${b0}"
}
read_le_u32() {
local hex=$(read_hex "$1" "$2" 4)
local b0=${hex:0:2} b1=${hex:2:2} b2=${hex:4:2} b3=${hex:6:2}
printf '%d' "0x${b3}${b2}${b1}${b0}"
}
Pattern 3: Sequential TOC Parsing with Running Offset
What: Parse variable-length TOC entries using a running byte offset. When to use: When reading the file table (TOC), each entry has a variable-length filename. Example:
# Start at toc_offset
pos=$toc_offset
for i in $(seq 0 $((file_count - 1))); do
name_length=$(read_le_u16 "$ARCHIVE" "$pos")
pos=$((pos + 2))
# Extract filename (raw UTF-8 bytes via dd)
filename=$(dd if="$ARCHIVE" bs=1 skip="$pos" count="$name_length" 2>/dev/null)
pos=$((pos + name_length))
original_size=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
compressed_size=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
encrypted_size=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
data_offset=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
iv_hex=$(read_hex "$ARCHIVE" "$pos" 16); pos=$((pos + 16))
hmac_hex=$(read_hex "$ARCHIVE" "$pos" 32); pos=$((pos + 32))
sha256_hex=$(read_hex "$ARCHIVE" "$pos" 32); pos=$((pos + 32))
compression_flag=$(read_hex "$ARCHIVE" "$pos" 1); pos=$((pos + 1))
padding_after=$(read_le_u16 "$ARCHIVE" "$pos"); pos=$((pos + 2))
# Process this file entry...
done
Pattern 4: Pipe-Based Decryption (dd | openssl)
What: Extract ciphertext with dd and pipe directly to openssl enc -d for decryption.
When to use: For each file's data block decryption.
Example:
# Source: FORMAT.md Section 13.4
dd if="$ARCHIVE" bs=1 skip="$data_offset" count="$encrypted_size" 2>/dev/null \
| openssl enc -d -aes-256-cbc -nosalt -K "$KEY_HEX" -iv "$iv_hex" \
> "$tmpfile"
Pattern 5: Graceful HMAC Degradation
What: Detect if openssl dgst -mac HMAC is supported; skip HMAC if not.
When to use: Before the file extraction loop.
Example:
# Source: FORMAT.md Section 13.3
SKIP_HMAC=0
if ! echo -n "test" | openssl dgst -sha256 -mac HMAC -macopt hexkey:00 >/dev/null 2>&1; then
echo "WARNING: openssl HMAC not available, skipping integrity verification"
SKIP_HMAC=1
fi
Anti-Patterns to Avoid
- Using bash-specific syntax: The script must run in busybox
ash/sh. No[[ ]], no$((16#FF)), no arrays, no process substitution<(). Use[ ],printf '%d' "0x...", positional parameters or temp files. - Reading entire archive into memory: Shell cannot handle binary data in variables. Always use
ddto extract specific byte ranges to files or pipes. - Using
-eflag with echo for binary: Portability issues across shells. Useprintforddinstead. - Storing binary data in shell variables: NULL bytes (
\0) terminate strings in shell. Only store hex strings in variables, never raw binary. - Hardcoding
/tmp: Usemktempfor temporary files. Clean up with a trap. - Using
xxd -e(little-endian mode): Not supported in busybox xxd. Manual byte swapping is required.
Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| AES-256-CBC decryption | Custom decryption in shell | openssl enc -d -aes-256-cbc |
Impossible to implement AES in pure shell; openssl handles PKCS7 removal automatically |
| Gzip decompression | Custom DEFLATE in shell | gunzip -c |
Compression algorithms cannot be implemented in shell |
| SHA-256 hashing | Custom hash in shell | sha256sum (busybox) |
Cryptographic hash requires proper implementation |
| HMAC-SHA-256 | Custom HMAC in shell | openssl dgst -sha256 -mac HMAC |
HMAC construction is subtle; openssl handles it correctly |
| Hex-to-binary conversion | Manual byte construction | xxd -r -p or printf '\xNN' |
Direct hex-to-binary tools already exist |
Key insight: The shell decoder is fundamentally a "glue script" -- it orchestrates existing tools (dd, openssl, gunzip, sha256sum) to implement the decode pipeline. All cryptographic and compression operations are delegated to dedicated tools; the script only handles binary format parsing (offsets, lengths, byte swapping).
Common Pitfalls
Pitfall 1: openssl enc Output Contains Extra Bytes
What goes wrong: When using openssl enc -d with piped input from dd, the openssl command may behave differently depending on whether input comes from a file or a pipe. Some versions have issues with incomplete reads on pipes.
Why it happens: Pipe buffering and EOF handling can differ from file I/O.
How to avoid: Always extract ciphertext to a temp file first, then decrypt from the file. Or use dd ... | openssl enc -d ... and verify the output size matches compressed_size (or original_size if uncompressed).
Warning signs: Decrypted output size doesn't match expected compressed_size.
Pitfall 2: Hex String Case Mismatch in HMAC Comparison
What goes wrong: openssl dgst outputs lowercase hex, but the stored HMAC extracted via xxd -p or od may produce different case.
Why it happens: Different tools use different case conventions for hex output.
How to avoid: Normalize both sides to lowercase before comparison: echo "$hex" | tr 'A-F' 'a-f'.
Warning signs: HMAC verification always fails despite correct data.
Pitfall 3: Empty File (0 bytes) Causes gunzip Error
What goes wrong: A 0-byte original file has compression_flag=1 but after decryption produces a valid (but tiny) gzip stream. However, the compressed size may be very small and edge cases exist.
Why it happens: Gzip of empty input produces a minimal gzip stream (~20 bytes). After AES encryption, encrypted_size = 32 (one PKCS7 block of padding added). The decrypted output is a valid gzip stream that decompresses to 0 bytes.
How to avoid: Check original_size == 0 before decompression; if zero, just create an empty file. Alternatively, let gunzip handle it (it should produce empty output from a valid empty gzip stream).
Warning signs: Script crashes or produces garbage for 0-byte files.
Pitfall 4: Cyrillic Filename Extraction Corrupts Characters
What goes wrong: The dd-extracted filename bytes are valid UTF-8, but the shell variable may be corrupted if the locale is not set or if intermediate processing strips high bytes.
Why it happens: Some busybox builds strip non-ASCII bytes, or the LANG/LC_ALL environment may not support UTF-8.
How to avoid: Set export LC_ALL=C (or C.UTF-8 if available) at the top of the script. Use dd to extract raw bytes directly. Do NOT process the filename through tr, sed, or awk before writing. Verify that printf '%s' preserves the bytes.
Warning signs: Extracted files have garbled names (mojibake) or are named with ? characters.
Pitfall 5: Shell Arithmetic Overflow on Large Files
What goes wrong: Shell arithmetic uses platform-native integers. On 32-bit shells, values above 2^31 (2 GB) overflow.
Why it happens: The archive format uses u32 for sizes and offsets (max 4 GB), but shell arithmetic may be limited to signed 32-bit.
How to avoid: All fields are u32 LE (max ~4 GB). Busybox on ARM typically uses 32-bit arithmetic. For v1, file sizes under 4 GB are expected, so this is LOW risk. If needed, use awk for arithmetic on large numbers.
Warning signs: Negative offsets or sizes in the script output for files larger than 2 GB.
Pitfall 6: dd stderr Noise Pollutes Output
What goes wrong: dd writes transfer statistics to stderr (e.g., "32 bytes transferred"). If stderr is not suppressed, it may confuse piped commands or pollute user output.
Why it happens: dd always writes stats to stderr unless suppressed.
How to avoid: Always use 2>/dev/null with dd commands. This is already shown in FORMAT.md Section 13 reference functions.
Warning signs: Unexpected text mixed into hex output or filenames.
Pitfall 7: openssl 3.x Changes HMAC Syntax
What goes wrong: The -mac HMAC -macopt hexkey:KEY syntax works in OpenSSL 1.x and 3.x but is soft-deprecated in 3.x. The new openssl mac subcommand is preferred but has different syntax.
Why it happens: OpenSSL 3.x migrated to provider-based architecture; legacy options still work but may be removed.
How to avoid: Implement graceful degradation (already specified in FORMAT.md Section 13.3). Test with echo -n "test" | openssl dgst -sha256 -mac HMAC -macopt hexkey:00 at script startup. If it fails, set SKIP_HMAC=1.
Warning signs: HMAC check produces error messages instead of hash values.
Code Examples
Complete Decode Pipeline for One File
# Verified pattern from FORMAT.md Section 13.4 + project decisions
KEY_HEX="7a35c1d94fe82b6a910df358bc74a61e428fd063e5179b2cfa8406cd3e79b550"
TMPDIR=$(mktemp -d)
trap 'rm -rf "$TMPDIR"' EXIT
# Step 1: Extract ciphertext to temp file
dd if="$ARCHIVE" bs=1 skip="$data_offset" count="$encrypted_size" \
of="$TMPDIR/ct.bin" 2>/dev/null
# Step 2: Verify HMAC (if available)
if [ "$SKIP_HMAC" = "0" ]; then
computed_hmac=$(
{
dd if="$ARCHIVE" bs=1 skip="$iv_toc_offset" count=16 2>/dev/null # IV from TOC
cat "$TMPDIR/ct.bin" # ciphertext
} | openssl dgst -sha256 -mac HMAC -macopt "hexkey:${KEY_HEX}" -hex 2>/dev/null \
| awk '{print $NF}'
)
if [ "$computed_hmac" != "$hmac_hex" ]; then
echo "HMAC failed for $filename, skipping" >&2
continue
fi
fi
# Step 3: Decrypt (openssl auto-removes PKCS7 padding)
openssl enc -d -aes-256-cbc -nosalt \
-K "$KEY_HEX" -iv "$iv_hex" \
-in "$TMPDIR/ct.bin" -out "$TMPDIR/dec.bin"
# Step 4: Decompress if needed
if [ "$compression_flag" = "01" ]; then
gunzip -c "$TMPDIR/dec.bin" > "$TMPDIR/out.bin"
else
mv "$TMPDIR/dec.bin" "$TMPDIR/out.bin"
fi
# Step 5: Verify SHA-256
actual_sha=$(sha256sum "$TMPDIR/out.bin" | awk '{print $1}')
if [ "$actual_sha" != "$sha256_hex" ]; then
echo "WARNING: SHA-256 mismatch for $filename" >&2
fi
# Step 6: Write output
mv "$TMPDIR/out.bin" "$OUTPUT_DIR/$filename"
Key Hex Constant (from src/key.rs)
# Hardcoded 32-byte AES-256 key as hex string (matching src/key.rs)
KEY_HEX="7a35c1d94fe82b6a910df358bc74a61e428fd063e5179b2cfa8406cd3e79b550"
Derivation from src/key.rs:
0x7A 0x35 0xC1 0xD9 0x4F 0xE8 0x2B 0x6A
0x91 0x0D 0xF3 0x58 0xBC 0x74 0xA6 0x1E
0x42 0x8F 0xD0 0x63 0xE5 0x17 0x9B 0x2C
0xFA 0x84 0x06 0xCD 0x3E 0x79 0xB5 0x50
HMAC Verification with IV from Archive (Not from TOC-parsed Variable)
A subtle point: for HMAC verification, the IV bytes must come from the archive file (not from a hex variable). The HMAC is computed over raw IV || ciphertext bytes, not hex strings. The approach using dd to extract IV bytes and concatenating with ciphertext via subshell { dd ...; dd ...; } is correct (as shown in FORMAT.md Section 13.3).
However, for the HMAC comparison, we compare hex strings (both from openssl dgst output and from the TOC hex extraction). Both must be lowercase.
UTF-8 Filename Extraction
# dd extracts raw bytes; if they are valid UTF-8, the shell preserves them
filename=$(dd if="$ARCHIVE" bs=1 skip="$pos" count="$name_length" 2>/dev/null)
# $filename now contains UTF-8 string, including Cyrillic characters
# Works because: (1) dd copies raw bytes, (2) $() captures them, (3) no null bytes in UTF-8 filenames
State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
openssl dgst -mac HMAC -macopt |
openssl mac -digest SHA256 -macopt hexkey:... |
OpenSSL 3.0 (2021) | Old syntax still works in 3.x but soft-deprecated |
xxd not in busybox |
xxd applet in busybox |
BusyBox 1.28 (2017) | Available on newer builds, but od fallback still needed for older systems |
openssl enc with -md md5 default |
-md sha256 default |
OpenSSL 1.1.0 (2016) | No impact for raw key mode (-K/-iv); -md only affects password-derived keys |
Deprecated/outdated:
openssl dgst -hmac "key"(string key): Still works but-macopt hexkey:is required for binary keys. The hex key mode is NOT deprecated.- busybox builds without xxd: Still common on very old/minimal systems, hence
odfallback is essential.
Open Questions
-
Does the target busybox have
openssl?- What we know: openssl is NOT a busybox applet. It must be a separate binary on the target system. The project mentions "busybox-compatible" in requirements.
- What's unclear: Whether the specific target device (Android-based car head unit) has the
opensslCLI installed. - Recommendation: The script MUST fail with a clear error message if
opensslis not found. Document this as a prerequisite. The script already checks for tool availability at startup.
-
Does the target
opensslsupport-mac HMAC -macopt hexkey:?- What we know: Standard OpenSSL 1.1.1+ and 3.x support this syntax. Busybox does not include openssl. Minimal/embedded openssl builds may lack HMAC support.
- What's unclear: Exact openssl version on target.
- Recommendation: Implement graceful degradation per FORMAT.md Section 13.3. Skip HMAC if unsupported, print warning.
-
Performance on large files with
dd bs=1?- What we know:
dd bs=1reads one byte at a time. For extracting large data blocks (megabytes), this is very slow. - What's unclear: Whether the shell decoder needs to handle large files efficiently.
- Recommendation: For data block extraction (Step 1 of decode), use larger block sizes. Extract full ciphertext with
dd bs=1 skip=OFFSET count=SIZEwhich still uses bs=1 but lets dd handle the buffering. For truly large files, considerdd bs=4096with calculated skip/count, but the added complexity may not be worth it for a fallback decoder.
- What we know:
Sources
Primary (HIGH confidence)
- FORMAT.md Section 13 (Shell Decoder Reference) - Complete reference functions for all operations
- FORMAT.md Section 10 (Decode Order of Operations) - Mandatory decode pipeline
- FORMAT.md Section 4-5 (Header/TOC structure) - Binary layout
src/key.rs- Actual hardcoded key bytessrc/format.rs- Rust implementation of header/TOC parsing (reference)src/crypto.rs- Rust crypto implementation (HMAC scope, encrypt/decrypt)kotlin/ArchiveDecoder.kt- Working decoder implementation (behavioral reference)kotlin/test_decoder.sh- Cross-validation test pattern (structural reference)
Secondary (MEDIUM confidence)
- OpenSSL enc documentation (3.3) -
-K,-iv,-nosalt, PKCS7 auto-removal - OpenSSL dgst documentation (3.3) -
-mac HMAC -macopt hexkey:syntax - BusyBox xxd commit (2017) - xxd applet added to busybox
- BusyBox applet list - dd, gunzip, sha256sum, od are native applets; openssl, xxd are not
- BusyBox xxd options - Supported flags: -p, -r, -l, -s, -g, -c, -u
- docker-library/busybox#13 - UTF-8 support limitations in busybox
Tertiary (LOW confidence)
- Various shell scripting guides on binary file handling - General patterns, not project-specific
Metadata
Confidence breakdown:
- Standard stack: HIGH - All tools are well-documented CLI utilities with decades of stability. FORMAT.md Section 13 provides verified reference code.
- Architecture: HIGH - Single-file script pattern is proven by the existing
kotlin/test_decoder.sh. Binary parsing pattern with dd+xxd/od is well-established. - Pitfalls: HIGH - Identified from real tool behavior (openssl pipe handling, hex case, PKCS7, busybox limitations). FORMAT.md already anticipates several pitfalls (graceful HMAC degradation, od fallback).
Research date: 2026-02-25 Valid until: 2026-03-25 (stable domain -- shell tools and openssl CLI interface change very slowly)