docs(phase5): research shell decoder domain
This commit is contained in:
385
.planning/phases/05-shell-decoder/05-RESEARCH.md
Normal file
385
.planning/phases/05-shell-decoder/05-RESEARCH.md
Normal file
@@ -0,0 +1,385 @@
|
||||
# Phase 5: Shell Decoder - Research
|
||||
|
||||
**Researched:** 2026-02-25
|
||||
**Domain:** POSIX/busybox shell scripting, binary format parsing, AES-256-CBC decryption via CLI
|
||||
**Confidence:** HIGH
|
||||
|
||||
## Summary
|
||||
|
||||
The shell decoder is a busybox-compatible shell script that extracts files from archives created by the Rust archiver. The script must parse the binary format (header + TOC) using `dd` and hex conversion tools, decrypt each file with `openssl enc -aes-256-cbc`, optionally decompress with `gunzip`, and verify integrity with `sha256sum`. The format spec (FORMAT.md Section 13) already provides reference functions for most operations.
|
||||
|
||||
The main technical challenges are: (1) `openssl` is NOT a busybox applet -- it requires a separate `openssl` binary on the target system; (2) `xxd` was added to busybox in v1.28 (2017) but older versions lack it -- `od` must serve as fallback; (3) extracting UTF-8 filenames (Cyrillic) from binary data requires careful byte-range extraction with `dd` piped to raw output; (4) little-endian integer parsing in shell requires byte-swapping via hex string manipulation.
|
||||
|
||||
**Primary recommendation:** Build a single self-contained `decode.sh` script using the reference functions from FORMAT.md Section 13 as a foundation, with `od`-based fallbacks for `xxd`, graceful degradation for HMAC verification if `openssl dgst` lacks `-mac` support, and a cross-validation test script modeled after `kotlin/test_decoder.sh`.
|
||||
|
||||
<phase_requirements>
|
||||
## Phase Requirements
|
||||
|
||||
| ID | Description | Research Support |
|
||||
|----|-------------|-----------------|
|
||||
| SHL-01 | Shell script dearchivation via busybox (dd, xxd, openssl, gunzip) | FORMAT.md Section 13 provides complete reference functions. `dd` and `gunzip` are native busybox applets. `xxd` available in busybox >=1.28 with `od` fallback. `openssl` requires external binary. |
|
||||
| SHL-02 | openssl enc -aes-256-cbc with -K/-iv/-nosalt for raw key mode | OpenSSL `enc` supports `-K` (hex key), `-iv` (hex IV), `-nosalt` and auto-removes PKCS7 padding on decryption. Standard across OpenSSL 1.x and 3.x. |
|
||||
| SHL-03 | Support files with non-ASCII names (Cyrillic) | `dd` extracts raw bytes which are valid UTF-8. The filename bytes can be written directly to a variable using command substitution. Shell/filesystem handles UTF-8 natively if `LANG` is set properly. |
|
||||
</phase_requirements>
|
||||
|
||||
## Standard Stack
|
||||
|
||||
### Core
|
||||
|
||||
| Tool | Version | Purpose | Why Standard |
|
||||
|------|---------|---------|--------------|
|
||||
| `dd` | busybox built-in | Extract byte ranges from archive | Native applet, universal on all busybox builds |
|
||||
| `openssl` | 1.1.1+ or 3.x (external) | AES-256-CBC decryption, HMAC-SHA-256 | Only CLI tool supporting raw-key AES-CBC decryption |
|
||||
| `gunzip` | busybox built-in | Gzip decompression | Native applet, handles standard gzip streams |
|
||||
| `sha256sum` | busybox built-in | SHA-256 integrity verification | Native applet, produces standard hash output |
|
||||
| `sh` | busybox ash/sh | Script interpreter | POSIX-compatible shell, always available |
|
||||
|
||||
### Supporting
|
||||
|
||||
| Tool | Version | Purpose | When to Use |
|
||||
|------|---------|---------|-------------|
|
||||
| `xxd` | busybox >=1.28 (optional) | Binary-to-hex conversion | Primary hex encoder; added to busybox in 2017 |
|
||||
| `od` | busybox built-in | Binary-to-hex conversion (fallback) | Fallback when `xxd` is unavailable |
|
||||
| `awk` | busybox built-in | Text processing (parse openssl output) | Extract hash values from command output |
|
||||
| `tr` | busybox built-in | Character deletion/translation | Remove whitespace/newlines from hex output |
|
||||
| `printf` | shell built-in | Hex-to-decimal conversion | Convert `0xNN` strings to decimal integers |
|
||||
| `mktemp` | busybox built-in | Create temporary files | Temporary storage for ciphertext/decrypted data |
|
||||
|
||||
### Alternatives Considered
|
||||
|
||||
| Instead of | Could Use | Tradeoff |
|
||||
|------------|-----------|----------|
|
||||
| `xxd -p` for hex | `od -A n -t x1` for hex | `od` is universally available but output needs more cleanup (spaces, newlines) |
|
||||
| `openssl dgst -mac HMAC` for HMAC | Skip HMAC verification | Older/minimal openssl may not support `-mac HMAC -macopt`; graceful degradation is the spec's recommendation |
|
||||
| `printf '%d' "0x..."` for hex-to-dec | `$(( 16#... ))` bash arithmetic | `printf` is more portable across sh implementations; bash arithmetic is not POSIX |
|
||||
| `sha256sum` for SHA-256 | `openssl dgst -sha256` | Both work; `sha256sum` is a busybox applet (no external dependency) |
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Recommended Script Structure
|
||||
|
||||
```
|
||||
shell/
|
||||
├── decode.sh # Main decoder script (single file, self-contained)
|
||||
└── test_decoder.sh # Cross-validation test script (Rust pack -> Shell decode)
|
||||
```
|
||||
|
||||
The decoder is a SINGLE self-contained script. No external libraries, no sourced files. This matches the project's deployment model: the script is copied alongside the archive to the target device.
|
||||
|
||||
### Pattern 1: Detect-and-Fallback for xxd/od
|
||||
|
||||
**What:** Auto-detect if `xxd` is available; if not, define wrapper functions using `od`.
|
||||
**When to use:** At script startup, before any binary parsing.
|
||||
**Example:**
|
||||
|
||||
```sh
|
||||
# Source: FORMAT.md Section 13.1 + 13.2 (od fallback)
|
||||
if command -v xxd >/dev/null 2>&1; then
|
||||
read_hex() {
|
||||
dd if="$1" bs=1 skip="$2" count="$3" 2>/dev/null | xxd -p | tr -d '\n'
|
||||
}
|
||||
else
|
||||
read_hex() {
|
||||
dd if="$1" bs=1 skip="$2" count="$3" 2>/dev/null \
|
||||
| od -A n -t x1 | tr -d ' \n'
|
||||
}
|
||||
fi
|
||||
```
|
||||
|
||||
### Pattern 2: Little-Endian Integer Parsing via Hex Byte Swap
|
||||
|
||||
**What:** Read N bytes as hex, swap byte order, convert to decimal.
|
||||
**When to use:** For every u16/u32 field in the header and TOC.
|
||||
**Example:**
|
||||
|
||||
```sh
|
||||
# Source: FORMAT.md Section 13.1
|
||||
read_le_u16() {
|
||||
local hex=$(read_hex "$1" "$2" 2)
|
||||
local b0=${hex:0:2} b1=${hex:2:2}
|
||||
printf '%d' "0x${b1}${b0}"
|
||||
}
|
||||
|
||||
read_le_u32() {
|
||||
local hex=$(read_hex "$1" "$2" 4)
|
||||
local b0=${hex:0:2} b1=${hex:2:2} b2=${hex:4:2} b3=${hex:6:2}
|
||||
printf '%d' "0x${b3}${b2}${b1}${b0}"
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: Sequential TOC Parsing with Running Offset
|
||||
|
||||
**What:** Parse variable-length TOC entries using a running byte offset.
|
||||
**When to use:** When reading the file table (TOC), each entry has a variable-length filename.
|
||||
**Example:**
|
||||
|
||||
```sh
|
||||
# Start at toc_offset
|
||||
pos=$toc_offset
|
||||
|
||||
for i in $(seq 0 $((file_count - 1))); do
|
||||
name_length=$(read_le_u16 "$ARCHIVE" "$pos")
|
||||
pos=$((pos + 2))
|
||||
|
||||
# Extract filename (raw UTF-8 bytes via dd)
|
||||
filename=$(dd if="$ARCHIVE" bs=1 skip="$pos" count="$name_length" 2>/dev/null)
|
||||
pos=$((pos + name_length))
|
||||
|
||||
original_size=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
|
||||
compressed_size=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
|
||||
encrypted_size=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
|
||||
data_offset=$(read_le_u32 "$ARCHIVE" "$pos"); pos=$((pos + 4))
|
||||
|
||||
iv_hex=$(read_hex "$ARCHIVE" "$pos" 16); pos=$((pos + 16))
|
||||
hmac_hex=$(read_hex "$ARCHIVE" "$pos" 32); pos=$((pos + 32))
|
||||
sha256_hex=$(read_hex "$ARCHIVE" "$pos" 32); pos=$((pos + 32))
|
||||
|
||||
compression_flag=$(read_hex "$ARCHIVE" "$pos" 1); pos=$((pos + 1))
|
||||
padding_after=$(read_le_u16 "$ARCHIVE" "$pos"); pos=$((pos + 2))
|
||||
|
||||
# Process this file entry...
|
||||
done
|
||||
```
|
||||
|
||||
### Pattern 4: Pipe-Based Decryption (dd | openssl)
|
||||
|
||||
**What:** Extract ciphertext with `dd` and pipe directly to `openssl enc -d` for decryption.
|
||||
**When to use:** For each file's data block decryption.
|
||||
**Example:**
|
||||
|
||||
```sh
|
||||
# Source: FORMAT.md Section 13.4
|
||||
dd if="$ARCHIVE" bs=1 skip="$data_offset" count="$encrypted_size" 2>/dev/null \
|
||||
| openssl enc -d -aes-256-cbc -nosalt -K "$KEY_HEX" -iv "$iv_hex" \
|
||||
> "$tmpfile"
|
||||
```
|
||||
|
||||
### Pattern 5: Graceful HMAC Degradation
|
||||
|
||||
**What:** Detect if `openssl dgst -mac HMAC` is supported; skip HMAC if not.
|
||||
**When to use:** Before the file extraction loop.
|
||||
**Example:**
|
||||
|
||||
```sh
|
||||
# Source: FORMAT.md Section 13.3
|
||||
SKIP_HMAC=0
|
||||
if ! echo -n "test" | openssl dgst -sha256 -mac HMAC -macopt hexkey:00 >/dev/null 2>&1; then
|
||||
echo "WARNING: openssl HMAC not available, skipping integrity verification"
|
||||
SKIP_HMAC=1
|
||||
fi
|
||||
```
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
|
||||
- **Using bash-specific syntax:** The script must run in busybox `ash`/`sh`. No `[[ ]]`, no `$((16#FF))`, no arrays, no process substitution `<()`. Use `[ ]`, `printf '%d' "0x..."`, positional parameters or temp files.
|
||||
- **Reading entire archive into memory:** Shell cannot handle binary data in variables. Always use `dd` to extract specific byte ranges to files or pipes.
|
||||
- **Using `-e` flag with echo for binary:** Portability issues across shells. Use `printf` or `dd` instead.
|
||||
- **Storing binary data in shell variables:** NULL bytes (`\0`) terminate strings in shell. Only store hex strings in variables, never raw binary.
|
||||
- **Hardcoding `/tmp`:** Use `mktemp` for temporary files. Clean up with a trap.
|
||||
- **Using `xxd -e` (little-endian mode):** Not supported in busybox xxd. Manual byte swapping is required.
|
||||
|
||||
## Don't Hand-Roll
|
||||
|
||||
| Problem | Don't Build | Use Instead | Why |
|
||||
|---------|-------------|-------------|-----|
|
||||
| AES-256-CBC decryption | Custom decryption in shell | `openssl enc -d -aes-256-cbc` | Impossible to implement AES in pure shell; openssl handles PKCS7 removal automatically |
|
||||
| Gzip decompression | Custom DEFLATE in shell | `gunzip -c` | Compression algorithms cannot be implemented in shell |
|
||||
| SHA-256 hashing | Custom hash in shell | `sha256sum` (busybox) | Cryptographic hash requires proper implementation |
|
||||
| HMAC-SHA-256 | Custom HMAC in shell | `openssl dgst -sha256 -mac HMAC` | HMAC construction is subtle; openssl handles it correctly |
|
||||
| Hex-to-binary conversion | Manual byte construction | `xxd -r -p` or `printf '\xNN'` | Direct hex-to-binary tools already exist |
|
||||
|
||||
**Key insight:** The shell decoder is fundamentally a "glue script" -- it orchestrates existing tools (`dd`, `openssl`, `gunzip`, `sha256sum`) to implement the decode pipeline. All cryptographic and compression operations are delegated to dedicated tools; the script only handles binary format parsing (offsets, lengths, byte swapping).
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: openssl enc Output Contains Extra Bytes
|
||||
|
||||
**What goes wrong:** When using `openssl enc -d` with piped input from `dd`, the `openssl` command may behave differently depending on whether input comes from a file or a pipe. Some versions have issues with incomplete reads on pipes.
|
||||
**Why it happens:** Pipe buffering and EOF handling can differ from file I/O.
|
||||
**How to avoid:** Always extract ciphertext to a temp file first, then decrypt from the file. Or use `dd ... | openssl enc -d ...` and verify the output size matches `compressed_size` (or `original_size` if uncompressed).
|
||||
**Warning signs:** Decrypted output size doesn't match expected `compressed_size`.
|
||||
|
||||
### Pitfall 2: Hex String Case Mismatch in HMAC Comparison
|
||||
|
||||
**What goes wrong:** `openssl dgst` outputs lowercase hex, but the stored HMAC extracted via `xxd -p` or `od` may produce different case.
|
||||
**Why it happens:** Different tools use different case conventions for hex output.
|
||||
**How to avoid:** Normalize both sides to lowercase before comparison: `echo "$hex" | tr 'A-F' 'a-f'`.
|
||||
**Warning signs:** HMAC verification always fails despite correct data.
|
||||
|
||||
### Pitfall 3: Empty File (0 bytes) Causes gunzip Error
|
||||
|
||||
**What goes wrong:** A 0-byte original file has `compression_flag=1` but after decryption produces a valid (but tiny) gzip stream. However, the compressed size may be very small and edge cases exist.
|
||||
**Why it happens:** Gzip of empty input produces a minimal gzip stream (~20 bytes). After AES encryption, `encrypted_size = 32` (one PKCS7 block of padding added). The decrypted output is a valid gzip stream that decompresses to 0 bytes.
|
||||
**How to avoid:** Check `original_size == 0` before decompression; if zero, just create an empty file. Alternatively, let `gunzip` handle it (it should produce empty output from a valid empty gzip stream).
|
||||
**Warning signs:** Script crashes or produces garbage for 0-byte files.
|
||||
|
||||
### Pitfall 4: Cyrillic Filename Extraction Corrupts Characters
|
||||
|
||||
**What goes wrong:** The `dd`-extracted filename bytes are valid UTF-8, but the shell variable may be corrupted if the locale is not set or if intermediate processing strips high bytes.
|
||||
**Why it happens:** Some busybox builds strip non-ASCII bytes, or the `LANG`/`LC_ALL` environment may not support UTF-8.
|
||||
**How to avoid:** Set `export LC_ALL=C` (or `C.UTF-8` if available) at the top of the script. Use `dd` to extract raw bytes directly. Do NOT process the filename through `tr`, `sed`, or `awk` before writing. Verify that `printf '%s'` preserves the bytes.
|
||||
**Warning signs:** Extracted files have garbled names (mojibake) or are named with `?` characters.
|
||||
|
||||
### Pitfall 5: Shell Arithmetic Overflow on Large Files
|
||||
|
||||
**What goes wrong:** Shell arithmetic uses platform-native integers. On 32-bit shells, values above 2^31 (2 GB) overflow.
|
||||
**Why it happens:** The archive format uses u32 for sizes and offsets (max 4 GB), but shell arithmetic may be limited to signed 32-bit.
|
||||
**How to avoid:** All fields are u32 LE (max ~4 GB). Busybox on ARM typically uses 32-bit arithmetic. For v1, file sizes under 4 GB are expected, so this is LOW risk. If needed, use `awk` for arithmetic on large numbers.
|
||||
**Warning signs:** Negative offsets or sizes in the script output for files larger than 2 GB.
|
||||
|
||||
### Pitfall 6: dd stderr Noise Pollutes Output
|
||||
|
||||
**What goes wrong:** `dd` writes transfer statistics to stderr (e.g., "32 bytes transferred"). If stderr is not suppressed, it may confuse piped commands or pollute user output.
|
||||
**Why it happens:** `dd` always writes stats to stderr unless suppressed.
|
||||
**How to avoid:** Always use `2>/dev/null` with `dd` commands. This is already shown in FORMAT.md Section 13 reference functions.
|
||||
**Warning signs:** Unexpected text mixed into hex output or filenames.
|
||||
|
||||
### Pitfall 7: openssl 3.x Changes HMAC Syntax
|
||||
|
||||
**What goes wrong:** The `-mac HMAC -macopt hexkey:KEY` syntax works in OpenSSL 1.x and 3.x but is soft-deprecated in 3.x. The new `openssl mac` subcommand is preferred but has different syntax.
|
||||
**Why it happens:** OpenSSL 3.x migrated to provider-based architecture; legacy options still work but may be removed.
|
||||
**How to avoid:** Implement graceful degradation (already specified in FORMAT.md Section 13.3). Test with `echo -n "test" | openssl dgst -sha256 -mac HMAC -macopt hexkey:00` at script startup. If it fails, set `SKIP_HMAC=1`.
|
||||
**Warning signs:** HMAC check produces error messages instead of hash values.
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Complete Decode Pipeline for One File
|
||||
|
||||
```sh
|
||||
# Verified pattern from FORMAT.md Section 13.4 + project decisions
|
||||
KEY_HEX="7a35c1d94fe82b6a910df358bc74a61e428fd063e5179b2cfa8406cd3e79b550"
|
||||
TMPDIR=$(mktemp -d)
|
||||
trap 'rm -rf "$TMPDIR"' EXIT
|
||||
|
||||
# Step 1: Extract ciphertext to temp file
|
||||
dd if="$ARCHIVE" bs=1 skip="$data_offset" count="$encrypted_size" \
|
||||
of="$TMPDIR/ct.bin" 2>/dev/null
|
||||
|
||||
# Step 2: Verify HMAC (if available)
|
||||
if [ "$SKIP_HMAC" = "0" ]; then
|
||||
computed_hmac=$(
|
||||
{
|
||||
dd if="$ARCHIVE" bs=1 skip="$iv_toc_offset" count=16 2>/dev/null # IV from TOC
|
||||
cat "$TMPDIR/ct.bin" # ciphertext
|
||||
} | openssl dgst -sha256 -mac HMAC -macopt "hexkey:${KEY_HEX}" -hex 2>/dev/null \
|
||||
| awk '{print $NF}'
|
||||
)
|
||||
if [ "$computed_hmac" != "$hmac_hex" ]; then
|
||||
echo "HMAC failed for $filename, skipping" >&2
|
||||
continue
|
||||
fi
|
||||
fi
|
||||
|
||||
# Step 3: Decrypt (openssl auto-removes PKCS7 padding)
|
||||
openssl enc -d -aes-256-cbc -nosalt \
|
||||
-K "$KEY_HEX" -iv "$iv_hex" \
|
||||
-in "$TMPDIR/ct.bin" -out "$TMPDIR/dec.bin"
|
||||
|
||||
# Step 4: Decompress if needed
|
||||
if [ "$compression_flag" = "01" ]; then
|
||||
gunzip -c "$TMPDIR/dec.bin" > "$TMPDIR/out.bin"
|
||||
else
|
||||
mv "$TMPDIR/dec.bin" "$TMPDIR/out.bin"
|
||||
fi
|
||||
|
||||
# Step 5: Verify SHA-256
|
||||
actual_sha=$(sha256sum "$TMPDIR/out.bin" | awk '{print $1}')
|
||||
if [ "$actual_sha" != "$sha256_hex" ]; then
|
||||
echo "WARNING: SHA-256 mismatch for $filename" >&2
|
||||
fi
|
||||
|
||||
# Step 6: Write output
|
||||
mv "$TMPDIR/out.bin" "$OUTPUT_DIR/$filename"
|
||||
```
|
||||
|
||||
### Key Hex Constant (from src/key.rs)
|
||||
|
||||
```sh
|
||||
# Hardcoded 32-byte AES-256 key as hex string (matching src/key.rs)
|
||||
KEY_HEX="7a35c1d94fe82b6a910df358bc74a61e428fd063e5179b2cfa8406cd3e79b550"
|
||||
```
|
||||
|
||||
Derivation from `src/key.rs`:
|
||||
```
|
||||
0x7A 0x35 0xC1 0xD9 0x4F 0xE8 0x2B 0x6A
|
||||
0x91 0x0D 0xF3 0x58 0xBC 0x74 0xA6 0x1E
|
||||
0x42 0x8F 0xD0 0x63 0xE5 0x17 0x9B 0x2C
|
||||
0xFA 0x84 0x06 0xCD 0x3E 0x79 0xB5 0x50
|
||||
```
|
||||
|
||||
### HMAC Verification with IV from Archive (Not from TOC-parsed Variable)
|
||||
|
||||
A subtle point: for HMAC verification, the IV bytes must come from the archive file (not from a hex variable). The HMAC is computed over raw `IV || ciphertext` bytes, not hex strings. The approach using `dd` to extract IV bytes and concatenating with ciphertext via subshell `{ dd ...; dd ...; }` is correct (as shown in FORMAT.md Section 13.3).
|
||||
|
||||
However, for the HMAC *comparison*, we compare hex strings (both from `openssl dgst` output and from the TOC hex extraction). Both must be lowercase.
|
||||
|
||||
### UTF-8 Filename Extraction
|
||||
|
||||
```sh
|
||||
# dd extracts raw bytes; if they are valid UTF-8, the shell preserves them
|
||||
filename=$(dd if="$ARCHIVE" bs=1 skip="$pos" count="$name_length" 2>/dev/null)
|
||||
# $filename now contains UTF-8 string, including Cyrillic characters
|
||||
# Works because: (1) dd copies raw bytes, (2) $() captures them, (3) no null bytes in UTF-8 filenames
|
||||
```
|
||||
|
||||
## State of the Art
|
||||
|
||||
| Old Approach | Current Approach | When Changed | Impact |
|
||||
|--------------|------------------|--------------|--------|
|
||||
| `openssl dgst -mac HMAC -macopt` | `openssl mac -digest SHA256 -macopt hexkey:...` | OpenSSL 3.0 (2021) | Old syntax still works in 3.x but soft-deprecated |
|
||||
| `xxd` not in busybox | `xxd` applet in busybox | BusyBox 1.28 (2017) | Available on newer builds, but `od` fallback still needed for older systems |
|
||||
| `openssl enc` with `-md md5` default | `-md sha256` default | OpenSSL 1.1.0 (2016) | No impact for raw key mode (`-K`/`-iv`); `-md` only affects password-derived keys |
|
||||
|
||||
**Deprecated/outdated:**
|
||||
- `openssl dgst -hmac "key"` (string key): Still works but `-macopt hexkey:` is required for binary keys. The hex key mode is NOT deprecated.
|
||||
- busybox builds without xxd: Still common on very old/minimal systems, hence `od` fallback is essential.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Does the target busybox have `openssl`?**
|
||||
- What we know: openssl is NOT a busybox applet. It must be a separate binary on the target system. The project mentions "busybox-compatible" in requirements.
|
||||
- What's unclear: Whether the specific target device (Android-based car head unit) has the `openssl` CLI installed.
|
||||
- Recommendation: The script MUST fail with a clear error message if `openssl` is not found. Document this as a prerequisite. The script already checks for tool availability at startup.
|
||||
|
||||
2. **Does the target `openssl` support `-mac HMAC -macopt hexkey:`?**
|
||||
- What we know: Standard OpenSSL 1.1.1+ and 3.x support this syntax. Busybox does not include openssl. Minimal/embedded openssl builds may lack HMAC support.
|
||||
- What's unclear: Exact openssl version on target.
|
||||
- Recommendation: Implement graceful degradation per FORMAT.md Section 13.3. Skip HMAC if unsupported, print warning.
|
||||
|
||||
3. **Performance on large files with `dd bs=1`?**
|
||||
- What we know: `dd bs=1` reads one byte at a time. For extracting large data blocks (megabytes), this is very slow.
|
||||
- What's unclear: Whether the shell decoder needs to handle large files efficiently.
|
||||
- Recommendation: For data block extraction (Step 1 of decode), use larger block sizes. Extract full ciphertext with `dd bs=1 skip=OFFSET count=SIZE` which still uses bs=1 but lets dd handle the buffering. For truly large files, consider `dd bs=4096` with calculated skip/count, but the added complexity may not be worth it for a fallback decoder.
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary (HIGH confidence)
|
||||
- FORMAT.md Section 13 (Shell Decoder Reference) - Complete reference functions for all operations
|
||||
- FORMAT.md Section 10 (Decode Order of Operations) - Mandatory decode pipeline
|
||||
- FORMAT.md Section 4-5 (Header/TOC structure) - Binary layout
|
||||
- `src/key.rs` - Actual hardcoded key bytes
|
||||
- `src/format.rs` - Rust implementation of header/TOC parsing (reference)
|
||||
- `src/crypto.rs` - Rust crypto implementation (HMAC scope, encrypt/decrypt)
|
||||
- `kotlin/ArchiveDecoder.kt` - Working decoder implementation (behavioral reference)
|
||||
- `kotlin/test_decoder.sh` - Cross-validation test pattern (structural reference)
|
||||
|
||||
### Secondary (MEDIUM confidence)
|
||||
- [OpenSSL enc documentation (3.3)](https://docs.openssl.org/3.3/man1/openssl-enc/) - `-K`, `-iv`, `-nosalt`, PKCS7 auto-removal
|
||||
- [OpenSSL dgst documentation (3.3)](https://docs.openssl.org/3.3/man1/openssl-dgst/) - `-mac HMAC -macopt hexkey:` syntax
|
||||
- [BusyBox xxd commit (2017)](https://lists.busybox.net/pipermail/busybox-cvs/2017-January/036600.html) - xxd applet added to busybox
|
||||
- [BusyBox applet list](https://www.busybox.net/downloads/BusyBox.html) - dd, gunzip, sha256sum, od are native applets; openssl, xxd are not
|
||||
- [BusyBox xxd options](https://www.boxmatrix.info/wiki/Property:xxd_(bbcmd)) - Supported flags: -p, -r, -l, -s, -g, -c, -u
|
||||
- [docker-library/busybox#13](https://github.com/docker-library/busybox/issues/13) - UTF-8 support limitations in busybox
|
||||
|
||||
### Tertiary (LOW confidence)
|
||||
- Various shell scripting guides on binary file handling - General patterns, not project-specific
|
||||
|
||||
## Metadata
|
||||
|
||||
**Confidence breakdown:**
|
||||
- Standard stack: HIGH - All tools are well-documented CLI utilities with decades of stability. FORMAT.md Section 13 provides verified reference code.
|
||||
- Architecture: HIGH - Single-file script pattern is proven by the existing `kotlin/test_decoder.sh`. Binary parsing pattern with dd+xxd/od is well-established.
|
||||
- Pitfalls: HIGH - Identified from real tool behavior (openssl pipe handling, hex case, PKCS7, busybox limitations). FORMAT.md already anticipates several pitfalls (graceful HMAC degradation, od fallback).
|
||||
|
||||
**Research date:** 2026-02-25
|
||||
**Valid until:** 2026-03-25 (stable domain -- shell tools and openssl CLI interface change very slowly)
|
||||
Reference in New Issue
Block a user