android-encrypted-archiver/docs/FORMAT.md

# Encrypted Archive Binary Format Specification

**Version:** 1.1
**Date:** 2026-02-26
**Status:** Normative

---

## Table of Contents

1. [Overview and Design Goals](#1-overview-and-design-goals)
2. [Notation Conventions](#2-notation-conventions)
3. [Archive Structure Diagram](#3-archive-structure-diagram)
4. [Archive Header Definition](#4-archive-header-definition)
5. [Table of Contents (TOC) Entry Definition](#5-table-of-contents-toc-entry-definition)
6. [Data Block Layout](#6-data-block-layout)
7. [Encryption and Authentication Details](#7-encryption-and-authentication-details)
8. [Compression Details](#8-compression-details)
9. [Obfuscation Features](#9-obfuscation-features)
10. [Decode Order of Operations](#10-decode-order-of-operations)
11. [Version Compatibility Rules](#11-version-compatibility-rules)
12. [Worked Example](#12-worked-example)
13. [Appendix: Shell Decoder Reference](#13-appendix-shell-decoder-reference)

---

## 1. Overview and Design Goals

This document specifies the binary format for `encrypted_archive` -- a custom archive container designed to be **unrecognizable by standard tools**. Standard utilities (`file`, `binwalk`, `7z`, `tar`, `unzip`) must not be able to identify or extract the contents of an archive produced in this format.

### Target Decoders

Three independent implementations will build against this specification:

1. **Rust CLI archiver** (`encrypted_archive pack`/`unpack`) -- the reference encoder and primary decoder, runs on Linux/macOS.
2. **Kotlin Android decoder** -- runs on Android 13 (Qualcomm SoC) using only `javax.crypto` and `java.util.zip`. Primary extraction path on the target device.
3. **Busybox shell decoder** -- a fallback shell script using only standard busybox commands: `dd`, `xxd`, `openssl`, `gunzip`, and `sh`. Must work without external dependencies.

### Core Constraint

The shell decoder must be able to parse the archive format using `dd` (for byte extraction), `xxd` (for hex conversion), and `openssl enc` (for AES-CBC decryption with raw key mode: `-K`/`-iv`/`-nosalt`). This constraint drives several design choices:

- Fixed-size header at a known offset (no variable-length preamble before the TOC pointer)
- Absolute offsets (no relative offset chains that require cumulative addition)
- IVs stored in the file table, not embedded in data blocks (single `dd` call per extraction)
- Little-endian integers (native byte order on ARM and x86)

---

## 2. Notation Conventions

| Convention | Meaning |
|------------|---------|
| **LE** | Little-endian byte order |
| **u8** | Unsigned 8-bit integer (1 byte) |
| **u16** | Unsigned 16-bit integer (2 bytes) |
| **u32** | Unsigned 32-bit integer (4 bytes) |
| **bytes** | Raw byte sequence (no endianness) |
| Offset `0xNN` | Absolute byte offset from archive byte 0 |
| Size | Always in bytes unless stated otherwise |
| `\|\|` | Concatenation of byte sequences |

- All multi-byte integers are **little-endian (LE)**.
- All sizes are in **bytes** unless stated otherwise.
- All offsets are **absolute** from archive byte 0 (the first byte of the file).
- Entry names are **UTF-8 encoded** relative paths using `/` as the path separator (e.g., `dir/subdir/file.txt`). Names MUST NOT start with `/` or contain `..` components. For top-level files, the name is just the filename (e.g., `readme.txt`). Names are length-prefixed with a u16 byte count (NOT null-terminated).
- Reserved fields are **zero-filled** and MUST be written as `0x00` bytes.

---

## 3. Archive Structure Diagram

```
+=======================================+
|          ARCHIVE HEADER               |  Fixed 40 bytes
|  magic(4) | ver(1) | flags(1)        |
|  entry_count(2) | toc_offset(4)      |
|  toc_size(4) | toc_iv(16)            |
|  reserved(8)                          |
+=======================================+
|          FILE TABLE (TOC)             |  Variable size
|  Entry 1: name, type, perms,         |  Optionally encrypted
|    sizes, offset, iv, hmac,           |  Files AND directories
|    sha256, flags                      |  (see Section 9.2)
|  Entry 2: ...                         |
|  ...                                  |
|  Entry N: ...                         |
+=======================================+
|          DATA BLOCK 1                 |  encrypted_size bytes
|  [ciphertext]                         |
+---------------------------------------+
|          [DECOY PADDING 1]            |  Optional (see Section 9.3)
+---------------------------------------+
|          DATA BLOCK 2                 |  encrypted_size bytes
|  [ciphertext]                         |
+---------------------------------------+
|          [DECOY PADDING 2]            |  Optional (see Section 9.3)
+---------------------------------------+
|          ...                          |
+=======================================+
```

The archive consists of three contiguous regions:

1. **Header** (fixed 40 bytes) -- contains magic bytes, version, flags, and a pointer to the file table.
2. **File Table (TOC)** (variable size) -- contains one entry per archived file or directory with all metadata needed for extraction.
3. **Data Blocks** (variable size) -- contains the encrypted (and optionally compressed) file contents, one block per file entry (directory entries have no data block), optionally separated by decoy padding.

---

## 4. Archive Header Definition

The header is a fixed-size 40-byte structure at offset 0x00.

| Offset | Size | Type | Endian | Field | Description |
|--------|------|------|--------|-------|-------------|
| `0x00` | 4 | bytes | - | `magic` | Custom magic bytes: `0x00 0xEA 0x72 0x63`. The leading `0x00` signals binary content; the remaining bytes (`0xEA 0x72 0x63`) do not match any known file signature. |
| `0x04` | 1 | u8 | - | `version` | Format version. Value `2` for this specification (v1.1). Value `1` for legacy v1.0 (no directory support). |
| `0x05` | 1 | u8 | - | `flags` | Feature flags bitfield (see below). |
| `0x06` | 2 | u16 | LE | `entry_count` | Number of entries (files and directories) stored in the archive. |
| `0x08` | 4 | u32 | LE | `toc_offset` | Absolute byte offset of the entry table from archive start. |
| `0x0C` | 4 | u32 | LE | `toc_size` | Size of the entry table in bytes (if TOC encryption is on, this is the encrypted size including PKCS7 padding). |
| `0x10` | 16 | bytes | - | `toc_iv` | Initialization vector for encrypted TOC. Zero-filled (`0x00` x 16) when TOC encryption flag (bit 1) is off. |
| `0x20` | 8 | bytes | - | `reserved` | Reserved for future use. MUST be zero-filled. |

**Total header size: 40 bytes (0x28).**

### Flags Bitfield

| Bit | Mask | Name | Description |
|-----|------|------|-------------|
| 0 | `0x01` | `compression` | Per-file compression enabled. When set, files MAY be individually gzip-compressed (per-file `compression_flag` controls each file). When clear, all files are stored raw. |
| 1 | `0x02` | `toc_encrypted` | File table is encrypted with AES-256-CBC using `toc_iv`. When clear, file table is stored as plaintext. |
| 2 | `0x04` | `xor_header` | Header bytes are XOR-obfuscated (see Section 9.1). When clear, header is stored as-is. |
| 3 | `0x08` | `decoy_padding` | Random decoy bytes are inserted after data blocks (see Section 9.3). When clear, `padding_after` in every file table entry is 0. |
| 4-7 | `0xF0` | reserved | Reserved. MUST be `0`. |

---

## 5. Table of Contents (TOC) Entry Definition

The file table (TOC) is a contiguous sequence of variable-length entries, one per file or directory. Entries are stored so that directory entries appear before any files within them (parent-before-child ordering). There is no per-entry delimiter; entries are read sequentially using the `name_length` field to determine where each entry's variable-length name ends.

### Entry Field Table

| Field | Size | Type | Endian | Description |
|-------|------|------|--------|-------------|
| `name_length` | 2 | u16 | LE | Entry name length in bytes (UTF-8 encoded byte count). |
| `name` | `name_length` | bytes | - | Entry name as UTF-8 bytes. NOT null-terminated. Relative path using `/` as separator (see Entry Name Semantics below). |
| `entry_type` | 1 | u8 | - | Entry type: `0x00` = regular file, `0x01` = directory. Directories have `original_size`, `compressed_size`, and `encrypted_size` all set to 0 and no corresponding data block. |
| `permissions` | 2 | u16 | LE | Unix permission bits (lower 12 bits of POSIX `mode_t`). Bit layout: `[suid(1)][sgid(1)][sticky(1)][owner_rwx(3)][group_rwx(3)][other_rwx(3)]`. Example: `0o755` = `0x01ED` = owner rwx, group r-x, other r-x. Stored as u16 LE. |
| `original_size` | 4 | u32 | LE | Original file size in bytes (before compression). For directories: 0. |
| `compressed_size` | 4 | u32 | LE | Size after gzip compression. Equals `original_size` if `compression_flag` is 0 (no compression). For directories: 0. |
| `encrypted_size` | 4 | u32 | LE | Size after AES-256-CBC encryption with PKCS7 padding. Formula: `((compressed_size / 16) + 1) * 16`. For directories: 0. |
| `data_offset` | 4 | u32 | LE | Absolute byte offset of this entry's data block from archive start. For directories: 0. |
| `iv` | 16 | bytes | - | Random AES-256-CBC initialization vector for this file. For directories: zero-filled. |
| `hmac` | 32 | bytes | - | HMAC-SHA-256 over `iv || ciphertext`. See Section 7 for details. For directories: zero-filled. |
| `sha256` | 32 | bytes | - | SHA-256 hash of the original file content (before compression and encryption). For directories: zero-filled. |
| `compression_flag` | 1 | u8 | - | `0` = raw (no compression), `1` = gzip compressed. For directories: 0. |
| `padding_after` | 2 | u16 | LE | Number of decoy padding bytes after this file's data block. Always `0` when flags bit 3 (decoy_padding) is off. |

### Entry Type Values

| Value | Name | Description |
|-------|------|-------------|
| `0x00` | File | Regular file. Has associated data block with ciphertext. All size fields and data_offset are meaningful. |
| `0x01` | Directory | Directory entry. `original_size`, `compressed_size`, `encrypted_size` are all 0. `data_offset` is 0. `iv` is zero-filled. `hmac` is zero-filled. `sha256` is zero-filled. `compression_flag` is 0. No data block exists for this entry. |

### Permission Bits Layout

| Bits | Mask | Name | Description |
|------|------|------|-------------|
| 11 | `0o4000` | setuid | Set user ID on execution |
| 10 | `0o2000` | setgid | Set group ID on execution |
| 9 | `0o1000` | sticky | Sticky bit |
| 8-6 | `0o0700` | owner | Owner read(4)/write(2)/execute(1) |
| 5-3 | `0o0070` | group | Group read(4)/write(2)/execute(1) |
| 2-0 | `0o0007` | other | Other read(4)/write(2)/execute(1) |

Common examples: `0o755` (rwxr-xr-x) = `0x01ED`, `0o644` (rw-r--r--) = `0x01A4`, `0o700` (rwx------) = `0x01C0`.

### Entry Name Semantics

- Names are relative paths from the archive root, using `/` as separator.
- Example: a file at `project/src/main.rs` has name `project/src/main.rs`.
- A directory entry for `project/src/` has name `project/src` (no trailing slash).
- Names MUST NOT start with `/` (no absolute paths).
- Names MUST NOT contain `..` components (no directory traversal).
- The encoder MUST sort entries so that directory entries appear before any files within them (parent-before-child ordering). This allows the decoder to `mkdir -p` or create directories in a single sequential pass.

### Entry Size Formula

Each TOC entry has a total size of:

```
entry_size = 2 + name_length + 1 + 2 + 4 + 4 + 4 + 4 + 16 + 32 + 32 + 1 + 2
           = 104 + name_length bytes
```

### File Table Total Size

The total file table size is the sum of all entry sizes:

```
toc_size = SUM(104 + name_length_i) for i in 0..entry_count-1
```

When TOC encryption (flags bit 1) is active, the encrypted TOC size includes PKCS7 padding:

```
encrypted_toc_size = ((toc_size / 16) + 1) * 16
```

The `toc_size` field in the header stores the **actual size on disk** (encrypted size if TOC encryption is on, plaintext size if off).

---

## 6. Data Block Layout

Each file entry has a single contiguous data block containing **only the ciphertext** (the AES-256-CBC encrypted output). Directory entries (`entry_type = 0x01`) have no data block. The decoder MUST skip directory entries when processing data blocks.

```
[ciphertext: encrypted_size bytes]
```

**Important design decisions:**

- The **IV is stored only in the file table entry**, not duplicated at the start of the data block. The data block contains only ciphertext. This simplifies `dd` extraction in the shell decoder: a single `dd` call with the correct offset and size extracts the complete ciphertext.
- The **HMAC is stored only in the file table entry**, not appended to the data block. The decoder reads the HMAC from the TOC, then verifies against the data block contents.
- If decoy padding is enabled (flags bit 3), `padding_after` bytes of random data follow the ciphertext. The decoder MUST skip these bytes. The next file's data block starts at offset `data_offset + encrypted_size + padding_after`.

### Data Block Ordering

Data blocks appear in the same order as file table entries. For file entry `i`:

```
data_offset_0 = toc_offset + toc_size
data_offset_i = data_offset_{i-1} + encrypted_size_{i-1} + padding_after_{i-1}
```

---

## 7. Encryption and Authentication Details

### Pipeline

Each file is processed through the following pipeline, in order:

```
original_file
    |
    v
[1. SHA-256 checksum] --> stored in file table entry as `sha256`
    |
    v
[2. Gzip compress] (if compression_flag = 1) --> compressed_data
    |                                             (size = compressed_size)
    v
[3. PKCS7 pad] --> padded_data
    |               (size = encrypted_size)
    v
[4. AES-256-CBC encrypt] (with random IV) --> ciphertext
    |                                          (size = encrypted_size)
    v
[5. HMAC-SHA-256] (over IV || ciphertext) --> stored in file table entry as `hmac`
```

### AES-256-CBC

- **Key:** 32 bytes (256 bits), hardcoded and shared across all three decoders.
- **IV:** 16 bytes, randomly generated for each file. Stored in the file table entry `iv` field.
- **Block size:** 16 bytes.
- **Mode:** CBC (Cipher Block Chaining).
- The same 32-byte key is used for all files in the archive.

### PKCS7 Padding

PKCS7 padding is applied to the compressed (or raw) data before encryption. PKCS7 **always adds at least 1 byte** of padding. If the input length is already a multiple of 16, a full 16-byte padding block is added.

**Formula:**

```
encrypted_size = ((compressed_size / 16) + 1) * 16
```

Where `/` is integer division (floor).

**Examples:**

| `compressed_size` | Padding bytes | `encrypted_size` |
|-------------------|---------------|------------------|
| 0 | 16 | 16 |
| 1 | 15 | 16 |
| 15 | 1 | 16 |
| 16 | 16 | 32 |
| 17 | 15 | 32 |
| 31 | 1 | 32 |
| 32 | 16 | 48 |
| 100 | 12 | 112 |

### HMAC-SHA-256

- **Key:** The same 32-byte key used for AES-256-CBC encryption. (v1 uses a single key for both encryption and authentication. v2 will derive separate subkeys using HKDF.)
- **Input:** The concatenation of the 16-byte IV and the ciphertext:

  ```
  HMAC_input = IV (16 bytes) || ciphertext (encrypted_size bytes)
  Total HMAC input length = 16 + encrypted_size bytes
  ```

- **Output:** 32 bytes, stored in the file table entry `hmac` field.

### Encrypt-then-MAC

This format uses the **Encrypt-then-MAC** construction:

1. The HMAC is computed **after** encryption, over the IV and ciphertext.
2. The decoder **MUST verify the HMAC before attempting decryption**. If the HMAC does not match, the decoder MUST reject the file without decrypting. This prevents padding oracle attacks and avoids processing tampered data.

### SHA-256 Integrity Checksum

- **Input:** The original file content (before compression, before encryption).
- **Output:** 32 bytes, stored in the file table entry `sha256` field.
- **Verification:** After the decoder decrypts and decompresses a file, it computes SHA-256 of the result and compares it to the stored `sha256`. A mismatch indicates data corruption or an incorrect key.

---

## 8. Compression Details

- **Algorithm:** Standard gzip (DEFLATE, RFC 1952).
- **Granularity:** Per-file. Each file has its own `compression_flag` in the file table entry.
- **Global flag:** The header flags bit 0 (`compression`) enables per-file compression. When this bit is clear, ALL files are stored raw regardless of individual `compression_flag` values.
- **Recommendation:** Already-compressed files (APK, ZIP, PNG, JPEG) should use `compression_flag = 0` (raw) to avoid size inflation.

### Size Tracking

- `original_size`: Size of the file before any processing.
- `compressed_size`: Size after gzip compression. If `compression_flag = 0`, then `compressed_size = original_size`.
- `encrypted_size`: Size after AES-256-CBC with PKCS7 padding. Always `>= compressed_size`.

### Decompression in Each Decoder

| Decoder | Library/Command |
|---------|-----------------|
| Rust | `flate2` crate (`GzDecoder`) |
| Kotlin | `java.util.zip.GZIPInputStream` |
| Shell | `gunzip` (busybox) |

---

## 9. Obfuscation Features

These features are defined fully in this v1 specification but are intended for implementation in Phase 6 (after all three decoders work without obfuscation). Each feature is controlled by a flag bit in the header and can be activated independently.

### 9.1 XOR Header Obfuscation (flags bit 2, mask `0x04`)

When flags bit 2 is set, the entire 40-byte header is XOR-obfuscated with a fixed repeating 8-byte key.

**XOR Key:** `0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8` (8 bytes, repeating)

**XOR Range:** Bytes `0x00` through `0x27` (the entire 40-byte header).

**Application:**

- XOR is applied **after** the header is fully constructed (all fields written).
- The 8-byte key repeats cyclically across the 40 bytes: byte `i` of the header is XORed with `key[i % 8]`.

**Decoding:**

- The decoder reads the first 40 bytes and XORs them with the same repeating key (XOR is its own inverse).
- After de-XOR, the decoder reads header fields normally.

**Bootstrapping problem:** When XOR obfuscation is active, the flags byte itself is XORed. The decoder MUST:

1. Always attempt de-XOR on the first 40 bytes.
2. Read the flags byte from the de-XORed header.
3. Check if bit 2 is set. If it is, the de-XOR was correct. If it is not, re-read the header from the original (un-XORed) bytes.

Alternatively, the decoder can check the magic bytes: if the first 4 bytes are `0x00 0xEA 0x72 0x63`, the header is not XOR-obfuscated. If they are not, attempt de-XOR and re-check.

**When flags bit 2 is 0:** The header is stored as-is (no XOR).

### 9.2 TOC Encryption (flags bit 1, mask `0x02`)

When flags bit 1 is set, the entire file table is encrypted with AES-256-CBC.

- **Key:** The same 32-byte key used for file encryption.
- **IV:** The `toc_iv` field in the header (16 bytes, randomly generated).
- **Input:** The serialized file table (all entries concatenated).
- **Padding:** PKCS7 padding is applied to the entire serialized TOC.
- **`toc_size` in header:** Stores the **encrypted** TOC size (including PKCS7 padding), not the plaintext size.

**Decoding:**

1. Read `toc_offset`, `toc_size`, and `toc_iv` from the (de-XORed) header.
2. Read `toc_size` bytes starting at `toc_offset`.
3. Decrypt with AES-256-CBC using `toc_iv` and the 32-byte key.
4. Remove PKCS7 padding.
5. Parse file table entries from the decrypted plaintext.

**When flags bit 1 is 0:** The file table is stored as plaintext. `toc_iv` is zero-filled but unused.

### 9.3 Decoy Padding (flags bit 3, mask `0x08`)

When flags bit 3 is set, random bytes are inserted after each file's data block.

- The number of random padding bytes for each file is stored in the file table entry `padding_after` field (u16 LE).
- Padding bytes are cryptographically random and carry no meaningful data.
- The decoder MUST skip `padding_after` bytes after reading the ciphertext of each file.
- The padding disrupts size-based analysis: an observer cannot determine individual file sizes from the data block layout.

**Next data block offset:**

```
next_data_offset = data_offset + encrypted_size + padding_after
```

**When flags bit 3 is 0:** `padding_after` is `0` for every file table entry. No padding bytes exist between data blocks.

---

## 10. Decode Order of Operations

The following steps MUST be followed in order by all decoders:

```
1. Read 40 bytes from offset 0x00.

2. Attempt XOR de-obfuscation:
   a. Check if bytes 0x00-0x03 equal magic (0x00 0xEA 0x72 0x63).
   b. If YES: header is not XOR-obfuscated. Use as-is.
   c. If NO: XOR bytes 0x00-0x27 with key (0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8),
      repeating cyclically. Re-check magic. If still wrong, reject archive.

3. Parse header fields:
   - Verify magic == 0x00 0xEA 0x72 0x63
   - Read version (must be 2 for v1.1)
   - Read flags
   - Check for unknown flag bits (bits 4-7 must be 0; reject if not)
   - Read entry_count
   - Read toc_offset, toc_size, toc_iv

4. Read TOC:
   a. Seek to toc_offset.
   b. Read toc_size bytes.
   c. If flags bit 1 (toc_encrypted) is set:
      - Decrypt TOC with AES-256-CBC using toc_iv and the 32-byte key.
      - Remove PKCS7 padding.
   d. Parse entry_count entries sequentially from the (decrypted) TOC bytes.

5. For each entry (i = 0 to entry_count - 1):
   a. Check entry_type. If 0x01 (directory): create the directory using the entry
      name as a relative path, apply permissions from the `permissions` field,
      and skip to the next entry (no ciphertext to read).

   b. Read ciphertext (file entries only):
      - Seek to data_offset.
      - Read encrypted_size bytes.

   c. Verify HMAC:
      - Compute HMAC-SHA-256(key, iv || ciphertext).
      - Compare with stored hmac (32 bytes).
      - If mismatch: REJECT this file. Do NOT attempt decryption.

   d. Decrypt:
      - Decrypt ciphertext with AES-256-CBC using entry's iv and the 32-byte key.
      - Remove PKCS7 padding.
      - Result = compressed_data (or raw data if compression_flag = 0).

   e. Decompress (if compression_flag = 1):
      - Decompress with gzip.
      - Result = original file content.

   f. Verify integrity:
      - Compute SHA-256 of the decompressed/raw result.
      - Compare with stored sha256 (32 bytes).
      - If mismatch: WARN (data corruption or wrong key).

   g. Write to output:
      - Create parent directories as needed (using the path components of the entry name).
      - Create output file using stored name.
      - Write the verified content.
      - Apply permissions from the entry's `permissions` field.
```

---

## 11. Version Compatibility Rules

1. **Version field:** The `version` field at offset `0x04` identifies the format version. This specification defines version `2` (v1.1). Version `1` was the original v1.0 format (no directory support, no entry_type/permissions fields).

2. **Version 2 changes from version 1:**
   - TOC entries now include `entry_type` (1 byte) and `permissions` (2 bytes) fields after `name` and before `original_size`.
   - Entry size formula changed from `101 + name_length` to `104 + name_length`.
   - `file_count` header field renamed to `entry_count` (same offset, same type; directories count as entries).
   - Entry names are relative paths with `/` separator (not filename-only).
   - Entries are ordered parent-before-child (directories before their contents).

3. **Forward compatibility:** Decoders MUST reject archives with `version` greater than their supported version. A v2 decoder encountering `version = 3` MUST fail with a clear error message.

4. **Unknown flags:** Decoders MUST reject archives that have any reserved flag bits (bits 4-7) set to `1`. Unknown flags indicate features the decoder does not understand and cannot safely skip. Silent ignoring of unknown flags is prohibited.

5. **Future versions:** Version 3+ MAY:
   - Add fields after the `reserved` bytes in the header (growing header size).
   - Define new flag bits (bits 4-7).
   - Change the `reserved` field to carry metadata.
   - Introduce HKDF-derived per-file keys (replacing single shared key).

6. **Backward compatibility:** Future versions SHOULD maintain the same magic bytes and the same position of the `version` field (offset `0x04`) so that decoders can read the version before deciding how to proceed.

---

## 12. Worked Example

This section constructs a complete 2-file archive byte by byte. All offsets, field sizes, and hex values are internally consistent and can be verified by summing field sizes. This example serves as a **golden reference** for implementation testing.

### 12.1 Input Files

| File | Name | Content | Size |
|------|------|---------|------|
| 1 | `hello.txt` | ASCII string `Hello` (bytes: `48 65 6C 6C 6F`) | 5 bytes |
| 2 | `data.bin` | 32 bytes of `0x01` repeated | 32 bytes |

### 12.2 Parameters

- **Key:** 32 bytes: `00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F`
- **Flags:** `0x01` (compression enabled, no obfuscation)
- **Version:** `1`

### 12.3 Per-File Pipeline Walkthrough

#### File 1: `hello.txt`

**Step 1: SHA-256 checksum of original content**

```
SHA-256("Hello") = 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969
```

As bytes:
```
18 5F 8D B3 22 71 FE 25 F5 61 A6 FC 93 8B 2E 26
43 06 EC 30 4E DA 51 80 07 D1 76 48 26 38 19 69
```

**Step 2: Gzip compression**

Gzip output is implementation-dependent (timestamps, OS flags vary). For this example, we use a representative compressed size of **25 bytes**. The actual gzip output will differ between implementations, but the pipeline and sizes are computed from this value.

- `compressed_size = 25`

**Step 3: Compute encrypted_size (PKCS7 padding)**

```
encrypted_size = ((25 / 16) + 1) * 16 = ((1) + 1) * 16 = 32 bytes
```

PKCS7 padding adds `32 - 25 = 7` bytes of value `0x07`.

**Step 4: AES-256-CBC encryption**

- IV (randomly chosen for this example): `AA BB CC DD EE FF 00 11 22 33 44 55 66 77 88 99`
- Ciphertext: 32 bytes (actual value depends on the gzip output and IV; representative bytes used in the hex dump below)

**Step 5: HMAC-SHA-256**

```
HMAC_input = IV (16 bytes) || ciphertext (32 bytes) = 48 bytes total
HMAC-SHA-256(key, HMAC_input) = <32 bytes>
```

The HMAC value depends on the actual ciphertext; representative bytes (`0xC1` repeated) are used in the hex dump. In a real implementation, this MUST be computed from the actual IV and ciphertext.

#### File 2: `data.bin`

**Step 1: SHA-256 checksum of original content**

```
SHA-256(0x01 * 32) = 72cd6e8422c407fb6d098690f1130b7ded7ec2f7f5e1d30bd9d521f015363793
```

As bytes:
```
72 CD 6E 84 22 C4 07 FB 6D 09 86 90 F1 13 0B 7D
ED 7E C2 F7 F5 E1 D3 0B D9 D5 21 F0 15 36 37 93
```

**Step 2: Gzip compression**

32 bytes of identical content compresses well. Representative compressed size: **22 bytes**.

- `compressed_size = 22`

**Step 3: Compute encrypted_size (PKCS7 padding)**

```
encrypted_size = ((22 / 16) + 1) * 16 = ((1) + 1) * 16 = 32 bytes
```

PKCS7 padding adds `32 - 22 = 10` bytes of value `0x0A`.

**Step 4: AES-256-CBC encryption**

- IV (randomly chosen for this example): `11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 00`
- Ciphertext: 32 bytes (representative)

**Step 5: HMAC-SHA-256**

```
HMAC_input = IV (16 bytes) || ciphertext (32 bytes) = 48 bytes total
HMAC-SHA-256(key, HMAC_input) = <32 bytes>
```

Representative bytes (`0xD2` repeated) used in the hex dump.

### 12.4 Archive Layout

| Region | Start Offset | End Offset | Size | Description |
|--------|-------------|------------|------|-------------|
| Header | `0x0000` | `0x0027` | 40 bytes | Fixed header |
| TOC Entry 1 | `0x0028` | `0x0095` | 110 bytes | `hello.txt` metadata |
| TOC Entry 2 | `0x0096` | `0x0102` | 109 bytes | `data.bin` metadata |
| Data Block 1 | `0x0103` | `0x0122` | 32 bytes | `hello.txt` ciphertext |
| Data Block 2 | `0x0123` | `0x0142` | 32 bytes | `data.bin` ciphertext |
| **Total** | | | **323 bytes** | |

**Offset verification:**

```
TOC offset       = header_size                          = 40 (0x28)    CHECK
TOC size         = entry1_size + entry2_size            = 110 + 109 = 219 (0xDB)    CHECK
Data Block 1     = toc_offset + toc_size                = 40 + 219 = 259 (0x103)    CHECK
Data Block 2     = data_offset_1 + encrypted_size_1     = 259 + 32 = 291 (0x123)    CHECK
Archive end      = data_offset_2 + encrypted_size_2     = 291 + 32 = 323 (0x143)    CHECK
```

### 12.5 Header (Bytes 0x0000 - 0x0027)

| Offset | Hex | Field | Value |
|--------|-----|-------|-------|
| `0x0000` | `00 EA 72 63` | magic | Custom magic bytes |
| `0x0004` | `01` | version | 1 |
| `0x0005` | `01` | flags | `0x01` = compression enabled |
| `0x0006` | `02 00` | file_count | 2 (LE) |
| `0x0008` | `28 00 00 00` | toc_offset | 40 (LE) |
| `0x000C` | `DB 00 00 00` | toc_size | 219 (LE) |
| `0x0010` | `00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00` | toc_iv | Zero-filled (TOC not encrypted) |
| `0x0020` | `00 00 00 00 00 00 00 00` | reserved | Zero-filled |

### 12.6 File Table Entry 1: `hello.txt` (Bytes 0x0028 - 0x0095)

| Offset | Hex | Field | Value |
|--------|-----|-------|-------|
| `0x0028` | `09 00` | name_length | 9 (LE) |
| `0x002A` | `68 65 6C 6C 6F 2E 74 78 74` | name | "hello.txt" (UTF-8) |
| `0x0033` | `05 00 00 00` | original_size | 5 (LE) |
| `0x0037` | `19 00 00 00` | compressed_size | 25 (LE) |
| `0x003B` | `20 00 00 00` | encrypted_size | 32 (LE) |
| `0x003F` | `03 01 00 00` | data_offset | 259 = 0x103 (LE) |
| `0x0043` | `AA BB CC DD EE FF 00 11 22 33 44 55 66 77 88 99` | iv | Example IV for file 1 |
| `0x0053` | `C1 C1 C1 ... (32 bytes)` | hmac | Representative HMAC (actual depends on ciphertext) |
| `0x0073` | `18 5F 8D B3 22 71 FE 25 F5 61 A6 FC 93 8B 2E 26 43 06 EC 30 4E DA 51 80 07 D1 76 48 26 38 19 69` | sha256 | SHA-256 of "Hello" |
| `0x0093` | `01` | compression_flag | 1 (gzip) |
| `0x0094` | `00 00` | padding_after | 0 (no decoy padding) |

**Entry size verification:** `2 + 9 + 4 + 4 + 4 + 4 + 16 + 32 + 32 + 1 + 2 = 110 bytes`. Offset range: `0x0028` to `0x0095` = 110 bytes. CHECK.

### 12.7 File Table Entry 2: `data.bin` (Bytes 0x0096 - 0x0102)

| Offset | Hex | Field | Value |
|--------|-----|-------|-------|
| `0x0096` | `08 00` | name_length | 8 (LE) |
| `0x0098` | `64 61 74 61 2E 62 69 6E` | name | "data.bin" (UTF-8) |
| `0x00A0` | `20 00 00 00` | original_size | 32 (LE) |
| `0x00A4` | `16 00 00 00` | compressed_size | 22 (LE) |
| `0x00A8` | `20 00 00 00` | encrypted_size | 32 (LE) |
| `0x00AC` | `23 01 00 00` | data_offset | 291 = 0x123 (LE) |
| `0x00B0` | `11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 00` | iv | Example IV for file 2 |
| `0x00C0` | `D2 D2 D2 ... (32 bytes)` | hmac | Representative HMAC (actual depends on ciphertext) |
| `0x00E0` | `72 CD 6E 84 22 C4 07 FB 6D 09 86 90 F1 13 0B 7D ED 7E C2 F7 F5 E1 D3 0B D9 D5 21 F0 15 36 37 93` | sha256 | SHA-256 of 32 x 0x01 |
| `0x0100` | `01` | compression_flag | 1 (gzip) |
| `0x0101` | `00 00` | padding_after | 0 (no decoy padding) |

**Entry size verification:** `2 + 8 + 4 + 4 + 4 + 4 + 16 + 32 + 32 + 1 + 2 = 109 bytes`. Offset range: `0x0096` to `0x0102` = 109 bytes. CHECK.

### 12.8 Data Blocks (Bytes 0x0103 - 0x0142)

**Data Block 1** (bytes `0x0103` - `0x0122`, 32 bytes):

Ciphertext of gzip-compressed "Hello", encrypted with AES-256-CBC. Actual bytes depend on the gzip output (which includes timestamps) and the IV. Representative value: 32 bytes of ciphertext.

**Data Block 2** (bytes `0x0123` - `0x0142`, 32 bytes):

Ciphertext of gzip-compressed `0x01 * 32`, encrypted with AES-256-CBC. Representative value: 32 bytes of ciphertext.

### 12.9 Complete Annotated Hex Dump

The following hex dump shows the full 323-byte archive. HMAC values (`C1...` and `D2...`) and ciphertext (`E7...` and `F8...`) are representative placeholders. SHA-256 hashes are real computed values.

```
Offset  | Hex                                             | ASCII            | Annotation
--------|------------------------------------------------|------------------|------------------------------------------
0x0000  | 00 EA 72 63 01 01 02 00  28 00 00 00 DB 00 00 00 | ..rc....(......  | Header: magic, ver=1, flags=0x01, count=2, toc_off=40, toc_sz=219
0x0010  | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 | ................  | Header: toc_iv (zero-filled, TOC not encrypted)
0x0020  | 00 00 00 00 00 00 00 00  09 00 68 65 6C 6C 6F 2E | ..........hello. | Header: reserved | TOC Entry 1: name_len=9, name="hello."
0x0030  | 74 78 74 05 00 00 00 19  00 00 00 20 00 00 00 03 | txt........ .... | Entry 1: "txt", orig=5, comp=25, enc=32, data_off=
0x0040  | 01 00 00 AA BB CC DD EE  FF 00 11 22 33 44 55 66 | ..........."3DUf | Entry 1: =259(0x103), iv[0..15]
0x0050  | 77 88 99 C1 C1 C1 C1 C1  C1 C1 C1 C1 C1 C1 C1 C1 | w............... | Entry 1: iv[13..15], hmac[0..12]
0x0060  | C1 C1 C1 C1 C1 C1 C1 C1  C1 C1 C1 C1 C1 C1 C1 C1 | ................ | Entry 1: hmac[13..28]
0x0070  | C1 C1 C1 18 5F 8D B3 22  71 FE 25 F5 61 A6 FC 93 | ...._.."q.%.a... | Entry 1: hmac[29..31], sha256[0..12]
0x0080  | 8B 2E 26 43 06 EC 30 4E  DA 51 80 07 D1 76 48 26 | ..&C..0N.Q...vH& | Entry 1: sha256[13..28]
0x0090  | 38 19 69 01 00 00 08 00  64 61 74 61 2E 62 69 6E | 8.i.....data.bin | Entry 1: sha256[29..31], comp=1, pad=0 | Entry 2: name_len=8, name="data.bin"
0x00A0  | 20 00 00 00 16 00 00 00  20 00 00 00 23 01 00 00 |  ....... ...#... | Entry 2: orig=32, comp=22, enc=32, data_off=291(0x123)
0x00B0  | 11 22 33 44 55 66 77 88  99 AA BB CC DD EE FF 00 | ."3DUfw......... | Entry 2: iv[0..15]
0x00C0  | D2 D2 D2 D2 D2 D2 D2 D2  D2 D2 D2 D2 D2 D2 D2 D2 | ................ | Entry 2: hmac[0..15]
0x00D0  | D2 D2 D2 D2 D2 D2 D2 D2  D2 D2 D2 D2 D2 D2 D2 D2 | ................ | Entry 2: hmac[16..31]
0x00E0  | 72 CD 6E 84 22 C4 07 FB  6D 09 86 90 F1 13 0B 7D | r.n."...m......} | Entry 2: sha256[0..15]
0x00F0  | ED 7E C2 F7 F5 E1 D3 0B  D9 D5 21 F0 15 36 37 93 | .~........!..67. | Entry 2: sha256[16..31]
0x0100  | 01 00 00 E7 E7 E7 E7 E7  E7 E7 E7 E7 E7 E7 E7 E7 | ................ | Entry 2: comp=1, pad=0 | Data Block 1: ciphertext[0..12]
0x0110  | E7 E7 E7 E7 E7 E7 E7 E7  E7 E7 E7 E7 E7 E7 E7 E7 | ................ | Data Block 1: ciphertext[13..28]
0x0120  | E7 E7 E7 F8 F8 F8 F8 F8  F8 F8 F8 F8 F8 F8 F8 F8 | ................ | Data Block 1: ciphertext[29..31] | Data Block 2: ciphertext[0..12]
0x0130  | F8 F8 F8 F8 F8 F8 F8 F8  F8 F8 F8 F8 F8 F8 F8 F8 | ................ | Data Block 2: ciphertext[13..28]
0x0140  | F8 F8 F8                                          | ...              | Data Block 2: ciphertext[29..31]
```

**Total: 323 bytes (0x143).**

### 12.10 Step-by-Step Shell Decode Walkthrough

The following shell commands demonstrate decoding this archive using only `dd` and `xxd`. The `read_le_u16` and `read_le_u32` functions are defined in the Appendix (Section 13).

```sh
# -------------------------------------------------------
# Step 1: Read and verify magic bytes
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=0 count=4 2>/dev/null | xxd -p
# Expected: 00ea7263

# -------------------------------------------------------
# Step 2: Read version
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=4 count=1 2>/dev/null | xxd -p
# Expected: 01

# -------------------------------------------------------
# Step 3: Read flags
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=5 count=1 2>/dev/null | xxd -p
# Expected: 01 (compression enabled)

# -------------------------------------------------------
# Step 4: Read file count
# -------------------------------------------------------
read_le_u16 archive.bin 6
# Expected: 2

# -------------------------------------------------------
# Step 5: Read TOC offset
# -------------------------------------------------------
read_le_u32 archive.bin 8
# Expected: 40

# -------------------------------------------------------
# Step 6: Read TOC size
# -------------------------------------------------------
read_le_u32 archive.bin 12
# Expected: 219

# -------------------------------------------------------
# Step 7: Read TOC Entry 1 -- name_length
# -------------------------------------------------------
read_le_u16 archive.bin 40
# Expected: 9

# -------------------------------------------------------
# Step 8: Read TOC Entry 1 -- filename
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=42 count=9 2>/dev/null
# Expected: hello.txt

# -------------------------------------------------------
# Step 9: Read TOC Entry 1 -- original_size
# -------------------------------------------------------
read_le_u32 archive.bin 51
# Expected: 5

# -------------------------------------------------------
# Step 10: Read TOC Entry 1 -- compressed_size
# -------------------------------------------------------
read_le_u32 archive.bin 55
# Expected: 25

# -------------------------------------------------------
# Step 11: Read TOC Entry 1 -- encrypted_size
# -------------------------------------------------------
read_le_u32 archive.bin 59
# Expected: 32

# -------------------------------------------------------
# Step 12: Read TOC Entry 1 -- data_offset
# -------------------------------------------------------
read_le_u32 archive.bin 63
# Expected: 259

# -------------------------------------------------------
# Step 13: Read TOC Entry 1 -- IV (16 bytes)
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=67 count=16 2>/dev/null | xxd -p
# Expected: aabbccddeeff00112233445566778899

# -------------------------------------------------------
# Step 14: Read TOC Entry 1 -- HMAC (32 bytes)
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=83 count=32 2>/dev/null | xxd -p
# (32 bytes of HMAC for verification)

# -------------------------------------------------------
# Step 15: Extract ciphertext for file 1
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=259 count=32 of=/tmp/file1.enc 2>/dev/null

# -------------------------------------------------------
# Step 16: Verify HMAC for file 1
# -------------------------------------------------------
# Create HMAC input: IV (16 bytes) || ciphertext (32 bytes)
IV_HEX="aabbccddeeff00112233445566778899"
KEY_HEX="000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f"

# Extract IV and ciphertext, concatenate, compute HMAC
{
  dd if=archive.bin bs=1 skip=67 count=16 2>/dev/null   # IV
  dd if=archive.bin bs=1 skip=259 count=32 2>/dev/null  # ciphertext
} | openssl dgst -sha256 -mac HMAC -macopt "hexkey:${KEY_HEX}" -hex 2>/dev/null \
  | awk '{print $NF}'
# Compare output with stored HMAC from step 14

# -------------------------------------------------------
# Step 17: Decrypt file 1
# -------------------------------------------------------
openssl enc -d -aes-256-cbc -nosalt \
  -K "${KEY_HEX}" \
  -iv "${IV_HEX}" \
  -in /tmp/file1.enc -out /tmp/file1.gz

# -------------------------------------------------------
# Step 18: Decompress file 1
# -------------------------------------------------------
gunzip -c /tmp/file1.gz > /tmp/hello.txt

# -------------------------------------------------------
# Step 19: Verify SHA-256 of extracted file
# -------------------------------------------------------
sha256sum /tmp/hello.txt
# Expected: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969
```

---

## 13. Appendix: Shell Decoder Reference

This appendix provides reference shell functions for decoding archives using only standard busybox commands.

### 13.1 Little-Endian Integer Reading

```sh
# Read a little-endian u16 from a binary file at a byte offset.
# Usage: read_le_u16 <file> <offset>
# Output: decimal integer value
read_le_u16() {
  local file="$1" offset="$2"
  local hex=$(dd if="$file" bs=1 skip="$offset" count=2 2>/dev/null | xxd -p)
  local b0=${hex:0:2} b1=${hex:2:2}
  printf '%d' "0x${b1}${b0}"
}

# Read a little-endian u32 from a binary file at a byte offset.
# Usage: read_le_u32 <file> <offset>
# Output: decimal integer value
read_le_u32() {
  local file="$1" offset="$2"
  local hex=$(dd if="$file" bs=1 skip="$offset" count=4 2>/dev/null | xxd -p)
  local b0=${hex:0:2} b1=${hex:2:2} b2=${hex:4:2} b3=${hex:6:2}
  printf '%d' "0x${b3}${b2}${b1}${b0}"
}
```

**Busybox compatibility note:** If `xxd` is not available, use `od` as a fallback:

```sh
# Fallback using od instead of xxd
read_le_u32_od() {
  local file="$1" offset="$2"
  local bytes=$(dd if="$file" bs=1 skip="$offset" count=4 2>/dev/null \
    | od -A n -t x1 | tr -d ' \n')
  local b0=${bytes:0:2} b1=${bytes:2:2} b2=${bytes:4:2} b3=${bytes:6:2}
  printf '%d' "0x${b3}${b2}${b1}${b0}"
}
```

### 13.2 Read Raw Bytes as Hex

```sh
# Read N bytes from file at offset as hex string (no spaces)
# Usage: read_hex <file> <offset> <count>
read_hex() {
  local file="$1" offset="$2" count="$3"
  dd if="$file" bs=1 skip="$offset" count="$count" 2>/dev/null | xxd -p | tr -d '\n'
}
```

### 13.3 HMAC-SHA-256 Verification

```sh
# Verify HMAC-SHA-256 of IV || ciphertext.
# Usage: verify_hmac <file> <iv_offset> <iv_length> <data_offset> <data_length> <expected_hmac_hex> <key_hex>
# Returns: 0 if HMAC matches, 1 if not
verify_hmac() {
  local file="$1"
  local iv_offset="$2" iv_length="$3"
  local data_offset="$4" data_length="$5"
  local expected="$6" key="$7"

  local actual=$(
    {
      dd if="$file" bs=1 skip="$iv_offset" count="$iv_length" 2>/dev/null
      dd if="$file" bs=1 skip="$data_offset" count="$data_length" 2>/dev/null
    } | openssl dgst -sha256 -mac HMAC -macopt "hexkey:${key}" -hex 2>/dev/null \
      | awk '{print $NF}'
  )

  [ "$actual" = "$expected" ]
}
```

**Graceful degradation:** If the target busybox `openssl` does not support `-mac HMAC -macopt`, the shell decoder MAY skip HMAC verification. In this case, print a warning:

```sh
# Check if openssl HMAC is available
if ! echo -n "test" | openssl dgst -sha256 -mac HMAC -macopt hexkey:00 >/dev/null 2>&1; then
  echo "WARNING: openssl HMAC not available, skipping integrity verification"
  SKIP_HMAC=1
fi
```

### 13.4 Single-File Decryption

```sh
# Decrypt a single file from the archive.
# Usage: decrypt_file <archive> <data_offset> <encrypted_size> <iv_hex> <key_hex> <output_file> <is_compressed>
decrypt_file() {
  local archive="$1"
  local data_offset="$2" encrypted_size="$3"
  local iv_hex="$4" key_hex="$5"
  local output="$6" is_compressed="$7"

  # Extract ciphertext
  dd if="$archive" bs=1 skip="$data_offset" count="$encrypted_size" 2>/dev/null \
    | openssl enc -d -aes-256-cbc -nosalt -K "$key_hex" -iv "$iv_hex" \
    > /tmp/_decrypted_$$

  # Decompress if needed
  if [ "$is_compressed" = "1" ]; then
    gunzip -c /tmp/_decrypted_$$ > "$output"
  else
    mv /tmp/_decrypted_$$ "$output"
  fi

  rm -f /tmp/_decrypted_$$
}
```

### 13.5 SHA-256 Verification

```sh
# Verify SHA-256 of an extracted file.
# Usage: verify_sha256 <file> <expected_hex>
# Returns: 0 if matches, 1 if not
verify_sha256() {
  local file="$1" expected="$2"
  local actual=$(sha256sum "$file" | awk '{print $1}')
  [ "$actual" = "$expected" ]
}
```

### 13.6 Kotlin Decoder Reference

For Android implementations using `javax.crypto`:

```kotlin
import java.io.ByteArrayInputStream
import java.security.MessageDigest
import java.util.zip.GZIPInputStream
import javax.crypto.Cipher
import javax.crypto.Mac
import javax.crypto.spec.IvParameterSpec
import javax.crypto.spec.SecretKeySpec

/**
 * Decrypt a single file entry from the archive.
 *
 * @param ciphertext The encrypted data (encrypted_size bytes from the data block)
 * @param iv The 16-byte IV from the file table entry
 * @param key The 32-byte AES key
 * @return Decrypted data (after PKCS7 unpadding, which is automatic)
 */
fun decryptFileEntry(ciphertext: ByteArray, iv: ByteArray, key: ByteArray): ByteArray {
    val cipher = Cipher.getInstance("AES/CBC/PKCS5Padding")
    // Note: PKCS5Padding in Java/Android == PKCS7 for 16-byte blocks
    val secretKey = SecretKeySpec(key, "AES")
    val ivSpec = IvParameterSpec(iv)
    cipher.init(Cipher.DECRYPT_MODE, secretKey, ivSpec)
    return cipher.doFinal(ciphertext)
}

/**
 * Verify HMAC-SHA-256 of IV || ciphertext.
 *
 * @param iv The 16-byte IV
 * @param ciphertext The encrypted data
 * @param key The 32-byte key (same as AES key in v1)
 * @param expectedHmac The 32-byte HMAC from the file table entry
 * @return true if HMAC matches
 */
fun verifyHmac(iv: ByteArray, ciphertext: ByteArray, key: ByteArray, expectedHmac: ByteArray): Boolean {
    val mac = Mac.getInstance("HmacSHA256")
    mac.init(SecretKeySpec(key, "HmacSHA256"))
    mac.update(iv)
    mac.update(ciphertext)
    val computed = mac.doFinal()
    return computed.contentEquals(expectedHmac)
}

/**
 * Decompress gzip data.
 *
 * @param compressed Gzip-compressed data
 * @return Decompressed data
 */
fun decompressGzip(compressed: ByteArray): ByteArray {
    return GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
}

/**
 * Verify SHA-256 checksum of extracted content.
 *
 * @param data The decompressed file content
 * @param expectedSha256 The 32-byte SHA-256 from the file table entry
 * @return true if checksum matches
 */
fun verifySha256(data: ByteArray, expectedSha256: ByteArray): Boolean {
    val digest = MessageDigest.getInstance("SHA-256")
    val computed = digest.digest(data)
    return computed.contentEquals(expectedSha256)
}
```

**Full decode flow in Kotlin:**

```kotlin
// For each file entry:
// 1. Read ciphertext from data_offset (encrypted_size bytes)
// 2. Verify HMAC BEFORE decryption
if (!verifyHmac(entry.iv, ciphertext, key, entry.hmac)) {
    throw SecurityException("HMAC verification failed for ${entry.name}")
}
// 3. Decrypt
val compressed = decryptFileEntry(ciphertext, entry.iv, key)
// 4. Decompress if needed
val original = if (entry.compressionFlag == 1) decompressGzip(compressed) else compressed
// 5. Verify SHA-256
if (!verifySha256(original, entry.sha256)) {
    throw SecurityException("SHA-256 verification failed for ${entry.name}")
}
// 6. Write to file
File(outputDir, entry.name).writeBytes(original)
```