Files
android-encrypted-archiver/docs/FORMAT.md
NikitolProject e7535da7ce feat(07-01): update TOC entry definition with entry_type, permissions, and path semantics
- Add entry_type (u8) and permissions (u16 LE) fields to TOC entry
- Add Entry Type Values table (0x00=file, 0x01=directory)
- Add Permission Bits Layout table (POSIX mode_t lower 12 bits)
- Add Entry Name Semantics subsection (relative paths, parent-before-child)
- Update entry size formula: 101 -> 104 + name_length
- Bump format version from 1 to 2
- Rename file_count to entry_count in header
- Update Decode Order of Operations for directory handling
- Update Version Compatibility Rules for v2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:21:13 +03:00

45 KiB

Encrypted Archive Binary Format Specification

Version: 1.1 Date: 2026-02-26 Status: Normative


Table of Contents

  1. Overview and Design Goals
  2. Notation Conventions
  3. Archive Structure Diagram
  4. Archive Header Definition
  5. Table of Contents (TOC) Entry Definition
  6. Data Block Layout
  7. Encryption and Authentication Details
  8. Compression Details
  9. Obfuscation Features
  10. Decode Order of Operations
  11. Version Compatibility Rules
  12. Worked Example
  13. Appendix: Shell Decoder Reference

1. Overview and Design Goals

This document specifies the binary format for encrypted_archive -- a custom archive container designed to be unrecognizable by standard tools. Standard utilities (file, binwalk, 7z, tar, unzip) must not be able to identify or extract the contents of an archive produced in this format.

Target Decoders

Three independent implementations will build against this specification:

  1. Rust CLI archiver (encrypted_archive pack/unpack) -- the reference encoder and primary decoder, runs on Linux/macOS.
  2. Kotlin Android decoder -- runs on Android 13 (Qualcomm SoC) using only javax.crypto and java.util.zip. Primary extraction path on the target device.
  3. Busybox shell decoder -- a fallback shell script using only standard busybox commands: dd, xxd, openssl, gunzip, and sh. Must work without external dependencies.

Core Constraint

The shell decoder must be able to parse the archive format using dd (for byte extraction), xxd (for hex conversion), and openssl enc (for AES-CBC decryption with raw key mode: -K/-iv/-nosalt). This constraint drives several design choices:

  • Fixed-size header at a known offset (no variable-length preamble before the TOC pointer)
  • Absolute offsets (no relative offset chains that require cumulative addition)
  • IVs stored in the file table, not embedded in data blocks (single dd call per extraction)
  • Little-endian integers (native byte order on ARM and x86)

2. Notation Conventions

Convention Meaning
LE Little-endian byte order
u8 Unsigned 8-bit integer (1 byte)
u16 Unsigned 16-bit integer (2 bytes)
u32 Unsigned 32-bit integer (4 bytes)
bytes Raw byte sequence (no endianness)
Offset 0xNN Absolute byte offset from archive byte 0
Size Always in bytes unless stated otherwise
|| Concatenation of byte sequences
  • All multi-byte integers are little-endian (LE).
  • All sizes are in bytes unless stated otherwise.
  • All offsets are absolute from archive byte 0 (the first byte of the file).
  • Entry names are UTF-8 encoded relative paths using / as the path separator (e.g., dir/subdir/file.txt). Names MUST NOT start with / or contain .. components. For top-level files, the name is just the filename (e.g., readme.txt). Names are length-prefixed with a u16 byte count (NOT null-terminated).
  • Reserved fields are zero-filled and MUST be written as 0x00 bytes.

3. Archive Structure Diagram

+=======================================+
|          ARCHIVE HEADER               |  Fixed 40 bytes
|  magic(4) | ver(1) | flags(1)        |
|  entry_count(2) | toc_offset(4)      |
|  toc_size(4) | toc_iv(16)            |
|  reserved(8)                          |
+=======================================+
|          FILE TABLE (TOC)             |  Variable size
|  Entry 1: name, type, perms,         |  Optionally encrypted
|    sizes, offset, iv, hmac,           |  Files AND directories
|    sha256, flags                      |  (see Section 9.2)
|  Entry 2: ...                         |
|  ...                                  |
|  Entry N: ...                         |
+=======================================+
|          DATA BLOCK 1                 |  encrypted_size bytes
|  [ciphertext]                         |
+---------------------------------------+
|          [DECOY PADDING 1]            |  Optional (see Section 9.3)
+---------------------------------------+
|          DATA BLOCK 2                 |  encrypted_size bytes
|  [ciphertext]                         |
+---------------------------------------+
|          [DECOY PADDING 2]            |  Optional (see Section 9.3)
+---------------------------------------+
|          ...                          |
+=======================================+

The archive consists of three contiguous regions:

  1. Header (fixed 40 bytes) -- contains magic bytes, version, flags, and a pointer to the file table.
  2. File Table (TOC) (variable size) -- contains one entry per archived file or directory with all metadata needed for extraction.
  3. Data Blocks (variable size) -- contains the encrypted (and optionally compressed) file contents, one block per file entry (directory entries have no data block), optionally separated by decoy padding.

4. Archive Header Definition

The header is a fixed-size 40-byte structure at offset 0x00.

Offset Size Type Endian Field Description
0x00 4 bytes - magic Custom magic bytes: 0x00 0xEA 0x72 0x63. The leading 0x00 signals binary content; the remaining bytes (0xEA 0x72 0x63) do not match any known file signature.
0x04 1 u8 - version Format version. Value 2 for this specification (v1.1). Value 1 for legacy v1.0 (no directory support).
0x05 1 u8 - flags Feature flags bitfield (see below).
0x06 2 u16 LE entry_count Number of entries (files and directories) stored in the archive.
0x08 4 u32 LE toc_offset Absolute byte offset of the entry table from archive start.
0x0C 4 u32 LE toc_size Size of the entry table in bytes (if TOC encryption is on, this is the encrypted size including PKCS7 padding).
0x10 16 bytes - toc_iv Initialization vector for encrypted TOC. Zero-filled (0x00 x 16) when TOC encryption flag (bit 1) is off.
0x20 8 bytes - reserved Reserved for future use. MUST be zero-filled.

Total header size: 40 bytes (0x28).

Flags Bitfield

Bit Mask Name Description
0 0x01 compression Per-file compression enabled. When set, files MAY be individually gzip-compressed (per-file compression_flag controls each file). When clear, all files are stored raw.
1 0x02 toc_encrypted File table is encrypted with AES-256-CBC using toc_iv. When clear, file table is stored as plaintext.
2 0x04 xor_header Header bytes are XOR-obfuscated (see Section 9.1). When clear, header is stored as-is.
3 0x08 decoy_padding Random decoy bytes are inserted after data blocks (see Section 9.3). When clear, padding_after in every file table entry is 0.
4-7 0xF0 reserved Reserved. MUST be 0.

5. Table of Contents (TOC) Entry Definition

The file table (TOC) is a contiguous sequence of variable-length entries, one per file or directory. Entries are stored so that directory entries appear before any files within them (parent-before-child ordering). There is no per-entry delimiter; entries are read sequentially using the name_length field to determine where each entry's variable-length name ends.

Entry Field Table

Field Size Type Endian Description
name_length 2 u16 LE Entry name length in bytes (UTF-8 encoded byte count).
name name_length bytes - Entry name as UTF-8 bytes. NOT null-terminated. Relative path using / as separator (see Entry Name Semantics below).
entry_type 1 u8 - Entry type: 0x00 = regular file, 0x01 = directory. Directories have original_size, compressed_size, and encrypted_size all set to 0 and no corresponding data block.
permissions 2 u16 LE Unix permission bits (lower 12 bits of POSIX mode_t). Bit layout: [suid(1)][sgid(1)][sticky(1)][owner_rwx(3)][group_rwx(3)][other_rwx(3)]. Example: 0o755 = 0x01ED = owner rwx, group r-x, other r-x. Stored as u16 LE.
original_size 4 u32 LE Original file size in bytes (before compression). For directories: 0.
compressed_size 4 u32 LE Size after gzip compression. Equals original_size if compression_flag is 0 (no compression). For directories: 0.
encrypted_size 4 u32 LE Size after AES-256-CBC encryption with PKCS7 padding. Formula: ((compressed_size / 16) + 1) * 16. For directories: 0.
data_offset 4 u32 LE Absolute byte offset of this entry's data block from archive start. For directories: 0.
iv 16 bytes - Random AES-256-CBC initialization vector for this file. For directories: zero-filled.
hmac 32 bytes - HMAC-SHA-256 over `iv
sha256 32 bytes - SHA-256 hash of the original file content (before compression and encryption). For directories: zero-filled.
compression_flag 1 u8 - 0 = raw (no compression), 1 = gzip compressed. For directories: 0.
padding_after 2 u16 LE Number of decoy padding bytes after this file's data block. Always 0 when flags bit 3 (decoy_padding) is off.

Entry Type Values

Value Name Description
0x00 File Regular file. Has associated data block with ciphertext. All size fields and data_offset are meaningful.
0x01 Directory Directory entry. original_size, compressed_size, encrypted_size are all 0. data_offset is 0. iv is zero-filled. hmac is zero-filled. sha256 is zero-filled. compression_flag is 0. No data block exists for this entry.

Permission Bits Layout

Bits Mask Name Description
11 0o4000 setuid Set user ID on execution
10 0o2000 setgid Set group ID on execution
9 0o1000 sticky Sticky bit
8-6 0o0700 owner Owner read(4)/write(2)/execute(1)
5-3 0o0070 group Group read(4)/write(2)/execute(1)
2-0 0o0007 other Other read(4)/write(2)/execute(1)

Common examples: 0o755 (rwxr-xr-x) = 0x01ED, 0o644 (rw-r--r--) = 0x01A4, 0o700 (rwx------) = 0x01C0.

Entry Name Semantics

  • Names are relative paths from the archive root, using / as separator.
  • Example: a file at project/src/main.rs has name project/src/main.rs.
  • A directory entry for project/src/ has name project/src (no trailing slash).
  • Names MUST NOT start with / (no absolute paths).
  • Names MUST NOT contain .. components (no directory traversal).
  • The encoder MUST sort entries so that directory entries appear before any files within them (parent-before-child ordering). This allows the decoder to mkdir -p or create directories in a single sequential pass.

Entry Size Formula

Each TOC entry has a total size of:

entry_size = 2 + name_length + 1 + 2 + 4 + 4 + 4 + 4 + 16 + 32 + 32 + 1 + 2
           = 104 + name_length bytes

File Table Total Size

The total file table size is the sum of all entry sizes:

toc_size = SUM(104 + name_length_i) for i in 0..entry_count-1

When TOC encryption (flags bit 1) is active, the encrypted TOC size includes PKCS7 padding:

encrypted_toc_size = ((toc_size / 16) + 1) * 16

The toc_size field in the header stores the actual size on disk (encrypted size if TOC encryption is on, plaintext size if off).


6. Data Block Layout

Each file entry has a single contiguous data block containing only the ciphertext (the AES-256-CBC encrypted output). Directory entries (entry_type = 0x01) have no data block. The decoder MUST skip directory entries when processing data blocks.

[ciphertext: encrypted_size bytes]

Important design decisions:

  • The IV is stored only in the file table entry, not duplicated at the start of the data block. The data block contains only ciphertext. This simplifies dd extraction in the shell decoder: a single dd call with the correct offset and size extracts the complete ciphertext.
  • The HMAC is stored only in the file table entry, not appended to the data block. The decoder reads the HMAC from the TOC, then verifies against the data block contents.
  • If decoy padding is enabled (flags bit 3), padding_after bytes of random data follow the ciphertext. The decoder MUST skip these bytes. The next file's data block starts at offset data_offset + encrypted_size + padding_after.

Data Block Ordering

Data blocks appear in the same order as file table entries. For file entry i:

data_offset_0 = toc_offset + toc_size
data_offset_i = data_offset_{i-1} + encrypted_size_{i-1} + padding_after_{i-1}

7. Encryption and Authentication Details

Pipeline

Each file is processed through the following pipeline, in order:

original_file
    |
    v
[1. SHA-256 checksum] --> stored in file table entry as `sha256`
    |
    v
[2. Gzip compress] (if compression_flag = 1) --> compressed_data
    |                                             (size = compressed_size)
    v
[3. PKCS7 pad] --> padded_data
    |               (size = encrypted_size)
    v
[4. AES-256-CBC encrypt] (with random IV) --> ciphertext
    |                                          (size = encrypted_size)
    v
[5. HMAC-SHA-256] (over IV || ciphertext) --> stored in file table entry as `hmac`

AES-256-CBC

  • Key: 32 bytes (256 bits), hardcoded and shared across all three decoders.
  • IV: 16 bytes, randomly generated for each file. Stored in the file table entry iv field.
  • Block size: 16 bytes.
  • Mode: CBC (Cipher Block Chaining).
  • The same 32-byte key is used for all files in the archive.

PKCS7 Padding

PKCS7 padding is applied to the compressed (or raw) data before encryption. PKCS7 always adds at least 1 byte of padding. If the input length is already a multiple of 16, a full 16-byte padding block is added.

Formula:

encrypted_size = ((compressed_size / 16) + 1) * 16

Where / is integer division (floor).

Examples:

compressed_size Padding bytes encrypted_size
0 16 16
1 15 16
15 1 16
16 16 32
17 15 32
31 1 32
32 16 48
100 12 112

HMAC-SHA-256

  • Key: The same 32-byte key used for AES-256-CBC encryption. (v1 uses a single key for both encryption and authentication. v2 will derive separate subkeys using HKDF.)

  • Input: The concatenation of the 16-byte IV and the ciphertext:

    HMAC_input = IV (16 bytes) || ciphertext (encrypted_size bytes)
    Total HMAC input length = 16 + encrypted_size bytes
    
  • Output: 32 bytes, stored in the file table entry hmac field.

Encrypt-then-MAC

This format uses the Encrypt-then-MAC construction:

  1. The HMAC is computed after encryption, over the IV and ciphertext.
  2. The decoder MUST verify the HMAC before attempting decryption. If the HMAC does not match, the decoder MUST reject the file without decrypting. This prevents padding oracle attacks and avoids processing tampered data.

SHA-256 Integrity Checksum

  • Input: The original file content (before compression, before encryption).
  • Output: 32 bytes, stored in the file table entry sha256 field.
  • Verification: After the decoder decrypts and decompresses a file, it computes SHA-256 of the result and compares it to the stored sha256. A mismatch indicates data corruption or an incorrect key.

8. Compression Details

  • Algorithm: Standard gzip (DEFLATE, RFC 1952).
  • Granularity: Per-file. Each file has its own compression_flag in the file table entry.
  • Global flag: The header flags bit 0 (compression) enables per-file compression. When this bit is clear, ALL files are stored raw regardless of individual compression_flag values.
  • Recommendation: Already-compressed files (APK, ZIP, PNG, JPEG) should use compression_flag = 0 (raw) to avoid size inflation.

Size Tracking

  • original_size: Size of the file before any processing.
  • compressed_size: Size after gzip compression. If compression_flag = 0, then compressed_size = original_size.
  • encrypted_size: Size after AES-256-CBC with PKCS7 padding. Always >= compressed_size.

Decompression in Each Decoder

Decoder Library/Command
Rust flate2 crate (GzDecoder)
Kotlin java.util.zip.GZIPInputStream
Shell gunzip (busybox)

9. Obfuscation Features

These features are defined fully in this v1 specification but are intended for implementation in Phase 6 (after all three decoders work without obfuscation). Each feature is controlled by a flag bit in the header and can be activated independently.

9.1 XOR Header Obfuscation (flags bit 2, mask 0x04)

When flags bit 2 is set, the entire 40-byte header is XOR-obfuscated with a fixed repeating 8-byte key.

XOR Key: 0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8 (8 bytes, repeating)

XOR Range: Bytes 0x00 through 0x27 (the entire 40-byte header).

Application:

  • XOR is applied after the header is fully constructed (all fields written).
  • The 8-byte key repeats cyclically across the 40 bytes: byte i of the header is XORed with key[i % 8].

Decoding:

  • The decoder reads the first 40 bytes and XORs them with the same repeating key (XOR is its own inverse).
  • After de-XOR, the decoder reads header fields normally.

Bootstrapping problem: When XOR obfuscation is active, the flags byte itself is XORed. The decoder MUST:

  1. Always attempt de-XOR on the first 40 bytes.
  2. Read the flags byte from the de-XORed header.
  3. Check if bit 2 is set. If it is, the de-XOR was correct. If it is not, re-read the header from the original (un-XORed) bytes.

Alternatively, the decoder can check the magic bytes: if the first 4 bytes are 0x00 0xEA 0x72 0x63, the header is not XOR-obfuscated. If they are not, attempt de-XOR and re-check.

When flags bit 2 is 0: The header is stored as-is (no XOR).

9.2 TOC Encryption (flags bit 1, mask 0x02)

When flags bit 1 is set, the entire file table is encrypted with AES-256-CBC.

  • Key: The same 32-byte key used for file encryption.
  • IV: The toc_iv field in the header (16 bytes, randomly generated).
  • Input: The serialized file table (all entries concatenated).
  • Padding: PKCS7 padding is applied to the entire serialized TOC.
  • toc_size in header: Stores the encrypted TOC size (including PKCS7 padding), not the plaintext size.

Decoding:

  1. Read toc_offset, toc_size, and toc_iv from the (de-XORed) header.
  2. Read toc_size bytes starting at toc_offset.
  3. Decrypt with AES-256-CBC using toc_iv and the 32-byte key.
  4. Remove PKCS7 padding.
  5. Parse file table entries from the decrypted plaintext.

When flags bit 1 is 0: The file table is stored as plaintext. toc_iv is zero-filled but unused.

9.3 Decoy Padding (flags bit 3, mask 0x08)

When flags bit 3 is set, random bytes are inserted after each file's data block.

  • The number of random padding bytes for each file is stored in the file table entry padding_after field (u16 LE).
  • Padding bytes are cryptographically random and carry no meaningful data.
  • The decoder MUST skip padding_after bytes after reading the ciphertext of each file.
  • The padding disrupts size-based analysis: an observer cannot determine individual file sizes from the data block layout.

Next data block offset:

next_data_offset = data_offset + encrypted_size + padding_after

When flags bit 3 is 0: padding_after is 0 for every file table entry. No padding bytes exist between data blocks.


10. Decode Order of Operations

The following steps MUST be followed in order by all decoders:

1. Read 40 bytes from offset 0x00.

2. Attempt XOR de-obfuscation:
   a. Check if bytes 0x00-0x03 equal magic (0x00 0xEA 0x72 0x63).
   b. If YES: header is not XOR-obfuscated. Use as-is.
   c. If NO: XOR bytes 0x00-0x27 with key (0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8),
      repeating cyclically. Re-check magic. If still wrong, reject archive.

3. Parse header fields:
   - Verify magic == 0x00 0xEA 0x72 0x63
   - Read version (must be 2 for v1.1)
   - Read flags
   - Check for unknown flag bits (bits 4-7 must be 0; reject if not)
   - Read entry_count
   - Read toc_offset, toc_size, toc_iv

4. Read TOC:
   a. Seek to toc_offset.
   b. Read toc_size bytes.
   c. If flags bit 1 (toc_encrypted) is set:
      - Decrypt TOC with AES-256-CBC using toc_iv and the 32-byte key.
      - Remove PKCS7 padding.
   d. Parse entry_count entries sequentially from the (decrypted) TOC bytes.

5. For each entry (i = 0 to entry_count - 1):
   a. Check entry_type. If 0x01 (directory): create the directory using the entry
      name as a relative path, apply permissions from the `permissions` field,
      and skip to the next entry (no ciphertext to read).

   b. Read ciphertext (file entries only):
      - Seek to data_offset.
      - Read encrypted_size bytes.

   c. Verify HMAC:
      - Compute HMAC-SHA-256(key, iv || ciphertext).
      - Compare with stored hmac (32 bytes).
      - If mismatch: REJECT this file. Do NOT attempt decryption.

   d. Decrypt:
      - Decrypt ciphertext with AES-256-CBC using entry's iv and the 32-byte key.
      - Remove PKCS7 padding.
      - Result = compressed_data (or raw data if compression_flag = 0).

   e. Decompress (if compression_flag = 1):
      - Decompress with gzip.
      - Result = original file content.

   f. Verify integrity:
      - Compute SHA-256 of the decompressed/raw result.
      - Compare with stored sha256 (32 bytes).
      - If mismatch: WARN (data corruption or wrong key).

   g. Write to output:
      - Create parent directories as needed (using the path components of the entry name).
      - Create output file using stored name.
      - Write the verified content.
      - Apply permissions from the entry's `permissions` field.

11. Version Compatibility Rules

  1. Version field: The version field at offset 0x04 identifies the format version. This specification defines version 2 (v1.1). Version 1 was the original v1.0 format (no directory support, no entry_type/permissions fields).

  2. Version 2 changes from version 1:

    • TOC entries now include entry_type (1 byte) and permissions (2 bytes) fields after name and before original_size.
    • Entry size formula changed from 101 + name_length to 104 + name_length.
    • file_count header field renamed to entry_count (same offset, same type; directories count as entries).
    • Entry names are relative paths with / separator (not filename-only).
    • Entries are ordered parent-before-child (directories before their contents).
  3. Forward compatibility: Decoders MUST reject archives with version greater than their supported version. A v2 decoder encountering version = 3 MUST fail with a clear error message.

  4. Unknown flags: Decoders MUST reject archives that have any reserved flag bits (bits 4-7) set to 1. Unknown flags indicate features the decoder does not understand and cannot safely skip. Silent ignoring of unknown flags is prohibited.

  5. Future versions: Version 3+ MAY:

    • Add fields after the reserved bytes in the header (growing header size).
    • Define new flag bits (bits 4-7).
    • Change the reserved field to carry metadata.
    • Introduce HKDF-derived per-file keys (replacing single shared key).
  6. Backward compatibility: Future versions SHOULD maintain the same magic bytes and the same position of the version field (offset 0x04) so that decoders can read the version before deciding how to proceed.


12. Worked Example

This section constructs a complete 2-file archive byte by byte. All offsets, field sizes, and hex values are internally consistent and can be verified by summing field sizes. This example serves as a golden reference for implementation testing.

12.1 Input Files

File Name Content Size
1 hello.txt ASCII string Hello (bytes: 48 65 6C 6C 6F) 5 bytes
2 data.bin 32 bytes of 0x01 repeated 32 bytes

12.2 Parameters

  • Key: 32 bytes: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
  • Flags: 0x01 (compression enabled, no obfuscation)
  • Version: 1

12.3 Per-File Pipeline Walkthrough

File 1: hello.txt

Step 1: SHA-256 checksum of original content

SHA-256("Hello") = 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

As bytes:

18 5F 8D B3 22 71 FE 25 F5 61 A6 FC 93 8B 2E 26
43 06 EC 30 4E DA 51 80 07 D1 76 48 26 38 19 69

Step 2: Gzip compression

Gzip output is implementation-dependent (timestamps, OS flags vary). For this example, we use a representative compressed size of 25 bytes. The actual gzip output will differ between implementations, but the pipeline and sizes are computed from this value.

  • compressed_size = 25

Step 3: Compute encrypted_size (PKCS7 padding)

encrypted_size = ((25 / 16) + 1) * 16 = ((1) + 1) * 16 = 32 bytes

PKCS7 padding adds 32 - 25 = 7 bytes of value 0x07.

Step 4: AES-256-CBC encryption

  • IV (randomly chosen for this example): AA BB CC DD EE FF 00 11 22 33 44 55 66 77 88 99
  • Ciphertext: 32 bytes (actual value depends on the gzip output and IV; representative bytes used in the hex dump below)

Step 5: HMAC-SHA-256

HMAC_input = IV (16 bytes) || ciphertext (32 bytes) = 48 bytes total
HMAC-SHA-256(key, HMAC_input) = <32 bytes>

The HMAC value depends on the actual ciphertext; representative bytes (0xC1 repeated) are used in the hex dump. In a real implementation, this MUST be computed from the actual IV and ciphertext.

File 2: data.bin

Step 1: SHA-256 checksum of original content

SHA-256(0x01 * 32) = 72cd6e8422c407fb6d098690f1130b7ded7ec2f7f5e1d30bd9d521f015363793

As bytes:

72 CD 6E 84 22 C4 07 FB 6D 09 86 90 F1 13 0B 7D
ED 7E C2 F7 F5 E1 D3 0B D9 D5 21 F0 15 36 37 93

Step 2: Gzip compression

32 bytes of identical content compresses well. Representative compressed size: 22 bytes.

  • compressed_size = 22

Step 3: Compute encrypted_size (PKCS7 padding)

encrypted_size = ((22 / 16) + 1) * 16 = ((1) + 1) * 16 = 32 bytes

PKCS7 padding adds 32 - 22 = 10 bytes of value 0x0A.

Step 4: AES-256-CBC encryption

  • IV (randomly chosen for this example): 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 00
  • Ciphertext: 32 bytes (representative)

Step 5: HMAC-SHA-256

HMAC_input = IV (16 bytes) || ciphertext (32 bytes) = 48 bytes total
HMAC-SHA-256(key, HMAC_input) = <32 bytes>

Representative bytes (0xD2 repeated) used in the hex dump.

12.4 Archive Layout

Region Start Offset End Offset Size Description
Header 0x0000 0x0027 40 bytes Fixed header
TOC Entry 1 0x0028 0x0095 110 bytes hello.txt metadata
TOC Entry 2 0x0096 0x0102 109 bytes data.bin metadata
Data Block 1 0x0103 0x0122 32 bytes hello.txt ciphertext
Data Block 2 0x0123 0x0142 32 bytes data.bin ciphertext
Total 323 bytes

Offset verification:

TOC offset       = header_size                          = 40 (0x28)    CHECK
TOC size         = entry1_size + entry2_size            = 110 + 109 = 219 (0xDB)    CHECK
Data Block 1     = toc_offset + toc_size                = 40 + 219 = 259 (0x103)    CHECK
Data Block 2     = data_offset_1 + encrypted_size_1     = 259 + 32 = 291 (0x123)    CHECK
Archive end      = data_offset_2 + encrypted_size_2     = 291 + 32 = 323 (0x143)    CHECK

12.5 Header (Bytes 0x0000 - 0x0027)

Offset Hex Field Value
0x0000 00 EA 72 63 magic Custom magic bytes
0x0004 01 version 1
0x0005 01 flags 0x01 = compression enabled
0x0006 02 00 file_count 2 (LE)
0x0008 28 00 00 00 toc_offset 40 (LE)
0x000C DB 00 00 00 toc_size 219 (LE)
0x0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 toc_iv Zero-filled (TOC not encrypted)
0x0020 00 00 00 00 00 00 00 00 reserved Zero-filled

12.6 File Table Entry 1: hello.txt (Bytes 0x0028 - 0x0095)

Offset Hex Field Value
0x0028 09 00 name_length 9 (LE)
0x002A 68 65 6C 6C 6F 2E 74 78 74 name "hello.txt" (UTF-8)
0x0033 05 00 00 00 original_size 5 (LE)
0x0037 19 00 00 00 compressed_size 25 (LE)
0x003B 20 00 00 00 encrypted_size 32 (LE)
0x003F 03 01 00 00 data_offset 259 = 0x103 (LE)
0x0043 AA BB CC DD EE FF 00 11 22 33 44 55 66 77 88 99 iv Example IV for file 1
0x0053 C1 C1 C1 ... (32 bytes) hmac Representative HMAC (actual depends on ciphertext)
0x0073 18 5F 8D B3 22 71 FE 25 F5 61 A6 FC 93 8B 2E 26 43 06 EC 30 4E DA 51 80 07 D1 76 48 26 38 19 69 sha256 SHA-256 of "Hello"
0x0093 01 compression_flag 1 (gzip)
0x0094 00 00 padding_after 0 (no decoy padding)

Entry size verification: 2 + 9 + 4 + 4 + 4 + 4 + 16 + 32 + 32 + 1 + 2 = 110 bytes. Offset range: 0x0028 to 0x0095 = 110 bytes. CHECK.

12.7 File Table Entry 2: data.bin (Bytes 0x0096 - 0x0102)

Offset Hex Field Value
0x0096 08 00 name_length 8 (LE)
0x0098 64 61 74 61 2E 62 69 6E name "data.bin" (UTF-8)
0x00A0 20 00 00 00 original_size 32 (LE)
0x00A4 16 00 00 00 compressed_size 22 (LE)
0x00A8 20 00 00 00 encrypted_size 32 (LE)
0x00AC 23 01 00 00 data_offset 291 = 0x123 (LE)
0x00B0 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 00 iv Example IV for file 2
0x00C0 D2 D2 D2 ... (32 bytes) hmac Representative HMAC (actual depends on ciphertext)
0x00E0 72 CD 6E 84 22 C4 07 FB 6D 09 86 90 F1 13 0B 7D ED 7E C2 F7 F5 E1 D3 0B D9 D5 21 F0 15 36 37 93 sha256 SHA-256 of 32 x 0x01
0x0100 01 compression_flag 1 (gzip)
0x0101 00 00 padding_after 0 (no decoy padding)

Entry size verification: 2 + 8 + 4 + 4 + 4 + 4 + 16 + 32 + 32 + 1 + 2 = 109 bytes. Offset range: 0x0096 to 0x0102 = 109 bytes. CHECK.

12.8 Data Blocks (Bytes 0x0103 - 0x0142)

Data Block 1 (bytes 0x0103 - 0x0122, 32 bytes):

Ciphertext of gzip-compressed "Hello", encrypted with AES-256-CBC. Actual bytes depend on the gzip output (which includes timestamps) and the IV. Representative value: 32 bytes of ciphertext.

Data Block 2 (bytes 0x0123 - 0x0142, 32 bytes):

Ciphertext of gzip-compressed 0x01 * 32, encrypted with AES-256-CBC. Representative value: 32 bytes of ciphertext.

12.9 Complete Annotated Hex Dump

The following hex dump shows the full 323-byte archive. HMAC values (C1... and D2...) and ciphertext (E7... and F8...) are representative placeholders. SHA-256 hashes are real computed values.

Offset  | Hex                                             | ASCII            | Annotation
--------|------------------------------------------------|------------------|------------------------------------------
0x0000  | 00 EA 72 63 01 01 02 00  28 00 00 00 DB 00 00 00 | ..rc....(......  | Header: magic, ver=1, flags=0x01, count=2, toc_off=40, toc_sz=219
0x0010  | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 | ................  | Header: toc_iv (zero-filled, TOC not encrypted)
0x0020  | 00 00 00 00 00 00 00 00  09 00 68 65 6C 6C 6F 2E | ..........hello. | Header: reserved | TOC Entry 1: name_len=9, name="hello."
0x0030  | 74 78 74 05 00 00 00 19  00 00 00 20 00 00 00 03 | txt........ .... | Entry 1: "txt", orig=5, comp=25, enc=32, data_off=
0x0040  | 01 00 00 AA BB CC DD EE  FF 00 11 22 33 44 55 66 | ..........."3DUf | Entry 1: =259(0x103), iv[0..15]
0x0050  | 77 88 99 C1 C1 C1 C1 C1  C1 C1 C1 C1 C1 C1 C1 C1 | w............... | Entry 1: iv[13..15], hmac[0..12]
0x0060  | C1 C1 C1 C1 C1 C1 C1 C1  C1 C1 C1 C1 C1 C1 C1 C1 | ................ | Entry 1: hmac[13..28]
0x0070  | C1 C1 C1 18 5F 8D B3 22  71 FE 25 F5 61 A6 FC 93 | ...._.."q.%.a... | Entry 1: hmac[29..31], sha256[0..12]
0x0080  | 8B 2E 26 43 06 EC 30 4E  DA 51 80 07 D1 76 48 26 | ..&C..0N.Q...vH& | Entry 1: sha256[13..28]
0x0090  | 38 19 69 01 00 00 08 00  64 61 74 61 2E 62 69 6E | 8.i.....data.bin | Entry 1: sha256[29..31], comp=1, pad=0 | Entry 2: name_len=8, name="data.bin"
0x00A0  | 20 00 00 00 16 00 00 00  20 00 00 00 23 01 00 00 |  ....... ...#... | Entry 2: orig=32, comp=22, enc=32, data_off=291(0x123)
0x00B0  | 11 22 33 44 55 66 77 88  99 AA BB CC DD EE FF 00 | ."3DUfw......... | Entry 2: iv[0..15]
0x00C0  | D2 D2 D2 D2 D2 D2 D2 D2  D2 D2 D2 D2 D2 D2 D2 D2 | ................ | Entry 2: hmac[0..15]
0x00D0  | D2 D2 D2 D2 D2 D2 D2 D2  D2 D2 D2 D2 D2 D2 D2 D2 | ................ | Entry 2: hmac[16..31]
0x00E0  | 72 CD 6E 84 22 C4 07 FB  6D 09 86 90 F1 13 0B 7D | r.n."...m......} | Entry 2: sha256[0..15]
0x00F0  | ED 7E C2 F7 F5 E1 D3 0B  D9 D5 21 F0 15 36 37 93 | .~........!..67. | Entry 2: sha256[16..31]
0x0100  | 01 00 00 E7 E7 E7 E7 E7  E7 E7 E7 E7 E7 E7 E7 E7 | ................ | Entry 2: comp=1, pad=0 | Data Block 1: ciphertext[0..12]
0x0110  | E7 E7 E7 E7 E7 E7 E7 E7  E7 E7 E7 E7 E7 E7 E7 E7 | ................ | Data Block 1: ciphertext[13..28]
0x0120  | E7 E7 E7 F8 F8 F8 F8 F8  F8 F8 F8 F8 F8 F8 F8 F8 | ................ | Data Block 1: ciphertext[29..31] | Data Block 2: ciphertext[0..12]
0x0130  | F8 F8 F8 F8 F8 F8 F8 F8  F8 F8 F8 F8 F8 F8 F8 F8 | ................ | Data Block 2: ciphertext[13..28]
0x0140  | F8 F8 F8                                          | ...              | Data Block 2: ciphertext[29..31]

Total: 323 bytes (0x143).

12.10 Step-by-Step Shell Decode Walkthrough

The following shell commands demonstrate decoding this archive using only dd and xxd. The read_le_u16 and read_le_u32 functions are defined in the Appendix (Section 13).

# -------------------------------------------------------
# Step 1: Read and verify magic bytes
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=0 count=4 2>/dev/null | xxd -p
# Expected: 00ea7263

# -------------------------------------------------------
# Step 2: Read version
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=4 count=1 2>/dev/null | xxd -p
# Expected: 01

# -------------------------------------------------------
# Step 3: Read flags
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=5 count=1 2>/dev/null | xxd -p
# Expected: 01 (compression enabled)

# -------------------------------------------------------
# Step 4: Read file count
# -------------------------------------------------------
read_le_u16 archive.bin 6
# Expected: 2

# -------------------------------------------------------
# Step 5: Read TOC offset
# -------------------------------------------------------
read_le_u32 archive.bin 8
# Expected: 40

# -------------------------------------------------------
# Step 6: Read TOC size
# -------------------------------------------------------
read_le_u32 archive.bin 12
# Expected: 219

# -------------------------------------------------------
# Step 7: Read TOC Entry 1 -- name_length
# -------------------------------------------------------
read_le_u16 archive.bin 40
# Expected: 9

# -------------------------------------------------------
# Step 8: Read TOC Entry 1 -- filename
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=42 count=9 2>/dev/null
# Expected: hello.txt

# -------------------------------------------------------
# Step 9: Read TOC Entry 1 -- original_size
# -------------------------------------------------------
read_le_u32 archive.bin 51
# Expected: 5

# -------------------------------------------------------
# Step 10: Read TOC Entry 1 -- compressed_size
# -------------------------------------------------------
read_le_u32 archive.bin 55
# Expected: 25

# -------------------------------------------------------
# Step 11: Read TOC Entry 1 -- encrypted_size
# -------------------------------------------------------
read_le_u32 archive.bin 59
# Expected: 32

# -------------------------------------------------------
# Step 12: Read TOC Entry 1 -- data_offset
# -------------------------------------------------------
read_le_u32 archive.bin 63
# Expected: 259

# -------------------------------------------------------
# Step 13: Read TOC Entry 1 -- IV (16 bytes)
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=67 count=16 2>/dev/null | xxd -p
# Expected: aabbccddeeff00112233445566778899

# -------------------------------------------------------
# Step 14: Read TOC Entry 1 -- HMAC (32 bytes)
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=83 count=32 2>/dev/null | xxd -p
# (32 bytes of HMAC for verification)

# -------------------------------------------------------
# Step 15: Extract ciphertext for file 1
# -------------------------------------------------------
dd if=archive.bin bs=1 skip=259 count=32 of=/tmp/file1.enc 2>/dev/null

# -------------------------------------------------------
# Step 16: Verify HMAC for file 1
# -------------------------------------------------------
# Create HMAC input: IV (16 bytes) || ciphertext (32 bytes)
IV_HEX="aabbccddeeff00112233445566778899"
KEY_HEX="000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f"

# Extract IV and ciphertext, concatenate, compute HMAC
{
  dd if=archive.bin bs=1 skip=67 count=16 2>/dev/null   # IV
  dd if=archive.bin bs=1 skip=259 count=32 2>/dev/null  # ciphertext
} | openssl dgst -sha256 -mac HMAC -macopt "hexkey:${KEY_HEX}" -hex 2>/dev/null \
  | awk '{print $NF}'
# Compare output with stored HMAC from step 14

# -------------------------------------------------------
# Step 17: Decrypt file 1
# -------------------------------------------------------
openssl enc -d -aes-256-cbc -nosalt \
  -K "${KEY_HEX}" \
  -iv "${IV_HEX}" \
  -in /tmp/file1.enc -out /tmp/file1.gz

# -------------------------------------------------------
# Step 18: Decompress file 1
# -------------------------------------------------------
gunzip -c /tmp/file1.gz > /tmp/hello.txt

# -------------------------------------------------------
# Step 19: Verify SHA-256 of extracted file
# -------------------------------------------------------
sha256sum /tmp/hello.txt
# Expected: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

13. Appendix: Shell Decoder Reference

This appendix provides reference shell functions for decoding archives using only standard busybox commands.

13.1 Little-Endian Integer Reading

# Read a little-endian u16 from a binary file at a byte offset.
# Usage: read_le_u16 <file> <offset>
# Output: decimal integer value
read_le_u16() {
  local file="$1" offset="$2"
  local hex=$(dd if="$file" bs=1 skip="$offset" count=2 2>/dev/null | xxd -p)
  local b0=${hex:0:2} b1=${hex:2:2}
  printf '%d' "0x${b1}${b0}"
}

# Read a little-endian u32 from a binary file at a byte offset.
# Usage: read_le_u32 <file> <offset>
# Output: decimal integer value
read_le_u32() {
  local file="$1" offset="$2"
  local hex=$(dd if="$file" bs=1 skip="$offset" count=4 2>/dev/null | xxd -p)
  local b0=${hex:0:2} b1=${hex:2:2} b2=${hex:4:2} b3=${hex:6:2}
  printf '%d' "0x${b3}${b2}${b1}${b0}"
}

Busybox compatibility note: If xxd is not available, use od as a fallback:

# Fallback using od instead of xxd
read_le_u32_od() {
  local file="$1" offset="$2"
  local bytes=$(dd if="$file" bs=1 skip="$offset" count=4 2>/dev/null \
    | od -A n -t x1 | tr -d ' \n')
  local b0=${bytes:0:2} b1=${bytes:2:2} b2=${bytes:4:2} b3=${bytes:6:2}
  printf '%d' "0x${b3}${b2}${b1}${b0}"
}

13.2 Read Raw Bytes as Hex

# Read N bytes from file at offset as hex string (no spaces)
# Usage: read_hex <file> <offset> <count>
read_hex() {
  local file="$1" offset="$2" count="$3"
  dd if="$file" bs=1 skip="$offset" count="$count" 2>/dev/null | xxd -p | tr -d '\n'
}

13.3 HMAC-SHA-256 Verification

# Verify HMAC-SHA-256 of IV || ciphertext.
# Usage: verify_hmac <file> <iv_offset> <iv_length> <data_offset> <data_length> <expected_hmac_hex> <key_hex>
# Returns: 0 if HMAC matches, 1 if not
verify_hmac() {
  local file="$1"
  local iv_offset="$2" iv_length="$3"
  local data_offset="$4" data_length="$5"
  local expected="$6" key="$7"

  local actual=$(
    {
      dd if="$file" bs=1 skip="$iv_offset" count="$iv_length" 2>/dev/null
      dd if="$file" bs=1 skip="$data_offset" count="$data_length" 2>/dev/null
    } | openssl dgst -sha256 -mac HMAC -macopt "hexkey:${key}" -hex 2>/dev/null \
      | awk '{print $NF}'
  )

  [ "$actual" = "$expected" ]
}

Graceful degradation: If the target busybox openssl does not support -mac HMAC -macopt, the shell decoder MAY skip HMAC verification. In this case, print a warning:

# Check if openssl HMAC is available
if ! echo -n "test" | openssl dgst -sha256 -mac HMAC -macopt hexkey:00 >/dev/null 2>&1; then
  echo "WARNING: openssl HMAC not available, skipping integrity verification"
  SKIP_HMAC=1
fi

13.4 Single-File Decryption

# Decrypt a single file from the archive.
# Usage: decrypt_file <archive> <data_offset> <encrypted_size> <iv_hex> <key_hex> <output_file> <is_compressed>
decrypt_file() {
  local archive="$1"
  local data_offset="$2" encrypted_size="$3"
  local iv_hex="$4" key_hex="$5"
  local output="$6" is_compressed="$7"

  # Extract ciphertext
  dd if="$archive" bs=1 skip="$data_offset" count="$encrypted_size" 2>/dev/null \
    | openssl enc -d -aes-256-cbc -nosalt -K "$key_hex" -iv "$iv_hex" \
    > /tmp/_decrypted_$$

  # Decompress if needed
  if [ "$is_compressed" = "1" ]; then
    gunzip -c /tmp/_decrypted_$$ > "$output"
  else
    mv /tmp/_decrypted_$$ "$output"
  fi

  rm -f /tmp/_decrypted_$$
}

13.5 SHA-256 Verification

# Verify SHA-256 of an extracted file.
# Usage: verify_sha256 <file> <expected_hex>
# Returns: 0 if matches, 1 if not
verify_sha256() {
  local file="$1" expected="$2"
  local actual=$(sha256sum "$file" | awk '{print $1}')
  [ "$actual" = "$expected" ]
}

13.6 Kotlin Decoder Reference

For Android implementations using javax.crypto:

import java.io.ByteArrayInputStream
import java.security.MessageDigest
import java.util.zip.GZIPInputStream
import javax.crypto.Cipher
import javax.crypto.Mac
import javax.crypto.spec.IvParameterSpec
import javax.crypto.spec.SecretKeySpec

/**
 * Decrypt a single file entry from the archive.
 *
 * @param ciphertext The encrypted data (encrypted_size bytes from the data block)
 * @param iv The 16-byte IV from the file table entry
 * @param key The 32-byte AES key
 * @return Decrypted data (after PKCS7 unpadding, which is automatic)
 */
fun decryptFileEntry(ciphertext: ByteArray, iv: ByteArray, key: ByteArray): ByteArray {
    val cipher = Cipher.getInstance("AES/CBC/PKCS5Padding")
    // Note: PKCS5Padding in Java/Android == PKCS7 for 16-byte blocks
    val secretKey = SecretKeySpec(key, "AES")
    val ivSpec = IvParameterSpec(iv)
    cipher.init(Cipher.DECRYPT_MODE, secretKey, ivSpec)
    return cipher.doFinal(ciphertext)
}

/**
 * Verify HMAC-SHA-256 of IV || ciphertext.
 *
 * @param iv The 16-byte IV
 * @param ciphertext The encrypted data
 * @param key The 32-byte key (same as AES key in v1)
 * @param expectedHmac The 32-byte HMAC from the file table entry
 * @return true if HMAC matches
 */
fun verifyHmac(iv: ByteArray, ciphertext: ByteArray, key: ByteArray, expectedHmac: ByteArray): Boolean {
    val mac = Mac.getInstance("HmacSHA256")
    mac.init(SecretKeySpec(key, "HmacSHA256"))
    mac.update(iv)
    mac.update(ciphertext)
    val computed = mac.doFinal()
    return computed.contentEquals(expectedHmac)
}

/**
 * Decompress gzip data.
 *
 * @param compressed Gzip-compressed data
 * @return Decompressed data
 */
fun decompressGzip(compressed: ByteArray): ByteArray {
    return GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
}

/**
 * Verify SHA-256 checksum of extracted content.
 *
 * @param data The decompressed file content
 * @param expectedSha256 The 32-byte SHA-256 from the file table entry
 * @return true if checksum matches
 */
fun verifySha256(data: ByteArray, expectedSha256: ByteArray): Boolean {
    val digest = MessageDigest.getInstance("SHA-256")
    val computed = digest.digest(data)
    return computed.contentEquals(expectedSha256)
}

Full decode flow in Kotlin:

// For each file entry:
// 1. Read ciphertext from data_offset (encrypted_size bytes)
// 2. Verify HMAC BEFORE decryption
if (!verifyHmac(entry.iv, ciphertext, key, entry.hmac)) {
    throw SecurityException("HMAC verification failed for ${entry.name}")
}
// 3. Decrypt
val compressed = decryptFileEntry(ciphertext, entry.iv, key)
// 4. Decompress if needed
val original = if (entry.compressionFlag == 1) decompressGzip(compressed) else compressed
// 5. Verify SHA-256
if (!verifySha256(original, entry.sha256)) {
    throw SecurityException("SHA-256 verification failed for ${entry.name}")
}
// 6. Write to file
File(outputDir, entry.name).writeBytes(original)