NikitolProject/android-encrypted-archiver

Files

NikitolProject fcd37f531b feat(01-01): write format specification with byte-level field definitions

- Archive header definition (40 bytes) with complete field table
- File table entry definition (11 fields, variable-length per entry)
- AES-256-CBC + HMAC-SHA-256 encryption pipeline with encrypt-then-MAC
- PKCS7 padding formula with 8 worked examples
- Gzip compression details with per-file flag
- Obfuscation features: XOR header, encrypted TOC, decoy padding
- Decode order of operations (full step-by-step)
- Version compatibility rules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-24 23:18:37 +03:00

20 KiB

Raw Blame History

Encrypted Archive Binary Format Specification

Version: 1.0 Date: 2026-02-24 Status: Normative

Overview and Design Goals
Notation Conventions
Archive Structure Diagram
Archive Header Definition
File Table Entry Definition
Data Block Layout
Encryption and Authentication Details
Compression Details
Obfuscation Features
Decode Order of Operations
Version Compatibility Rules
Worked Example
Appendix: Shell Decoder Reference

1. Overview and Design Goals

This document specifies the binary format for encrypted_archive -- a custom archive container designed to be unrecognizable by standard tools. Standard utilities (file, binwalk, 7z, tar, unzip) must not be able to identify or extract the contents of an archive produced in this format.

Target Decoders

Three independent implementations will build against this specification:

Rust CLI archiver (encrypted_archive pack/unpack) -- the reference encoder and primary decoder, runs on Linux/macOS.
Kotlin Android decoder -- runs on Android 13 (Qualcomm SoC) using only javax.crypto and java.util.zip. Primary extraction path on the target device.
Busybox shell decoder -- a fallback shell script using only standard busybox commands: dd, xxd, openssl, gunzip, and sh. Must work without external dependencies.

Core Constraint

The shell decoder must be able to parse the archive format using dd (for byte extraction), xxd (for hex conversion), and openssl enc (for AES-CBC decryption with raw key mode: -K/-iv/-nosalt). This constraint drives several design choices:

Fixed-size header at a known offset (no variable-length preamble before the TOC pointer)
Absolute offsets (no relative offset chains that require cumulative addition)
IVs stored in the file table, not embedded in data blocks (single dd call per extraction)
Little-endian integers (native byte order on ARM and x86)

2. Notation Conventions

Convention	Meaning
LE	Little-endian byte order
u8	Unsigned 8-bit integer (1 byte)
u16	Unsigned 16-bit integer (2 bytes)
u32	Unsigned 32-bit integer (4 bytes)
bytes	Raw byte sequence (no endianness)
Offset `0xNN`	Absolute byte offset from archive byte 0
Size	Always in bytes unless stated otherwise
`\|\|`	Concatenation of byte sequences

All multi-byte integers are little-endian (LE).
All sizes are in bytes unless stated otherwise.
All offsets are absolute from archive byte 0 (the first byte of the file).
Filenames are UTF-8 encoded, length-prefixed with a u16 byte count (NOT null-terminated).
Reserved fields are zero-filled and MUST be written as 0x00 bytes.

3. Archive Structure Diagram

+=======================================+
|          ARCHIVE HEADER               |  Fixed 40 bytes
|  magic(4) | ver(1) | flags(1)        |
|  file_count(2) | toc_offset(4)       |
|  toc_size(4) | toc_iv(16)            |
|  reserved(8)                          |
+=======================================+
|          FILE TABLE (TOC)             |  Variable size
|  Entry 1: name, sizes, offset,       |  Optionally encrypted
|    iv, hmac, sha256, flags            |  (see Section 9.2)
|  Entry 2: ...                         |
|  ...                                  |
|  Entry N: ...                         |
+=======================================+
|          DATA BLOCK 1                 |  encrypted_size bytes
|  [ciphertext]                         |
+---------------------------------------+
|          [DECOY PADDING 1]            |  Optional (see Section 9.3)
+---------------------------------------+
|          DATA BLOCK 2                 |  encrypted_size bytes
|  [ciphertext]                         |
+---------------------------------------+
|          [DECOY PADDING 2]            |  Optional (see Section 9.3)
+---------------------------------------+
|          ...                          |
+=======================================+

The archive consists of three contiguous regions:

Header (fixed 40 bytes) -- contains magic bytes, version, flags, and a pointer to the file table.
File Table (TOC) (variable size) -- contains one entry per archived file with all metadata needed for extraction.
Data Blocks (variable size) -- contains the encrypted (and optionally compressed) file contents, one block per file, optionally separated by decoy padding.

4. Archive Header Definition

The header is a fixed-size 40-byte structure at offset 0x00.

Offset	Size	Type	Endian	Field	Description
`0x00`	4	bytes	-	`magic`	Custom magic bytes: `0x00 0xEA 0x72 0x63`. The leading `0x00` signals binary content; the remaining bytes (`0xEA 0x72 0x63`) do not match any known file signature.
`0x04`	1	u8	-	`version`	Format version. Value `1` for this specification (v1).
`0x05`	1	u8	-	`flags`	Feature flags bitfield (see below).
`0x06`	2	u16	LE	`file_count`	Number of files stored in the archive.
`0x08`	4	u32	LE	`toc_offset`	Absolute byte offset of the file table from archive start.
`0x0C`	4	u32	LE	`toc_size`	Size of the file table in bytes (if TOC encryption is on, this is the encrypted size including PKCS7 padding).
`0x10`	16	bytes	-	`toc_iv`	Initialization vector for encrypted TOC. Zero-filled (`0x00` x 16) when TOC encryption flag (bit 1) is off.
`0x20`	8	bytes	-	`reserved`	Reserved for future use. MUST be zero-filled.

Total header size: 40 bytes (0x28).

Flags Bitfield

Bit	Mask	Name	Description
0	`0x01`	`compression`	Per-file compression enabled. When set, files MAY be individually gzip-compressed (per-file `compression_flag` controls each file). When clear, all files are stored raw.
1	`0x02`	`toc_encrypted`	File table is encrypted with AES-256-CBC using `toc_iv`. When clear, file table is stored as plaintext.
2	`0x04`	`xor_header`	Header bytes are XOR-obfuscated (see Section 9.1). When clear, header is stored as-is.
3	`0x08`	`decoy_padding`	Random decoy bytes are inserted after data blocks (see Section 9.3). When clear, `padding_after` in every file table entry is 0.
4-7	`0xF0`	reserved	Reserved. MUST be `0`.

5. File Table Entry Definition

The file table (TOC) is a contiguous sequence of variable-length entries, one per file. Entries are stored in the order files were added to the archive. There is no per-entry delimiter; entries are read sequentially using the name_length field to determine where each entry's variable-length name ends.

Entry Field Table

Field	Size	Type	Endian	Description
`name_length`	2	u16	LE	Filename length in bytes (UTF-8 encoded byte count).
`name`	`name_length`	bytes	-	Filename as UTF-8 bytes. NOT null-terminated. May contain path separators (`/`).
`original_size`	4	u32	LE	Original file size in bytes (before compression).
`compressed_size`	4	u32	LE	Size after gzip compression. Equals `original_size` if `compression_flag` is 0 (no compression).
`encrypted_size`	4	u32	LE	Size after AES-256-CBC encryption with PKCS7 padding. Formula: `((compressed_size / 16) + 1) * 16`.
`data_offset`	4	u32	LE	Absolute byte offset of this file's data block from archive start.
`iv`	16	bytes	-	Random AES-256-CBC initialization vector for this file.
`hmac`	32	bytes	-	HMAC-SHA-256 over `iv
`sha256`	32	bytes	-	SHA-256 hash of the original file content (before compression and encryption).
`compression_flag`	1	u8	-	`0` = raw (no compression), `1` = gzip compressed.
`padding_after`	2	u16	LE	Number of decoy padding bytes after this file's data block. Always `0` when flags bit 3 (decoy_padding) is off.

Entry Size Formula

Each file table entry has a total size of:

entry_size = 2 + name_length + 4 + 4 + 4 + 4 + 16 + 32 + 32 + 1 + 2
           = 101 + name_length bytes

File Table Total Size

The total file table size is the sum of all entry sizes:

toc_size = SUM(101 + name_length_i) for i in 0..file_count-1

When TOC encryption (flags bit 1) is active, the encrypted TOC size includes PKCS7 padding:

encrypted_toc_size = ((toc_size / 16) + 1) * 16

The toc_size field in the header stores the actual size on disk (encrypted size if TOC encryption is on, plaintext size if off).

6. Data Block Layout

Each file has a single contiguous data block containing only the ciphertext (the AES-256-CBC encrypted output).

[ciphertext: encrypted_size bytes]

Important design decisions:

The IV is stored only in the file table entry, not duplicated at the start of the data block. The data block contains only ciphertext. This simplifies dd extraction in the shell decoder: a single dd call with the correct offset and size extracts the complete ciphertext.
The HMAC is stored only in the file table entry, not appended to the data block. The decoder reads the HMAC from the TOC, then verifies against the data block contents.
If decoy padding is enabled (flags bit 3), padding_after bytes of random data follow the ciphertext. The decoder MUST skip these bytes. The next file's data block starts at offset data_offset + encrypted_size + padding_after.

Data Block Ordering

Data blocks appear in the same order as file table entries. For file entry i:

data_offset_0 = toc_offset + toc_size
data_offset_i = data_offset_{i-1} + encrypted_size_{i-1} + padding_after_{i-1}

7. Encryption and Authentication Details

Pipeline

Each file is processed through the following pipeline, in order:

original_file
    |
    v
[1. SHA-256 checksum] --> stored in file table entry as `sha256`
    |
    v
[2. Gzip compress] (if compression_flag = 1) --> compressed_data
    |                                             (size = compressed_size)
    v
[3. PKCS7 pad] --> padded_data
    |               (size = encrypted_size)
    v
[4. AES-256-CBC encrypt] (with random IV) --> ciphertext
    |                                          (size = encrypted_size)
    v
[5. HMAC-SHA-256] (over IV || ciphertext) --> stored in file table entry as `hmac`

AES-256-CBC

Key: 32 bytes (256 bits), hardcoded and shared across all three decoders.
IV: 16 bytes, randomly generated for each file. Stored in the file table entry iv field.
Block size: 16 bytes.
Mode: CBC (Cipher Block Chaining).
The same 32-byte key is used for all files in the archive.

PKCS7 Padding

PKCS7 padding is applied to the compressed (or raw) data before encryption. PKCS7 always adds at least 1 byte of padding. If the input length is already a multiple of 16, a full 16-byte padding block is added.

Formula:

encrypted_size = ((compressed_size / 16) + 1) * 16

Where / is integer division (floor).

Examples:

`compressed_size`	Padding bytes	`encrypted_size`
0	16	16
1	15	16
15	1	16
16	16	32
17	15	32
31	1	32
32	16	48
100	12	112

HMAC-SHA-256

Key: The same 32-byte key used for AES-256-CBC encryption. (v1 uses a single key for both encryption and authentication. v2 will derive separate subkeys using HKDF.)

Input: The concatenation of the 16-byte IV and the ciphertext:

HMAC_input = IV (16 bytes) || ciphertext (encrypted_size bytes)
Total HMAC input length = 16 + encrypted_size bytes

Output: 32 bytes, stored in the file table entry hmac field.

Encrypt-then-MAC

This format uses the Encrypt-then-MAC construction:

The HMAC is computed after encryption, over the IV and ciphertext.
The decoder MUST verify the HMAC before attempting decryption. If the HMAC does not match, the decoder MUST reject the file without decrypting. This prevents padding oracle attacks and avoids processing tampered data.

SHA-256 Integrity Checksum

Input: The original file content (before compression, before encryption).
Output: 32 bytes, stored in the file table entry sha256 field.
Verification: After the decoder decrypts and decompresses a file, it computes SHA-256 of the result and compares it to the stored sha256. A mismatch indicates data corruption or an incorrect key.

8. Compression Details

Algorithm: Standard gzip (DEFLATE, RFC 1952).
Granularity: Per-file. Each file has its own compression_flag in the file table entry.
Global flag: The header flags bit 0 (compression) enables per-file compression. When this bit is clear, ALL files are stored raw regardless of individual compression_flag values.
Recommendation: Already-compressed files (APK, ZIP, PNG, JPEG) should use compression_flag = 0 (raw) to avoid size inflation.

Size Tracking

original_size: Size of the file before any processing.
compressed_size: Size after gzip compression. If compression_flag = 0, then compressed_size = original_size.
encrypted_size: Size after AES-256-CBC with PKCS7 padding. Always >= compressed_size.

Decompression in Each Decoder

Decoder	Library/Command
Rust	`flate2` crate (`GzDecoder`)
Kotlin	`java.util.zip.GZIPInputStream`
Shell	`gunzip` (busybox)

9. Obfuscation Features

These features are defined fully in this v1 specification but are intended for implementation in Phase 6 (after all three decoders work without obfuscation). Each feature is controlled by a flag bit in the header and can be activated independently.

9.1 XOR Header Obfuscation (flags bit 2, mask `0x04`)

When flags bit 2 is set, the entire 40-byte header is XOR-obfuscated with a fixed repeating 8-byte key.

XOR Key: 0xA5 0x3C 0x96 0x0F 0xE1 0x7B 0x4D 0xC8 (8 bytes, repeating)

XOR Range: Bytes 0x00 through 0x27 (the entire 40-byte header).

Application:

XOR is applied after the header is fully constructed (all fields written).
The 8-byte key repeats cyclically across the 40 bytes: byte i of the header is XORed with key[i % 8].

Decoding:

The decoder reads the first 40 bytes and XORs them with the same repeating key (XOR is its own inverse).
After de-XOR, the decoder reads header fields normally.

Bootstrapping problem: When XOR obfuscation is active, the flags byte itself is XORed. The decoder MUST:

Always attempt de-XOR on the first 40 bytes.
Read the flags byte from the de-XORed header.
Check if bit 2 is set. If it is, the de-XOR was correct. If it is not, re-read the header from the original (un-XORed) bytes.

Alternatively, the decoder can check the magic bytes: if the first 4 bytes are 0x00 0xEA 0x72 0x63, the header is not XOR-obfuscated. If they are not, attempt de-XOR and re-check.

When flags bit 2 is 0: The header is stored as-is (no XOR).

9.2 TOC Encryption (flags bit 1, mask `0x02`)

When flags bit 1 is set, the entire file table is encrypted with AES-256-CBC.

Key: The same 32-byte key used for file encryption.
IV: The toc_iv field in the header (16 bytes, randomly generated).
Input: The serialized file table (all entries concatenated).
Padding: PKCS7 padding is applied to the entire serialized TOC.
toc_size in header: Stores the encrypted TOC size (including PKCS7 padding), not the plaintext size.