Files
android-encrypted-archiver/.planning/phases/07-format-spec-update/07-01-PLAN.md
2026-02-26 21:13:34 +03:00

13 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
07-format-spec-update 01 execute 1
docs/FORMAT.md
true
FMT-09
FMT-10
FMT-11
FMT-12
truths artifacts key_links
FORMAT.md defines entry_type field (1 byte, u8) in File Table Entry: 0x00=file, 0x01=directory
FORMAT.md defines permissions field (2 bytes, u16 LE) in File Table Entry with POSIX mode_t lower 12 bits
FORMAT.md specifies entry names are relative paths using / separator (e.g. dir/subdir/file.txt)
FORMAT.md worked example includes a directory archive with nested directory, file inside it, and empty directory
FORMAT.md version field is bumped to 2 reflecting the v1.1 format changes
Entry size formula is updated to include entry_type (1 byte) and permissions (2 bytes)
path provides contains
docs/FORMAT.md Complete v1.1 binary format specification entry_type.*u8
from to via pattern
docs/FORMAT.md Section 5 (File Table Entry) docs/FORMAT.md Section 12 (Worked Example) New TOC fields (entry_type, permissions) appear in both definition and worked example entry_type.*permissions
Update FORMAT.md to fully document the v1.1 TOC entry layout with entry type, permission bits, and relative path semantics.

Purpose: All three decoders (Rust, Kotlin, Shell) need an unambiguous specification to build their v1.1 directory support against. This phase updates the normative format document before any code changes.

Output: Updated docs/FORMAT.md with v1.1 TOC entry fields and a new worked example showing a directory archive.

<execution_context> @/home/nick/.claude/get-shit-done/workflows/execute-plan.md @/home/nick/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @docs/FORMAT.md

Key decisions from STATE.md:

  • v1.1: No backward compatibility with v1.0 archives (format version bump to 2)
  • v1.1: Only mode bits (no uid/gid, no timestamps, no symlinks)
  • v1.0: Filename-only entry names -- v1.1 changes this to relative paths with / separator

Existing FORMAT.md patterns (from Phase 1):

  • Field table pattern: offset, size, type, endian, field name, description for every binary structure
  • Worked example pattern: concrete inputs -> pipeline walkthrough -> hex dump -> shell decode commands
  • Entry size formula: 101 + name_length bytes per entry
  • All offsets absolute from archive byte 0
Task 1: Update TOC entry definition with entry_type, permissions, and path semantics docs/FORMAT.md Update docs/FORMAT.md with the following changes. Preserve the existing document structure and style conventions (field tables, notation, etc.).

1. Version bump (Section 1 and header):

  • Change document version from "1.0" to "1.1" in the front matter
  • Note that format version field in archives is now 2 (header byte at offset 0x04)
  • In Section 11 (Version Compatibility), add that v2 introduces entry_type and permissions fields

2. Section 2 (Notation Conventions):

  • Update the filenames note: change "Filenames are UTF-8 encoded" to "Entry names are UTF-8 encoded relative paths using / as the path separator (e.g., dir/subdir/file.txt). Names MUST NOT start with / or contain .. components. For top-level files, the name is just the filename (e.g., readme.txt)."

3. Section 3 (Archive Structure Diagram):

  • Update the TOC description comment: entries now represent files AND directories

4. Section 4 (Archive Header):

  • Change version field description: "Format version. Value 2 for this specification (v1.1). Value 1 for legacy v1.0 (no directory support)."
  • In the file_count field, rename to entry_count and update description: "Number of entries (files and directories) stored in the archive."
  • Update the toc_offset, toc_size field descriptions to reference "entry table" where they say "file table"

5. Section 5 (File Table Entry Definition) -- the core change:

Rename section title to "Table of Contents (TOC) Entry Definition" for clarity.

Add two new fields to the Entry Field Table AFTER name and BEFORE original_size:

Field Size Type Endian Description
entry_type 1 u8 - Entry type: 0x00 = regular file, 0x01 = directory. Directories have original_size, compressed_size, and encrypted_size all set to 0 and no corresponding data block.
permissions 2 u16 LE Unix permission bits (lower 12 bits of POSIX mode_t). Bit layout: [suid(1)][sgid(1)][sticky(1)][owner_rwx(3)][group_rwx(3)][other_rwx(3)]. Example: 0o755 = 0x01ED = owner rwx, group r-x, other r-x. Stored as u16 LE.

Add a subsection "### Entry Type Values" with a table:

Value Name Description
0x00 File Regular file. Has associated data block with ciphertext. All size fields and data_offset are meaningful.
0x01 Directory Directory entry. original_size, compressed_size, encrypted_size are all 0. data_offset is 0. iv is zero-filled. hmac is zero-filled. sha256 is zero-filled. compression_flag is 0. No data block exists for this entry.

Add a subsection "### Permission Bits Layout" with a table:

Bits Mask Name Description
11 0o4000 setuid Set user ID on execution
10 0o2000 setgid Set group ID on execution
9 0o1000 sticky Sticky bit
8-6 0o0700 owner Owner read(4)/write(2)/execute(1)
5-3 0o0070 group Group read(4)/write(2)/execute(1)
2-0 0o0007 other Other read(4)/write(2)/execute(1)

Common examples: 0o755 (rwxr-xr-x) = 0x01ED, 0o644 (rw-r--r--) = 0x01A4, 0o700 (rwx------) = 0x01C0.

Add a subsection "### Entry Name Semantics" explaining:

  • Names are relative paths from the archive root, using / as separator
  • Example: a file at project/src/main.rs has name project/src/main.rs
  • A directory entry for project/src/ has name project/src (no trailing slash)
  • Names MUST NOT start with / (no absolute paths)
  • Names MUST NOT contain .. components (no directory traversal)
  • The encoder MUST sort entries so that directory entries appear before any files within them (parent-before-child ordering). This allows the decoder to mkdir -p or create directories in a single sequential pass.

6. Update Entry Size Formula:

  • Old: entry_size = 101 + name_length bytes
  • New: entry_size = 104 + name_length bytes (added 1 byte entry_type + 2 bytes permissions = +3)

7. Section 6 (Data Block Layout):

  • Add note: "Directory entries (entry_type = 0x01) have no data block. The decoder MUST skip directory entries when processing data blocks."

8. Section 10 (Decode Order of Operations):

  • In step 3, update version check: "Read version (must be 2 for v1.1)"
  • In step 5, add substep before reading ciphertext: "Check entry_type. If 0x01 (directory): create the directory using the entry name as a relative path, apply permissions, and skip to the next entry (no ciphertext to read)."
  • In step 5f (Write to output), add: "Create parent directories as needed (using the path components of the entry name). Apply permissions from the entry's permissions field." grep -c "entry_type" docs/FORMAT.md | xargs test 5 -le
  • Section 5 has entry_type (u8) and permissions (u16 LE) fields in the Entry Field Table
  • Entry type values table documents 0x00=file, 0x01=directory
  • Permission bits layout table with POSIX mode_t lower 12 bits
  • Entry name semantics subsection specifies relative paths with / separator
  • Entry size formula updated to 104 + name_length
  • Decode order updated for directory handling
  • Version bumped to 2
Task 2: Write updated worked example with directory archive docs/FORMAT.md Replace Section 12 (Worked Example) in docs/FORMAT.md with a new worked example that demonstrates the v1.1 directory archive format. Keep the old example as Section 12.1 with a note "(v1.0, retained for reference)" and add the new example as Section 12.2.

Actually, to avoid confusion, REPLACE the entire worked example with a new v1.1 example. The v1.0 example is no longer valid (version field changed, entry format changed).

New Worked Example: Directory Archive

Use the following input structure:

project/
  project/src/           (directory, mode 0755)
  project/src/main.rs    (file, mode 0644, content: "fn main() {}\n" = 14 bytes)
  project/empty/          (empty directory, mode 0755)

This demonstrates:

  • A nested directory (project/src/)
  • A file inside a nested directory (project/src/main.rs)
  • An empty directory (project/empty/)
  • Three entry types total: 2 directories + 1 file

Parameters:

  • Key: same 32 bytes as v1.0 example (00 01 02 ... 1F)
  • Flags: 0x01 (compression enabled, no obfuscation -- keep example simple)
  • Version: 2

Per-entry walkthrough:

Entry 1: project/src (directory)

  • entry_type: 0x01
  • permissions: 0o755 = 0x01ED (LE: ED 01)
  • name: "project/src" (11 bytes)
  • original_size: 0, compressed_size: 0, encrypted_size: 0
  • data_offset: 0, iv: zero-filled, hmac: zero-filled, sha256: zero-filled
  • compression_flag: 0, padding_after: 0

Entry 2: project/src/main.rs (file)

  • entry_type: 0x00
  • permissions: 0o644 = 0x01A4 (LE: A4 01)
  • name: "project/src/main.rs" (19 bytes)
  • original_size: 14
  • SHA-256 of "fn main() {}\n": compute the real hash
  • compressed_size: representative (e.g., 30 bytes for small gzip output)
  • encrypted_size: ((30/16)+1)*16 = 32
  • IV: representative (e.g., AA BB CC DD EE FF 00 11 22 33 44 55 66 77 88 99)
  • hmac: representative, sha256: real value
  • compression_flag: 1, padding_after: 0

Entry 3: project/empty (directory)

  • entry_type: 0x01
  • permissions: 0o755 = 0x01ED (LE: ED 01)
  • name: "project/empty" (13 bytes)
  • All sizes 0, data_offset 0, iv/hmac/sha256 zero-filled
  • compression_flag: 0, padding_after: 0

Layout table: Compute all offsets using the new entry size formula (104 + name_length per entry):

  • Header: 40 bytes (0x00 - 0x27)
  • TOC Entry 1: 104 + 11 = 115 bytes
  • TOC Entry 2: 104 + 19 = 123 bytes
  • TOC Entry 3: 104 + 13 = 117 bytes
  • TOC total: 115 + 123 + 117 = 355 bytes
  • Data block 1 (only file entry): starts at 40 + 355 = 395, size = 32 bytes
  • Archive total: 395 + 32 = 427 bytes

Include:

  1. Input description table (entries, types, permissions, content)
  2. Parameters (key, flags, version)
  3. Per-entry pipeline walkthrough (SHA-256 for the file, show directory entries have all-zero crypto fields)
  4. Archive layout offset table with CHECK verification
  5. Header hex table (version=2, entry_count=3)
  6. Each TOC entry hex table showing entry_type and permissions fields
  7. Data block hex (only 1 block for the single file)
  8. Complete annotated hex dump
  9. Updated shell decode walkthrough showing directory handling: "if entry_type is 0x01, mkdir -p and chmod, then skip to next entry"

Style: Follow exact same conventions as v1.0 worked example -- field tables, offset verification formulas, annotated hex dump format, shell decode walkthrough. grep -c "project/src/main.rs" docs/FORMAT.md | xargs test 3 -le

  • Worked example shows 3 entries: 2 directories (project/src, project/empty) and 1 file (project/src/main.rs)
  • Each entry shows entry_type and permissions fields in hex tables
  • Directory entries show all-zero crypto fields (iv, hmac, sha256, sizes)
  • File entry shows full crypto pipeline (SHA-256, gzip, PKCS7, AES-CBC, HMAC)
  • Archive layout table has internally consistent offsets verified by formulas
  • Annotated hex dump covers all bytes
  • Shell decode walkthrough handles directory entries (mkdir -p + chmod)
After both tasks complete, verify:
  1. grep -c "entry_type" docs/FORMAT.md returns >= 5 (field table + entry type values + worked example + decode order)
  2. grep -c "permissions" docs/FORMAT.md returns >= 5 (field table + permission bits layout + worked example entries)
  3. grep "entry_size = 104" docs/FORMAT.md returns the updated formula
  4. grep "project/src/main.rs" docs/FORMAT.md returns matches in the worked example
  5. grep "project/empty" docs/FORMAT.md returns matches showing the empty directory entry
  6. grep "version.*2" docs/FORMAT.md returns the bumped version
  7. No stale v1.0 references (check that entry_size formula no longer says 101)

<success_criteria>

  1. FORMAT.md Section 5 defines entry_type (1 byte, u8) and permissions (2 bytes, u16 LE) fields in the TOC entry
  2. Entry type values table distinguishes files (0x00) from directories (0x01) with clear rules for zero-filled fields on directories
  3. Permission bits table matches POSIX mode_t lower 12 bits with examples (0o755, 0o644)
  4. Entry names documented as relative paths with / separator, no leading /, no ..
  5. Worked example includes nested directory, file, and empty directory with correct offsets
  6. Entry size formula is 104 + name_length (was 101 + name_length)
  7. Version bumped to 2
  8. Decode order of operations updated for directory entry handling </success_criteria>
After completion, create `.planning/phases/07-format-spec-update/07-01-SUMMARY.md`