docs(07-format-spec-update): create phase plan

This commit is contained in:
NikitolProject
2026-02-26 21:13:34 +03:00
parent a716d09178
commit a7c3e009c9
2 changed files with 278 additions and 2 deletions

View File

@@ -141,7 +141,10 @@ Plans:
2. FORMAT.md defines the Unix permissions field (2 bytes, u16 little-endian) in TOC entries with bit layout matching POSIX mode_t lower 12 bits 2. FORMAT.md defines the Unix permissions field (2 bytes, u16 little-endian) in TOC entries with bit layout matching POSIX mode_t lower 12 bits
3. FORMAT.md specifies that entry names are relative paths using `/` as separator (e.g., `dir/subdir/file.txt`), replacing the previous filename-only convention 3. FORMAT.md specifies that entry names are relative paths using `/` as separator (e.g., `dir/subdir/file.txt`), replacing the previous filename-only convention
4. FORMAT.md includes an updated worked example showing a directory archive with at least one nested directory, one file, and one empty directory 4. FORMAT.md includes an updated worked example showing a directory archive with at least one nested directory, one file, and one empty directory
**Plans**: TBD **Plans**: 1 plan
Plans:
- [ ] 07-01-PLAN.md -- Update TOC entry definition (entry_type, permissions, path semantics) and worked example with directory archive
### Phase 8: Rust Directory Archiver ### Phase 8: Rust Directory Archiver
**Goal**: `pack` accepts directories and recursively archives them with full path hierarchy and permissions; `unpack` restores the complete directory tree **Goal**: `pack` accepts directories and recursively archives them with full path hierarchy and permissions; `unpack` restores the complete directory tree
@@ -202,7 +205,7 @@ Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9 -> 10
| 4. Kotlin Decoder | v1.0 | 1/1 | Complete | 2026-02-25 | | 4. Kotlin Decoder | v1.0 | 1/1 | Complete | 2026-02-25 |
| 5. Shell Decoder | v1.0 | 2/2 | Complete | 2026-02-25 | | 5. Shell Decoder | v1.0 | 2/2 | Complete | 2026-02-25 |
| 6. Obfuscation Hardening | v1.0 | 2/2 | Complete | 2026-02-25 | | 6. Obfuscation Hardening | v1.0 | 2/2 | Complete | 2026-02-25 |
| 7. Format Spec Update | v1.1 | 0/TBD | Not started | - | | 7. Format Spec Update | v1.1 | 0/1 | Planned | - |
| 8. Rust Directory Archiver | v1.1 | 0/TBD | Not started | - | | 8. Rust Directory Archiver | v1.1 | 0/TBD | Not started | - |
| 9. Kotlin Decoder Update | v1.1 | 0/TBD | Not started | - | | 9. Kotlin Decoder Update | v1.1 | 0/TBD | Not started | - |
| 10. Shell Decoder Update | v1.1 | 0/TBD | Not started | - | | 10. Shell Decoder Update | v1.1 | 0/TBD | Not started | - |

View File

@@ -0,0 +1,273 @@
---
phase: 07-format-spec-update
plan: 01
type: execute
wave: 1
depends_on: []
files_modified: [docs/FORMAT.md]
autonomous: true
requirements: [FMT-09, FMT-10, FMT-11, FMT-12]
must_haves:
truths:
- "FORMAT.md defines entry_type field (1 byte, u8) in File Table Entry: 0x00=file, 0x01=directory"
- "FORMAT.md defines permissions field (2 bytes, u16 LE) in File Table Entry with POSIX mode_t lower 12 bits"
- "FORMAT.md specifies entry names are relative paths using / separator (e.g. dir/subdir/file.txt)"
- "FORMAT.md worked example includes a directory archive with nested directory, file inside it, and empty directory"
- "FORMAT.md version field is bumped to 2 reflecting the v1.1 format changes"
- "Entry size formula is updated to include entry_type (1 byte) and permissions (2 bytes)"
artifacts:
- path: "docs/FORMAT.md"
provides: "Complete v1.1 binary format specification"
contains: "entry_type.*u8"
key_links:
- from: "docs/FORMAT.md Section 5 (File Table Entry)"
to: "docs/FORMAT.md Section 12 (Worked Example)"
via: "New TOC fields (entry_type, permissions) appear in both definition and worked example"
pattern: "entry_type.*permissions"
---
<objective>
Update FORMAT.md to fully document the v1.1 TOC entry layout with entry type, permission bits, and relative path semantics.
Purpose: All three decoders (Rust, Kotlin, Shell) need an unambiguous specification to build their v1.1 directory support against. This phase updates the normative format document before any code changes.
Output: Updated `docs/FORMAT.md` with v1.1 TOC entry fields and a new worked example showing a directory archive.
</objective>
<execution_context>
@/home/nick/.claude/get-shit-done/workflows/execute-plan.md
@/home/nick/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@docs/FORMAT.md
Key decisions from STATE.md:
- v1.1: No backward compatibility with v1.0 archives (format version bump to 2)
- v1.1: Only mode bits (no uid/gid, no timestamps, no symlinks)
- v1.0: Filename-only entry names -- v1.1 changes this to relative paths with `/` separator
Existing FORMAT.md patterns (from Phase 1):
- Field table pattern: offset, size, type, endian, field name, description for every binary structure
- Worked example pattern: concrete inputs -> pipeline walkthrough -> hex dump -> shell decode commands
- Entry size formula: `101 + name_length bytes` per entry
- All offsets absolute from archive byte 0
</context>
<tasks>
<task type="auto">
<name>Task 1: Update TOC entry definition with entry_type, permissions, and path semantics</name>
<files>docs/FORMAT.md</files>
<action>
Update docs/FORMAT.md with the following changes. Preserve the existing document structure and style conventions (field tables, notation, etc.).
**1. Version bump (Section 1 and header):**
- Change document version from "1.0" to "1.1" in the front matter
- Note that format version field in archives is now `2` (header byte at offset 0x04)
- In Section 11 (Version Compatibility), add that v2 introduces entry_type and permissions fields
**2. Section 2 (Notation Conventions):**
- Update the filenames note: change "Filenames are UTF-8 encoded" to "Entry names are UTF-8 encoded relative paths using `/` as the path separator (e.g., `dir/subdir/file.txt`). Names MUST NOT start with `/` or contain `..` components. For top-level files, the name is just the filename (e.g., `readme.txt`)."
**3. Section 3 (Archive Structure Diagram):**
- Update the TOC description comment: entries now represent files AND directories
**4. Section 4 (Archive Header):**
- Change version field description: "Format version. Value `2` for this specification (v1.1). Value `1` for legacy v1.0 (no directory support)."
- In the `file_count` field, rename to `entry_count` and update description: "Number of entries (files and directories) stored in the archive."
- Update the toc_offset, toc_size field descriptions to reference "entry table" where they say "file table"
**5. Section 5 (File Table Entry Definition) -- the core change:**
Rename section title to "Table of Contents (TOC) Entry Definition" for clarity.
Add two new fields to the Entry Field Table AFTER `name` and BEFORE `original_size`:
| Field | Size | Type | Endian | Description |
|-------|------|------|--------|-------------|
| `entry_type` | 1 | u8 | - | Entry type: `0x00` = regular file, `0x01` = directory. Directories have `original_size`, `compressed_size`, and `encrypted_size` all set to 0 and no corresponding data block. |
| `permissions` | 2 | u16 | LE | Unix permission bits (lower 12 bits of POSIX `mode_t`). Bit layout: `[suid(1)][sgid(1)][sticky(1)][owner_rwx(3)][group_rwx(3)][other_rwx(3)]`. Example: `0o755` = `0x01ED` = owner rwx, group r-x, other r-x. Stored as u16 LE. |
Add a subsection "### Entry Type Values" with a table:
| Value | Name | Description |
|-------|------|-------------|
| `0x00` | File | Regular file. Has associated data block with ciphertext. All size fields and data_offset are meaningful. |
| `0x01` | Directory | Directory entry. `original_size`, `compressed_size`, `encrypted_size` are all 0. `data_offset` is 0. `iv` is zero-filled. `hmac` is zero-filled. `sha256` is zero-filled. `compression_flag` is 0. No data block exists for this entry. |
Add a subsection "### Permission Bits Layout" with a table:
| Bits | Mask | Name | Description |
|------|------|------|-------------|
| 11 | `0o4000` | setuid | Set user ID on execution |
| 10 | `0o2000` | setgid | Set group ID on execution |
| 9 | `0o1000` | sticky | Sticky bit |
| 8-6 | `0o0700` | owner | Owner read(4)/write(2)/execute(1) |
| 5-3 | `0o0070` | group | Group read(4)/write(2)/execute(1) |
| 2-0 | `0o0007` | other | Other read(4)/write(2)/execute(1) |
Common examples: `0o755` (rwxr-xr-x) = `0x01ED`, `0o644` (rw-r--r--) = `0x01A4`, `0o700` (rwx------) = `0x01C0`.
Add a subsection "### Entry Name Semantics" explaining:
- Names are relative paths from the archive root, using `/` as separator
- Example: a file at `project/src/main.rs` has name `project/src/main.rs`
- A directory entry for `project/src/` has name `project/src` (no trailing slash)
- Names MUST NOT start with `/` (no absolute paths)
- Names MUST NOT contain `..` components (no directory traversal)
- The encoder MUST sort entries so that directory entries appear before any files within them (parent-before-child ordering). This allows the decoder to `mkdir -p` or create directories in a single sequential pass.
**6. Update Entry Size Formula:**
- Old: `entry_size = 101 + name_length bytes`
- New: `entry_size = 104 + name_length bytes` (added 1 byte entry_type + 2 bytes permissions = +3)
**7. Section 6 (Data Block Layout):**
- Add note: "Directory entries (entry_type = 0x01) have no data block. The decoder MUST skip directory entries when processing data blocks."
**8. Section 10 (Decode Order of Operations):**
- In step 3, update version check: "Read version (must be 2 for v1.1)"
- In step 5, add substep before reading ciphertext: "Check entry_type. If 0x01 (directory): create the directory using the entry name as a relative path, apply permissions, and skip to the next entry (no ciphertext to read)."
- In step 5f (Write to output), add: "Create parent directories as needed (using the path components of the entry name). Apply permissions from the entry's `permissions` field."
</action>
<verify>
<automated>grep -c "entry_type" docs/FORMAT.md | xargs test 5 -le</automated>
</verify>
<done>
- Section 5 has entry_type (u8) and permissions (u16 LE) fields in the Entry Field Table
- Entry type values table documents 0x00=file, 0x01=directory
- Permission bits layout table with POSIX mode_t lower 12 bits
- Entry name semantics subsection specifies relative paths with `/` separator
- Entry size formula updated to 104 + name_length
- Decode order updated for directory handling
- Version bumped to 2
</done>
</task>
<task type="auto">
<name>Task 2: Write updated worked example with directory archive</name>
<files>docs/FORMAT.md</files>
<action>
Replace Section 12 (Worked Example) in docs/FORMAT.md with a new worked example that demonstrates the v1.1 directory archive format. Keep the old example as Section 12.1 with a note "(v1.0, retained for reference)" and add the new example as Section 12.2.
Actually, to avoid confusion, REPLACE the entire worked example with a new v1.1 example. The v1.0 example is no longer valid (version field changed, entry format changed).
**New Worked Example: Directory Archive**
Use the following input structure:
```
project/
project/src/ (directory, mode 0755)
project/src/main.rs (file, mode 0644, content: "fn main() {}\n" = 14 bytes)
project/empty/ (empty directory, mode 0755)
```
This demonstrates:
- A nested directory (`project/src/`)
- A file inside a nested directory (`project/src/main.rs`)
- An empty directory (`project/empty/`)
- Three entry types total: 2 directories + 1 file
**Parameters:**
- Key: same 32 bytes as v1.0 example (00 01 02 ... 1F)
- Flags: `0x01` (compression enabled, no obfuscation -- keep example simple)
- Version: `2`
**Per-entry walkthrough:**
Entry 1: `project/src` (directory)
- entry_type: 0x01
- permissions: 0o755 = 0x01ED (LE: ED 01)
- name: "project/src" (11 bytes)
- original_size: 0, compressed_size: 0, encrypted_size: 0
- data_offset: 0, iv: zero-filled, hmac: zero-filled, sha256: zero-filled
- compression_flag: 0, padding_after: 0
Entry 2: `project/src/main.rs` (file)
- entry_type: 0x00
- permissions: 0o644 = 0x01A4 (LE: A4 01)
- name: "project/src/main.rs" (19 bytes)
- original_size: 14
- SHA-256 of "fn main() {}\n": compute the real hash
- compressed_size: representative (e.g., 30 bytes for small gzip output)
- encrypted_size: ((30/16)+1)*16 = 32
- IV: representative (e.g., AA BB CC DD EE FF 00 11 22 33 44 55 66 77 88 99)
- hmac: representative, sha256: real value
- compression_flag: 1, padding_after: 0
Entry 3: `project/empty` (directory)
- entry_type: 0x01
- permissions: 0o755 = 0x01ED (LE: ED 01)
- name: "project/empty" (13 bytes)
- All sizes 0, data_offset 0, iv/hmac/sha256 zero-filled
- compression_flag: 0, padding_after: 0
**Layout table:**
Compute all offsets using the new entry size formula (104 + name_length per entry):
- Header: 40 bytes (0x00 - 0x27)
- TOC Entry 1: 104 + 11 = 115 bytes
- TOC Entry 2: 104 + 19 = 123 bytes
- TOC Entry 3: 104 + 13 = 117 bytes
- TOC total: 115 + 123 + 117 = 355 bytes
- Data block 1 (only file entry): starts at 40 + 355 = 395, size = 32 bytes
- Archive total: 395 + 32 = 427 bytes
**Include:**
1. Input description table (entries, types, permissions, content)
2. Parameters (key, flags, version)
3. Per-entry pipeline walkthrough (SHA-256 for the file, show directory entries have all-zero crypto fields)
4. Archive layout offset table with CHECK verification
5. Header hex table (version=2, entry_count=3)
6. Each TOC entry hex table showing entry_type and permissions fields
7. Data block hex (only 1 block for the single file)
8. Complete annotated hex dump
9. Updated shell decode walkthrough showing directory handling: "if entry_type is 0x01, mkdir -p and chmod, then skip to next entry"
**Style:** Follow exact same conventions as v1.0 worked example -- field tables, offset verification formulas, annotated hex dump format, shell decode walkthrough.
</action>
<verify>
<automated>grep -c "project/src/main.rs" docs/FORMAT.md | xargs test 3 -le</automated>
</verify>
<done>
- Worked example shows 3 entries: 2 directories (project/src, project/empty) and 1 file (project/src/main.rs)
- Each entry shows entry_type and permissions fields in hex tables
- Directory entries show all-zero crypto fields (iv, hmac, sha256, sizes)
- File entry shows full crypto pipeline (SHA-256, gzip, PKCS7, AES-CBC, HMAC)
- Archive layout table has internally consistent offsets verified by formulas
- Annotated hex dump covers all bytes
- Shell decode walkthrough handles directory entries (mkdir -p + chmod)
</done>
</task>
</tasks>
<verification>
After both tasks complete, verify:
1. `grep -c "entry_type" docs/FORMAT.md` returns >= 5 (field table + entry type values + worked example + decode order)
2. `grep -c "permissions" docs/FORMAT.md` returns >= 5 (field table + permission bits layout + worked example entries)
3. `grep "entry_size = 104" docs/FORMAT.md` returns the updated formula
4. `grep "project/src/main.rs" docs/FORMAT.md` returns matches in the worked example
5. `grep "project/empty" docs/FORMAT.md` returns matches showing the empty directory entry
6. `grep "version.*2" docs/FORMAT.md` returns the bumped version
7. No stale v1.0 references (check that entry_size formula no longer says 101)
</verification>
<success_criteria>
1. FORMAT.md Section 5 defines entry_type (1 byte, u8) and permissions (2 bytes, u16 LE) fields in the TOC entry
2. Entry type values table distinguishes files (0x00) from directories (0x01) with clear rules for zero-filled fields on directories
3. Permission bits table matches POSIX mode_t lower 12 bits with examples (0o755, 0o644)
4. Entry names documented as relative paths with `/` separator, no leading `/`, no `..`
5. Worked example includes nested directory, file, and empty directory with correct offsets
6. Entry size formula is 104 + name_length (was 101 + name_length)
7. Version bumped to 2
8. Decode order of operations updated for directory entry handling
</success_criteria>
<output>
After completion, create `.planning/phases/07-format-spec-update/07-01-SUMMARY.md`
</output>