Files
android-encrypted-archiver/.planning/phases/04-kotlin-decoder/04-RESEARCH.md
2026-02-25 00:53:59 +03:00

24 KiB

Phase 4: Kotlin Decoder - Research

Researched: 2026-02-25 Domain: Kotlin/JVM binary format parsing, AES-256-CBC decryption, HMAC-SHA-256 verification, gzip decompression Confidence: HIGH

Summary

The Kotlin decoder must extract files from custom encrypted archives produced by the Rust archiver (Phase 2). The archive format is fully specified in docs/FORMAT.md with a 40-byte fixed header, variable-length TOC (table of contents), and per-file encrypted data blocks. The decoder must use only Android SDK built-in libraries: javax.crypto for AES/CBC/PKCS5Padding and HMAC-SHA256, java.security.MessageDigest for SHA-256 verification, java.util.zip.GZIPInputStream for decompression, and java.nio.ByteBuffer for little-endian integer parsing.

All required cryptographic primitives (AES/CBC/PKCS5Padding, HmacSHA256, SHA-256) are guaranteed available on Android 13 (API 33) without any third-party dependencies. The javax.crypto.Cipher class with "AES/CBC/PKCS5Padding" transformation is functionally identical to PKCS7 for 16-byte AES blocks -- Java's PKCS5Padding implementation handles 16-byte block sizes correctly despite the naming inconsistency. The FORMAT.md specification already contains a complete Kotlin reference implementation in Section 13.6 that validates the approach.

The primary complexity is in binary format parsing: reading little-endian integers from a byte stream, sequential TOC entry parsing with variable-length filenames, and correctly seeking to absolute data offsets for each file. The Kotlin decoder can be implemented as a single .kt file (or a small set of files) that runs on any JVM, testable via kotlinc + java -jar without a full Android project.

Primary recommendation: Implement as a standalone Kotlin file using java.nio.ByteBuffer.order(ByteOrder.LITTLE_ENDIAN) for integer parsing, RandomAccessFile for seeking to data blocks, and the exact crypto API calls already validated in FORMAT.md Section 13.6. Test by creating archives with the Rust CLI and decoding with the Kotlin decoder, comparing SHA-256 checksums.

<phase_requirements>

Phase Requirements

ID Description Research Support
KOT-01 Kotlin-code archive extraction on Android 13 without native libraries All required APIs (javax.crypto, java.util.zip, java.nio, java.security) are built into Android SDK API 33. No native libraries or third-party dependencies needed. Decoder can be a single .kt file.
KOT-02 Use javax.crypto (AES/CBC/PKCS5Padding) and java.util.zip.GZIPInputStream Cipher.getInstance("AES/CBC/PKCS5Padding") is guaranteed on all JVM/Android. PKCS5Padding == PKCS7 for 16-byte blocks. GZIPInputStream(ByteArrayInputStream(compressed)).readBytes() handles decompression.
KOT-03 Verify HMAC before decryption Mac.getInstance("HmacSHA256") with SecretKeySpec(key, "HmacSHA256"), mac.update(iv), mac.update(ciphertext), mac.doFinal() then contentEquals() comparison. Must reject file if HMAC fails -- no decryption attempt.
KOT-04 Verify SHA-256 checksum after decompression MessageDigest.getInstance("SHA-256").digest(decompressedData) then contentEquals() comparison against stored sha256 from TOC entry.
</phase_requirements>

Standard Stack

Core

Library Version Purpose Why Standard
javax.crypto.Cipher Android SDK built-in AES-256-CBC decryption with PKCS5Padding Guaranteed on all Android versions. "AES/CBC/PKCS5Padding" transformation.
javax.crypto.Mac Android SDK built-in HMAC-SHA256 computation Mac.getInstance("HmacSHA256") -- standard JCE provider.
javax.crypto.spec.SecretKeySpec Android SDK built-in Wraps raw 32-byte key for Cipher and Mac Standard JCE key specification class.
javax.crypto.spec.IvParameterSpec Android SDK built-in Wraps 16-byte IV for AES-CBC Standard JCE IV parameter class.
java.security.MessageDigest Android SDK built-in SHA-256 integrity checksum MessageDigest.getInstance("SHA-256") -- standard JCA provider.
java.util.zip.GZIPInputStream Android SDK built-in Gzip decompression Standard Java I/O for RFC 1952 gzip streams.
java.nio.ByteBuffer Android SDK built-in Little-endian integer parsing ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN) for getShort/getInt.
java.io.RandomAccessFile Android SDK built-in Binary file reading with seek Required for seeking to absolute data_offset for each file's ciphertext.

Supporting

Library Version Purpose When to Use
java.io.ByteArrayInputStream Android SDK built-in Wraps ByteArray for GZIPInputStream When decompressing decrypted data
java.io.File Android SDK built-in Output file creation When writing extracted files to disk
kotlin.io.readBytes() Kotlin stdlib Read entire file as ByteArray Alternative to RandomAccessFile for small archives

Alternatives Considered

Instead of Could Use Tradeoff
RandomAccessFile + seek File.readBytes() (read entire archive into memory) Simpler code but higher memory usage for large archives. For archives <100MB on Android, reading entirely into memory is acceptable.
ByteBuffer for LE integers Manual byte shifting with and 0xFF mask ByteBuffer is cleaner and less error-prone. Manual shifting requires careful masking of signed bytes.
Single .kt file Full Android project with Gradle Single file is simpler, testable on any JVM, easily embeddable into an Android app later.

No external dependencies required. Everything uses Android SDK / JVM standard library.

Architecture Patterns

kotlin/
├── ArchiveDecoder.kt       # Main decoder: header parsing, TOC parsing, file extraction
└── test_decoder.sh         # Shell script: create archive with Rust CLI, decode with Kotlin, verify

Alternatively (if splitting into multiple files is preferred for clarity):

kotlin/
├── ArchiveDecoder.kt       # Main entry point + decode orchestration
├── FormatParser.kt         # Header/TOC binary parsing
├── CryptoUtils.kt          # AES decrypt, HMAC verify, SHA-256 verify
└── test_decoder.sh         # Cross-validation test script

Pattern 1: Sequential Binary Parsing with ByteBuffer

What: Parse the archive format by reading bytes sequentially and using ByteBuffer with LITTLE_ENDIAN order for integer conversion. When to use: For all header and TOC entry parsing. Example:

// Source: java.nio.ByteBuffer JavaDoc + FORMAT.md Section 4-5
import java.nio.ByteBuffer
import java.nio.ByteOrder

fun readLeU16(data: ByteArray, offset: Int): Int {
    return ByteBuffer.wrap(data, offset, 2)
        .order(ByteOrder.LITTLE_ENDIAN)
        .short.toInt() and 0xFFFF  // Unsigned conversion
}

fun readLeU32(data: ByteArray, offset: Int): Long {
    return ByteBuffer.wrap(data, offset, 4)
        .order(ByteOrder.LITTLE_ENDIAN)
        .int.toLong() and 0xFFFFFFFFL  // Unsigned conversion
}

Pattern 2: Decode Pipeline per File

What: Process each file entry following FORMAT.md Section 10 decode order exactly. When to use: For every file in the archive. Example:

// Source: FORMAT.md Section 10 + Section 13.6
for (entry in tocEntries) {
    // 1. Read ciphertext from data_offset
    raf.seek(entry.dataOffset)
    val ciphertext = ByteArray(entry.encryptedSize)
    raf.readFully(ciphertext)

    // 2. Verify HMAC BEFORE decryption (Encrypt-then-MAC)
    if (!verifyHmac(entry.iv, ciphertext, KEY, entry.hmac)) {
        System.err.println("HMAC failed for ${entry.name}, skipping")
        continue
    }

    // 3. Decrypt (PKCS5Padding removes padding automatically)
    val decrypted = decryptAesCbc(ciphertext, entry.iv, KEY)

    // 4. Decompress if needed
    val original = if (entry.compressionFlag == 1) {
        GZIPInputStream(ByteArrayInputStream(decrypted)).readBytes()
    } else {
        decrypted
    }

    // 5. Verify SHA-256
    val hash = MessageDigest.getInstance("SHA-256").digest(original)
    if (!hash.contentEquals(entry.sha256)) {
        System.err.println("SHA-256 mismatch for ${entry.name}")
    }

    // 6. Write output
    File(outputDir, entry.name).writeBytes(original)
}

Pattern 3: Hardcoded Key as Byte Array

What: The 32-byte AES key is hardcoded as a constant byte array. When to use: Matches the Rust key.rs pattern. Example:

// Source: src/key.rs in Rust archiver
private val KEY = byteArrayOf(
    0x7A, 0x35, 0xC1.toByte(), 0xD9.toByte(), 0x4F, 0xE8.toByte(), 0x2B, 0x6A,
    0x91.toByte(), 0x0D, 0xF3.toByte(), 0x58, 0xBC.toByte(), 0x74, 0xA6.toByte(), 0x1E,
    0x42, 0x8F.toByte(), 0xD0.toByte(), 0x63, 0xE5.toByte(), 0x17, 0x9B.toByte(), 0x2C,
    0xFA.toByte(), 0x84.toByte(), 0x06, 0xCD.toByte(), 0x3E, 0x79, 0xB5.toByte(), 0x50,
)

Anti-Patterns to Avoid

  • Reading entire archive into memory for very large files: Use RandomAccessFile.seek() + readFully() to read only the ciphertext block needed. However, for the current use case (archives well under 1GB on Android), reading the entire file into a ByteArray is acceptable and simpler.
  • Using BouncyCastle provider: Deprecated on Android 9+. Use the default JCE provider (no provider argument to Cipher.getInstance()).
  • Comparing HMAC with == on arrays: Kotlin ByteArray uses reference equality with ==. Must use contentEquals() for value comparison.
  • Forgetting toByte() for values > 0x7F: Kotlin bytes are signed (-128 to 127). Literal values like 0xC1 require .toByte() cast. This is a compile error, not a runtime error, so it will be caught early.
  • Not masking signed byte to unsigned: When converting Byte.toInt(), values > 127 sign-extend to negative integers. Always use byte.toInt() and 0xFF or ByteBuffer with proper byte order.

Don't Hand-Roll

Problem Don't Build Use Instead Why
AES-256-CBC decryption Custom AES implementation Cipher.getInstance("AES/CBC/PKCS5Padding") Cipher handles PKCS7 unpadding automatically. Hand-rolling is insecure.
HMAC-SHA256 Custom HMAC construction Mac.getInstance("HmacSHA256") Correct MAC construction is subtle (inner/outer padding). Built-in is constant-time.
Little-endian integer parsing Manual bit shifting ByteBuffer.order(ByteOrder.LITTLE_ENDIAN).getInt() ByteBuffer handles sign extension and byte order correctly.
Gzip decompression Custom DEFLATE decoder GZIPInputStream Standard implementation handles all gzip header variations.
PKCS7 unpadding Manual padding removal Cipher's built-in PKCS5Padding Cipher.doFinal() handles unpadding. Manual removal risks padding oracle issues.

Key insight: The entire crypto pipeline (decrypt, verify HMAC, verify SHA-256, decompress) uses only 4 JCE/JCA classes. Zero hand-rolling needed.

Common Pitfalls

Pitfall 1: Signed Byte Arithmetic in Kotlin/Java

What goes wrong: Kotlin Byte is signed (-128 to 127). When constructing byte array literals with values > 0x7F (like 0xEA, 0xC1), the literal overflows without .toByte() cast. When converting bytes to integers, sign extension produces wrong values. Why it happens: JVM has no unsigned byte type. 0xEA.toByte() == -22, but (-22).toInt() == -22, not 234. How to avoid:

  • Always use .toByte() for byte literals > 0x7F: 0xEA.toByte()
  • Always mask when converting to Int: byte.toInt() and 0xFF
  • Use ByteBuffer for multi-byte integer parsing (handles this correctly) Warning signs: HMAC/SHA-256 mismatches, wrong offsets when parsing header fields, magic byte verification failures.

Pitfall 2: PKCS5Padding vs PKCS7Padding Naming

What goes wrong: Developer uses "AES/CBC/PKCS7Padding" which may not be recognized on all Android versions. Or developer manually removes PKCS7 padding after decryption. Why it happens: Java/Android JCE uses "PKCS5Padding" as the name even for 16-byte blocks. Technically PKCS5 is defined for 8-byte blocks, but Java's implementation correctly handles 16-byte AES blocks. How to avoid: Always use "AES/CBC/PKCS5Padding" (not PKCS7Padding). Never manually remove padding -- cipher.doFinal() handles it. Warning signs: NoSuchAlgorithmException at runtime, or corrupted decrypted output from double-unpadding.

Pitfall 3: Forgetting to Verify HMAC Before Decryption

What goes wrong: Decoder decrypts first, then checks HMAC. This violates Encrypt-then-MAC security and allows padding oracle attacks. Why it happens: Natural inclination to "decrypt and see if it worked." Spec explicitly mandates HMAC verification FIRST (FORMAT.md Section 7, Section 10 step 5b). How to avoid: Code structure must enforce: if (!verifyHmac(...)) { skip } before any cipher.doFinal() call. Warning signs: Code review reveals decrypt before HMAC check.

Pitfall 4: ByteArray contentEquals vs ==

What goes wrong: Using == to compare two ByteArray instances checks reference equality, not value equality. HMAC and SHA-256 verification always fails. Why it happens: Kotlin arrays inherit from Java arrays, which use reference equality for ==. How to avoid: Always use computed.contentEquals(expected) for byte array comparison. Warning signs: All HMAC checks fail even on valid archives.

Pitfall 5: TOC Parsing Offset Drift

What goes wrong: When parsing TOC entries sequentially, a parsing error in one entry causes all subsequent entries to be misaligned. Why it happens: TOC entries are variable-length (101 + name_length bytes). If name_length is misread, all subsequent field offsets are wrong. How to avoid: Parse TOC entries using a cursor/offset that advances by exactly the bytes consumed. Validate that after parsing all file_count entries, the cursor equals toc_size. Add assertion: assert(cursor == tocSize). Warning signs: First file extracts correctly, but second file fails HMAC or has corrupted data.

Pitfall 6: GZIPInputStream on Non-Compressed Data

What goes wrong: Trying to decompress data that has compression_flag = 0 through GZIPInputStream throws an exception (not valid gzip header). Why it happens: Ignoring the per-file compression_flag field and always decompressing. How to avoid: Check entry.compressionFlag == 1 before wrapping in GZIPInputStream. If 0, use decrypted bytes directly. Warning signs: java.util.zip.ZipException: Not in GZIP format.

Code Examples

Verified patterns from official sources and FORMAT.md:

Complete AES-256-CBC Decryption

// Source: FORMAT.md Section 13.6, Android Developers Cryptography docs
import javax.crypto.Cipher
import javax.crypto.spec.IvParameterSpec
import javax.crypto.spec.SecretKeySpec

fun decryptAesCbc(ciphertext: ByteArray, iv: ByteArray, key: ByteArray): ByteArray {
    val cipher = Cipher.getInstance("AES/CBC/PKCS5Padding")
    cipher.init(Cipher.DECRYPT_MODE, SecretKeySpec(key, "AES"), IvParameterSpec(iv))
    return cipher.doFinal(ciphertext)
}

Complete HMAC-SHA256 Verification

// Source: FORMAT.md Section 13.6, javax.crypto.Mac JavaDoc
import javax.crypto.Mac
import javax.crypto.spec.SecretKeySpec

fun verifyHmac(iv: ByteArray, ciphertext: ByteArray, key: ByteArray, expectedHmac: ByteArray): Boolean {
    val mac = Mac.getInstance("HmacSHA256")
    mac.init(SecretKeySpec(key, "HmacSHA256"))
    mac.update(iv)          // 16 bytes
    mac.update(ciphertext)  // encrypted_size bytes
    val computed = mac.doFinal()
    return computed.contentEquals(expectedHmac)
}

Complete SHA-256 Verification

// Source: FORMAT.md Section 13.6, java.security.MessageDigest JavaDoc
import java.security.MessageDigest

fun verifySha256(data: ByteArray, expectedSha256: ByteArray): Boolean {
    val digest = MessageDigest.getInstance("SHA-256")
    val computed = digest.digest(data)
    return computed.contentEquals(expectedSha256)
}

Gzip Decompression

// Source: java.util.zip.GZIPInputStream JavaDoc, Android Developer Reference
import java.io.ByteArrayInputStream
import java.util.zip.GZIPInputStream

fun decompressGzip(compressed: ByteArray): ByteArray {
    return GZIPInputStream(ByteArrayInputStream(compressed)).readBytes()
}

Header Parsing (40 bytes)

// Source: FORMAT.md Section 4
import java.nio.ByteBuffer
import java.nio.ByteOrder

data class ArchiveHeader(
    val version: Int,
    val flags: Int,
    val fileCount: Int,
    val tocOffset: Long,
    val tocSize: Long,
    val tocIv: ByteArray,
)

val MAGIC = byteArrayOf(0x00, 0xEA.toByte(), 0x72, 0x63)

fun parseHeader(data: ByteArray): ArchiveHeader {
    require(data.size >= 40) { "Header too short: ${data.size} bytes" }

    // Verify magic bytes
    require(data[0] == MAGIC[0] && data[1] == MAGIC[1] && data[2] == MAGIC[2] && data[3] == MAGIC[3]) {
        "Invalid magic bytes"
    }

    val version = data[4].toInt() and 0xFF
    require(version == 1) { "Unsupported version: $version" }

    val flags = data[5].toInt() and 0xFF
    require(flags and 0xF0 == 0) { "Unknown flags set: 0x${flags.toString(16)}" }

    val buf = ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN)
    val fileCount = buf.getShort(6).toInt() and 0xFFFF
    val tocOffset = buf.getInt(8).toLong() and 0xFFFFFFFFL
    val tocSize = buf.getInt(12).toLong() and 0xFFFFFFFFL

    val tocIv = data.copyOfRange(16, 32)

    return ArchiveHeader(version, flags, fileCount, tocOffset, tocSize, tocIv)
}

TOC Entry Parsing

// Source: FORMAT.md Section 5
data class TocEntry(
    val name: String,
    val originalSize: Long,
    val compressedSize: Long,
    val encryptedSize: Int,
    val dataOffset: Long,
    val iv: ByteArray,
    val hmac: ByteArray,
    val sha256: ByteArray,
    val compressionFlag: Int,
    val paddingAfter: Int,
)

fun parseTocEntry(data: ByteArray, offset: Int): Pair<TocEntry, Int> {
    val buf = ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN)
    var pos = offset

    // name_length (u16 LE)
    val nameLength = buf.getShort(pos).toInt() and 0xFFFF
    pos += 2

    // name (UTF-8 bytes)
    val name = String(data, pos, nameLength, Charsets.UTF_8)
    pos += nameLength

    // Fixed fields
    val originalSize = buf.getInt(pos).toLong() and 0xFFFFFFFFL; pos += 4
    val compressedSize = buf.getInt(pos).toLong() and 0xFFFFFFFFL; pos += 4
    val encryptedSize = buf.getInt(pos).toInt(); pos += 4  // Always fits in Int (< 4GB)
    val dataOffset = buf.getInt(pos).toLong() and 0xFFFFFFFFL; pos += 4

    val iv = data.copyOfRange(pos, pos + 16); pos += 16
    val hmac = data.copyOfRange(pos, pos + 32); pos += 32
    val sha256 = data.copyOfRange(pos, pos + 32); pos += 32

    val compressionFlag = data[pos].toInt() and 0xFF; pos += 1
    val paddingAfter = buf.getShort(pos).toInt() and 0xFFFF; pos += 2

    return Pair(TocEntry(name, originalSize, compressedSize, encryptedSize,
        dataOffset, iv, hmac, sha256, compressionFlag, paddingAfter), pos)
}

Hardcoded Key (matching Rust key.rs)

// Source: src/key.rs
private val KEY = byteArrayOf(
    0x7A, 0x35, 0xC1.toByte(), 0xD9.toByte(), 0x4F, 0xE8.toByte(), 0x2B, 0x6A,
    0x91.toByte(), 0x0D, 0xF3.toByte(), 0x58, 0xBC.toByte(), 0x74, 0xA6.toByte(), 0x1E,
    0x42, 0x8F.toByte(), 0xD0.toByte(), 0x63, 0xE5.toByte(), 0x17, 0x9B.toByte(), 0x2C,
    0xFA.toByte(), 0x84.toByte(), 0x06, 0xCD.toByte(), 0x3E, 0x79, 0xB5.toByte(), 0x50,
)

State of the Art

Old Approach Current Approach When Changed Impact
BouncyCastle provider for AES Default JCE provider (no provider arg) Android 9 (deprecated BC) Must NOT specify "BC" provider. Use Cipher.getInstance("AES/CBC/PKCS5Padding") without provider.
Crypto JCA provider Removed Android 9 (removed) Cannot use SecureRandom.getInstance("SHA1PRNG", "Crypto"). Not needed for decoder (no random generation).
Manual PKCS7 unpadding Cipher.doFinal() auto-strips Always Never manually strip PKCS7 padding.

Deprecated/outdated:

  • BouncyCastle provider ("BC"): Deprecated on Android. Use default provider.
  • Crypto JCA provider: Removed in Android 9.
  • Using PKCS7Padding string: Not universally available. PKCS5Padding is the correct JCE name.

Open Questions

  1. File structure: single file vs multiple files?

    • What we know: The decoder is conceptually simple (~200-300 lines). FORMAT.md Section 13.6 shows it as helper functions.
    • What's unclear: Whether the user prefers a single self-contained file or split by concern.
    • Recommendation: Start with a single ArchiveDecoder.kt file with a main() function for CLI testing. It can always be refactored into an Android class later.
  2. Error handling strategy: exceptions vs return codes?

    • What we know: Rust archiver uses anyhow::Result with continue on HMAC/SHA-256 failure. FORMAT.md Section 10 says "REJECT this file. Do NOT attempt decryption" for HMAC failure.
    • What's unclear: Exact behavior for Kotlin -- throw exception and abort entire archive, or skip file and continue?
    • Recommendation: Match Rust behavior: HMAC failure skips the file (print error, continue to next). SHA-256 mismatch warns but writes the file. Throw exception only for fatal errors (bad magic, wrong version).
  3. Testing approach: JVM-only or Android instrumented tests?

    • What we know: All APIs used (javax.crypto, java.util.zip, java.nio) are standard JVM APIs available without Android framework.
    • What's unclear: Whether tests should run on JVM only (faster, no device needed) or also on Android emulator.
    • Recommendation: JVM-only tests via kotlinc compilation. Create test archives with Rust CLI, decode with Kotlin, verify byte-identical output. No Android device/emulator needed for crypto validation.

Sources

Primary (HIGH confidence)

  • FORMAT.md Section 4-10, Section 13.6 -- Complete binary format specification and Kotlin reference code
  • Android Developers - Cryptography -- AES/CBC/PKCS5Padding recommended, HmacSHA256 supported, SHA-256 supported on all Android versions
  • java.nio.ByteBuffer JavaDoc -- ByteOrder.LITTLE_ENDIAN for getShort/getInt
  • javax.crypto.Cipher JavaDoc -- "AES/CBC/PKCS5Padding" transformation guaranteed
  • src/key.rs (lines 1-9) -- Exact 32-byte key bytes that Kotlin decoder must use identically
  • src/format.rs (lines 1-211) -- Rust implementation of header/TOC parsing that Kotlin must match
  • src/crypto.rs (lines 1-78) -- Rust implementation of encrypt/decrypt/HMAC/SHA-256 pipeline

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • None. All claims verified with primary or secondary sources.

Metadata

Confidence breakdown:

  • Standard stack: HIGH -- All APIs are Android SDK built-ins, verified against official Android docs
  • Architecture: HIGH -- FORMAT.md already provides Kotlin reference code (Section 13.6) and worked example (Section 12)
  • Pitfalls: HIGH -- Signed byte issues and PKCS5/7 naming are well-documented JVM pitfalls, verified with multiple sources

Research date: 2026-02-25 Valid until: 2026-03-25 (stable -- JVM crypto APIs do not change)