Flash CTF - Cogwork

First look

cogwork is a static, stripped ELF. Run it and it just asks for a code:

cogwork> hunter2
Access denied.

Drop it in a decompiler and main is short, but the interesting part is a loop with a giant switch on a single byte pulled from a const array. There's no strcmp, no comparison against the flag anywhere. The check is happening inside that switch, which means it's an interpreter and the real logic lives in the byte array it walks. This is a bytecode VM.

Reversing the VM

Reading the switch arm by arm, each case pops/pushes an 8-bit stack and advances an instruction pointer ip. There's also an index register (call it i) and the program's three data tables: your input, an 8-byte KTAB, and a TGT table the same length as the expected code. The opcodes:

OpNameEffect0x10PUSH bpush immediate byte b (I guess we're rushing b)0x11POPdrop top0x12DUPduplicate top0x13LDINi = pop; push input[i]0x14LDKi = pop; push KTAB[i]0x15LDTi = pop; push TGT[i]0x16XORpush a ^ b0x17ADDpush a + b0x19ROL bpush rol8(pop, b)0x1aAND bpush pop & b0x1bNEQpush (a != b)0x20JMP off16ip = off0x21JNZ off16if pop: ip = off0x22JZ off16if !pop: ip = off0x30GETIpush index register i0x32INCIi++0x33LENpush the expected length0x34FAILmark the attempt failed0x35GETApush accumulator acc0x36ORAacc |= pop0x37RNDpush one byte read from /dev/urandom0x3fHALTstop

Disassembling the bytecode

The blob is 61 bytes, and it's two pieces. The first piece (0x00000x001f) is a noise routine; the actual check is the loop at 0x0020.

; --- noise prologue: random benign work driven by /dev/urandom ---
0000 RND ; n = random byte (loop count)
0001 DUP
0002 JZ 0x001f ; n == 0 -> done
0005 RND ; r = random byte
0006 DUP
0007 AND 1
0009 JZ 0x0014 ; even -> the other op-chain
000c RND ; odd: r ^ random, rotate, discard
000d XOR
000e ROL 3
0010 POP
0011 JMP 0x0019
0014 RND ; even: r + random, mask, discard
0015 ADD
0016 AND 63
0018 POP
0019 PUSH 1
001b SUB ; n--
001c JMP 0x0001
001f POP ; drop the spent counter

; --- the real check ---
0020 GETI ; push i
0021 LDIN ; push input[i]
0022 GETI ; push i
0023 AND 7 ; i & 7
0025 LDK ; push KTAB[i & 7]
0026 XOR ; input[i] ^ KTAB[i & 7]
0027 GETI
0028 ADD ; + i
0029 ROL 3 ; rol8(.., 3)
002b GETI
002c LDT ; push TGT[i]
002d NEQ ; (transformed != TGT[i]) -> 0/1
002e ORA ; acc |= mismatch (no early-out)
002f INCI ; i++
0030 GETI
0031 LEN
0032 NEQ ; (i != len) ?
0033 JNZ 0x0020 ; not done -> loop (always runs len times)
0036 GETA ; push acc
0037 JNZ 0x003b ; any byte differed -> reject
003a HALT ; acc == 0 -> accepted
003b FAIL ; reject
003c HALT

The noise prologue is a dead end. It reads a random count with RND, loops that many times, and each pass uses another random bit to pick one of two benign op-chains — all working on scratch values it immediately drops. It never touches input, i, or acc, so it changes nothing about the verdict; it just makes the executed op stream (and the host instruction count) different on every run. Skip it.

The real check is one transform applied to each input byte:

rol8((input[i] ^ KTAB[i & 7]) + i, 3) == TGT[i]

Note the check loop doesn't bail on the first wrong byte. Each mismatch is folded into acc with ORA, the loop always runs the full length, and a single GETA; JNZ at the end decides pass/fail. That's deliberate: branching out early would let you recover the code one character at a time by watching how far the VM gets (instruction count, single-stepping, ltrace) before it gives up. Here a wrong guess does identical work whether the first byte or the last byte is off — so even with the noise stripped out, the check leaks nothing per byte, and the RND-driven prologue buries the whole run in entropy on top of that.

Inverting it

Each step is reversible, so there's no brute force needed. Pull KTAB and TGT out of .rodata and run the transform backwards:

KTAB = [246, 165, 118, 211, 192, 23, 218, 62]
TGT = [45, 126, 9, 22, 133, 210, 205, 138, 172, 126, 130, 253, 30, 161, 181,
        35, 205, 71, 168, 134, 62, 234, 0, 59, 14, 7, 73, 22, 120, 140, 64,
        203, 47, 216, 209, 48, 198, 107, 238, 137, 205, 237, 249, 206]

def ror8(x, r):
    x &= 0xff
    return ((x >> r) | (x << (8 - r))) & 0xff

flag = bytearray()
for i, t in enumerate(TGT):
    x = ror8(t, 3)
    x = (x - i) & 0xff
    x ^= KTAB[i & 7]
    flag.append(x)

print(flag.decode())

The + i term is why the two repeated _n0t_-style chunks don't produce repeated bytes in TGT: the position is folded into every byte, so identical input characters encode differently depending on where they sit.

solve.py

#!/usr/bin/env python3
"""
Cogwork solve script.

After reversing the VM and disassembling the embedded bytecode, the per-byte
check is:

    x = input[i]
    x ^= KTAB[i & 7]
    x = (x + i) & 0xff
    x = rol8(x, 3)
    reject unless x == TGT[i]

Both tables are constants sitting in the binary's .rodata. Recover them, then
invert the transform to print the flag.

    ror8(TGT[i], 3) - i (mod 256), then xor KTAB[i & 7]
"""

# Recovered from the binary's .rodata.
KTAB = [246, 165, 118, 211, 192, 23, 218, 62]
TGT = [221, 14, 32, 173, 60, 66, 21, 98, 236, 244, 216, 125, 231, 147, 253,
       131, 164, 15, 81, 157, 157, 225, 86, 195, 101, 175, 224, 223, 253, 34,
       230, 97, 78, 103, 67, 86, 30, 252, 160, 4, 127, 127, 169]


def ror8(x, r):
    x &= 0xff
    return ((x >> r) | (x << (8 - r))) & 0xff


flag = bytearray()
for i, t in enumerate(TGT):
    x = ror8(t, 3)
    x = (x - i) & 0xff
    x ^= KTAB[i & 7]
    flag.append(x)

print(flag.decode())

Interested in joining our team? Let’s connect!