Commit 29e3af6

authored

Implement breakpoint disassembly support for Intel APX (#114120)

Teach the amd64 breakpoint disassembler about APX, specifically the REX2 and extended EVEX encodings. Update the tools to work with newer versions of gcc/gdb, such as handling new gdb output format in the parsing regular expressions. Due to these newer versions, there are differences in the non-APX tables, apparently due to gcc/gdb bug fixes and improvements (e.g., supporting instructions previously unsupported). Note that the APX code is untested due to lack of APX hardware. Also, the Windows SDK CONTEXT record does not define APX extended GPR (eGPR) registers yet, so accessing those registers is disabled. The tables were generated using the following versions of gcc/gdb on Ubuntu 24.04.2 LTS, in WSL2: ``` gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git ``` Details: 1. Change the "createOpcodes.cpp" tool to generate more varieties of possible instructions, to include encoding forms for REX2 and extended EVEX. 2. Change createOpcodes to always generate 16 bytes of codes. Previously, the parser looked for a "58" followed by "59 pop" to indicate the end of an instruction sequence. This failed in various cases. Note that the longest legal x86 instruction sequence is 15 bytes, as defined by the architecture. 3. Update the parser and table generation tool (Amd64InstructionTableGenerator.cs) to be able to parse REX2 and extended EVEX instructions, and generate a new EVEX table for EVEX map 4. 4. The parser was updated to handle new gdb disassembly formats, such as different whitespace usage (spaces versus tabs), and using the "BCST" tag to indicate EVEX embedded broadcast. 5. The native walker was updated to understand the new tables, including when to use them (thus, it needs to recognize REX2 and extended EVEX formats). 6. Fixed bugs in existing AVX-512 (EVEX) handling of `b`, `L'L`, and `pp` bits: they were being read from the wrong prefix byte. 7. There seem to be a couple existing bugs in `NativeWalker::Decode` which I annotated but did not feel confident fixing: a. the loop to read and process instruction prefixes only reads a single prefix. Thus, a case like 0x66 (operand size) followed by 0x40 (REX) improperly assumes the REX byte is the instruction opcode. b. if the instruction opcode (after the prefix) is 0xcc, `DebuggerController::GetPatchedOpcode()` is called to read the actual opcode, but it uses the wrong address to do so.

1 parent 54dbfa7 commit 29e3af6Copy full SHA for 29e3af6

4 files changed

+1930

-518

lines changed

src/coreclr/debug/ee/amd64
- amd64InstrDecode.h
- gen_amd64InstrDecode
  - Amd64InstructionTableGenerator.cs
  - createOpcodes.cpp
- walker.cpp

4 files changed

+1930

-518

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 29e3af6

4 files changed

4 files changed

File tree

4 files changed

4 files changed

0 commit comments