Skip to content

Conversation

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Nov 4, 2022

@markshannon Please see "PLEASE NOTE" below.

This is pretty straightforward: the expansion of

super(FOO__BAR) = FOO + BAR;

generates code like this:

TARGET(FOO__BAR) {
    {
        <code block for FOO>
    }
    NEXTOPARG();
    next_instr++;
    {
        <code block for BAR>
    }
    DISPATCH();
}

The only tricky thing is that the code block for LOAD_CONST contains a PREDICTED() macro call, so we strip ththoseat out. (Eventually we'll auto-generate the PREDICTED() macros so we won't need to do this any more, but super-instruction generation is a simpler project than auto-PREDICTED insertion, so I figured I'd tackle it first.)

The new super-instructions are all placed at the end (the parser doesn't keep track of how instruction definitions are interspersed), so the diff for generated_cases.c.h looks perhaps a bit more imposing than it is.

PLEASE NOTE: The code for some of the generated super-instructions is different than the hand-written super-instructions because somehow the hand-written versions moved the NEXTOPARG(); next_instr++; pair up a few lines. I don't know why the originals were written that way -- @markshannon do you remember? Do you think it matters? (Another difference is that the parts each define their own variables, whereas the hand-written versions reused PyObject *value, but I assume the optimizer will handle that.)

@gvanrossum
Copy link
Member Author

@CAM-Gerlach Can you double check the markup in the NEWS entry? What's the recommended way to refer to a markdown file in a different repo owned by a different org?

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few small suggestions to clarify the news entry and move the link to be inline where the DSL introduced, instead of a bare URL at the end of the body, but nothing too major.

Maybe out of scope for this particular PR, but while it may not affect too many people's work directly, this #98831 overall seems a significant enough change to deserve a What's New entry in the (e.g. in the Build Section, or as appropriate), which this NEWS entry seems like a great basis for.

@CAM-Gerlach
Copy link
Member

What's the recommended way to refer to a markdown file in a different repo owned by a different org?

Right now, same or different repo/org, there isn't (AFAIK) currently an easy way to do this with a nice role as opposed to just a plain reST link.

However, following discussion and some experimentation in the devguide repo and Discord, I'm working on a little Sphinx extension with a base "SmarkLink" role that, with just a couple lines of customization can be used to link files on the CPython repo, other repos/orgs or even other websites (like PEPs, RFCs, etc, as a smarter and more powerful version of the built-in PEP/RFC roles). If we implement that, you'll be able to do e.g. :gh-file:`Tools/cases_generator` to generate Tools/case-generator (for the current branch by default), and also supports files on other repos, e.g. :gh-file:`faster-cpython/ideas:3.12/interpreter_definition.md`, as well as specifying branch/tag/commit, standard Sphinx ~, ! and <> syntax, # for subsections, etc.

Here's an earlier prototype.

@gvanrossum
Copy link
Member Author

Right now, same or different repo/org, there isn't (AFAIK) currently an easy way to do this with a nice role as opposed to just a plain reST link.

Thanks for your suggestions! All applied.

I'm looking forward to using your "SnarkLink" extension next time. :-)

@markshannon
Copy link
Member

We need the code generator to have an understanding of stack effects before breaking instructions into their parts.
Consider a hypothetical LOAD_FAST__STORE_FAST
Simple concatenation would produce this:

TARGET(STORE_FAST__LOAD_FAST) {
    {
        PyObject *value = GETLOCAL(oparg);
        assert(value != NULL);
        Py_INCREF(value);
        PUSH(value);
    }
    NEXTOPARG();
    next_instr++;
    {
        PyObject *value = POP();
        SETLOCAL(oparg, value);
    }
    DISPATCH();
}

This is clearly inefficient.
The output should look something like:

TARGET(STORE_FAST__LOAD_FAST) {
    PyObject *_stack_tmp1;
    {
        PyObject *value = GETLOCAL(oparg);
        assert(value != NULL);
        Py_INCREF(value);
        _stack_tmp1 = value;
    }
    NEXTOPARG();
    next_instr++;
    {
        PyObject *value = _stack_tmp1
        SETLOCAL(oparg, value);
    }
    DISPATCH();
}

In order to do this, we first need to have stack effects that the tooling understands.

This probably doesn't matter for superinstructions, but does matter if we want to split specialized instructions into guard and action parts.

@markshannon
Copy link
Member

markshannon commented Nov 4, 2022

I think the next enhancements to generated code should be generating the code for dispatch related macros.

  • Strip out remaining DISPATCH() macros from bytecodes.c
  • Strip out all PREDICTED() macros. The code generator can insert them as required by PREDICT(name) macros.
  • Replace all gotos in bytecodes.c with ERROR, UNWIND, RESUME_FRAME, etc., so that they are easily identifiable by code-gen. This is necessary, so that the code generator can ensure that the stack in consistent in case of error or unwind. It will also allow us to experiment with different layouts and inlining of code sequences.

Implementing stack effects can be done before or after the above.

Our goal is to expose "micro ops" for optimization:
E.g.

op(FLOAT_CHECK2, (left right -- left right)) {
    DEOPT_IF(!PyFloat_CheckExact(left));
    DEOPT_IF(Py_TYPE(right) != Py_TYPE(left));
}

op(FLOAT_ADD, (left right  -- sum)) {
    STAT_INC(BINARY_OP, hit);
    double dsum = ((PyFloatObject *)left)->ob_fval +
        ((PyFloatObject *)right)->ob_fval;
    PyObject *sum = PyFloat_FromDouble(dsum);
    if (sum == NULL) {
        ERROR();
    }
    _Py_DECREF_SPECIALIZED(right, _PyFloat_ExactDealloc);
    _Py_DECREF_SPECIALIZED(left, _PyFloat_ExactDealloc);
}

BINARY_OP_ADD_FLOAT = counter/1 + FLOAT_CHECK2 + FLOAT_ADD;

And be able to generate code for BINARY_OP_ADD_FLOAT that is at least as efficient as what we have now.

@gvanrossum
Copy link
Member Author

gvanrossum commented Nov 4, 2022

@markshannon, I asked you one question you didn't answer yet: Does it matter that the placement of the NEXTOPARG(); next_instr++; pairs in the generated super-instructions is different than it was in the hand-written versions? (You can compare them in the diff.)

I think the next enhancements to generated code should be generating the code for dispatch related macros.

  • Strip out remaining DISPATCH() macros from bytecodes.c

I imagine this is important for opcodes that can be combined into others. It doesn't matter for top-level instructions, right?

Doing this everywhere seems complicated, since there are quite a few that have conditional DISPATCH() all over the place (e.g. UNARY_NOT). What do you propose I do in those cases? Add a label at the end of the instruction and goto there? That seems to defeat half the purpose of DISPATCH(). (There are also easy cases, e.g. for some reason the extract_cases script didn't elide the DISPATCH() at the end of STOPITERATION_ERROR. Those I can do of course.)

  • Strip out all PREDICTED() macros. The code generator can insert them as required by PREDICT(name) macros.

That was going to be my next project, indeed.

  • Replace all gotos in bytecodes.c with ERROR, UNWIND, RESUME_FRAME, etc., so that they are easily identifiable by code-gen. This is necessary, so that the code generator can ensure that the stack in consistent in case of error or unwind. It will also allow us to experiment with different layouts and inlining of code sequences.

What purpose does this serve? The generator can recognize goto error just as easily.

Anyway, my own planned sequence of projects is roughly:

After that we should decide what's next, e.g. combining uops into instrs, cache effects, array stack effects, conditional stack effects, register opcodes, who knows.

@markshannon
Copy link
Member

Does it matter that the placement of the NEXTOPARG(); next_instr++; pairs in the generated super-instructions is different than it was in the hand-written versions?

Only if it impairs performance. I don't think it should.

@markshannon
Copy link
Member

We need to implement stack effects before superinstructions. Without the stack effects the code generator cannot efficiently move values from one op to the next.

@markshannon
Copy link
Member

The generator can recognize goto error just as easily.

Sure, it is just about of consistency. It is easier to see what's going on if SHOUTY_CASE_MACROS need to be handled by the code generator, and everything can be treated as an opaque blob.

@markshannon
Copy link
Member

Replacing all the DISPATCH()s with gotos to the end of the instruction means that the code generator doesn't need to handle DISPATCH. It should be fine to leave it for now, as the code generator doesn't do anything with the DISPATCH() anyway.

@gvanrossum
Copy link
Member Author

We need to implement stack effects before superinstructions. Without the stack effects the code generator cannot efficiently move values from one op to the next.

Okay, but the current crop of 5 superinstructions don't need this. There's not a single case where the second half does something with the result of the first case (no LOAD_FAST__STORE_FAST).

@gvanrossum gvanrossum merged commit 7dcd28e into python:main Nov 6, 2022
@gvanrossum gvanrossum deleted the super-instrs branch November 6, 2022 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants