Skip to content

[RFC] [performance] {.inline.} guarantees, --stacktrace:inline, and --ignoreInline #198

@timotheecour

Description

@timotheecour

some of these (especially P1) will result in a large speedup for builds with --stacktrace:off (eg debug builds)

links

RFC proposals

  • P1 an {.inline.} proc should by default NOT generate nimfr_ + popFrame (regardless of -d:release or not) even with --stacktrace:on. Otherwise --stacktrace:on can be very slow

  • P2 a new option should be introduced --stacktrace:inline (implies --stacktrace:on) to generate nimfr_ + popFrame even for {.inline.}

  • P3 currently proc fun2x(a: int): int {.inline.} generates:

static N_INLINE(NI, fun2x__rxUfcN4ZvcApCx0waSVH8At10303b)(NI a) {...

regardless of -d:danger or not
according to https://stackoverflow.com/questions/25602813/force-a-function-to-be-inline-in-clang-llvm it seems static should maybe not be used:

So in order to guarantee that the function is inlined:
Don’t use static inline.
Make sure not to compile with GNU89.

we should investigate to make sure it works as intended, and wheter __attribute__((always_inline)) is more appropriate (cf hints vs guarantees)

  • P4 #define N_INLINE_PTR(rettype, name) rettype (*name) is dead code: no use of N_INLINE_PTR (and no idea what it's for)

  • P5 we should have a compiler option --ignoreInline way to skip inlining (for debug and maybe even release builds, it's and orthogonal decision), helps w debugging esp stacktraces
    => this would only control nim code that cgen's inline, and would NOT be conflated with C-specific flags that controls inlining, so as not to interfere with other libraries; user can always pass --passC:XXX in addition to that if needed

  • P6 --ignoreInline shall be usable in cmdline as well as in {.push.}/{.pop.} sections just like --rangeChecks + friends

  • P7 an imported {.inline.} proc currently gets its C source code duplicated in every module it's used, which could potentially slow down c code compilation (but no need to fix if benchmarking reveals it's not a bottleneck). So it's some kind of double inlining since we already have the N_INLINE macro static N_INLINE(NI, fun2x__rxUfcN4ZvcApCx0waSVH8At10303b)(NI a) {...}
    Instead we can use what's recommended here https://gustedt.wordpress.com/2010/11/29/myth-and-reality-about-inline-in-c99/ and here https://stackoverflow.com/a/43442230/1426932 and introduce the inline proc definition in a header file, inline fun2x__rxUfcN4ZvcApCx0waSVH8At10303b(NI a) { ... } imported by all clients of it, and then use extern in a single location in a source file

// in inline_definitions.h
inline double dabs(double x) {  return x < 0.0 ? -x : x;}
// in exactly one compilation unit inline_definitions.c
extern inline double dabs(double x);

Nowadays, where the C99 standard still isn’t completely implemented in many compilers, many of them still get it wrong. Gcc corrected its first approach and now (since version 4.2.1) works with the method above. Others that mimic gcc like opencc or icc are still stuck on the old model that gcc had once invented. The include file p99_compiler.h of P99 helps you make sure that your code will work even with these compilers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions