Address multiple-parent `:lincomb` in C code generation #77

mfasi · 2024-01-12T21:50:27Z

No description provided.

+ is_trivial_left + is_trivial_right + has_identity_lincomb + has_trivial_nodes

+ has_lincomb_with_identity + has_identity_lincomb

+ Full support for OpenBLAS + Support for real types for MKL

+ Support for complex types for MKL

codecov-commenter · 2024-01-12T21:51:37Z

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (a8ec53a) 92.41% compared to head (78753f9) 94.29%.

Files	Patch %	Lines
src/code_gen/gen_c_code.jl	97.98%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
+ Coverage   92.41%   94.29%   +1.87%     
==========================================
  Files          30       30              
  Lines        2546     2560      +14     
==========================================
+ Hits         2353     2414      +61     
+ Misses        193      146      -47

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/compress_graph.jl

jarlebring · 2024-01-15T15:48:03Z

Julia code:

julia> (g,_)=graph_ps_degopt([2.2,3,4,0,3.3,0.9,0.1]);
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL());
julia> eval_graph(g,[0 3.0; 0 0])
2×2 Matrix{Float64}:
 2.2  9.0
 0.0  2.2

C-code /tmp/mainfile.c badly written...

#include<stdio.h>
#include "bla.c"
typedef double blas_type;
int main() {
    size_t i;
    size_t n = 2;
    blas_type *A = malloc(n * n * sizeof(*A));
    for(i = 0; i < n * n; i++){
      A[i] = 0.0;
    }
    A[1]=3.0;
    printf("A\n%lf\n",A[0]);
    printf("%lf\n",A[1]);
    printf("%lf\n",A[2]);
    printf("%lf\n",A[3]);
    blas_type *B = malloc(n * n * sizeof(*A));
    ddummy(A, n, B);
    printf("B\n%lf\n",B[0]);
    printf("%lf\n",B[1]);
    printf("%lf\n",B[2]);
    printf("%lf\n",B[3]);
    return 0;
}

bash:

jarl@beech:/tmp$ gcc -o main_compiled ./mainfile.c -lmkl_rt 
jarl@beech:/tmp$ ./main_compiled 
A
0.000000
3.000000
0.000000
0.000000
B
2.200000
3.000000
0.000000
2.200000

I think the julia and c output are not consistent. Should be 9 instead of 3. Right?

jarlebring · 2024-01-15T16:00:09Z

Generated code

    /* Computing P1 with operation: lincomb */
    /* Computing P1 = x*A3 + x*B_1_2 */
    double coeff_P1_1 = 0.1;
    double coeff_P1_2 = 1.0;
   /* Smart lincomb recycle B_1_2. */
    for (size_t i = 0; i < n * n; i += n + 1) {
        *(memslots[2] + i) = coeff_P1_1 * *(memslots[4] + i) +
            coeff_P1_2 * *(memslots[2] + i);
    }

Should it really be i += n + 1 ? We want to do this on all elements, so shouldn't it be i++ ? For-loops adding multiples of the identity should still have i += n+1.

mfasi · 2024-01-15T16:35:12Z

Thank you for spotting that bug! Your diagnosis is correct. I introduced that bug when switching to memset – I must have copied the wrong line. I've fixed that commit in c1e0dc7 and I've updated the branch.

jarlebring · 2024-01-15T16:48:20Z

Shouldn't there be a coeff_Bb4_1?

    double coeff_Bb4_2 = 0.626189194268691;
    double coeff_Bb4_3 = 0.796941757132641;
    double coeff_Bb4_4 = 0.21311590849480466;
    for (size_t i = 0; i < n * n; i++) {
        *(memslots[3] + i) = coeff_Bb4_2 * *(memslots[0] + i) + 
            coeff_Bb4_3 * *(memslots[1] + i) + 
            coeff_Bb4_4 * *(memslots[2] + i);
    }
    double coeff_Bb4_0 = 0.5728465547708623;
    for (size_t i = 0; i < n * n; i += n + 1) {
        *(memslots[3] + i) += coeff_Bb4_0;
    }

jarlebring · 2024-01-15T16:56:32Z

For me, the generated main for complex numbers does not work.

julia> (g,_)=graph_ps_degopt(rand(10)+1im*rand(10));
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL(true));

 jarl@beech:/tmp$ gcc -o main_compiled /tmp/bla.c -lmkl_rt
/tmp/bla.c: In function ‘main’:
/tmp/bla.c:208:16: error: incompatible types when assigning to type ‘blas_type’ {aka ‘struct _MKL_Complex16’} from type ‘double’
  208 |         A[i] = rand() / (1.0 * RAND_MAX);

mfasi · 2024-01-15T17:16:46Z

Shouldn't there be a coeff_Bb4_1?

    double coeff_Bb4_2 = 0.626189194268691;
    double coeff_Bb4_3 = 0.796941757132641;
    double coeff_Bb4_4 = 0.21311590849480466;
    for (size_t i = 0; i < n * n; i++) {
        *(memslots[3] + i) = coeff_Bb4_2 * *(memslots[0] + i) + 
            coeff_Bb4_3 * *(memslots[1] + i) + 
            coeff_Bb4_4 * *(memslots[2] + i);
    }
    double coeff_Bb4_0 = 0.5728465547708623;
    for (size_t i = 0; i < n * n; i += n + 1) {
        *(memslots[3] + i) += coeff_Bb4_0;
    }

Currently, I'm using as index the position of the matrix as parent in the lincomb node, and using the special index 0 to accumulate the coefficients of all the identities. I can change this to use a counter that is incremented every time a new coefficient is found – will do.

For me, the generated main for complex numbers does not work.

julia> (g,_)=graph_ps_degopt(rand(10)+1im*rand(10));
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL(true));

 jarl@beech:/tmp$ gcc -o main_compiled /tmp/bla.c -lmkl_rt
/tmp/bla.c: In function ‘main’:
/tmp/bla.c:208:16: error: incompatible types when assigning to type ‘blas_type’ {aka ‘struct _MKL_Complex16’} from type ‘double’
  208 |         A[i] = rand() / (1.0 * RAND_MAX);

I don't think this has ever worked for complex types with LangC_MKL(), because assigning a single floating-point number is not valid with MKL's complex types. This PR is probably a good chance to fix this, however.

+ Use consecutive indices starting from 0 for non-identity nodes + Do not generate code for combinations with coefficient 0

src/code_gen/gen_c_code.jl

jarlebring · 2024-01-16T08:42:08Z

Good idea to try to use the opportunity to fix the generated main-file for complex numbers.

Here is another small test that almost works. Maybe I got something wrong in MKL types...

Julia:

julia> (g,_)=graph_ps_degopt([3 0.3 2.2 1.0 0.0 .01] + 1im*[0.1 0.2 0.01 0.1 0.03 -0.1]);
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL());
julia> A=[10.0+0.1im 3.0; 0.1 1.1];
julia> eval_graph(g,A)
2×2 Matrix{ComplexF64}:
 2733.37-9629.62im  879.357-3243.66im
 29.3119-108.122im  16.4857-36.0639im

C-code:

#include<stdio.h>
#include "bla.c"
typedef MKL_Complex16 blas_type;
int main() {
    size_t i;
    size_t n = 2;
    blas_type *A = malloc(n * n * sizeof(*A));
    for(i = 0; i < n * n; i++){
      A[i].real = 0.0;
      A[i].imag = 0.0;
    }
    A[0].real=10.0;
    A[0].imag=0.1;
    A[1].real=3.0;
    A[2].real=0.1;
    A[3].real=1.1;
    printf("A[1,1]=%lf + 1im* %lf\n",A[0].real, A[0].imag);
    printf("A[1,2]=%lf + 1im* %lf\n",A[1].real, A[1].imag);
    printf("A[2,1]=%lf + 1im* %lf\n",A[2].real, A[2].imag);
    printf("A[2,2]=%lf + 1im* %lf\n",A[3].real, A[3].imag);
    blas_type *B = malloc(n * n * sizeof(*A));
    zdummy(A, n, B);
    printf("B[1,1]=%lf + 1im* %lf\n",B[0].real, B[0].imag);
    printf("B[1,2]=%lf + 1im* %lf\n",B[1].real, B[1].imag);
    printf("B[2,1]=%lf + 1im* %lf\n",B[2].real, B[2].imag);
    printf("B[2,2]=%lf + 1im* %lf\n",B[3].real, B[3].imag);
    return 0;
}

bash:

jarl@beech:/tmp$gcc -o main_compiled ./mainfile.c -lmkl_rt
jarl@beech:/tmp$ ./main_compiled 
A[1,1]=10.000000 + 1im* 0.100000
A[1,2]=3.000000 + 1im* 0.000000
A[2,1]=0.100000 + 1im* 0.000000
A[2,2]=1.100000 + 1im* 0.000000
B[1,1]=2733.367085 + 1im* -9631.026233
B[1,2]=879.357063 + 1im* -3244.084110
B[2,1]=29.311902 + 1im* -108.136137
B[2,2]=16.485661 + 1im* -36.217942

Real parts are "okay". Imaginary parts different but not super different?

mfasi · 2024-01-16T10:01:47Z

It looks like a bug – I'll try to debug it more carefully.

One unrelated comment is that the matrices should be contiguous by columns rather than by row (the BLAS and LAPACK routines use CblasColMajor and LAPACK_COL_MAJOR, respectively), so A[1] and A[2] should be swapped.

mfasi · 2024-01-16T10:46:22Z

Found the bug. The two functions:

// Compute c <- c + a * b.
void fma_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    c->real += a->real * b->real - a->imag * b->imag;
    c->imag += a->real * b->imag + a->imag * b->real;
}

// Compute c <- a * b.
void prod_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    c->real = a->real * b->real - a->imag * b->imag;
    c->imag = a->real * b->imag + a->imag * b->real;
}

should really be

// Compute c <- c + a * b.
void fma_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    double real_part = a->real * b->real - a->imag * b->imag;
    double imag_part = a->real * b->imag + a->imag * b->real;
    c->real += real_part;
    c->imag += imag_part;
}

// Compute c <- a * b.
void prod_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    double real_part = a->real * b->real - a->imag * b->imag;
    double imag_part = a->real * b->imag + a->imag * b->real;
    c->real = real_part;
    c->imag = imag_part;
}

because c can be the same as a or b if we are reusing memory.

Fixed in my latest commit.

src/code_gen/gen_c_code.jl

jarlebring · 2024-01-16T18:38:54Z

Nice work!

When I export with LangC_OpenBLAS(true), I get this compilation error:

/tmp/bla.c:9:13: error: unknown type name ‘openblas_complex_double’; did you mean ‘lapack_complex_double’?
    9 | void zdummy(openblas_complex_double *A, const size_t n, openblas_complex_double *output) {

It could be my openblas installation, which I know is a bit funny.

jarlebring · 2024-01-16T18:54:09Z

GraphMatFun.jl/src/code_gen/gen_c_code.jl

Lines 623 to 626 in fac0e48

    
           function gen_main(lang::LangC, T, fname, funname, graph; A = 10::Union{Integer,Matrix}) 
        
               code = init_code(lang) 
        
               if (lang.gen_main)

As far as I understand, there is no way for a user to influence kwarg A. An end-user anyway shouldn't directly call gen_main. Maybe we can just remove support for what is not default. That way we can also remove print_indented_matrix.

The alternative is to expose it to the user through a parameter in the LangC-type, but it seems a bit much for functionality that has never been used. "The best code is no code"

This is actually already present before this PR, but when we are anyway messing with this code...

mfasi · 2024-01-17T11:17:03Z

lapack_complex_double

It works with both openblas_complex_double and lapack_complex_double, so I suggest sticking to the latter if that means it will work for you as well. I've made the change.

mfasi · 2024-01-17T11:23:32Z

GraphMatFun.jl/src/code_gen/gen_c_code.jl

Lines 623 to 626 in fac0e48

function gen_main(lang::LangC, T, fname, funname, graph; A = 10::Union{Integer,Matrix})

code = init_code(lang)

if (lang.gen_main)

As far as I understand, there is no way for a user to influence kwarg A. An end-user anyway shouldn't directly call gen_main. Maybe we can just remove support for what is not default. That way we can also remove print_indented_matrix.

The alternative is to expose it to the user through a parameter in the LangC-type, but it seems a bit much for. "The best code is no code"

This is actually already present before this PR, but when we are anyway messing with this code...

Indeed I was a bit puzzled by the kwarg that cannot be used – there must have been a way at some point in the past, but I don't remember how we did it. I've removed A and all related functionalities (which were not – and I believe could not – be covered by our tests).

jarlebring · 2024-01-17T11:24:44Z

Great! LGTM 👍

mfasi · 2024-01-17T11:29:56Z

One other thing I noticed by looking at the generated code is that we now have a number of new trivial lincomb nodes, such as

    X ← + α × A + 0 × B

which should really be

    X ← + α × A             (a)

or

X ← 1 × A

which is just a copy and should be removed. In fact, nodes of the form (a) could also be removed by pushing α to the corresponding coefficient(s) of subsequent lincombs, although this seems a bit messier to do.

mfasi · 2024-01-17T11:31:06Z

Great! LGTM 👍

Great! I'll create an issue from my last comment and let's discuss it there.

jarlebring · 2024-01-17T11:34:45Z

    X ← + α × A + 0 × B

which should really be

    X ← + α × A             (a)

or

I actually think we this type of code optimization should not be done in the code generation. A user can always run compress_trivial! before calling gen_code and those cases will be removed. It can be handled at graph-level. I think we get cleaner code-generation code if we see the code generator as being a bit stupid. Code generation philosophy: "Generate code for exactly this graph".

mfasi added 13 commits January 12, 2024 07:51

Refactor compress_graphs.jl

7853e9d

+ is_trivial_left + is_trivial_right + has_identity_lincomb + has_trivial_nodes

Add function to check if graph has lincomb with identity

fa93b89

Add tests for graph compression

b623cd2

+ has_lincomb_with_identity + has_identity_lincomb

Fix spacing in generated C code

3f2027c

Fix spacing in generated Julia code

fd0c432

Add partial support for lincomb with multiple parents in C

f34a262

+ Full support for OpenBLAS + Support for real types for MKL

Complete support for lincomb with multiple parents in C

2f623a5

+ Support for complex types for MKL

Fix error with lincomb of identity in C code generators

0e4e0c9

Remove repeated code

0fc78b7

Use per-lincomb coefficient declaration

f3829a8

Remove now unused allocation in C code generation

faa5289

Improve formatting of generated code

e0f63a7

Reduce number of operations if coefficients of lincomb are real

cba0acc

mfasi requested a review from jarlebring January 12, 2024 21:50

mfasi force-pushed the max-changes branch from ca53084 to 85f9ecb Compare January 13, 2024 09:48

jarlebring reviewed Jan 13, 2024

View reviewed changes

src/compress_graph.jl Show resolved Hide resolved

mfasi force-pushed the max-changes branch from 9732771 to fadce65 Compare January 14, 2024 17:12

mfasi added 5 commits January 15, 2024 16:25

Use memset to initialize empty matrices

c1e0dc7

Add tests to increase line coverage

950d92a

Add tests to increase coverage of matrix inversion

26544b1

Add test to check if node with single lincomb is trivial

27749c0

Increase test coverage of Julia code generator

1af8990

mfasi force-pushed the max-changes branch from fadce65 to 1af8990 Compare January 15, 2024 16:26

Improve coefficients of lincomb nodes in C generated code

ee32732

+ Use consecutive indices starting from 0 for non-identity nodes + Do not generate code for combinations with coefficient 0

mfasi force-pushed the max-changes branch from 4d16819 to ee32732 Compare January 15, 2024 17:29

mfasi added 2 commits January 15, 2024 21:19

Remove unnecessary operations in generated C code

b8e774f

Fix bug in generated C code

6e5fd67

jarlebring reviewed Jan 16, 2024

View reviewed changes

src/code_gen/gen_c_code.jl Outdated Show resolved Hide resolved

Add minimal documentation for fma_, prod_, acc_ functions

0b3915c

Fix bug in generated MKL C code

db33faf

jarlebring reviewed Jan 16, 2024

View reviewed changes

src/code_gen/gen_c_code.jl Outdated Show resolved Hide resolved

mfasi added 2 commits January 16, 2024 17:31

Fix generation of random matrix for MKL complex types

4cec625

Simplify data passed across functions

fac0e48

Switch to lapack_complex_ types for OpenBLAS generated code

68a2e88

Remove unreachable matrix generation functionality in C generator

78753f9

mfasi mentioned this pull request Jan 17, 2024

New trivial lincomb nodes #78

Closed

mfasi merged commit e4d6efd into main Jan 17, 2024

Address multiple-parent :lincomb in C code generation #77

Address multiple-parent :lincomb in C code generation #77

Uh oh!

Conversation

mfasi commented Jan 12, 2024

Uh oh!

codecov-commenter commented Jan 12, 2024 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

jarlebring commented Jan 15, 2024

Uh oh!

jarlebring commented Jan 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfasi commented Jan 15, 2024

Uh oh!

jarlebring commented Jan 15, 2024

Uh oh!

jarlebring commented Jan 15, 2024

Uh oh!

mfasi commented Jan 15, 2024

Uh oh!

Uh oh!

jarlebring commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfasi commented Jan 16, 2024

Uh oh!

mfasi commented Jan 16, 2024

Uh oh!

Uh oh!

jarlebring commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jarlebring commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfasi commented Jan 17, 2024

Uh oh!

mfasi commented Jan 17, 2024

Uh oh!

jarlebring commented Jan 17, 2024

Uh oh!

mfasi commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfasi commented Jan 17, 2024

Uh oh!

jarlebring commented Jan 17, 2024

Uh oh!

Uh oh!

Address multiple-parent `:lincomb` in C code generation #77

Address multiple-parent `:lincomb` in C code generation #77

codecov-commenter commented Jan 12, 2024 •

edited by codecov bot

Loading

jarlebring commented Jan 15, 2024 •

edited

Loading

jarlebring commented Jan 16, 2024 •

edited

Loading

jarlebring commented Jan 16, 2024 •

edited

Loading

jarlebring commented Jan 16, 2024 •

edited

Loading

mfasi commented Jan 17, 2024 •

edited

Loading