Skip to content

Conversation

mfasi
Copy link
Collaborator

@mfasi mfasi commented Jan 12, 2024

No description provided.

@mfasi mfasi requested a review from jarlebring January 12, 2024 21:50
@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (a8ec53a) 92.41% compared to head (78753f9) 94.29%.

Files Patch % Lines
src/code_gen/gen_c_code.jl 97.98% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
+ Coverage   92.41%   94.29%   +1.87%     
==========================================
  Files          30       30              
  Lines        2546     2560      +14     
==========================================
+ Hits         2353     2414      +61     
+ Misses        193      146      -47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jarlebring
Copy link
Collaborator

Julia code:

julia> (g,_)=graph_ps_degopt([2.2,3,4,0,3.3,0.9,0.1]);
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL());
julia> eval_graph(g,[0 3.0; 0 0])
2×2 Matrix{Float64}:
 2.2  9.0
 0.0  2.2

C-code /tmp/mainfile.c badly written...

#include<stdio.h>
#include "bla.c"
typedef double blas_type;
int main() {
    size_t i;
    size_t n = 2;
    blas_type *A = malloc(n * n * sizeof(*A));
    for(i = 0; i < n * n; i++){
      A[i] = 0.0;
    }
    A[1]=3.0;
    printf("A\n%lf\n",A[0]);
    printf("%lf\n",A[1]);
    printf("%lf\n",A[2]);
    printf("%lf\n",A[3]);
    blas_type *B = malloc(n * n * sizeof(*A));
    ddummy(A, n, B);
    printf("B\n%lf\n",B[0]);
    printf("%lf\n",B[1]);
    printf("%lf\n",B[2]);
    printf("%lf\n",B[3]);
    return 0;
}

bash:

jarl@beech:/tmp$ gcc -o main_compiled ./mainfile.c -lmkl_rt 
jarl@beech:/tmp$ ./main_compiled 
A
0.000000
3.000000
0.000000
0.000000
B
2.200000
3.000000
0.000000
2.200000

I think the julia and c output are not consistent. Should be 9 instead of 3. Right?

@jarlebring
Copy link
Collaborator

jarlebring commented Jan 15, 2024

Generated code

    /* Computing P1 with operation: lincomb */
    /* Computing P1 = x*A3 + x*B_1_2 */
    double coeff_P1_1 = 0.1;
    double coeff_P1_2 = 1.0;
   /* Smart lincomb recycle B_1_2. */
    for (size_t i = 0; i < n * n; i += n + 1) {
        *(memslots[2] + i) = coeff_P1_1 * *(memslots[4] + i) +
            coeff_P1_2 * *(memslots[2] + i);
    }

Should it really be i += n + 1 ? We want to do this on all elements, so shouldn't it be i++ ? For-loops adding multiples of the identity should still have i += n+1.

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 15, 2024

Thank you for spotting that bug! Your diagnosis is correct. I introduced that bug when switching to memset – I must have copied the wrong line. I've fixed that commit in c1e0dc7 and I've updated the branch.

@jarlebring
Copy link
Collaborator

Shouldn't there be a coeff_Bb4_1?

    double coeff_Bb4_2 = 0.626189194268691;
    double coeff_Bb4_3 = 0.796941757132641;
    double coeff_Bb4_4 = 0.21311590849480466;
    for (size_t i = 0; i < n * n; i++) {
        *(memslots[3] + i) = coeff_Bb4_2 * *(memslots[0] + i) + 
            coeff_Bb4_3 * *(memslots[1] + i) + 
            coeff_Bb4_4 * *(memslots[2] + i);
    }
    double coeff_Bb4_0 = 0.5728465547708623;
    for (size_t i = 0; i < n * n; i += n + 1) {
        *(memslots[3] + i) += coeff_Bb4_0;
    }

@jarlebring
Copy link
Collaborator

For me, the generated main for complex numbers does not work.

julia> (g,_)=graph_ps_degopt(rand(10)+1im*rand(10));
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL(true));
 jarl@beech:/tmp$ gcc -o main_compiled /tmp/bla.c -lmkl_rt
/tmp/bla.c: In function ‘main’:
/tmp/bla.c:208:16: error: incompatible types when assigning to type ‘blas_type’ {aka ‘struct _MKL_Complex16’} from type ‘double’
  208 |         A[i] = rand() / (1.0 * RAND_MAX);

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 15, 2024

Shouldn't there be a coeff_Bb4_1?

    double coeff_Bb4_2 = 0.626189194268691;
    double coeff_Bb4_3 = 0.796941757132641;
    double coeff_Bb4_4 = 0.21311590849480466;
    for (size_t i = 0; i < n * n; i++) {
        *(memslots[3] + i) = coeff_Bb4_2 * *(memslots[0] + i) + 
            coeff_Bb4_3 * *(memslots[1] + i) + 
            coeff_Bb4_4 * *(memslots[2] + i);
    }
    double coeff_Bb4_0 = 0.5728465547708623;
    for (size_t i = 0; i < n * n; i += n + 1) {
        *(memslots[3] + i) += coeff_Bb4_0;
    }

Currently, I'm using as index the position of the matrix as parent in the lincomb node, and using the special index 0 to accumulate the coefficients of all the identities. I can change this to use a counter that is incremented every time a new coefficient is found – will do.

For me, the generated main for complex numbers does not work.

julia> (g,_)=graph_ps_degopt(rand(10)+1im*rand(10));
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL(true));
 jarl@beech:/tmp$ gcc -o main_compiled /tmp/bla.c -lmkl_rt
/tmp/bla.c: In function ‘main’:
/tmp/bla.c:208:16: error: incompatible types when assigning to type ‘blas_type’ {aka ‘struct _MKL_Complex16’} from type ‘double’
  208 |         A[i] = rand() / (1.0 * RAND_MAX);

I don't think this has ever worked for complex types with LangC_MKL(), because assigning a single floating-point number is not valid with MKL's complex types. This PR is probably a good chance to fix this, however.

+ Use consecutive indices starting from 0 for non-identity nodes
+ Do not generate code for combinations with coefficient 0
@jarlebring
Copy link
Collaborator

jarlebring commented Jan 16, 2024

Good idea to try to use the opportunity to fix the generated main-file for complex numbers.

Here is another small test that almost works. Maybe I got something wrong in MKL types...

Julia:

julia> (g,_)=graph_ps_degopt([3 0.3 2.2 1.0 0.0 .01] + 1im*[0.1 0.2 0.01 0.1 0.03 -0.1]);
julia> gen_code("/tmp/bla.c",g,lang=LangC_MKL());
julia> A=[10.0+0.1im 3.0; 0.1 1.1];
julia> eval_graph(g,A)
2×2 Matrix{ComplexF64}:
 2733.37-9629.62im  879.357-3243.66im
 29.3119-108.122im  16.4857-36.0639im

C-code:

#include<stdio.h>
#include "bla.c"
typedef MKL_Complex16 blas_type;
int main() {
    size_t i;
    size_t n = 2;
    blas_type *A = malloc(n * n * sizeof(*A));
    for(i = 0; i < n * n; i++){
      A[i].real = 0.0;
      A[i].imag = 0.0;
    }
    A[0].real=10.0;
    A[0].imag=0.1;
    A[1].real=3.0;
    A[2].real=0.1;
    A[3].real=1.1;
    printf("A[1,1]=%lf + 1im* %lf\n",A[0].real, A[0].imag);
    printf("A[1,2]=%lf + 1im* %lf\n",A[1].real, A[1].imag);
    printf("A[2,1]=%lf + 1im* %lf\n",A[2].real, A[2].imag);
    printf("A[2,2]=%lf + 1im* %lf\n",A[3].real, A[3].imag);
    blas_type *B = malloc(n * n * sizeof(*A));
    zdummy(A, n, B);
    printf("B[1,1]=%lf + 1im* %lf\n",B[0].real, B[0].imag);
    printf("B[1,2]=%lf + 1im* %lf\n",B[1].real, B[1].imag);
    printf("B[2,1]=%lf + 1im* %lf\n",B[2].real, B[2].imag);
    printf("B[2,2]=%lf + 1im* %lf\n",B[3].real, B[3].imag);
    return 0;
}

bash:

jarl@beech:/tmp$gcc -o main_compiled ./mainfile.c -lmkl_rt
jarl@beech:/tmp$ ./main_compiled 
A[1,1]=10.000000 + 1im* 0.100000
A[1,2]=3.000000 + 1im* 0.000000
A[2,1]=0.100000 + 1im* 0.000000
A[2,2]=1.100000 + 1im* 0.000000
B[1,1]=2733.367085 + 1im* -9631.026233
B[1,2]=879.357063 + 1im* -3244.084110
B[2,1]=29.311902 + 1im* -108.136137
B[2,2]=16.485661 + 1im* -36.217942

Real parts are "okay". Imaginary parts different but not super different?

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 16, 2024

It looks like a bug – I'll try to debug it more carefully.

One unrelated comment is that the matrices should be contiguous by columns rather than by row (the BLAS and LAPACK routines use CblasColMajor and LAPACK_COL_MAJOR, respectively), so A[1] and A[2] should be swapped.

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 16, 2024

Found the bug. The two functions:

// Compute c <- c + a * b.
void fma_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    c->real += a->real * b->real - a->imag * b->imag;
    c->imag += a->real * b->imag + a->imag * b->real;
}

// Compute c <- a * b.
void prod_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    c->real = a->real * b->real - a->imag * b->imag;
    c->imag = a->real * b->imag + a->imag * b->real;
}

should really be

// Compute c <- c + a * b.
void fma_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    double real_part = a->real * b->real - a->imag * b->imag;
    double imag_part = a->real * b->imag + a->imag * b->real;
    c->real += real_part;
    c->imag += imag_part;
}

// Compute c <- a * b.
void prod_MKL_Complex16(MKL_Complex16 *c,
                    const MKL_Complex16 *a,
                    const MKL_Complex16 *b) {
    double real_part = a->real * b->real - a->imag * b->imag;
    double imag_part = a->real * b->imag + a->imag * b->real;
    c->real = real_part;
    c->imag = imag_part;
}

because c can be the same as a or b if we are reusing memory.

Fixed in my latest commit.

@jarlebring
Copy link
Collaborator

jarlebring commented Jan 16, 2024

Nice work!

When I export with LangC_OpenBLAS(true), I get this compilation error:

/tmp/bla.c:9:13: error: unknown type name ‘openblas_complex_double’; did you mean ‘lapack_complex_double’?
    9 | void zdummy(openblas_complex_double *A, const size_t n, openblas_complex_double *output) {

It could be my openblas installation, which I know is a bit funny.

@jarlebring
Copy link
Collaborator

jarlebring commented Jan 16, 2024

function gen_main(lang::LangC, T, fname, funname, graph; A = 10::Union{Integer,Matrix})
code = init_code(lang)
if (lang.gen_main)

As far as I understand, there is no way for a user to influence kwarg A. An end-user anyway shouldn't directly call gen_main. Maybe we can just remove support for what is not default. That way we can also remove print_indented_matrix.

The alternative is to expose it to the user through a parameter in the LangC-type, but it seems a bit much for functionality that has never been used. "The best code is no code"

This is actually already present before this PR, but when we are anyway messing with this code...

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 17, 2024

lapack_complex_double

It works with both openblas_complex_double and lapack_complex_double, so I suggest sticking to the latter if that means it will work for you as well. I've made the change.

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 17, 2024

function gen_main(lang::LangC, T, fname, funname, graph; A = 10::Union{Integer,Matrix})
code = init_code(lang)
if (lang.gen_main)

As far as I understand, there is no way for a user to influence kwarg A. An end-user anyway shouldn't directly call gen_main. Maybe we can just remove support for what is not default. That way we can also remove print_indented_matrix.

The alternative is to expose it to the user through a parameter in the LangC-type, but it seems a bit much for. "The best code is no code"

This is actually already present before this PR, but when we are anyway messing with this code...

Indeed I was a bit puzzled by the kwarg that cannot be used – there must have been a way at some point in the past, but I don't remember how we did it. I've removed A and all related functionalities (which were not – and I believe could not – be covered by our tests).

@jarlebring
Copy link
Collaborator

Great! LGTM 👍

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 17, 2024

One other thing I noticed by looking at the generated code is that we now have a number of new trivial lincomb nodes, such as

    X ← + α × A + 0 × B

which should really be

    X ← + α × A             (a)

or

X ← 1 × A

which is just a copy and should be removed. In fact, nodes of the form (a) could also be removed by pushing α to the corresponding coefficient(s) of subsequent lincombs, although this seems a bit messier to do.

@mfasi
Copy link
Collaborator Author

mfasi commented Jan 17, 2024

Great! LGTM 👍

Great! I'll create an issue from my last comment and let's discuss it there.

@mfasi mfasi mentioned this pull request Jan 17, 2024
@mfasi mfasi merged commit e4d6efd into main Jan 17, 2024
@jarlebring
Copy link
Collaborator

    X ← + α × A + 0 × B

which should really be

    X ← + α × A             (a)

or

I actually think we this type of code optimization should not be done in the code generation. A user can always run compress_trivial! before calling gen_code and those cases will be removed. It can be handled at graph-level. I think we get cleaner code-generation code if we see the code generator as being a bit stupid. Code generation philosophy: "Generate code for exactly this graph".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants