Refine GraalVM native-image configuration and fix native test execution #823

kkriske · 2022-12-27T01:22:39Z

Refine the native-image configuration and make inclusion of the metadata conditional
Update the jni code to load and cache all java class/field/method references in JNI_OnLoad. This makes it easier to validate native-image configuration as successfully loading the library means all required JNI configuration is present.
fix native-image test execution and include as much tests as currently possible
- Tests using AssertJ assume* calls are excluded because AssertJ relies on dynamic class loading which I couldn't get to play nice.

The commits currently are a bit of a mess, I can clean it up a bit before review if preferred.

Future work (probably separate PR):
I want to move the JNI configuration to a Feature implementation, as I once upon a time suggested here: #631 (comment). Such a feature is imo easier to maintain because it's just typechecked java code and not a json file. Additionally it would allow dynamic configuration of which resources get included in the native-image, so only the relevant library for the target OS is included instead of all of them reducing the overall size of the resulting executable. It would also enable unpacking the required library during the build, placing it next to the target exeuctable so it doesn't need to be unpacked to a temp dir at all. This would benefit the startup time of the image.

/cc @gotson Taking the liberty to tag you. The git history and this branch of yours tell me you've worked on this part of the code before.

…t execution

gotson · 2023-01-01T13:04:22Z

/cc @gotson Taking the liberty to tag you. The git history and this branch of yours tell me you've worked on this part of the code before.

I'm also the maintainer of the repo :) I will need some time to review this, might ask some questions when i do.

gotson · 2023-01-03T09:05:13Z

Hi @kkriske , i had a look at the PR, but i don't understand most of the changes. What are the changes in the native code and why are they required?

and make inclusion of the metadata conditional

what does that mean ?

Update the jni code to load and cache all java class/field/method references in JNI_OnLoad. This makes it easier to validate native-image configuration as successfully loading the library means all required JNI configuration is present.

I don't understand this bit.

The commits currently are a bit of a mess, I can clean it up a bit before review if preferred.

No need, it will be squashed :)

kkriske

I don't know how up-to-date you are with how GraalVM native-image works. It works with a closed world assumption, because to convert everything to native-code you need to know everything that will be executed. Everything that's not used will be removed but some usages are pretty much impossible to detect using bytecode analysis, for example reflection and JNI calls. So that's why the jni-config.json file is required, to tell native-image which JNI calls will be required so the needed metadata for calling it is available, and the method is not entirely removed either because it may not be detected as being used.

As a library however, we can't assume that just because we're on the classpath means we're used. So the config metadata should always be conditional. If we don't do that, we will always force these jni calls and the reachable code through them to be included in the native-image. So by adding this condition:

"condition": {
    "typeReachable": "org.sqlite.core.NativeDB"
}

The JNI calls will only be included if the bytecode analysis detects that the org.sqlite.core.NativeDB class is actually used.
So if sqlite-jdbc is on the classpath, but never used/called, we also don't include any code.

For the resource config, there is no such thing as conditional inclusion yet, which is why I want to do a follow-up to create a Feature implementation, because it is possible to do it through the java native-image API. It's then also possible to only include the resources for the relevant target OS instead of everything.

Update the jni code to load and cache all java class/field/method references in JNI_OnLoad. This makes it easier to validate native-image configuration as successfully loading the library means all required JNI configuration is present.

I moved all FindClass, GetFieldID and GetMethodID calls to the JNI_OnLoad function. This makes maintenance easier in my opinion as it groups all native->java lookups together. Every call of one of these functions results in an entry in the jni-config.json file, so grouping them also makes it easier to keep the native lookups and the required native-image configuration in sync.
This is partially what I ment with This makes it easier to validate native-image configuration. A second part of this is that if you're running in native-image and the library loads successfully (meaning the JNI_OnLoad does not result in an exception) then all native->java lookups already happened and you're certain no additional calls will happen at runtime for which you may have overlooked the configuration (this was the case for FindClass(env, "[Z") for example).

What this boils down to, is that if the native test execution in the GitHub Actions succeeds, we can be certain that the configuration is complete.

What are the changes in the native code and why are they required?

For the remaining native code changes, I have added some comments in the code. Let me know if I glossed over some other stuff that needs explaining.

kkriske · 2023-01-03T09:37:38Z

src/main/java/org/sqlite/core/NativeDB.c

+    exclass = (*env)->FindClass(env, "java/lang/Throwable");
+    if(!exclass) return JNI_ERR;
+    exclass = (*env)->NewWeakGlobalRef(env, exclass);


Lack of global reference to this class caused the Segfault mentioned in the surefire exclusions. The class reference got garbage collected which invalidates the jmethodID which caused usage of freed memory.

It seems bizarre to me that java.lang.Throwable can be unloaded when we instantiate an Exception where Throwable is an interface.

Also, since this is a call to toString(), I think that we can use java.lang.Object, rather than Throwable.

Also, as we're holding a weak global reference, I'm confused at what would be stopping the VM from unloading the class fully, as a weak reference shouldn't prevent this.

With this said, I am not super familiar with these APIs.

I am conflused why NewWeakGlobalRef calls are used as well, I would expect those to be NewGlobalRef instead, but I merely expanded on what was already there.

As for the Throwable reference, the class doesn't get unloaded, but the native reference is. I'm not entirely sure what causes this, possibly the garbage collector moving around where the class is stored in memory, while creating the global ref makes sure it stays valid? This was definitely a bug which for some reason does not seem to manifest itself in a JVM run. It could be possible to force it to happen in memory constrained situations?

Looking into this more, I think what we ran into is this: https://stackoverflow.com/questions/11907832/jni-local-vs-global-reference-is-not-a-valid-jni-reference

It appears we're just lucky that the regular JVM isn't moving these classes around.

It seems that the proper fix is that these should be all global references (these classes should not be unloaded), or FindClass should be called in the places where these classes are used. Calling FindClass more often is likely cleaner, but allows for more runtime exceptions due to graal native image misconfiguration and may be slower.

kkriske · 2023-01-03T09:37:42Z

src/main/java/org/sqlite/core/NativeDB.c

+    bool_array_class = (*env)->FindClass(env, "[Z");
+    if(!bool_array_class) return JNI_ERR;
+    bool_array_class = (*env)->NewWeakGlobalRef(env, bool_array_class);


The jni-config entry for boolean[] was missing, which caused the java.sql.SQLException: Out of memory mentioned in the surefire exclusions.

Do we need to maintain a global weak reference to this, can we just make sure that the returned class is not null which implies it is accessible from JNI?

Do you mean not storing the class ref and doing the FindClass call again later when needed? While that would work, I think it would be a bit pointless. The lookup itself is "slow" so it makes sense to cache it if the lookup happens anyway in the JNI_OnLoad. It would also break with my suggestion to keep JNI lookups confined in the JNI_OnLoad function.

kkriske · 2023-01-03T09:40:43Z

src/main/java/org/sqlite/core/NativeDB.c

-        busyHandlerContext->methodId = (*env)->GetMethodID(  env,
-                                                   (*env)->GetObjectClass(env, busyHandlerContext->obj),
-                                                   "callback",
-                                                   "(I)I");


This did a lookup to BusyHandler#callback for every implementation of BusyHandler that was passed to this function. It is sufficient to store the reference to the interface method, which will allow calling it on every subclass as well.

The lookup on the subclasses also ment that a jni-config entry would have been needed for every used implementation class which pushes the configuration requirement to the end-user. By storing the interface method id there is no more configuration needed by the user.

Nice fix (:+1:

kkriske · 2023-01-03T09:41:13Z

src/main/java/org/sqlite/core/NativeDB.c

-        progressHandlerContext->mth = (*env)->GetMethodID(  env,
-                                                   (*env)->GetObjectClass(env, progressHandlerContext->phandler),
-                                                   "progress",
-                                                   "()I");


Same thing as BusyHandler#callback.

kkriske · 2023-01-03T09:45:14Z

src/test/resources/META-INF/native-image/org.xerial/sqlite-jdbc/reflect-config.json

@@ -0,0 +1,15 @@
+[


This config is only added to the test-suite, so it is not provided to the end-user.
It is only required when using rowsets, which is not code maintained by sqlite-jdbc so it's not up to us to provide the user with this config. It would also cause a direct dependency to java internals which may not be compatible over all jdks/versions.

Same thing for the resource-config.json next to this file.

gotson · 2023-01-04T06:39:25Z

Thanks @kkriske for the details, that is very useful.

gotson · 2023-01-04T06:40:48Z

Hi @pyckle , if you have a bit of spare time, would you mind reviewing the changes done in the native C code ? 🙏🏻

pom.xml

pyckle · 2023-01-05T08:57:41Z

I'm happy to review, but I won't be able to until next week.

gotson · 2023-01-05T08:58:24Z

I'm happy to review, but I won't be able to until next week.

no rush, everyone's working for free here, and your help and expertise is much appreciated !

kkriske · 2023-01-05T09:04:42Z

FYI, I am not a C or JNI expert, I'm just working with whatever knowledge I've picked up along the way.

pettyalex · 2023-01-05T21:19:52Z

Awesome, I wish I would have checked this before debugging the failure to run in a native image myself.

I did notice one difference between your jni-config.json and mine: my tracing agent also found

  {
    "name":"java.lang.Boolean",
    "methods":[{"name":"getBoolean","parameterTypes":["java.lang.String"] }]
  },

in addition to Throwable. Is that necessary with the JNI access to boolean arrays, or is my traced usage probably coming from somewhere in Spring Boot?

kkriske · 2023-01-05T22:32:06Z

Looks like I'll have to do a small update to accomodate the changes of #818

kkriske · 2023-01-25T23:19:42Z

So about the "github actions bug", turns out I messed up the config. Who would've thought that I was at fault and not GitHub 🙄

Build is green again: https://github.com/kkriske/sqlite-jdbc/actions/runs/4010885484

gotson · 2023-01-26T07:12:47Z

@kkriske - can you add pyckle@cad5235 into your branch and then we can see if graal 17 native image works?

Hi @pyckle , i'm not sure that's right for the second change, which is for Android.

gotson · 2023-01-26T07:17:52Z

Linux-aarch64 is not defined in the known_targets in Makefile.common, so it would default to using the Default values. On top of that we should probably add the -lm to other Linux architectures like Linux-x86, Linux-arm, Linux-armv6 and Linux-armv7.

pyckle · 2023-01-26T07:42:22Z

@gotson - nice catch. I think you're correct on all of your points.

Adding -lm to all of the linux targets would be proper. And it's certainly necessary on the Linux aarch64 target as graal native-image will likely be used on aarch64. The other targets are less important, as graalvm native-image can't make 32 bits arm/x86 images (nor is it likely they will).

@kkriske - can you make these changes? Would you prefer if I make these changes?

As a side note, it seems that it would be more proper to explicitly link with pthread too, as pthreads is also used:

$ objdump -T  ./src/main/resources/org/sqlite/native/Linux/x86_64/libsqlitejdbc.so  | grep pthread
0000000000000000      D  *UND*	0000000000000000              pthread_mutex_trylock
0000000000000000      D  *UND*	0000000000000000              pthread_mutexattr_settype
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_init
0000000000000000      D  *UND*	0000000000000000              pthread_mutexattr_destroy
0000000000000000      D  *UND*	0000000000000000              pthread_create
0000000000000000      D  *UND*	0000000000000000              pthread_mutexattr_init
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_lock
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_destroy
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_unlock
0000000000000000      D  *UND*	0000000000000000              pthread_join

kkriske · 2023-01-27T17:47:55Z

@pyckle I have little experience with Makefiles so I would definitely be more confortable if you're willing to do the changes. Besides it would simplify things over you knowing what I should add and me fumbling around util I get it right.

kkriske · 2023-01-28T09:56:10Z

In the meantime, I was taking a look at adding a Feature in the code instead of the config files. However I'm having some dificulties making it work with the multirelease jar, and adding it would require a dependency on the org.graalvm.sdk module in the moduleinfo but that would probably force compilation to happen on a GraalVM JDK. I did reach out to some folks at GraalVM to find out what the best practices are for libraries to add support through Feature implementations.

gotson · 2023-01-28T12:50:21Z

As a side note, it seems that it would be more proper to explicitly link with pthread too, as pthreads is also used:

we can add that too. If you want to prepare a separate PR with those changes (-lm and pthreads) i will review it.

pyckle · 2023-01-28T17:43:15Z

@kkriske @gotson take a look at pyckle@0681710
I added -pthread -lm by default as it should be in all linux builds.

@kkriske -Regarding the Feature implementation: I think there are two options.

Add the library as a dependency with maven scope provided. That way this dependency will not be implicitly added to anybody depending upon the sqlite-jdbc library, as provided dependencies are not transitive. The Feature class is present though, but won't cause issues unless somebody explicitly tries to access it.
Make the sqlite project a multi module maven project, and put these classes in a separate module.

I think @gotson should decide which option is preferable, or suggest another option.

With that said, I think these changes should be merged before finishing the better implementation. People are using the graalvm native images with sqlite-jdbc, and this restores what previously worked.

kkriske · 2023-01-28T18:24:32Z

@pyckle

Add the library as a dependency with maven scope provided.

How I understand it, a provided scope means that the JDK will provide the dependency. So that should work as long as you are compiling on a GraalVM JDK. What I was thinking was a optional modifier, which would also not make it transitive, and would allow compiling on any JDK.

You're right though that a possible Feature implementation is definitely a followup after this is done, so maybe we should move that discussion to a separate issue.

Just FYI, this is where I reached out to clarify library development best practices: oracle/graal#5811
And this is a different issue I ran into with multimodule jars & native tests: graalvm/native-build-tools#112. However this may be possible to work around using a separate testing module.

pyckle · 2023-01-28T23:58:25Z

Having read through the maven documentation, as well as the linked graalvm thread, I think that provided makes more sense than optional, but in practice it's a semantic difference.

How I understand it, a provided scope means that the JDK will provide the dependency. So that should work as long as you are compiling on a GraalVM JDK.

Quoting from the maven documentation https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html:

provided
This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. A dependency with this scope is added to the classpath used for compilation and test, but not the runtime classpath. It is not transitive.

Therefore, a regular JDK would compile the classes which use graalvm sdk just fine. They would even load during the maven testing phase.

What I was thinking was a optional modifier, which would also not make it transitive, and would allow compiling on any JDK.

This implies that there's a case where this library should be included on the runtime classpath. But this is not the case. Either an application is used with graalvm which provides this library, or using a regular JDK and these classes should never be loaded. In neither case is there a reason to include graal-sdk on the classpath.

gotson · 2023-01-29T00:07:54Z

@kkriske @gotson take a look at pyckle@0681710
I added -pthread -lm by default as it should be in all linux builds.

i left a comment on that commit, the default could apply to any platform that is not a known target, and it could be a non-linux OS. I don't know if that could have an impact.

pyckle · 2023-01-29T00:27:20Z

i left a comment on that commit, the default could apply to any platform that is not a known target, and it could be a non-linux OS. I don't know if that could have an impact.

Thanks for the feedback (:

I believe that it is only used for Linux based OS. See my response.

pyckle · 2023-01-30T17:40:31Z

@kkriske - Please cherry-pick pyckle@0681710 and pyckle@0310731
and rebuild the jni libs. @gotson found an issue with my fix, thus the second commit being necessary.

Also, please feel free to squash my changes into a single commit without any need to include my name/credit - they're minor changes. And once again, very nice work with these improvements (:

gotson · 2023-01-31T03:40:53Z

and rebuild the jni libs.

this can be done via Github Actions, see https://github.com/xerial/sqlite-jdbc/blob/master/CONTRIBUTING.md#build-from-ci

Also, please feel free to squash my changes into a single commit without any need to include my name/credit - they're minor changes. And once again, very nice work with these improvements (:

the PR will be squashed anyhow :)

@pyckle are we all good with the code changes in the C JNI code ? I think so but just wanted to make sure!

pyckle · 2023-01-31T07:18:12Z

@pyckle are we all good with the code changes in the C JNI code ? I think so but just wanted to make sure!

Looks good to me. There's a few issues in the existing JNI code that this PR exposed to me, but those should be addressed in a different PR.

kkriske · 2023-01-31T08:21:35Z

@kkriske - Please cherry-pick pyckle@0681710 and pyckle@0310731
and rebuild the jni libs.

@pyckle I'll try to find some time for that this evening.
Was there anything that still needed to be done with NewWeakGlobalRef vs NewGlobalRef?

pyckle · 2023-01-31T17:44:38Z

Was there anything that still needed to be done with NewWeakGlobalRef vs NewGlobalRef?

I think this change should be in a separate PR. It's been this way for a long time, and those classes are referenced strongly elsewhere so I think this is more of a theoretical bug.

kkriske · 2023-01-31T18:44:39Z

I think this change should be in a separate PR. It's been this way for a long time, and those classes are referenced strongly elsewhere so I think this is more of a theoretical bug.

I don't think that's accurate.
It's not about the class being garbage collected, it's about the native reference to the class being released which could result in the garbage collection of the class reference. But it's not one and the same.
So in a memory constrained situation, the weak reference could be invalidated in order to free up that little bit of memory, while still keeping the other strong references. Either way, they should be the last thing to be released before going OOM if I understand it correctly, so usually it won't be enough to get out of the memory constrained situation anyway and the risk of it being an issue is extremely small.

Either way, I have cherry-picked the makefile changes and rebuilt the native libraries.

pyckle · 2023-01-31T20:10:32Z

Looking into this a bit more, I actually think WeakReferences are correct. In fact, there's a comment that documents this in NativeDB.c:

// These classes are weak references to that if the classloader is no longer referenced (garbage)
// It can be garbage collected. The weak references are freed on unload.

If these were strongly referenced, we'd have a circular reference that would prevent this classloader from ever be freed resulting in a memory leak. Loaded classes are always strongly referenced until a classloader is closed.

I think that this can be merged. Nice work.

gotson · 2023-02-01T02:51:41Z

/native

gotson · 2023-02-01T03:50:03Z

@kkriske thanks a lot for the work on this PR ! It's merged and available in the snapshot repo.

@pyckle thanks for your help on that one!

kkriske · 2023-02-01T23:05:24Z

I created a first draft for the feature implementation, you can have a look already and see if it's something you want to persue.
#835

kkriske and others added 16 commits December 26, 2022 14:39

ci: switch to official GraalVM setup action

ee51c2d

refactor: store jmethodID references for callbacks

292ee15

chore: update native libraries

48f67f8

refactor: store jmethodID references for callbacks

be914b5

chore: update native libraries

e1e5466

refactor: store jmethodID and jfieldID references on library load

c9366e6

refactor: store jmethodID and jfieldID references on library load

4230aa4

chore: update native libraries

73baea1

refactor: store jclass in weak global ref

435a06b

chore: update native libraries

9a2df7d

refactor: store primitive jclass in non-weak ref

60bab16

refactor: accidentally used a local scoped variable

4fd66d6

chore: update native libraries

af210b2

refactor: store jclass global reference of exception class

0b5d8cb

chore: update native libraries

1073b86

refactor: update GraalVM native-image configuration to fix native tes…

7f34d50

…t execution

gotson self-assigned this Jan 3, 2023

kkriske commented Jan 3, 2023

View reviewed changes

gotson reviewed Jan 4, 2023

View reviewed changes

pom.xml Show resolved Hide resolved

pom.xml Show resolved Hide resolved

pom.xml Show resolved Hide resolved

gotson added the impacts:JNI Has impact on JNI C code label Jan 4, 2023

pettyalex mentioned this pull request Jan 5, 2023

fix: graalVM native image with runtime class initialization #826

Closed

chore: update native libraries

4788698

Andrew Pikler and others added 3 commits January 31, 2023 19:05

fix(jni) fix makefile linker opts to include -lm and -pthreads

5dd9f27

fix(jni) Fix Makefile to use default linker flags for x86_64

202da6d

chore: update native libraries

e345c62

chore: update native libraries

c3cc33d

gotson merged commit e437b3f into xerial:master Feb 1, 2023

gotson linked an issue Feb 2, 2023 that may be closed by this pull request

Native image broken in 3.40.0.0 #816

Closed

kkriske deleted the fix-native-image-setup branch June 12, 2023 18:33

Refine GraalVM native-image configuration and fix native test execution #823

Refine GraalVM native-image configuration and fix native test execution #823

Uh oh!

Conversation

kkriske commented Dec 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gotson commented Jan 1, 2023

Uh oh!

gotson commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkriske left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gotson commented Jan 4, 2023

Uh oh!

gotson commented Jan 4, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pyckle commented Jan 5, 2023

Uh oh!

gotson commented Jan 5, 2023

Uh oh!

kkriske commented Jan 5, 2023

Uh oh!

pettyalex commented Jan 5, 2023

Uh oh!

kkriske commented Jan 5, 2023

Uh oh!

kkriske commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gotson commented Jan 26, 2023

Uh oh!

gotson commented Jan 26, 2023

Uh oh!

pyckle commented Jan 26, 2023

Uh oh!

kkriske commented Jan 27, 2023

Uh oh!

kkriske commented Jan 28, 2023

Uh oh!

gotson commented Jan 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pyckle commented Jan 28, 2023

Uh oh!

kkriske commented Jan 28, 2023

Uh oh!

pyckle commented Jan 28, 2023

Uh oh!

gotson commented Jan 29, 2023

Uh oh!

pyckle commented Jan 29, 2023

Uh oh!

kkriske commented Dec 27, 2022 •

edited

Loading

gotson commented Jan 3, 2023 •

edited

Loading

kkriske commented Jan 25, 2023 •

edited

Loading

gotson commented Jan 28, 2023 •

edited

Loading

kkriske commented Feb 1, 2023 •

edited

Loading