-
Notifications
You must be signed in to change notification settings - Fork 655
Refine GraalVM native-image configuration and fix native test execution #823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm also the maintainer of the repo :) I will need some time to review this, might ask some questions when i do. |
|
Hi @kkriske , i had a look at the PR, but i don't understand most of the changes. What are the changes in the native code and why are they required?
what does that mean ?
I don't understand this bit.
No need, it will be squashed :) |
kkriske
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how up-to-date you are with how GraalVM native-image works. It works with a closed world assumption, because to convert everything to native-code you need to know everything that will be executed. Everything that's not used will be removed but some usages are pretty much impossible to detect using bytecode analysis, for example reflection and JNI calls. So that's why the jni-config.json file is required, to tell native-image which JNI calls will be required so the needed metadata for calling it is available, and the method is not entirely removed either because it may not be detected as being used.
As a library however, we can't assume that just because we're on the classpath means we're used. So the config metadata should always be conditional. If we don't do that, we will always force these jni calls and the reachable code through them to be included in the native-image. So by adding this condition:
"condition": {
"typeReachable": "org.sqlite.core.NativeDB"
}The JNI calls will only be included if the bytecode analysis detects that the org.sqlite.core.NativeDB class is actually used.
So if sqlite-jdbc is on the classpath, but never used/called, we also don't include any code.
For the resource config, there is no such thing as conditional inclusion yet, which is why I want to do a follow-up to create a Feature implementation, because it is possible to do it through the java native-image API. It's then also possible to only include the resources for the relevant target OS instead of everything.
Update the jni code to load and cache all java class/field/method references in JNI_OnLoad. This makes it easier to validate native-image configuration as successfully loading the library means all required JNI configuration is present.
I moved all FindClass, GetFieldID and GetMethodID calls to the JNI_OnLoad function. This makes maintenance easier in my opinion as it groups all native->java lookups together. Every call of one of these functions results in an entry in the jni-config.json file, so grouping them also makes it easier to keep the native lookups and the required native-image configuration in sync.
This is partially what I ment with This makes it easier to validate native-image configuration. A second part of this is that if you're running in native-image and the library loads successfully (meaning the JNI_OnLoad does not result in an exception) then all native->java lookups already happened and you're certain no additional calls will happen at runtime for which you may have overlooked the configuration (this was the case for FindClass(env, "[Z") for example).
What this boils down to, is that if the native test execution in the GitHub Actions succeeds, we can be certain that the configuration is complete.
What are the changes in the native code and why are they required?
For the remaining native code changes, I have added some comments in the code. Let me know if I glossed over some other stuff that needs explaining.
| exclass = (*env)->FindClass(env, "java/lang/Throwable"); | ||
| if(!exclass) return JNI_ERR; | ||
| exclass = (*env)->NewWeakGlobalRef(env, exclass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack of global reference to this class caused the Segfault mentioned in the surefire exclusions. The class reference got garbage collected which invalidates the jmethodID which caused usage of freed memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems bizarre to me that java.lang.Throwable can be unloaded when we instantiate an Exception where Throwable is an interface.
Also, since this is a call to toString(), I think that we can use java.lang.Object, rather than Throwable.
Also, as we're holding a weak global reference, I'm confused at what would be stopping the VM from unloading the class fully, as a weak reference shouldn't prevent this.
With this said, I am not super familiar with these APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am conflused why NewWeakGlobalRef calls are used as well, I would expect those to be NewGlobalRef instead, but I merely expanded on what was already there.
As for the Throwable reference, the class doesn't get unloaded, but the native reference is. I'm not entirely sure what causes this, possibly the garbage collector moving around where the class is stored in memory, while creating the global ref makes sure it stays valid? This was definitely a bug which for some reason does not seem to manifest itself in a JVM run. It could be possible to force it to happen in memory constrained situations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this more, I think what we ran into is this: https://stackoverflow.com/questions/11907832/jni-local-vs-global-reference-is-not-a-valid-jni-reference
It appears we're just lucky that the regular JVM isn't moving these classes around.
It seems that the proper fix is that these should be all global references (these classes should not be unloaded), or FindClass should be called in the places where these classes are used. Calling FindClass more often is likely cleaner, but allows for more runtime exceptions due to graal native image misconfiguration and may be slower.
| bool_array_class = (*env)->FindClass(env, "[Z"); | ||
| if(!bool_array_class) return JNI_ERR; | ||
| bool_array_class = (*env)->NewWeakGlobalRef(env, bool_array_class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jni-config entry for boolean[] was missing, which caused the java.sql.SQLException: Out of memory mentioned in the surefire exclusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to maintain a global weak reference to this, can we just make sure that the returned class is not null which implies it is accessible from JNI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean not storing the class ref and doing the FindClass call again later when needed? While that would work, I think it would be a bit pointless. The lookup itself is "slow" so it makes sense to cache it if the lookup happens anyway in the JNI_OnLoad. It would also break with my suggestion to keep JNI lookups confined in the JNI_OnLoad function.
| busyHandlerContext->methodId = (*env)->GetMethodID( env, | ||
| (*env)->GetObjectClass(env, busyHandlerContext->obj), | ||
| "callback", | ||
| "(I)I"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did a lookup to BusyHandler#callback for every implementation of BusyHandler that was passed to this function. It is sufficient to store the reference to the interface method, which will allow calling it on every subclass as well.
The lookup on the subclasses also ment that a jni-config entry would have been needed for every used implementation class which pushes the configuration requirement to the end-user. By storing the interface method id there is no more configuration needed by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix (:+1:
| progressHandlerContext->mth = (*env)->GetMethodID( env, | ||
| (*env)->GetObjectClass(env, progressHandlerContext->phandler), | ||
| "progress", | ||
| "()I"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as BusyHandler#callback.
| @@ -0,0 +1,15 @@ | |||
| [ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config is only added to the test-suite, so it is not provided to the end-user.
It is only required when using rowsets, which is not code maintained by sqlite-jdbc so it's not up to us to provide the user with this config. It would also cause a direct dependency to java internals which may not be compatible over all jdks/versions.
Same thing for the resource-config.json next to this file.
|
Thanks @kkriske for the details, that is very useful. |
|
Hi @pyckle , if you have a bit of spare time, would you mind reviewing the changes done in the native C code ? 🙏🏻 |
|
I'm happy to review, but I won't be able to until next week. |
no rush, everyone's working for free here, and your help and expertise is much appreciated ! |
|
FYI, I am not a C or JNI expert, I'm just working with whatever knowledge I've picked up along the way. |
|
Awesome, I wish I would have checked this before debugging the failure to run in a native image myself. I did notice one difference between your jni-config.json and mine: my tracing agent also found in addition to Throwable. Is that necessary with the JNI access to boolean arrays, or is my traced usage probably coming from somewhere in Spring Boot? |
|
Looks like I'll have to do a small update to accomodate the changes of #818 |
|
So about the "github actions bug", turns out I messed up the config. Who would've thought that I was at fault and not GitHub 🙄 Build is green again: https://github.com/kkriske/sqlite-jdbc/actions/runs/4010885484 |
Hi @pyckle , i'm not sure that's right for the second change, which is for Android. |
|
|
|
@gotson - nice catch. I think you're correct on all of your points. Adding @kkriske - can you make these changes? Would you prefer if I make these changes? As a side note, it seems that it would be more proper to explicitly link with |
|
@pyckle I have little experience with Makefiles so I would definitely be more confortable if you're willing to do the changes. Besides it would simplify things over you knowing what I should add and me fumbling around util I get it right. |
|
In the meantime, I was taking a look at adding a Feature in the code instead of the config files. However I'm having some dificulties making it work with the multirelease jar, and adding it would require a dependency on the |
we can add that too. If you want to prepare a separate PR with those changes ( |
|
@kkriske @gotson take a look at pyckle@0681710 @kkriske -Regarding the
I think @gotson should decide which option is preferable, or suggest another option. With that said, I think these changes should be merged before finishing the better implementation. People are using the graalvm native images with sqlite-jdbc, and this restores what previously worked. |
How I understand it, a provided scope means that the JDK will provide the dependency. So that should work as long as you are compiling on a GraalVM JDK. What I was thinking was a You're right though that a possible Feature implementation is definitely a followup after this is done, so maybe we should move that discussion to a separate issue. Just FYI, this is where I reached out to clarify library development best practices: oracle/graal#5811 |
|
Having read through the maven documentation, as well as the linked graalvm thread, I think that
Quoting from the maven documentation https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html:
Therefore, a regular JDK would compile the classes which use graalvm sdk just fine. They would even load during the maven testing phase.
This implies that there's a case where this library should be included on the runtime classpath. But this is not the case. Either an application is used with graalvm which provides this library, or using a regular JDK and these classes should never be loaded. In neither case is there a reason to include |
i left a comment on that commit, the default could apply to any platform that is not a known target, and it could be a non-linux OS. I don't know if that could have an impact. |
Thanks for the feedback (: I believe that it is only used for Linux based OS. See my response. |
|
@kkriske - Please cherry-pick pyckle@0681710 and pyckle@0310731 Also, please feel free to squash my changes into a single commit without any need to include my name/credit - they're minor changes. And once again, very nice work with these improvements (: |
this can be done via Github Actions, see https://github.com/xerial/sqlite-jdbc/blob/master/CONTRIBUTING.md#build-from-ci
the PR will be squashed anyhow :) @pyckle are we all good with the code changes in the C JNI code ? I think so but just wanted to make sure! |
Looks good to me. There's a few issues in the existing JNI code that this PR exposed to me, but those should be addressed in a different PR. |
@pyckle I'll try to find some time for that this evening. |
I think this change should be in a separate PR. It's been this way for a long time, and those classes are referenced strongly elsewhere so I think this is more of a theoretical bug. |
I don't think that's accurate. Either way, I have cherry-picked the makefile changes and rebuilt the native libraries. |
|
Looking into this a bit more, I actually think WeakReferences are correct. In fact, there's a comment that documents this in If these were strongly referenced, we'd have a circular reference that would prevent this classloader from ever be freed resulting in a memory leak. Loaded classes are always strongly referenced until a classloader is closed. I think that this can be merged. Nice work. |
|
/native |
|
I created a first draft for the feature implementation, you can have a look already and see if it's something you want to persue. |
assume*calls are excluded because AssertJ relies on dynamic class loading which I couldn't get to play nice.The commits currently are a bit of a mess, I can clean it up a bit before review if preferred.
Future work (probably separate PR):
I want to move the JNI configuration to a
Featureimplementation, as I once upon a time suggested here: #631 (comment). Such a feature is imo easier to maintain because it's just typechecked java code and not a json file. Additionally it would allow dynamic configuration of which resources get included in the native-image, so only the relevant library for the target OS is included instead of all of them reducing the overall size of the resulting executable. It would also enable unpacking the required library during the build, placing it next to the target exeuctable so it doesn't need to be unpacked to a temp dir at all. This would benefit the startup time of the image./cc @gotson Taking the liberty to tag you. The git history and this branch of yours tell me you've worked on this part of the code before.