Skip to content

Conversation

@kkriske
Copy link
Contributor

@kkriske kkriske commented Dec 27, 2022

  • Refine the native-image configuration and make inclusion of the metadata conditional
  • Update the jni code to load and cache all java class/field/method references in JNI_OnLoad. This makes it easier to validate native-image configuration as successfully loading the library means all required JNI configuration is present.
  • fix native-image test execution and include as much tests as currently possible
    • Tests using AssertJ assume* calls are excluded because AssertJ relies on dynamic class loading which I couldn't get to play nice.

The commits currently are a bit of a mess, I can clean it up a bit before review if preferred.

Future work (probably separate PR):
I want to move the JNI configuration to a Feature implementation, as I once upon a time suggested here: #631 (comment). Such a feature is imo easier to maintain because it's just typechecked java code and not a json file. Additionally it would allow dynamic configuration of which resources get included in the native-image, so only the relevant library for the target OS is included instead of all of them reducing the overall size of the resulting executable. It would also enable unpacking the required library during the build, placing it next to the target exeuctable so it doesn't need to be unpacked to a temp dir at all. This would benefit the startup time of the image.

/cc @gotson Taking the liberty to tag you. The git history and this branch of yours tell me you've worked on this part of the code before.

@gotson
Copy link
Collaborator

gotson commented Jan 1, 2023

/cc @gotson Taking the liberty to tag you. The git history and this branch of yours tell me you've worked on this part of the code before.

I'm also the maintainer of the repo :) I will need some time to review this, might ask some questions when i do.

@gotson gotson self-assigned this Jan 3, 2023
@gotson
Copy link
Collaborator

gotson commented Jan 3, 2023

Hi @kkriske , i had a look at the PR, but i don't understand most of the changes. What are the changes in the native code and why are they required?

  • and make inclusion of the metadata conditional

what does that mean ?

  • Update the jni code to load and cache all java class/field/method references in JNI_OnLoad. This makes it easier to validate native-image configuration as successfully loading the library means all required JNI configuration is present.

I don't understand this bit.

The commits currently are a bit of a mess, I can clean it up a bit before review if preferred.

No need, it will be squashed :)

Copy link
Contributor Author

@kkriske kkriske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how up-to-date you are with how GraalVM native-image works. It works with a closed world assumption, because to convert everything to native-code you need to know everything that will be executed. Everything that's not used will be removed but some usages are pretty much impossible to detect using bytecode analysis, for example reflection and JNI calls. So that's why the jni-config.json file is required, to tell native-image which JNI calls will be required so the needed metadata for calling it is available, and the method is not entirely removed either because it may not be detected as being used.

As a library however, we can't assume that just because we're on the classpath means we're used. So the config metadata should always be conditional. If we don't do that, we will always force these jni calls and the reachable code through them to be included in the native-image. So by adding this condition:

"condition": {
    "typeReachable": "org.sqlite.core.NativeDB"
}

The JNI calls will only be included if the bytecode analysis detects that the org.sqlite.core.NativeDB class is actually used.
So if sqlite-jdbc is on the classpath, but never used/called, we also don't include any code.

For the resource config, there is no such thing as conditional inclusion yet, which is why I want to do a follow-up to create a Feature implementation, because it is possible to do it through the java native-image API. It's then also possible to only include the resources for the relevant target OS instead of everything.


Update the jni code to load and cache all java class/field/method references in JNI_OnLoad. This makes it easier to validate native-image configuration as successfully loading the library means all required JNI configuration is present.

I moved all FindClass, GetFieldID and GetMethodID calls to the JNI_OnLoad function. This makes maintenance easier in my opinion as it groups all native->java lookups together. Every call of one of these functions results in an entry in the jni-config.json file, so grouping them also makes it easier to keep the native lookups and the required native-image configuration in sync.
This is partially what I ment with This makes it easier to validate native-image configuration. A second part of this is that if you're running in native-image and the library loads successfully (meaning the JNI_OnLoad does not result in an exception) then all native->java lookups already happened and you're certain no additional calls will happen at runtime for which you may have overlooked the configuration (this was the case for FindClass(env, "[Z") for example).

What this boils down to, is that if the native test execution in the GitHub Actions succeeds, we can be certain that the configuration is complete.


What are the changes in the native code and why are they required?

For the remaining native code changes, I have added some comments in the code. Let me know if I glossed over some other stuff that needs explaining.

Comment on lines +500 to +502
exclass = (*env)->FindClass(env, "java/lang/Throwable");
if(!exclass) return JNI_ERR;
exclass = (*env)->NewWeakGlobalRef(env, exclass);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of global reference to this class caused the Segfault mentioned in the surefire exclusions. The class reference got garbage collected which invalidates the jmethodID which caused usage of freed memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems bizarre to me that java.lang.Throwable can be unloaded when we instantiate an Exception where Throwable is an interface.

Also, since this is a call to toString(), I think that we can use java.lang.Object, rather than Throwable.

Also, as we're holding a weak global reference, I'm confused at what would be stopping the VM from unloading the class fully, as a weak reference shouldn't prevent this.

With this said, I am not super familiar with these APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am conflused why NewWeakGlobalRef calls are used as well, I would expect those to be NewGlobalRef instead, but I merely expanded on what was already there.

As for the Throwable reference, the class doesn't get unloaded, but the native reference is. I'm not entirely sure what causes this, possibly the garbage collector moving around where the class is stored in memory, while creating the global ref makes sure it stays valid? This was definitely a bug which for some reason does not seem to manifest itself in a JVM run. It could be possible to force it to happen in memory constrained situations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this more, I think what we ran into is this: https://stackoverflow.com/questions/11907832/jni-local-vs-global-reference-is-not-a-valid-jni-reference

It appears we're just lucky that the regular JVM isn't moving these classes around.

It seems that the proper fix is that these should be all global references (these classes should not be unloaded), or FindClass should be called in the places where these classes are used. Calling FindClass more often is likely cleaner, but allows for more runtime exceptions due to graal native image misconfiguration and may be slower.

Comment on lines +506 to +508
bool_array_class = (*env)->FindClass(env, "[Z");
if(!bool_array_class) return JNI_ERR;
bool_array_class = (*env)->NewWeakGlobalRef(env, bool_array_class);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jni-config entry for boolean[] was missing, which caused the java.sql.SQLException: Out of memory mentioned in the surefire exclusions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to maintain a global weak reference to this, can we just make sure that the returned class is not null which implies it is accessible from JNI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean not storing the class ref and doing the FindClass call again later when needed? While that would work, I think it would be a bit pointless. The lookup itself is "slow" so it makes sense to cache it if the lookup happens anyway in the JNI_OnLoad. It would also break with my suggestion to keep JNI lookups confined in the JNI_OnLoad function.

Comment on lines -619 to -622
busyHandlerContext->methodId = (*env)->GetMethodID( env,
(*env)->GetObjectClass(env, busyHandlerContext->obj),
"callback",
"(I)I");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did a lookup to BusyHandler#callback for every implementation of BusyHandler that was passed to this function. It is sufficient to store the reference to the interface method, which will allow calling it on every subclass as well.

The lookup on the subclasses also ment that a jni-config entry would have been needed for every used implementation class which pushes the configuration requirement to the end-user. By storing the interface method id there is no more configuration needed by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix (:+1:

Comment on lines -1669 to -1717
progressHandlerContext->mth = (*env)->GetMethodID( env,
(*env)->GetObjectClass(env, progressHandlerContext->phandler),
"progress",
"()I");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as BusyHandler#callback.

@@ -0,0 +1,15 @@
[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is only added to the test-suite, so it is not provided to the end-user.
It is only required when using rowsets, which is not code maintained by sqlite-jdbc so it's not up to us to provide the user with this config. It would also cause a direct dependency to java internals which may not be compatible over all jdks/versions.

Same thing for the resource-config.json next to this file.

@gotson
Copy link
Collaborator

gotson commented Jan 4, 2023

Thanks @kkriske for the details, that is very useful.

@gotson
Copy link
Collaborator

gotson commented Jan 4, 2023

Hi @pyckle , if you have a bit of spare time, would you mind reviewing the changes done in the native C code ? 🙏🏻

@gotson gotson added the impacts:JNI Has impact on JNI C code label Jan 4, 2023
@pyckle
Copy link
Contributor

pyckle commented Jan 5, 2023

I'm happy to review, but I won't be able to until next week.

@gotson
Copy link
Collaborator

gotson commented Jan 5, 2023

I'm happy to review, but I won't be able to until next week.

no rush, everyone's working for free here, and your help and expertise is much appreciated !

@kkriske
Copy link
Contributor Author

kkriske commented Jan 5, 2023

FYI, I am not a C or JNI expert, I'm just working with whatever knowledge I've picked up along the way.

@pettyalex
Copy link

Awesome, I wish I would have checked this before debugging the failure to run in a native image myself.

I did notice one difference between your jni-config.json and mine: my tracing agent also found

  {
    "name":"java.lang.Boolean",
    "methods":[{"name":"getBoolean","parameterTypes":["java.lang.String"] }]
  },

in addition to Throwable. Is that necessary with the JNI access to boolean arrays, or is my traced usage probably coming from somewhere in Spring Boot?

@kkriske
Copy link
Contributor Author

kkriske commented Jan 5, 2023

Looks like I'll have to do a small update to accomodate the changes of #818

@kkriske
Copy link
Contributor Author

kkriske commented Jan 25, 2023

So about the "github actions bug", turns out I messed up the config. Who would've thought that I was at fault and not GitHub 🙄

Build is green again: https://github.com/kkriske/sqlite-jdbc/actions/runs/4010885484

@gotson
Copy link
Collaborator

gotson commented Jan 26, 2023

@kkriske - can you add pyckle@cad5235 into your branch and then we can see if graal 17 native image works?

Hi @pyckle , i'm not sure that's right for the second change, which is for Android.

@gotson
Copy link
Collaborator

gotson commented Jan 26, 2023

Linux-aarch64 is not defined in the known_targets in Makefile.common, so it would default to using the Default values. On top of that we should probably add the -lm to other Linux architectures like Linux-x86, Linux-arm, Linux-armv6 and Linux-armv7.

@pyckle
Copy link
Contributor

pyckle commented Jan 26, 2023

@gotson - nice catch. I think you're correct on all of your points.

Adding -lm to all of the linux targets would be proper. And it's certainly necessary on the Linux aarch64 target as graal native-image will likely be used on aarch64. The other targets are less important, as graalvm native-image can't make 32 bits arm/x86 images (nor is it likely they will).

@kkriske - can you make these changes? Would you prefer if I make these changes?

As a side note, it seems that it would be more proper to explicitly link with pthread too, as pthreads is also used:

$ objdump -T  ./src/main/resources/org/sqlite/native/Linux/x86_64/libsqlitejdbc.so  | grep pthread
0000000000000000      D  *UND*	0000000000000000              pthread_mutex_trylock
0000000000000000      D  *UND*	0000000000000000              pthread_mutexattr_settype
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_init
0000000000000000      D  *UND*	0000000000000000              pthread_mutexattr_destroy
0000000000000000      D  *UND*	0000000000000000              pthread_create
0000000000000000      D  *UND*	0000000000000000              pthread_mutexattr_init
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_lock
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_destroy
0000000000000000      DF *UND*	0000000000000015  GLIBC_2.2.5 pthread_mutex_unlock
0000000000000000      D  *UND*	0000000000000000              pthread_join

@kkriske
Copy link
Contributor Author

kkriske commented Jan 27, 2023

@pyckle I have little experience with Makefiles so I would definitely be more confortable if you're willing to do the changes. Besides it would simplify things over you knowing what I should add and me fumbling around util I get it right.

@kkriske
Copy link
Contributor Author

kkriske commented Jan 28, 2023

In the meantime, I was taking a look at adding a Feature in the code instead of the config files. However I'm having some dificulties making it work with the multirelease jar, and adding it would require a dependency on the org.graalvm.sdk module in the moduleinfo but that would probably force compilation to happen on a GraalVM JDK. I did reach out to some folks at GraalVM to find out what the best practices are for libraries to add support through Feature implementations.

@gotson
Copy link
Collaborator

gotson commented Jan 28, 2023

As a side note, it seems that it would be more proper to explicitly link with pthread too, as pthreads is also used:

we can add that too. If you want to prepare a separate PR with those changes (-lm and pthreads) i will review it.

@pyckle
Copy link
Contributor

pyckle commented Jan 28, 2023

@kkriske @gotson take a look at pyckle@0681710
I added -pthread -lm by default as it should be in all linux builds.

@kkriske -Regarding the Feature implementation: I think there are two options.

  1. Add the library as a dependency with maven scope provided. That way this dependency will not be implicitly added to anybody depending upon the sqlite-jdbc library, as provided dependencies are not transitive. The Feature class is present though, but won't cause issues unless somebody explicitly tries to access it.
  2. Make the sqlite project a multi module maven project, and put these classes in a separate module.

I think @gotson should decide which option is preferable, or suggest another option.

With that said, I think these changes should be merged before finishing the better implementation. People are using the graalvm native images with sqlite-jdbc, and this restores what previously worked.

@kkriske
Copy link
Contributor Author

kkriske commented Jan 28, 2023

@pyckle

Add the library as a dependency with maven scope provided.

How I understand it, a provided scope means that the JDK will provide the dependency. So that should work as long as you are compiling on a GraalVM JDK. What I was thinking was a optional modifier, which would also not make it transitive, and would allow compiling on any JDK.

You're right though that a possible Feature implementation is definitely a followup after this is done, so maybe we should move that discussion to a separate issue.

Just FYI, this is where I reached out to clarify library development best practices: oracle/graal#5811
And this is a different issue I ran into with multimodule jars & native tests: graalvm/native-build-tools#112. However this may be possible to work around using a separate testing module.

@pyckle
Copy link
Contributor

pyckle commented Jan 28, 2023

Having read through the maven documentation, as well as the linked graalvm thread, I think that provided makes more sense than optional, but in practice it's a semantic difference.

How I understand it, a provided scope means that the JDK will provide the dependency. So that should work as long as you are compiling on a GraalVM JDK.

Quoting from the maven documentation https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html:

provided
This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. A dependency with this scope is added to the classpath used for compilation and test, but not the runtime classpath. It is not transitive.

Therefore, a regular JDK would compile the classes which use graalvm sdk just fine. They would even load during the maven testing phase.

What I was thinking was a optional modifier, which would also not make it transitive, and would allow compiling on any JDK.

This implies that there's a case where this library should be included on the runtime classpath. But this is not the case. Either an application is used with graalvm which provides this library, or using a regular JDK and these classes should never be loaded. In neither case is there a reason to include graal-sdk on the classpath.

@gotson
Copy link
Collaborator

gotson commented Jan 29, 2023

@kkriske @gotson take a look at pyckle@0681710
I added -pthread -lm by default as it should be in all linux builds.

i left a comment on that commit, the default could apply to any platform that is not a known target, and it could be a non-linux OS. I don't know if that could have an impact.

@pyckle
Copy link
Contributor

pyckle commented Jan 29, 2023

i left a comment on that commit, the default could apply to any platform that is not a known target, and it could be a non-linux OS. I don't know if that could have an impact.

Thanks for the feedback (:

I believe that it is only used for Linux based OS. See my response.

@pyckle
Copy link
Contributor

pyckle commented Jan 30, 2023

@kkriske - Please cherry-pick pyckle@0681710 and pyckle@0310731
and rebuild the jni libs. @gotson found an issue with my fix, thus the second commit being necessary.

Also, please feel free to squash my changes into a single commit without any need to include my name/credit - they're minor changes. And once again, very nice work with these improvements (:

@gotson
Copy link
Collaborator

gotson commented Jan 31, 2023

and rebuild the jni libs.

this can be done via Github Actions, see https://github.com/xerial/sqlite-jdbc/blob/master/CONTRIBUTING.md#build-from-ci

Also, please feel free to squash my changes into a single commit without any need to include my name/credit - they're minor changes. And once again, very nice work with these improvements (:

the PR will be squashed anyhow :)

@pyckle are we all good with the code changes in the C JNI code ? I think so but just wanted to make sure!

@pyckle
Copy link
Contributor

pyckle commented Jan 31, 2023

@pyckle are we all good with the code changes in the C JNI code ? I think so but just wanted to make sure!

Looks good to me. There's a few issues in the existing JNI code that this PR exposed to me, but those should be addressed in a different PR.

@kkriske
Copy link
Contributor Author

kkriske commented Jan 31, 2023

@kkriske - Please cherry-pick pyckle@0681710 and pyckle@0310731
and rebuild the jni libs.

@pyckle I'll try to find some time for that this evening.
Was there anything that still needed to be done with NewWeakGlobalRef vs NewGlobalRef?

@pyckle
Copy link
Contributor

pyckle commented Jan 31, 2023

Was there anything that still needed to be done with NewWeakGlobalRef vs NewGlobalRef?

I think this change should be in a separate PR. It's been this way for a long time, and those classes are referenced strongly elsewhere so I think this is more of a theoretical bug.

@kkriske
Copy link
Contributor Author

kkriske commented Jan 31, 2023

I think this change should be in a separate PR. It's been this way for a long time, and those classes are referenced strongly elsewhere so I think this is more of a theoretical bug.

I don't think that's accurate.
It's not about the class being garbage collected, it's about the native reference to the class being released which could result in the garbage collection of the class reference. But it's not one and the same.
So in a memory constrained situation, the weak reference could be invalidated in order to free up that little bit of memory, while still keeping the other strong references. Either way, they should be the last thing to be released before going OOM if I understand it correctly, so usually it won't be enough to get out of the memory constrained situation anyway and the risk of it being an issue is extremely small.

Either way, I have cherry-picked the makefile changes and rebuilt the native libraries.

@pyckle
Copy link
Contributor

pyckle commented Jan 31, 2023

Looking into this a bit more, I actually think WeakReferences are correct. In fact, there's a comment that documents this in NativeDB.c:

// These classes are weak references to that if the classloader is no longer referenced (garbage)
// It can be garbage collected. The weak references are freed on unload.

If these were strongly referenced, we'd have a circular reference that would prevent this classloader from ever be freed resulting in a memory leak. Loaded classes are always strongly referenced until a classloader is closed.

I think that this can be merged. Nice work.

@gotson
Copy link
Collaborator

gotson commented Feb 1, 2023

/native

@gotson gotson merged commit e437b3f into xerial:master Feb 1, 2023
@gotson
Copy link
Collaborator

gotson commented Feb 1, 2023

@kkriske thanks a lot for the work on this PR ! It's merged and available in the snapshot repo.

@pyckle thanks for your help on that one!

@kkriske
Copy link
Contributor Author

kkriske commented Feb 1, 2023

I created a first draft for the feature implementation, you can have a look already and see if it's something you want to persue.
#835

@gotson gotson linked an issue Feb 2, 2023 that may be closed by this pull request
@kkriske kkriske deleted the fix-native-image-setup branch June 12, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

impacts:JNI Has impact on JNI C code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native image broken in 3.40.0.0

4 participants