Add Support for JFR Java Monitor Wait and Thread Sleep events #4676

roberttoyonaga · 2022-06-28T16:06:32Z

With regard to the jdk.JavaMonitorWait event, I am using a queue to keep track of the TIDs of notifiers so that waiters have access to them when they resume and emit the jdk.JavaMonitorWait event. This takes takes advantage of Condition.signal() signalling waiters in the order they started waiting.

I have created another PR #4698, that uses substitutions more closely mimic the way this event is handled in Hotspot. This alternative approach is simpler, but requires maintenance of the substituted methods when the original methods change.

...tratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/JavaMonitorWaitEvent.java

...ratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/JavaMonitorEnterEvent.java

...tratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/JavaMonitorWaitEvent.java

christianhaeubl · 2022-06-29T07:40:55Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/thread/JavaThreads.java

    static void sleep(long millis) throws InterruptedException {
+        long startTicks = com.oracle.svm.core.jfr.JfrTicks.elapsedTicks();
        if (supportsVirtual() && isVirtualDisallowLoom(Thread.currentThread())) {
            VirtualThreads.singleton().sleepMillis(millis);


Does HotSpot really emit this JFR event for virtual threads?

I tried it out using virtual threads with openjdk and the event does get emitted. Please see here

christianhaeubl · 2022-06-29T08:56:50Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/monitor/JavaMonitor.java

+    /*
+     * This queue is used to record the TIDs of notifiers so that waiters have access to them when they eventually
+     * acquire the monitor lock and resume.
+     * Adding and removing TIDs from this queue in a FIFO fashion is correct because Condition.signal(), notifies waiters in a FIFO fashion.


I don't think that this assumption is correct because:
a. it is possible to specify a timeout (see wait(long))
b. it is possible to interrupt a thread

a. If a thread wakes up from a timeout then it will not pull anything off of the TID queue. It will instead just emit an event with the timedout flag set. see here:

b. If a waiting thread is interrupted it should also not affect the TID queue.

So shouldn't it be true that a waiter will modify the notifier TID queue if and only if it has been notified? So even if all waiters do not resume in the order they began waiting (due to timeouts/interrupts), notified waiters should resume in the order that they were notified and therefore remove TIDs from the queue in the correct order.

The current code breaks as soon as a timeout or interrupt occurs because waitingThreads will be incorrect. I also think that your approach breaks if multiple threads are notified consecutively. The problems that I see, in more detail:

Scenario A

Thread A calls wait(long) and times out

Thread B acquires the monitor and tries to notify thread A

Thread B adds an entry to notifiers

Thread A emits the JFR event that it timed out

From that point on, we leak memory because there are too many entries in notifiers.

The next thread that calls wait() or wait(long) successfully will use the wrong notifier.

Scenario B

Thread A calls wait()

Thread B acquires the monitor and tries to notify thread A

Thread B adds an entry to notifiers

Thread C interrupts thread A

Thread A throws an InterruptedException - no JFR event is emitted

From that point on, we leak memory because there are too many entries in notifiers.

The next thread that calls wait() successfully will use the wrong notifier.

Scenario C

Thread A calls wait()

Thread B calls wait()

Thread C calls notify() and wakes up thread A

Thread D calls notify() and wakes up thread B

Thread A and thread B wake up and try to acquire the monitor. It is non-deterministic which thread will be able to acquire the monitor first, which violates your FIFO assumption.

Thank you Christian. I have changed things so that we always clean up after a waiter resumes (through timeout, interrupt, or signal). I have also changed things so that waiters are checked to still be waiting before we mark them as being notified (so that notifications don't get associated with waiters that have timed out or interrupted). Some drawbacks of this approach are that a linear scan of the maintained queue is required when waiters wake up. This is because the arbitrary order in which threads reacquire the lock.

In Hotspot, a notifierTid field is added to the Node struct that sits on the wait and sync queues. This way, when a waiter wakes up, it can simply inspect that field. However, in substrateVM, the wait queue and sync queue is abstracted away in Condition.signal() and Condition.await() the solution in this PR tries to mirror these low level queues this with its own queue. I have created another PR here, that uses substitutions to copy the behavior in Hotspot more closely. This alternative approach is simpler, but requires maintenance of the substituted methods when the original methods change.

christianhaeubl · 2022-06-30T07:12:38Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/monitor/JavaMonitor.java

+    /*
+     * This queue is used to record the TIDs of notifiers so that waiters have access to them when they eventually
+     * acquire the monitor lock and resume.
+     * Adding and removing TIDs from this queue in a FIFO fashion is correct because Condition.signal(), notifies waiters in a FIFO fashion.


The current code breaks as soon as a timeout or interrupt occurs because waitingThreads will be incorrect. I also think that your approach breaks if multiple threads are notified consecutively. The problems that I see, in more detail:

Scenario A

Thread A calls wait(long) and times out

Thread B acquires the monitor and tries to notify thread A

Thread B adds an entry to notifiers

Thread A emits the JFR event that it timed out

From that point on, we leak memory because there are too many entries in notifiers.

The next thread that calls wait() or wait(long) successfully will use the wrong notifier.

Scenario B

Thread A calls wait()

Thread B acquires the monitor and tries to notify thread A

Thread B adds an entry to notifiers

Thread C interrupts thread A

Thread A throws an InterruptedException - no JFR event is emitted

From that point on, we leak memory because there are too many entries in notifiers.

The next thread that calls wait() successfully will use the wrong notifier.

Scenario C

Thread A calls wait()

Thread B calls wait()

Thread C calls notify() and wakes up thread A

Thread D calls notify() and wakes up thread B

Thread A and thread B wake up and try to acquire the monitor. It is non-deterministic which thread will be able to acquire the monitor first, which violates your FIFO assumption.

christianhaeubl

Thanks for working on that but the current solution still has major issues. Regarding #4698: we can't use substitutions for this because those substitutions would affect the JDK as a whole (and therefore other usages of ReentrantLock) and not only the Java monitors. The other issue is that we would potentially need different substitutions for different JDK versions.

Maybe it is really time to move to a custom Java monitor implementation that is only loosely based on the ReentrantLock implementation but that no longer uses the JDK classes directly. This is something that we will need eventually anyways. Then, we can modify the internals of the locking code as needed.

Moving forward, I would suggest to split the work into the following PRs:

Move the sleep event & all infrastructure changes that you currently made to a separate PR. We can merge this PR right away.
Copy the JDK code for ReentrantLock, NonfairSync, and ConditionObject from the JDK19 sources. Strip the implementation down to the bare minimum that is needed by Native Image for implementing Java monitors and use that implementation instead of the JDK classes.
Add support for the event JavaMonitorWait by modifying the new Native Image-specific Java monitor implementation.

christianhaeubl · 2022-07-05T07:38:20Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/monitor/JavaMonitor.java

+                break;
+            }
+        }
+        assert waiters.remove(target);


This is only executed when assertions are enabled, so you are currently leaking memory.

christianhaeubl · 2022-07-05T07:47:39Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/monitor/JavaMonitor.java

+            }
+        } catch (InterruptedException e) {
+            // Similar to hotspot, we should not emit event if interrupted
+            getNotifierTid();


When the thread is interrupted, it won't hold the lock. So, the assertion in getNotifierTid() should fail. Without assertions enabled there will be a race.

christianhaeubl · 2022-07-05T07:49:30Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/monitor/JavaMonitor.java

+                    JavaMonitorWaitEvent.emit(startTicks, obj, getNotifierTid(), timeoutMillis, false);
+                } else {
+                    // remove waiter from queue and check it wasn't notified
+                    assert getNotifierTid() < 0;


This is only executed when assertions are enabled, so probably not what you want.
Besides that, the thread won't hold the lock when a wait times out. So, the assertion in `getNotifierTid()So, the assertion in getNotifierTid() should fail and without assertions there will be a race. should fail. Without assertions enabled there will be a race.

roberttoyonaga · 2022-07-05T17:21:18Z

@christianhaeubl Thank you Christian. I have factored out thread sleep events into another PR here #4702. I'll begin working on the custom monitor implementation.

roberttoyonaga added 6 commits June 21, 2022 15:40

add thread sleep events

775bf69

include changes form addJavaMonitorEnter

9e2f972

monitorwait added using queue

c0a608b

fix lock bug

e6f1cff

add comments and correct logic errors

eac426f

fix waiter counting

4ffcf51

graalvmbot added the oca-signed label Jun 28, 2022

roberttoyonaga changed the title ~~Support jdkJavaMonitorWait and jdk.ThreadSleep events~~ Add Support for Java Monitor Wait and Thread Sleep events Jun 28, 2022

roberttoyonaga changed the title ~~Add Support for Java Monitor Wait and Thread Sleep events~~ Add Support for JFR Java Monitor Wait and Thread Sleep events Jun 28, 2022

zakkak requested a review from christianhaeubl June 29, 2022 05:08

zakkak added the native-image label Jun 29, 2022

christianhaeubl requested changes Jun 29, 2022

View reviewed changes

roberttoyonaga added 4 commits June 29, 2022 10:42

refactor with emit0 for compiler opt. Adjust imports

1f0de84

remove uninterruptable annotation no longer needed

4d0e6b2

refactor. add getStackTraceId(JfrEvent, int)

3d97124

fix conflicts merge upstream master into addMoreEvents

7376370

roberttoyonaga requested a review from christianhaeubl June 29, 2022 17:50

remove unused import and call threadId() statically

472884f

christianhaeubl requested changes Jun 30, 2022

View reviewed changes

roberttoyonaga added 5 commits June 30, 2022 16:04

account for interrupts and tiemouts

b762020

cleanup

02c8acf

fix hiding

1f19876

refactor

a0c805c

comments and fix hiding

66d09fd

roberttoyonaga mentioned this pull request Jul 4, 2022

Add JavaMonitorWait events using substitutions #4698

Closed

fix conflicts

45a8aa2

roberttoyonaga requested a review from christianhaeubl July 4, 2022 20:18

christianhaeubl requested changes Jul 5, 2022

View reviewed changes

roberttoyonaga closed this Jul 5, 2022

roberttoyonaga mentioned this pull request Jul 11, 2022

Add Custom Java Monitors #4713

Closed

Add Support for JFR Java Monitor Wait and Thread Sleep events #4676

Add Support for JFR Java Monitor Wait and Thread Sleep events #4676

Uh oh!

Conversation

roberttoyonaga commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christianhaeubl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga commented Jul 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roberttoyonaga commented Jun 28, 2022 •

edited

Loading

roberttoyonaga Jun 29, 2022 •

edited

Loading

roberttoyonaga Jul 4, 2022 •

edited

Loading