-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add Support for JFR Java Monitor Wait and Thread Sleep events #4676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...tratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/JavaMonitorWaitEvent.java
Show resolved
Hide resolved
...ratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/JavaMonitorEnterEvent.java
Outdated
Show resolved
Hide resolved
...tratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/JavaMonitorWaitEvent.java
Outdated
Show resolved
Hide resolved
static void sleep(long millis) throws InterruptedException { | ||
long startTicks = com.oracle.svm.core.jfr.JfrTicks.elapsedTicks(); | ||
if (supportsVirtual() && isVirtualDisallowLoom(Thread.currentThread())) { | ||
VirtualThreads.singleton().sleepMillis(millis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HotSpot really emit this JFR event for virtual threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out using virtual threads with openjdk and the event does get emitted. Please see here
/* | ||
* This queue is used to record the TIDs of notifiers so that waiters have access to them when they eventually | ||
* acquire the monitor lock and resume. | ||
* Adding and removing TIDs from this queue in a FIFO fashion is correct because Condition.signal(), notifies waiters in a FIFO fashion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this assumption is correct because:
a. it is possible to specify a timeout (see wait(long)
)
b. it is possible to interrupt a thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. If a thread wakes up from a timeout then it will not pull anything off of the TID queue. It will instead just emit an event with the timedout
flag set. see here:
b. If a waiting thread is interrupted it should also not affect the TID queue.
So shouldn't it be true that a waiter will modify the notifier TID queue if and only if it has been notified? So even if all waiters do not resume in the order they began waiting (due to timeouts/interrupts), notified waiters should resume in the order that they were notified and therefore remove TIDs from the queue in the correct order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code breaks as soon as a timeout or interrupt occurs because waitingThreads
will be incorrect. I also think that your approach breaks if multiple threads are notified consecutively. The problems that I see, in more detail:
Scenario A
- Thread A calls
wait(long)
and times out - Thread B acquires the monitor and tries to notify thread A
- Thread B adds an entry to notifiers
- Thread A emits the JFR event that it timed out
- From that point on, we leak memory because there are too many entries in
notifiers
. - The next thread that calls
wait()
orwait(long)
successfully will use the wrong notifier.
Scenario B
- Thread A calls
wait()
- Thread B acquires the monitor and tries to notify thread A
- Thread B adds an entry to notifiers
- Thread C interrupts thread A
- Thread A throws an InterruptedException - no JFR event is emitted
- From that point on, we leak memory because there are too many entries in
notifiers
. - The next thread that calls
wait()
successfully will use the wrong notifier.
Scenario C
- Thread A calls
wait()
- Thread B calls
wait()
- Thread C calls
notify()
and wakes up thread A - Thread D calls
notify()
and wakes up thread B - Thread A and thread B wake up and try to acquire the monitor. It is non-deterministic which thread will be able to acquire the monitor first, which violates your FIFO assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Christian. I have changed things so that we always clean up after a waiter resumes (through timeout, interrupt, or signal). I have also changed things so that waiters are checked to still be waiting before we mark them as being notified (so that notifications don't get associated with waiters that have timed out or interrupted). Some drawbacks of this approach are that a linear scan of the maintained queue is required when waiters wake up. This is because the arbitrary order in which threads reacquire the lock.
In Hotspot, a notifierTid field is added to the Node struct that sits on the wait and sync queues. This way, when a waiter wakes up, it can simply inspect that field. However, in substrateVM, the wait queue and sync queue is abstracted away in Condition.signal()
and Condition.await()
the solution in this PR tries to mirror these low level queues this with its own queue. I have created another PR here, that uses substitutions to copy the behavior in Hotspot more closely. This alternative approach is simpler, but requires maintenance of the substituted methods when the original methods change.
/* | ||
* This queue is used to record the TIDs of notifiers so that waiters have access to them when they eventually | ||
* acquire the monitor lock and resume. | ||
* Adding and removing TIDs from this queue in a FIFO fashion is correct because Condition.signal(), notifies waiters in a FIFO fashion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code breaks as soon as a timeout or interrupt occurs because waitingThreads
will be incorrect. I also think that your approach breaks if multiple threads are notified consecutively. The problems that I see, in more detail:
Scenario A
- Thread A calls
wait(long)
and times out - Thread B acquires the monitor and tries to notify thread A
- Thread B adds an entry to notifiers
- Thread A emits the JFR event that it timed out
- From that point on, we leak memory because there are too many entries in
notifiers
. - The next thread that calls
wait()
orwait(long)
successfully will use the wrong notifier.
Scenario B
- Thread A calls
wait()
- Thread B acquires the monitor and tries to notify thread A
- Thread B adds an entry to notifiers
- Thread C interrupts thread A
- Thread A throws an InterruptedException - no JFR event is emitted
- From that point on, we leak memory because there are too many entries in
notifiers
. - The next thread that calls
wait()
successfully will use the wrong notifier.
Scenario C
- Thread A calls
wait()
- Thread B calls
wait()
- Thread C calls
notify()
and wakes up thread A - Thread D calls
notify()
and wakes up thread B - Thread A and thread B wake up and try to acquire the monitor. It is non-deterministic which thread will be able to acquire the monitor first, which violates your FIFO assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on that but the current solution still has major issues. Regarding #4698: we can't use substitutions for this because those substitutions would affect the JDK as a whole (and therefore other usages of ReentrantLock
) and not only the Java monitors. The other issue is that we would potentially need different substitutions for different JDK versions.
Maybe it is really time to move to a custom Java monitor implementation that is only loosely based on the ReentrantLock
implementation but that no longer uses the JDK classes directly. This is something that we will need eventually anyways. Then, we can modify the internals of the locking code as needed.
Moving forward, I would suggest to split the work into the following PRs:
- Move the sleep event & all infrastructure changes that you currently made to a separate PR. We can merge this PR right away.
- Copy the JDK code for
ReentrantLock
,NonfairSync
, andConditionObject
from the JDK19 sources. Strip the implementation down to the bare minimum that is needed by Native Image for implementing Java monitors and use that implementation instead of the JDK classes. - Add support for the event
JavaMonitorWait
by modifying the new Native Image-specific Java monitor implementation.
break; | ||
} | ||
} | ||
assert waiters.remove(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only executed when assertions are enabled, so you are currently leaking memory.
} | ||
} catch (InterruptedException e) { | ||
// Similar to hotspot, we should not emit event if interrupted | ||
getNotifierTid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the thread is interrupted, it won't hold the lock. So, the assertion in getNotifierTid()
should fail. Without assertions enabled there will be a race.
JavaMonitorWaitEvent.emit(startTicks, obj, getNotifierTid(), timeoutMillis, false); | ||
} else { | ||
// remove waiter from queue and check it wasn't notified | ||
assert getNotifierTid() < 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only executed when assertions are enabled, so probably not what you want.
Besides that, the thread won't hold the lock when a wait times out. So, the assertion in `getNotifierTid()So, the assertion in getNotifierTid() should fail and without assertions there will be a race. should fail. Without assertions enabled there will be a race.
@christianhaeubl Thank you Christian. I have factored out thread sleep events into another PR here #4702. I'll begin working on the custom monitor implementation. |
With regard to the jdk.JavaMonitorWait event, I am using a queue to keep track of the TIDs of notifiers so that waiters have access to them when they resume and emit the jdk.JavaMonitorWait event. This takes takes advantage of
Condition.signal()
signalling waiters in the order they started waiting.I have created another PR #4698, that uses substitutions more closely mimic the way this event is handled in Hotspot. This alternative approach is simpler, but requires maintenance of the substituted methods when the original methods change.