Skip to content

Conversation

roberttoyonaga
Copy link
Collaborator

@roberttoyonaga roberttoyonaga commented Jun 28, 2022

With regard to the jdk.JavaMonitorWait event, I am using a queue to keep track of the TIDs of notifiers so that waiters have access to them when they resume and emit the jdk.JavaMonitorWait event. This takes takes advantage of Condition.signal() signalling waiters in the order they started waiting.

I have created another PR #4698, that uses substitutions more closely mimic the way this event is handled in Hotspot. This alternative approach is simpler, but requires maintenance of the substituted methods when the original methods change.

@roberttoyonaga roberttoyonaga changed the title Support jdkJavaMonitorWait and jdk.ThreadSleep events Add Support for Java Monitor Wait and Thread Sleep events Jun 28, 2022
@roberttoyonaga roberttoyonaga changed the title Add Support for Java Monitor Wait and Thread Sleep events Add Support for JFR Java Monitor Wait and Thread Sleep events Jun 28, 2022
@zakkak zakkak requested a review from christianhaeubl June 29, 2022 05:08
static void sleep(long millis) throws InterruptedException {
long startTicks = com.oracle.svm.core.jfr.JfrTicks.elapsedTicks();
if (supportsVirtual() && isVirtualDisallowLoom(Thread.currentThread())) {
VirtualThreads.singleton().sleepMillis(millis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HotSpot really emit this JFR event for virtual threads?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out using virtual threads with openjdk and the event does get emitted. Please see here

/*
* This queue is used to record the TIDs of notifiers so that waiters have access to them when they eventually
* acquire the monitor lock and resume.
* Adding and removing TIDs from this queue in a FIFO fashion is correct because Condition.signal(), notifies waiters in a FIFO fashion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this assumption is correct because:
a. it is possible to specify a timeout (see wait(long))
b. it is possible to interrupt a thread

Copy link
Collaborator Author

@roberttoyonaga roberttoyonaga Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. If a thread wakes up from a timeout then it will not pull anything off of the TID queue. It will instead just emit an event with the timedout flag set. see here:

b. If a waiting thread is interrupted it should also not affect the TID queue.

So shouldn't it be true that a waiter will modify the notifier TID queue if and only if it has been notified? So even if all waiters do not resume in the order they began waiting (due to timeouts/interrupts), notified waiters should resume in the order that they were notified and therefore remove TIDs from the queue in the correct order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code breaks as soon as a timeout or interrupt occurs because waitingThreads will be incorrect. I also think that your approach breaks if multiple threads are notified consecutively. The problems that I see, in more detail:

Scenario A

  • Thread A calls wait(long) and times out
  • Thread B acquires the monitor and tries to notify thread A
  • Thread B adds an entry to notifiers
  • Thread A emits the JFR event that it timed out
  • From that point on, we leak memory because there are too many entries in notifiers.
  • The next thread that calls wait() or wait(long) successfully will use the wrong notifier.

Scenario B

  • Thread A calls wait()
  • Thread B acquires the monitor and tries to notify thread A
  • Thread B adds an entry to notifiers
  • Thread C interrupts thread A
  • Thread A throws an InterruptedException - no JFR event is emitted
  • From that point on, we leak memory because there are too many entries in notifiers.
  • The next thread that calls wait() successfully will use the wrong notifier.

Scenario C

  • Thread A calls wait()
  • Thread B calls wait()
  • Thread C calls notify() and wakes up thread A
  • Thread D calls notify() and wakes up thread B
  • Thread A and thread B wake up and try to acquire the monitor. It is non-deterministic which thread will be able to acquire the monitor first, which violates your FIFO assumption.

Copy link
Collaborator Author

@roberttoyonaga roberttoyonaga Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Christian. I have changed things so that we always clean up after a waiter resumes (through timeout, interrupt, or signal). I have also changed things so that waiters are checked to still be waiting before we mark them as being notified (so that notifications don't get associated with waiters that have timed out or interrupted). Some drawbacks of this approach are that a linear scan of the maintained queue is required when waiters wake up. This is because the arbitrary order in which threads reacquire the lock.

In Hotspot, a notifierTid field is added to the Node struct that sits on the wait and sync queues. This way, when a waiter wakes up, it can simply inspect that field. However, in substrateVM, the wait queue and sync queue is abstracted away in Condition.signal() and Condition.await() the solution in this PR tries to mirror these low level queues this with its own queue. I have created another PR here, that uses substitutions to copy the behavior in Hotspot more closely. This alternative approach is simpler, but requires maintenance of the substituted methods when the original methods change.

/*
* This queue is used to record the TIDs of notifiers so that waiters have access to them when they eventually
* acquire the monitor lock and resume.
* Adding and removing TIDs from this queue in a FIFO fashion is correct because Condition.signal(), notifies waiters in a FIFO fashion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code breaks as soon as a timeout or interrupt occurs because waitingThreads will be incorrect. I also think that your approach breaks if multiple threads are notified consecutively. The problems that I see, in more detail:

Scenario A

  • Thread A calls wait(long) and times out
  • Thread B acquires the monitor and tries to notify thread A
  • Thread B adds an entry to notifiers
  • Thread A emits the JFR event that it timed out
  • From that point on, we leak memory because there are too many entries in notifiers.
  • The next thread that calls wait() or wait(long) successfully will use the wrong notifier.

Scenario B

  • Thread A calls wait()
  • Thread B acquires the monitor and tries to notify thread A
  • Thread B adds an entry to notifiers
  • Thread C interrupts thread A
  • Thread A throws an InterruptedException - no JFR event is emitted
  • From that point on, we leak memory because there are too many entries in notifiers.
  • The next thread that calls wait() successfully will use the wrong notifier.

Scenario C

  • Thread A calls wait()
  • Thread B calls wait()
  • Thread C calls notify() and wakes up thread A
  • Thread D calls notify() and wakes up thread B
  • Thread A and thread B wake up and try to acquire the monitor. It is non-deterministic which thread will be able to acquire the monitor first, which violates your FIFO assumption.

Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on that but the current solution still has major issues. Regarding #4698: we can't use substitutions for this because those substitutions would affect the JDK as a whole (and therefore other usages of ReentrantLock) and not only the Java monitors. The other issue is that we would potentially need different substitutions for different JDK versions.

Maybe it is really time to move to a custom Java monitor implementation that is only loosely based on the ReentrantLock implementation but that no longer uses the JDK classes directly. This is something that we will need eventually anyways. Then, we can modify the internals of the locking code as needed.

Moving forward, I would suggest to split the work into the following PRs:

  • Move the sleep event & all infrastructure changes that you currently made to a separate PR. We can merge this PR right away.
  • Copy the JDK code for ReentrantLock, NonfairSync, and ConditionObject from the JDK19 sources. Strip the implementation down to the bare minimum that is needed by Native Image for implementing Java monitors and use that implementation instead of the JDK classes.
  • Add support for the event JavaMonitorWait by modifying the new Native Image-specific Java monitor implementation.

break;
}
}
assert waiters.remove(target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only executed when assertions are enabled, so you are currently leaking memory.

}
} catch (InterruptedException e) {
// Similar to hotspot, we should not emit event if interrupted
getNotifierTid();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the thread is interrupted, it won't hold the lock. So, the assertion in getNotifierTid() should fail. Without assertions enabled there will be a race.

JavaMonitorWaitEvent.emit(startTicks, obj, getNotifierTid(), timeoutMillis, false);
} else {
// remove waiter from queue and check it wasn't notified
assert getNotifierTid() < 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only executed when assertions are enabled, so probably not what you want.
Besides that, the thread won't hold the lock when a wait times out. So, the assertion in `getNotifierTid()So, the assertion in getNotifierTid() should fail and without assertions there will be a race. should fail. Without assertions enabled there will be a race.

@roberttoyonaga
Copy link
Collaborator Author

@christianhaeubl Thank you Christian. I have factored out thread sleep events into another PR here #4702. I'll begin working on the custom monitor implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants