-
Notifications
You must be signed in to change notification settings - Fork 929
[Merged by Bors] - Detailed validator monitoring #2151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bb284d5 to
c2cb527
Compare
realbigsean
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Can't wait till it's in 🚀
I left a bunch of comments, most of them are just typos. The only sort of major thing I would be concerned about is opening a read lock on ValidatorMonitor whenever we register something, and subsequently opening a write lock on the SummaryMap (which is held in the ValidatorMonitor struct) during this registration. I'm not exactly sure if/how it could deadlock but wanted to bring it up
| ); | ||
| } | ||
|
|
||
| fn u64_to_i64(n: impl Into<u64>) -> i64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit odd defined in a for loop, you could move it outside of the ValidatorMonitor impl
| metrics::set_int_gauge( | ||
| &metrics::VALIDATOR_MONITOR_SLASHED, | ||
| &[id], | ||
| if validator.slashed { 0 } else { 1 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
| .and_then(|slot_start| seen_timestamp.checked_sub(slot_start)) | ||
| .and_then(|gross_delay| { | ||
| let production_delay = slot_clock.slot_duration() / 3; | ||
| gross_delay.checked_sub(production_delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might actually be interesting to track where this is negative (and others like it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think I might leave it for later works though. I'm not exactly sure how Prom/Graf will handle negative a duration and I'm keen to get this out the door.
| .start_of(data.slot) | ||
| .and_then(|slot_start| seen_timestamp.checked_sub(slot_start)) | ||
| .and_then(|gross_delay| { | ||
| let production_delay = slot_clock.slot_duration() / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be * (2/3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
Co-authored-by: realbigsean <[email protected]>
Co-authored-by: realbigsean <[email protected]>
Co-authored-by: realbigsean <[email protected]>
|
All comments addressed! Thanks for the review, super helpful. I added a couple of extra metrics in 16d1f08, so I'll have to update the dashboard and replace the PR you made (sorry!). For your convenience, here's a diff including all my changes since your review (excluding merging in EDIT: I've merged sigp/lighthouse-metrics#20 and added the link to this PR. |
realbigsean
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Everything looks good
|
bors r+ |
## Issue Addressed - Resolves #2064 ## Proposed Changes Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators. Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint. Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored. See the new docs included in this PR for more info. ## TODO - [x] Track validator balance, `slashed` status, etc. - [x] ~~Register slashings in current epoch, not offense epoch~~ - [ ] Publish Grafana dashboard, update TODO link in docs - [x] ~~#2130 is merged into this branch, resolve that~~
|
Pull request successfully merged into unstable. Build succeeded: |
Issue Addressed
Proposed Changes
Adds a
ValidatorMonitorstruct which provides additional logging and Grafana metrics for specific validators.Use
lighthouse bn --validator-monitorto automatically enable monitoring for any validator that hits the subnet subscription HTTP API endpoint.Also, use
lighthouse bn --validator-monitor-pubkeysto supply a list of validators which will always be monitored.See the new docs included in this PR for more info.
TODO
slashedstatus, etc.Register slashings in current epoch, not offense epoch[Merged by Bors] - Disallow attestation production earlier than head #2130 is merged into this branch, resolve that