Skip to content

Conversation

@paulhauner
Copy link
Member

@paulhauner paulhauner commented Jan 8, 2021

Issue Addressed

Proposed Changes

Adds a ValidatorMonitor struct which provides additional logging and Grafana metrics for specific validators.

Use lighthouse bn --validator-monitor to automatically enable monitoring for any validator that hits the subnet subscription HTTP API endpoint.

Also, use lighthouse bn --validator-monitor-pubkeys to supply a list of validators which will always be monitored.

See the new docs included in this PR for more info.

TODO

@paulhauner paulhauner added the work-in-progress PR is a work-in-progress label Jan 8, 2021
@paulhauner paulhauner changed the base branch from stable to unstable January 8, 2021 05:33
@paulhauner paulhauner added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Jan 18, 2021
@paulhauner paulhauner marked this pull request as ready for review January 18, 2021 01:31
Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Can't wait till it's in 🚀

I left a bunch of comments, most of them are just typos. The only sort of major thing I would be concerned about is opening a read lock on ValidatorMonitor whenever we register something, and subsequently opening a write lock on the SummaryMap (which is held in the ValidatorMonitor struct) during this registration. I'm not exactly sure if/how it could deadlock but wanted to bring it up

);
}

fn u64_to_i64(n: impl Into<u64>) -> i64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit odd defined in a for loop, you could move it outside of the ValidatorMonitor impl

metrics::set_int_gauge(
&metrics::VALIDATOR_MONITOR_SLASHED,
&[id],
if validator.slashed { 0 } else { 1 },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backwards?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

.and_then(|slot_start| seen_timestamp.checked_sub(slot_start))
.and_then(|gross_delay| {
let production_delay = slot_clock.slot_duration() / 3;
gross_delay.checked_sub(production_delay)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might actually be interesting to track where this is negative (and others like it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think I might leave it for later works though. I'm not exactly sure how Prom/Graf will handle negative a duration and I'm keen to get this out the door.

.start_of(data.slot)
.and_then(|slot_start| seen_timestamp.checked_sub(slot_start))
.and_then(|gross_delay| {
let production_delay = slot_clock.slot_duration() / 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be * (2/3)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@paulhauner
Copy link
Member Author

paulhauner commented Jan 20, 2021

All comments addressed! Thanks for the review, super helpful.

I added a couple of extra metrics in 16d1f08, so I'll have to update the dashboard and replace the PR you made (sorry!).

For your convenience, here's a diff including all my changes since your review (excluding merging in unstable): 854c562...16d1f08

EDIT: I've merged sigp/lighthouse-metrics#20 and added the link to this PR.

Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Everything looks good

@paulhauner
Copy link
Member Author

bors r+

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Jan 20, 2021
bors bot pushed a commit that referenced this pull request Jan 20, 2021
## Issue Addressed

- Resolves #2064

## Proposed Changes

Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators.

Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint.

Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored.

See the new docs included in this PR for more info.

## TODO

- [x] Track validator balance, `slashed` status, etc.
- [x] ~~Register slashings in current epoch, not offense epoch~~
- [ ] Publish Grafana dashboard, update TODO link in docs
- [x] ~~#2130 is merged into this branch, resolve that~~
@bors
Copy link

bors bot commented Jan 20, 2021

@bors bors bot changed the title Detailed validator monitoring [Merged by Bors] - Detailed validator monitoring Jan 20, 2021
@bors bors bot closed this Jan 20, 2021
@paulhauner paulhauner deleted the validator-monitor branch February 15, 2021 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge This PR is ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants