Expand description
Configuration for the Validator Client Monitor
The Validator Client Monitor tracks client-observed performance metrics for validators in the Sui network. It runs from the perspective of a fullnode and monitors:
- Transaction submission latency
- Effects retrieval latency
- Health check response times
- Success/failure rates
§Tuning Guide
§Monitoring Metrics
The following Prometheus metrics can help tune the configuration:
validator_client_observed_latency
- Histogram of operation latencies per validatorvalidator_client_operation_success_total
- Counter of successful operationsvalidator_client_operation_failure_total
- Counter of failed operationsvalidator_client_observed_score
- Current score for each validator (0-1)validator_client_consecutive_failures
- Current consecutive failure countvalidator_client_selections_total
- How often each validator is selected
§Configuration Parameters
§Health Check Settings
-
health_check_interval
: How often to probe validator health- Default: 10s
- Decrease for more responsive failure detection (higher overhead)
- Increase to reduce network traffic
- Monitor
validator_client_operation_success_total{operation="health_check"}
to see probe frequency
-
health_check_timeout
: Maximum time to wait for health check response- Default: 2s
- Should be less than
health_check_interval
- Set based on p99 of
validator_client_observed_latency{operation="health_check"}
§Failure Handling
-
max_consecutive_failures
: Failures before temporary exclusion- Default: 5
- Lower values = faster exclusion of problematic validators
- Higher values = more tolerance for transient issues
- Monitor
validator_client_consecutive_failures
to see failure patterns
-
failure_cooldown
: How long to exclude failed validators- Default: 30s
- Should be several times the
health_check_interval
- Too short = thrashing between exclusion/inclusion
- Too long = reduced validator pool during transient issues
§Score Weights
Scores combine reliability and latency metrics. Adjust weights based on priorities:
-
reliability
: Weight for success rate (0-1)- Default: 0.6
- Increase if consistency is critical
- Decrease if latency is more important than occasional failures
-
latency
: Weight for latency scores- Default: 0.4
- Increase for latency-sensitive applications
- Individual operation weights can be tuned separately
§Example Configurations
§Low Latency Priority
validator-client-monitor-config:
health-check-interval: 5s
health-check-timeout: 1s
max-consecutive-failures: 3
failure-cooldown: 20s
score-weights:
latency: 0.7
reliability: 0.3
effects-latency-weight: 0.6 # Effects queries are critical
§High Reliability Priority
validator-client-monitor-config:
health-check-interval: 15s
max-consecutive-failures: 10 # Very tolerant
failure-cooldown: 60s
score-weights:
latency: 0.2
reliability: 0.8
Structs§
- Weights for different factors in score calculation
- Configuration for validator client monitoring from the client perspective