Client metrics cover the traffic between the client and the Varnish cache.
Monitor sess_conn
and client_req
to keep track of traffic volume - is it increasing or decreasing, is it spiking etc. Sudden changes might indicate problems.
Monitor sess_dropped
to see if the cache is dropping any sessions. If so you might need to increase thread_pool_max
.
varnishstat -1 | grep "sess_conn\|client_req \|sess_dropped"
MAIN.sess_conn 62449574 3.38 Sessions accepted
MAIN.client_req 184697229 9.99 Good client requests received
MAIN.sess_dropped 0 0.00 Sessions dropped for thread
Perhaps the most important performance metric is the hitrate.
Varnish routes it's incoming requests like this:
hit
or miss
depending on the state of the cache.A hash with a miss
and a hitpass
will be fetched from the server backend and delivered. A hash with a hit
will be delivered directly from the cache.
Metrics to monitor:
varnishstat -1 | grep "cache_hit \|cache_miss \|cache_hitpass"
MAIN.cache_hit 99032838 5.36 Cache hits
MAIN.cache_hitpass 0 0.00 Cache hits for pass
MAIN.cache_miss 42484195 2.30 Cache misses
Calculate the actual hitrate like this:
cache_hit / (cache_hit + cache_miss)
In this example the hitrate is 0.7 or 70%. You want to keep this as high as possible. 70% is a decent number. You can improve hitrate by increasing memory and customizing your vcl. Also monitor big changes in your hitrate.
You monitor the cached objects to see how often they expire and if they are "nuked".
varnishstat -1 | grep "n_expired\|n_lru_nuked"
MAIN.n_expired 42220159 . Number of expired objects
MAIN.n_lru_nuked 264005 . Number of LRU nuked objects
The one to watch here is n_lru_nuked
, if the rate is increasing (the rate, not only the number) your cache is pushing out objects faster and faster because of lack of space. You need to increase the cache size.
The n_expired
is more up to your application. A longer time to live will decrease this number but on the other hand not renew the objects as often. Also the cache might require more size.
You need to keep track of some threads metrics to watch your Varnish Cache. Is it running out of OS resources or is it functioning well.
varnishstat -1 | grep "threads\|thread_queue_len\|sess_queued"
MAIN.threads 100 . Total number of threads
MAIN.threads_limited 1 0.00 Threads hit max
MAIN.threads_created 3715 0.00 Threads created
MAIN.threads_destroyed 3615 0.00 Threads destroyed
MAIN.threads_failed 0 0.00 Thread creation failed
MAIN.thread_queue_len 0 . Length of session queue
MAIN.sess_queued 2505 0.00 Sessions queued for thread
If thread_queue_len
isn't 0 it means that Varnish is out of resources and have started to queue requests. This will decrease performance of those requests. You need to investigate why.
Watch also out for threads_failed
. If this increases it means your server is out of resources somehow. Increasing numbers in threads_limited
means you might need to increase your servers thread_pool_max
.
There are a number of metrics describing the communication between Varnish and it's backends.
The most important metrics here might be these:
varnishstat -1 | grep "backend_"
MAIN.backend_conn 86913481 4.70 Backend conn. success
MAIN.backend_unhealthy 0 0.00 Backend conn. not attempted
MAIN.backend_busy 0 0.00 Backend conn. too many
MAIN.backend_fail 7 0.00 Backend conn. failures
MAIN.backend_reuse 0 0.00 Backend conn. reuses
MAIN.backend_toolate 0 0.00 Backend conn. was closed
MAIN.backend_recycle 0 0.00 Backend conn. recycles
MAIN.backend_retry 0 0.00 Backend conn. retry
MAIN.backend_req 86961073 4.70 Backend requests made