Getting started with varnish Built-in VCL Monitoring Varnish Building vmods

Monitoring Varnish

Client metrics - incoming traffic

Client metrics cover the traffic between the client and the Varnish cache.

sess_conn - Cumulative number of connections.
client_req - Cumulative number of client requests.
sess_dropped - Dropped connections because of a full queue.

Monitor sess_conn and client_req to keep track of traffic volume - is it increasing or decreasing, is it spiking etc. Sudden changes might indicate problems.

Monitor sess_dropped to see if the cache is dropping any sessions. If so you might need to increase thread_pool_max.

varnishstat -1 | grep "sess_conn\|client_req \|sess_dropped"
MAIN.sess_conn              62449574         3.38 Sessions accepted
MAIN.client_req            184697229         9.99 Good client requests received
MAIN.sess_dropped                  0         0.00 Sessions dropped for thread

Cache performance

Perhaps the most important performance metric is the hitrate.

Varnish routes it's incoming requests like this:

Hash, a cacheable request. This might be either hit or miss depending on the state of the cache.
Hitpass, a not cacheable request.

A hash with a miss and a hitpass will be fetched from the server backend and delivered. A hash with a hit will be delivered directly from the cache.

Metrics to monitor:

cache_hit - Number of hashes with a hit in the cache.
cache_miss - Number of hashes with a miss in the cache.
cache_hitpass - Number of hitpasses as above.

varnishstat -1 | grep "cache_hit \|cache_miss \|cache_hitpass"
MAIN.cache_hit              99032838         5.36 Cache hits
MAIN.cache_hitpass                 0         0.00 Cache hits for pass
MAIN.cache_miss             42484195         2.30 Cache misses

Calculate the actual hitrate like this:

cache_hit / (cache_hit + cache_miss)

In this example the hitrate is 0.7 or 70%. You want to keep this as high as possible. 70% is a decent number. You can improve hitrate by increasing memory and customizing your vcl. Also monitor big changes in your hitrate.

Monitoring cached objects

You monitor the cached objects to see how often they expire and if they are "nuked".

n_expired - Number of expired objects.
n_lru_nuked - Last recently used nuked objects. Number of objects nuked (removed) from the cache because of lack of space.

varnishstat -1 | grep "n_expired\|n_lru_nuked"
MAIN.n_expired              42220159          .   Number of expired objects
MAIN.n_lru_nuked              264005          .   Number of LRU nuked objects

The one to watch here is n_lru_nuked, if the rate is increasing (the rate, not only the number) your cache is pushing out objects faster and faster because of lack of space. You need to increase the cache size.

The n_expired is more up to your application. A longer time to live will decrease this number but on the other hand not renew the objects as often. Also the cache might require more size.

Monitoring threads

You need to keep track of some threads metrics to watch your Varnish Cache. Is it running out of OS resources or is it functioning well.

threads - Number of threads in all pools.
threads_created - Number of created threads.
threads_failed - Number of times Varnish failed to create a thread.
threads_limited - Number of times Varnish was forced not to create a thread since it was maxed out.
thread_queue_len - Current queue length. Number of requests waiting for a thread.
sess_queued - Number of times there wasn't any threads available so a request had to be queued.

varnishstat -1 | grep "threads\|thread_queue_len\|sess_queued"
MAIN.threads                     100          .   Total number of threads
MAIN.threads_limited               1         0.00 Threads hit max
MAIN.threads_created            3715         0.00 Threads created
MAIN.threads_destroyed          3615         0.00 Threads destroyed
MAIN.threads_failed                0         0.00 Thread creation failed
MAIN.thread_queue_len              0          .   Length of session queue
MAIN.sess_queued                2505         0.00 Sessions queued for thread

If thread_queue_len isn't 0 it means that Varnish is out of resources and have started to queue requests. This will decrease performance of those requests. You need to investigate why.

Watch also out for threads_failed. If this increases it means your server is out of resources somehow. Increasing numbers in threads_limited means you might need to increase your servers thread_pool_max.

Monitoring backend metrics

There are a number of metrics describing the communication between Varnish and it's backends.

The most important metrics here might be these:

backend_busy - Number of http 5xx statuses recieved by a backend. With VCL you can configure Varnish to try another backend if this happens.
backend_fail - Number of times Varnish couldnt connect to the backend. This can have a number of causes (no TCP-connection, long time to first byte, long time between bytes). If this happens your backend isn't healthy.
backend_unhealthy - Number of times Varnish couldn't "ping" the backend (it didn't respond with a HTTP 200 response.

varnishstat -1 | grep "backend_"
MAIN.backend_conn           86913481         4.70 Backend conn. success
MAIN.backend_unhealthy             0         0.00 Backend conn. not attempted
MAIN.backend_busy                  0         0.00 Backend conn. too many
MAIN.backend_fail                  7         0.00 Backend conn. failures
MAIN.backend_reuse                 0         0.00 Backend conn. reuses
MAIN.backend_toolate               0         0.00 Backend conn. was closed
MAIN.backend_recycle               0         0.00 Backend conn. recycles
MAIN.backend_retry                 0         0.00 Backend conn. retry
MAIN.backend_req            86961073         4.70 Backend requests made

Contributors

Topic Id: 9072

Example Ids: 28173,28174,28175,28176,28177

This site is not affiliated with any of the contributors.