1
0
Fork 0
telegraf/plugins/inputs/zfs/README.md

419 lines
17 KiB
Markdown
Raw Normal View History

# ZFS Input Plugin
This ZFS plugin provides metrics from your ZFS filesystems. It supports ZFS on
Linux and FreeBSD. It gets ZFS stat from `/proc/spl/kstat/zfs` on Linux and
from `sysctl`, 'zfs' and `zpool` on FreeBSD.
## Global configuration options <!-- @/docs/includes/plugin_config.md -->
In addition to the plugin-specific configuration settings, plugins support
additional global and plugin configuration settings. These settings are used to
modify metrics, tags, and field or create aliases and configure ordering, etc.
See the [CONFIGURATION.md][CONFIGURATION.md] for more details.
[CONFIGURATION.md]: ../../../docs/CONFIGURATION.md#plugins
## Configuration
```toml @sample.conf
# Read metrics of ZFS from arcstats, zfetchstats, vdev_cache_stats, pools and datasets
# This plugin ONLY supports Linux & FreeBSD
[[inputs.zfs]]
## ZFS kstat path. Ignored on FreeBSD
## If not specified, then default is:
# kstatPath = "/proc/spl/kstat/zfs"
## By default, telegraf gather all zfs stats
## Override the stats list using the kstatMetrics array:
## For FreeBSD, the default is:
# kstatMetrics = ["arcstats", "zfetchstats", "vdev_cache_stats"]
## For Linux, the default is:
# kstatMetrics = ["abdstats", "arcstats", "dnodestats", "dbufcachestats",
# "dmu_tx", "fm", "vdev_mirror_stats", "zfetchstats", "zil"]
## By default, don't gather zpool stats
# poolMetrics = false
## By default, don't gather dataset stats
# datasetMetrics = false
```
## Metrics
By default this plugin collects metrics about ZFS internals pool and dataset.
These metrics are either counters or measure sizes
in bytes. These metrics will be in the `zfs` measurement with the field
names listed below.
If `poolMetrics` is enabled then additional metrics will be gathered for
each pool.
If `datasetMetrics` is enabled then additional metrics will be gathered for
each dataset.
- zfs
With fields listed below.
### ARC Stats (FreeBSD and Linux)
- arcstats_allocated (FreeBSD only)
- arcstats_anon_evict_data (Linux only)
- arcstats_anon_evict_metadata (Linux only)
- arcstats_anon_evictable_data (FreeBSD only)
- arcstats_anon_evictable_metadata (FreeBSD only)
- arcstats_anon_size
- arcstats_arc_loaned_bytes (Linux only)
- arcstats_arc_meta_limit
- arcstats_arc_meta_max
- arcstats_arc_meta_min (FreeBSD only)
- arcstats_arc_meta_used
- arcstats_arc_no_grow (Linux only)
- arcstats_arc_prune (Linux only)
- arcstats_arc_tempreserve (Linux only)
- arcstats_c
- arcstats_c_max
- arcstats_c_min
- arcstats_data_size
- arcstats_deleted
- arcstats_demand_data_hits
- arcstats_demand_data_misses
- arcstats_demand_hit_predictive_prefetch (FreeBSD only)
- arcstats_demand_metadata_hits
- arcstats_demand_metadata_misses
- arcstats_duplicate_buffers
- arcstats_duplicate_buffers_size
- arcstats_duplicate_reads
- arcstats_evict_l2_cached
- arcstats_evict_l2_eligible
- arcstats_evict_l2_ineligible
- arcstats_evict_l2_skip (FreeBSD only)
- arcstats_evict_not_enough (FreeBSD only)
- arcstats_evict_skip
- arcstats_hash_chain_max
- arcstats_hash_chains
- arcstats_hash_collisions
- arcstats_hash_elements
- arcstats_hash_elements_max
- arcstats_hdr_size
- arcstats_hits
- arcstats_l2_abort_lowmem
- arcstats_l2_asize
- arcstats_l2_cdata_free_on_write
- arcstats_l2_cksum_bad
- arcstats_l2_compress_failures
- arcstats_l2_compress_successes
- arcstats_l2_compress_zeros
- arcstats_l2_evict_l1cached (FreeBSD only)
- arcstats_l2_evict_lock_retry
- arcstats_l2_evict_reading
- arcstats_l2_feeds
- arcstats_l2_free_on_write
- arcstats_l2_hdr_size
- arcstats_l2_hits
- arcstats_l2_io_error
- arcstats_l2_misses
- arcstats_l2_read_bytes
- arcstats_l2_rw_clash
- arcstats_l2_size
- arcstats_l2_write_buffer_bytes_scanned (FreeBSD only)
- arcstats_l2_write_buffer_iter (FreeBSD only)
- arcstats_l2_write_buffer_list_iter (FreeBSD only)
- arcstats_l2_write_buffer_list_null_iter (FreeBSD only)
- arcstats_l2_write_bytes
- arcstats_l2_write_full (FreeBSD only)
- arcstats_l2_write_in_l2 (FreeBSD only)
- arcstats_l2_write_io_in_progress (FreeBSD only)
- arcstats_l2_write_not_cacheable (FreeBSD only)
- arcstats_l2_write_passed_headroom (FreeBSD only)
- arcstats_l2_write_pios (FreeBSD only)
- arcstats_l2_write_spa_mismatch (FreeBSD only)
- arcstats_l2_write_trylock_fail (FreeBSD only)
- arcstats_l2_writes_done
- arcstats_l2_writes_error
- arcstats_l2_writes_hdr_miss (Linux only)
- arcstats_l2_writes_lock_retry (FreeBSD only)
- arcstats_l2_writes_sent
- arcstats_memory_direct_count (Linux only)
- arcstats_memory_indirect_count (Linux only)
- arcstats_memory_throttle_count
- arcstats_meta_size (Linux only)
- arcstats_mfu_evict_data (Linux only)
- arcstats_mfu_evict_metadata (Linux only)
- arcstats_mfu_ghost_evict_data (Linux only)
- arcstats_mfu_ghost_evict_metadata (Linux only)
- arcstats_metadata_size (FreeBSD only)
- arcstats_mfu_evictable_data (FreeBSD only)
- arcstats_mfu_evictable_metadata (FreeBSD only)
- arcstats_mfu_ghost_evictable_data (FreeBSD only)
- arcstats_mfu_ghost_evictable_metadata (FreeBSD only)
- arcstats_mfu_ghost_hits
- arcstats_mfu_ghost_size
- arcstats_mfu_hits
- arcstats_mfu_size
- arcstats_misses
- arcstats_mru_evict_data (Linux only)
- arcstats_mru_evict_metadata (Linux only)
- arcstats_mru_ghost_evict_data (Linux only)
- arcstats_mru_ghost_evict_metadata (Linux only)
- arcstats_mru_evictable_data (FreeBSD only)
- arcstats_mru_evictable_metadata (FreeBSD only)
- arcstats_mru_ghost_evictable_data (FreeBSD only)
- arcstats_mru_ghost_evictable_metadata (FreeBSD only)
- arcstats_mru_ghost_hits
- arcstats_mru_ghost_size
- arcstats_mru_hits
- arcstats_mru_size
- arcstats_mutex_miss
- arcstats_other_size
- arcstats_p
- arcstats_prefetch_data_hits
- arcstats_prefetch_data_misses
- arcstats_prefetch_metadata_hits
- arcstats_prefetch_metadata_misses
- arcstats_recycle_miss (Linux only)
- arcstats_size
- arcstats_sync_wait_for_async (FreeBSD only)
### Zfetch Stats (FreeBSD and Linux)
- zfetchstats_bogus_streams (Linux only)
- zfetchstats_colinear_hits (Linux only)
- zfetchstats_colinear_misses (Linux only)
- zfetchstats_hits
- zfetchstats_max_streams (FreeBSD only)
- zfetchstats_misses
- zfetchstats_reclaim_failures (Linux only)
- zfetchstats_reclaim_successes (Linux only)
- zfetchstats_streams_noresets (Linux only)
- zfetchstats_streams_resets (Linux only)
- zfetchstats_stride_hits (Linux only)
- zfetchstats_stride_misses (Linux only)
### Vdev Cache Stats (FreeBSD)
- vdev_cache_stats_delegations
- vdev_cache_stats_hits
- vdev_cache_stats_misses
### Pool Metrics (optional)
On Linux (reference: kstat accumulated time and queue length statistics):
- zfs_pool
- nread (integer, bytes)
- nwritten (integer, bytes)
- reads (integer, count)
- writes (integer, count)
- wtime (integer, nanoseconds)
- wlentime (integer, queuelength * nanoseconds)
- wupdate (integer, timestamp)
- rtime (integer, nanoseconds)
- rlentime (integer, queuelength * nanoseconds)
- rupdate (integer, timestamp)
- wcnt (integer, count)
- rcnt (integer, count)
For ZFS >= 2.1.x the format has changed significantly:
- zfs_pool
- writes (integer, count)
- nwritten (integer, bytes)
- reads (integer, count)
- nread (integer, bytes)
- nunlinks (integer, count)
- nunlinked (integer, count)
For ZFS >= 2.2.x the following additional fields are available:
- additional fields for ZFS > 2.2.x
- zil_commit_count (integer, count)
- zil_commit_writer_count (integer, count)
- zil_itx_count (integer, count)
- zil_itx_indirect_count (integer, count)
- zil_itx_indirect_bytes (integer, bytes)
- zil_itx_copied_count (integer, count)
- zil_itx_copied_bytes (integer, bytes)
- zil_itx_needcopy_count (integer, count)
- zil_itx_needcopy_bytes (integer, bytes)
- zil_itx_metaslab_normal_count (integer, count)
- zil_itx_metaslab_normal_bytes (integer, bytes)
- zil_itx_metaslab_normal_write (integer, bytes)
- zil_itx_metaslab_normal_alloc (integer, bytes)
- zil_itx_metaslab_slog_count (integer, count)
- zil_itx_metaslab_slog_bytes (integer, bytes)
- zil_itx_metaslab_slog_write (integer, bytes)
- zil_itx_metaslab_slog_alloc (integer, bytes)
On FreeBSD:
- zfs_pool
- allocated (integer, bytes)
- capacity (integer, bytes)
- dedupratio (float, ratio)
- free (integer, bytes)
- size (integer, bytes)
- fragmentation (integer, percent)
### Dataset Metrics (optional, only on FreeBSD)
- zfs_dataset
- avail (integer, bytes)
- used (integer, bytes)
- usedsnap (integer, bytes
- usedds (integer, bytes)
### Tags
- ZFS stats (`zfs`) will have the following tag:
- pools - A `::` concatenated list of all ZFS pools on the machine.
- datasets - A `::` concatenated list of all ZFS datasets on the machine.
- Pool metrics (`zfs_pool`) will have the following tag:
- pool - with the name of the pool which the metrics are for.
- health - the health status of the pool. (FreeBSD only)
- dataset - ZFS >= 2.1.x only. (Linux only)
- Dataset metrics (`zfs_dataset`) will have the following tag:
- dataset - with the name of the dataset which the metrics are for.
## Example Output
```text
zfs_pool,health=ONLINE,pool=zroot allocated=1578590208i,capacity=2i,dedupratio=1,fragmentation=1i,free=64456531968i,size=66035122176i 1464473103625653908
zfs_dataset,dataset=zata avail=10741741326336,used=8564135526400,usedsnap=0,usedds=90112
zfs,pools=zroot arcstats_allocated=4167764i,arcstats_anon_evictable_data=0i,arcstats_anon_evictable_metadata=0i,arcstats_anon_size=16896i,arcstats_arc_meta_limit=10485760i,arcstats_arc_meta_max=115269568i,arcstats_arc_meta_min=8388608i,arcstats_arc_meta_used=51977456i,arcstats_c=16777216i,arcstats_c_max=41943040i,arcstats_c_min=16777216i,arcstats_data_size=0i,arcstats_deleted=1699340i,arcstats_demand_data_hits=14836131i,arcstats_demand_data_misses=2842945i,arcstats_demand_hit_predictive_prefetch=0i,arcstats_demand_metadata_hits=1655006i,arcstats_demand_metadata_misses=830074i,arcstats_duplicate_buffers=0i,arcstats_duplicate_buffers_size=0i,arcstats_duplicate_reads=123i,arcstats_evict_l2_cached=0i,arcstats_evict_l2_eligible=332172623872i,arcstats_evict_l2_ineligible=6168576i,arcstats_evict_l2_skip=0i,arcstats_evict_not_enough=12189444i,arcstats_evict_skip=195190764i,arcstats_hash_chain_max=2i,arcstats_hash_chains=10i,arcstats_hash_collisions=43134i,arcstats_hash_elements=2268i,arcstats_hash_elements_max=6136i,arcstats_hdr_size=565632i,arcstats_hits=16515778i,arcstats_l2_abort_lowmem=0i,arcstats_l2_asize=0i,arcstats_l2_cdata_free_on_write=0i,arcstats_l2_cksum_bad=0i,arcstats_l2_compress_failures=0i,arcstats_l2_compress_successes=0i,arcstats_l2_compress_zeros=0i,arcstats_l2_evict_l1cached=0i,arcstats_l2_evict_lock_retry=0i,arcstats_l2_evict_reading=0i,arcstats_l2_feeds=0i,arcstats_l2_free_on_write=0i,arcstats_l2_hdr_size=0i,arcstats_l2_hits=0i,arcstats_l2_io_error=0i,arcstats_l2_misses=0i,arcstats_l2_read_bytes=0i,arcstats_l2_rw_clash=0i,arcstats_l2_size=0i,arcstats_l2_write_buffer_bytes_scanned=0i,arcstats_l2_write_buffer_iter=0i,arcstats_l2_write_buffer_list_iter=0i,arcstats_l2_write_buffer_list_null_iter=0i,arcstats_l2_write_bytes=0i,arcstats_l2_write_full=0i,arcstats_l2_write_in_l2=0i,arcstats_l2_write_io_in_progress=0i,arcstats_l2_write_not_cacheable=380i,arcstats_l2_write_passed_headroom=0i,arcstats_l2_write_pios=0i,arcstats_l2_write_spa_mismatch=0i,arcstats_l2_write_trylock_fail=0i,arcstats_l2_writes_done=0i,arcstats_l2_writes_error=0i,arcstats_l2_writes_lock_retry=0i,arcstats_l2_writes_sent=0i,arcstats_memory_throttle_count=0i,arcstats_metadata_size=17014784i,arcstats_mfu_evictable_data=0i,arcstats_mfu_evictable_metadata=16384i,arcstats_mfu_ghost_evictable_data=5723648i,arcstats_mfu_ghost_evictable_metadata=10709504i,arcstats_mfu_ghost_hits=1315619i,arcstats_mfu_ghost_size=16433152i,arcstats_mfu_hits=7646611i,arcstats_mfu_size=305152i,arcstats_misses=3676993i,arcstats_mru_evictable_data=0i,arcstats_mru_evictable_metadata=0i,arcstats_mru_ghost_evictable_data=0i,arcstats_mru_ghost_evictable_metadata=80896i,arcstats_mru_ghost_hits=324250i,arcstats_mru_ghost_size=80896i,arcstats_mru_hits=8844526i,arcstats_mru_size=16693248i,arcstats_mutex_miss=354023i,arcstats_other_size=34397040i,arcstats_p=4172800i,arcstats_prefetch_data_hits=0i,arcstats_prefetch_data_misses=0i,arcstats_prefetch_metadata_hits=24641i,arcstats_prefetch_metadata_misses=3974i,arcstats_size=51977456i,arcstats_sync_wait_for_async=0i,vdev_cache_stats_delegations=779i,vdev_cache_stats_hits=323123i,vdev_cache_stats_misses=59929i,zfetchstats_hits=0i,zfetchstats_max_streams=0i,zfetchstats_misses=0i 1464473103634124908
```
## Description
A short description for some of the metrics.
### ARC Stats
`arcstats_hits` Total amount of cache hits in the arc.
`arcstats_misses` Total amount of cache misses in the arc.
`arcstats_demand_data_hits` Amount of cache hits for demand data, this is what
matters (is good) for your application/share.
`arcstats_demand_data_misses` Amount of cache misses for demand data, this is
what matters (is bad) for your application/share.
`arcstats_demand_metadata_hits` Amount of cache hits for demand metadata, this
matters (is good) for getting filesystem data (ls,find,…)
`arcstats_demand_metadata_misses` Amount of cache misses for demand metadata,
this matters (is bad) for getting filesystem data (ls,find,…)
`arcstats_prefetch_data_hits` The zfs prefetcher tried to prefetch something,
but it was already cached (boring)
`arcstats_prefetch_data_misses` The zfs prefetcher prefetched something which
was not in the cache (good job, could become a demand hit in the future)
`arcstats_prefetch_metadata_hits` Same as above, but for metadata
`arcstats_prefetch_metadata_misses` Same as above, but for metadata
`arcstats_mru_hits` Cache hit in the “most recently used cache”, we move this to
the mfu cache.
`arcstats_mru_ghost_hits` Cache hit in the “most recently used ghost list” we
had this item in the cache, but evicted it, maybe we should increase the mru
cache size.
`arcstats_mfu_hits` Cache hit in the “most frequently used cache” we move this
to the beginning of the mfu cache.
`arcstats_mfu_ghost_hits` Cache hit in the “most frequently used ghost list” we
had this item in the cache, but evicted it, maybe we should increase the mfu
cache size.
`arcstats_allocated` New data is written to the cache.
`arcstats_deleted` Old data is evicted (deleted) from the cache.
`arcstats_evict_l2_cached` We evicted something from the arc, but its still
cached in the l2 if we need it.
`arcstats_evict_l2_eligible` We evicted something from the arc, and its not in
the l2 this is sad. (maybe we hadnt had enough time to store it there)
`arcstats_evict_l2_ineligible` We evicted something which cannot be stored in
the l2. Reasons could be:
- We have multiple pools, we evicted something from a pool without an l2 device.
- The zfs property secondary cache.
`arcstats_c` Arc target size, this is the size the system thinks the arc should
have.
`arcstats_size` Total size of the arc.
`arcstats_l2_hits` Hits to the L2 cache. (It was not in the arc, but in the l2
cache)
`arcstats_l2_misses` Miss to the L2 cache. (It was not in the arc, and not in
the l2 cache)
`arcstats_l2_size` Size of the l2 cache.
`arcstats_l2_hdr_size` Size of the metadata in the arc (ram) used to manage
(lookup if something is in the l2) the l2 cache.
### Zfetch Stats
`zfetchstats_hits` Counts the number of cache hits, to items which are in the
cache because of the prefetcher.
`zfetchstats_misses` Counts the number of prefetch cache misses.
`zfetchstats_colinear_hits` Counts the number of cache hits, to items which are
in the cache because of the prefetcher (prefetched linear reads)
`zfetchstats_stride_hits` Counts the number of cache hits, to items which are in
the cache because of the prefetcher (prefetched stride reads)
### Vdev Cache Stats (FreeBSD only)
note: the vdev cache is deprecated in some ZFS implementations
`vdev_cache_stats_hits` Hits to the vdev (device level) cache.
`vdev_cache_stats_misses` Misses to the vdev (device level) cache.
### ABD Stats (Linux Only)
ABD is a linear/scatter dual typed buffer for ARC
`abdstats_linear_cnt` number of linear ABDs which are currently allocated
`abdstats_linear_data_size` amount of data stored in all linear ABDs
`abdstats_scatter_cnt` number of scatter ABDs which are currently allocated
`abdstats_scatter_data_size` amount of data stored in all scatter ABDs
### DMU Stats (Linux Only)
`dmu_tx_dirty_throttle` counts when writes are throttled due to the amount of
dirty data growing too large
`dmu_tx_memory_reclaim` counts when memory is low and throttling activity
`dmu_tx_memory_reserve` counts when memory footprint of the txg exceeds the ARC
size
### Fault Management Ereport errors (Linux Only)
`fm_erpt-dropped` counts when an error report cannot be created (eg available
memory is too low)
### ZIL (Linux Only)
note: `zil` measurements in `kstatMetrics` are system-wide, in `poolMetrics`
they are pool-wide
`zil_commit_count` counts when ZFS transactions are committed to a ZIL