telegraf/docs/specs/tsd-005-output-buffer-strategy.md

# Telegraf Output Buffer Strategy

## Objective

Introduce a new agent-level config option to choose a disk buffer strategy for
output plugin metric queues.

## Overview

Currently, when a Telegraf output metric queue fills, either due to incoming
metrics being too fast or various issues with writing to the output, oldest
metrics are overwritten and never written to the output. This specification
defines a set of options to make this output queue more durable by persisting
pending metrics to disk rather than only an in-memory limited size queue.

## Keywords

output plugins, agent configuration, persist to disk

## Agent Configuration

The configuration is at the agent-level, with options for:

- **Memory**, the current implementation, with no persistence to disk
- **Write-through**, all metrics are also written to disk using a
  Write Ahead Log (WAL) file
- **Disk-overflow**, when the memory buffer fills, metrics are flushed to a
  WAL file to avoid dropping overflow metrics

As well as an option to specify a directory to store the WAL files on disk,
with a default value. These configurations are global, and no change means
memory only mode, retaining current behavior.

## Metric Ordering and Tracking

Tracking metrics will be accepted on a successful write to the output
destination. Metrics will be written to their appropriate output in the order
they are received in the buffer regardless of which buffer strategy is chosen.

## Disk Utilization and File Handling

Each output plugin has its own in-memory output buffer, and therefore will
each have their own WAL file for buffer persistence. This file may not exist
if Telegraf is successfully able to write all of its metrics without filling
the in-memory buffer in disk-overflow mode, or not at all in memory mode.
Telegraf should use one file per output plugin, and remove entries from the
WAL file as they are written to the output.

Telegraf will not make any attempt to limit the size on disk taken by these
files beyond cleaning up WAL files for metrics that have successfully been
flushed to their output destination. It is the user's responsibility to ensure
these files do not entirely fill the disk, both during Telegraf uptime and
with lingering files from previous instances of the program.

If WAL files exist for an output plugin from previous instances of Telegraf,
they will be picked up and flushed before any new metrics that are written
to the output. This is to ensure that these metrics are not lost, and to
ensure that output write order remains consistent.

Telegraf must additionally provide a way to manually flush WAL files via
some separate plugin or similar. This could be used as a way to ensure that
WAL files are properly written in the event that the output plugin changes
and the WAL file is unable to be detected by a new instance of Telegraf.
This plugin should not be required for use to allow the buffer strategy to
work.

## Is/Is-not

- Is a way to prevent metrics from being dropped due to a full memory buffer
- Is not a way to guarantee data safety in the event of a crash or system failure
- Is not a way to manage file system allocation size, file space will be used
  until the disk is full

## Prior art

[Initial issue](https://github.com/influxdata/telegraf/issues/802)
[Loose specification issue](https://github.com/influxdata/telegraf/issues/14805)
Adding upstream version 1.34.4. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-05-24 07:26:29 +02:00			`# Telegraf Output Buffer Strategy`

			`## Objective`

			`Introduce a new agent-level config option to choose a disk buffer strategy for`
			`output plugin metric queues.`

			`## Overview`

			`Currently, when a Telegraf output metric queue fills, either due to incoming`
			`metrics being too fast or various issues with writing to the output, oldest`
			`metrics are overwritten and never written to the output. This specification`
			`defines a set of options to make this output queue more durable by persisting`
			`pending metrics to disk rather than only an in-memory limited size queue.`

			`## Keywords`

			`output plugins, agent configuration, persist to disk`

			`## Agent Configuration`

			`The configuration is at the agent-level, with options for:`

			`- Memory, the current implementation, with no persistence to disk`
			`- Write-through, all metrics are also written to disk using a`
			`Write Ahead Log (WAL) file`
			`- Disk-overflow, when the memory buffer fills, metrics are flushed to a`
			`WAL file to avoid dropping overflow metrics`

			`As well as an option to specify a directory to store the WAL files on disk,`
			`with a default value. These configurations are global, and no change means`
			`memory only mode, retaining current behavior.`

			`## Metric Ordering and Tracking`

			`Tracking metrics will be accepted on a successful write to the output`
			`destination. Metrics will be written to their appropriate output in the order`
			`they are received in the buffer regardless of which buffer strategy is chosen.`

			`## Disk Utilization and File Handling`

			`Each output plugin has its own in-memory output buffer, and therefore will`
			`each have their own WAL file for buffer persistence. This file may not exist`
			`if Telegraf is successfully able to write all of its metrics without filling`
			`the in-memory buffer in disk-overflow mode, or not at all in memory mode.`
			`Telegraf should use one file per output plugin, and remove entries from the`
			`WAL file as they are written to the output.`

			`Telegraf will not make any attempt to limit the size on disk taken by these`
			`files beyond cleaning up WAL files for metrics that have successfully been`
			`flushed to their output destination. It is the user's responsibility to ensure`
			`these files do not entirely fill the disk, both during Telegraf uptime and`
			`with lingering files from previous instances of the program.`

			`If WAL files exist for an output plugin from previous instances of Telegraf,`
			`they will be picked up and flushed before any new metrics that are written`
			`to the output. This is to ensure that these metrics are not lost, and to`
			`ensure that output write order remains consistent.`

			`Telegraf must additionally provide a way to manually flush WAL files via`
			`some separate plugin or similar. This could be used as a way to ensure that`
			`WAL files are properly written in the event that the output plugin changes`
			`and the WAL file is unable to be detected by a new instance of Telegraf.`
			`This plugin should not be required for use to allow the buffer strategy to`
			`work.`

			`## Is/Is-not`

			`- Is a way to prevent metrics from being dropped due to a full memory buffer`
			`- Is not a way to guarantee data safety in the event of a crash or system failure`
			`- Is not a way to manage file system allocation size, file space will be used`
			`until the disk is full`

			`## Prior art`

			`[Initial issue](https://github.com/influxdata/telegraf/issues/802)`
			`[Loose specification issue](https://github.com/influxdata/telegraf/issues/14805)`