1
0
Fork 0

Adding upstream version 1.34.4.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-05-24 07:26:29 +02:00
parent e393c3af3f
commit 4978089aab
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
4963 changed files with 677545 additions and 0 deletions

View file

@ -0,0 +1,206 @@
# Azure Monitor Output Plugin
This plugin writes metrics to [Azure Monitor][azure_monitor] which has
a metric resolution of one minute. To accomodate for this in Telegraf, the
plugin will automatically aggregate metrics into one minute buckets and send
them to the service on every flush interval.
> [!IMPORTANT]
> The Azure Monitor custom metrics service is currently in preview and might
> not be available in all Azure regions.
> Please also take the [metric time limitations](#metric-time-limitations) into
> account!
The metrics from each input plugin will be written to a separate Azure Monitor
namespace, prefixed with `Telegraf/` by default. The field name for each metric
is written as the Azure Monitor metric name. All field values are written as a
summarized set that includes: min, max, sum, count. Tags are written as a
dimension on each Azure Monitor metric.
⭐ Telegraf v1.8.0
🏷️ cloud, datastore
💻 all
[azure_monitor]: https://learn.microsoft.com/en-us/azure/azure-monitor
## Global configuration options <!-- @/docs/includes/plugin_config.md -->
In addition to the plugin-specific configuration settings, plugins support
additional global and plugin configuration settings. These settings are used to
modify metrics, tags, and field or create aliases and configure ordering, etc.
See the [CONFIGURATION.md][CONFIGURATION.md] for more details.
[CONFIGURATION.md]: ../../../docs/CONFIGURATION.md#plugins
## Configuration
```toml @sample.conf
# Send aggregate metrics to Azure Monitor
[[outputs.azure_monitor]]
## Timeout for HTTP writes.
# timeout = "20s"
## Set the namespace prefix, defaults to "Telegraf/<input-name>".
# namespace_prefix = "Telegraf/"
## Azure Monitor doesn't have a string value type, so convert string
## fields to dimensions (a.k.a. tags) if enabled. Azure Monitor allows
## a maximum of 10 dimensions so Telegraf will only send the first 10
## alphanumeric dimensions.
# strings_as_dimensions = false
## Both region and resource_id must be set or be available via the
## Instance Metadata service on Azure Virtual Machines.
#
## Azure Region to publish metrics against.
## ex: region = "southcentralus"
# region = ""
#
## The Azure Resource ID against which metric will be logged, e.g.
## ex: resource_id = "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<vm_name>"
# resource_id = ""
## Optionally, if in Azure US Government, China, or other sovereign
## cloud environment, set the appropriate REST endpoint for receiving
## metrics. (Note: region may be unused in this context)
# endpoint_url = "https://monitoring.core.usgovcloudapi.net"
## Time limitations of metric to send
## Documentation can be found here:
## https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-store-custom-rest-api?tabs=rest#timestamp
## However, the returned (400) error message might document more strict or
## relaxed settings. By default, only past metrics witin the limit are sent.
# timestamp_limit_past = "30m"
# timestamp_limit_future = "-1m"
```
## Setup
1. [Register the `microsoft.insights` resource provider in your Azure
subscription][resource provider].
1. If using Managed Service Identities to authenticate an Azure VM, [enable
system-assigned managed identity][enable msi].
1. Use a region that supports Azure Monitor Custom Metrics, For regions with
Custom Metrics support, an endpoint will be available with the format
`https://<region>.monitoring.azure.com`.
[resource provider]: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-supported-services
[enable msi]: https://docs.microsoft.com/en-us/azure/active-directory/managed-service-identity/qs-configure-portal-windows-vm
### Region and Resource ID
The plugin will attempt to discover the region and resource ID using the Azure
VM Instance Metadata service. If Telegraf is not running on a virtual machine or
the VM Instance Metadata service is not available, the following variables are
required for the output to function.
* region
* resource_id
### Authentication
This plugin uses one of several different types of authenticate methods. The
preferred authentication methods are different from the *order* in which each
authentication is checked. Here are the preferred authentication methods:
1. Managed Service Identity (MSI) token: This is the preferred authentication
method. Telegraf will automatically authenticate using this method when
running on Azure VMs.
2. AAD Application Tokens (Service Principals)
* Primarily useful if Telegraf is writing metrics for other resources.
[More information][principal].
* A Service Principal or User Principal needs to be assigned the `Monitoring
Metrics Publisher` role on the resource(s) metrics will be emitted
against.
3. AAD User Tokens (User Principals)
* Allows Telegraf to authenticate like a user. It is best to use this method
for development.
[principal]: https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-application-objects
The plugin will authenticate using the first available of the following
configurations:
1. **Client Credentials**: Azure AD Application ID and Secret. Set the following
environment variables:
* `AZURE_TENANT_ID`: Specifies the Tenant to which to authenticate.
* `AZURE_CLIENT_ID`: Specifies the app client ID to use.
* `AZURE_CLIENT_SECRET`: Specifies the app secret to use.
1. **Client Certificate**: Azure AD Application ID and X.509 Certificate.
* `AZURE_TENANT_ID`: Specifies the Tenant to which to authenticate.
* `AZURE_CLIENT_ID`: Specifies the app client ID to use.
* `AZURE_CERTIFICATE_PATH`: Specifies the certificate Path to use.
* `AZURE_CERTIFICATE_PASSWORD`: Specifies the certificate password to use.
1. **Resource Owner Password**: Azure AD User and Password. This grant type is
*not recommended*, use device login instead if you need interactive login.
* `AZURE_TENANT_ID`: Specifies the Tenant to which to authenticate.
* `AZURE_CLIENT_ID`: Specifies the app client ID to use.
* `AZURE_USERNAME`: Specifies the username to use.
* `AZURE_PASSWORD`: Specifies the password to use.
1. **Azure Managed Service Identity**: Delegate credential management to the
platform. Requires that code is running in Azure, e.g. on a VM. All
configuration is handled by Azure. See [Azure Managed Service Identity][msi]
for more details. Only available when using the [Azure Resource
Manager][arm].
[msi]: https://docs.microsoft.com/en-us/azure/active-directory/msi-overview
[arm]: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview
> [!NOTE]
> As shown above, the last option (#4) is the preferred way to authenticate
> when running Telegraf on Azure VMs.
## Dimensions
Azure Monitor only accepts values with a numeric type. The plugin will drop
fields with a string type by default. The plugin can set all string type fields
as extra dimensions in the Azure Monitor custom metric by setting the
configuration option `strings_as_dimensions` to `true`.
Keep in mind, Azure Monitor allows a maximum of 10 dimensions per metric. The
plugin will deterministically dropped any dimensions that exceed the 10
dimension limit.
To convert only a subset of string-typed fields as dimensions, enable
`strings_as_dimensions` and use the [`fieldinclude` or `fieldexclude`
modifiers][conf-modifiers] to limit the string-typed fields that are sent to
the plugin.
[conf-modifiers]: ../../../docs/CONFIGURATION.md#modifiers
## Metric time limitations
Azure Monitor won't accept metrics too far in the past or future. Keep this in
mind when configuring your output buffer limits or other variables, such as
flush intervals, or when using input sources that could cause metrics to be
out of this allowed range.
According to the [documentation][timestamp_docs], the timestamp should not be
older than 20 minutes or more than 5 minutes in the future at the time when the
metric is sent to the Azure Monitor service. However, HTTP `400` error messages
returned by the service might specify other values such as 30 minutes in the
past and 4 minutes in the future.
You can control the timeframe actually sent using the `timestamp_limit_past` and
`timestamp_limit_future` settings. By default only metrics between 30 minutes
and up to one minute in the past are sent. The lower limit represents the more
permissive limit received in the `400` error messages. The upper limit leaves
enough time for aggregation to happen by not sending aggregations too early.
> [!IMPORTANT]
> When adapting the limit you need to take the limits permitted by the service
> as well as latency when sending metrics into account. Furthermore, you sould
> not send metrics too early as in this case aggregation might not happen and
> values are misleading.
[timestamp_docs]: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-store-custom-rest-api?tabs=rest#timestamp

View file

@ -0,0 +1,597 @@
//go:generate ../../../tools/readme_config_includer/generator
package azure_monitor
import (
"bytes"
"compress/gzip"
"context"
_ "embed"
"encoding/binary"
"encoding/json"
"errors"
"fmt"
"hash/fnv"
"io"
"net/http"
"regexp"
"strings"
"time"
"github.com/Azure/go-autorest/autorest"
"github.com/Azure/go-autorest/autorest/azure/auth"
"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/config"
"github.com/influxdata/telegraf/internal"
"github.com/influxdata/telegraf/metric"
"github.com/influxdata/telegraf/plugins/outputs"
"github.com/influxdata/telegraf/selfstat"
)
//go:embed sample.conf
var sampleConfig string
const (
vmInstanceMetadataURL = "http://169.254.169.254/metadata/instance?api-version=2017-12-01"
resourceIDTemplate = "/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Compute/virtualMachines/%s"
resourceIDScaleSetTemplate = "/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Compute/virtualMachineScaleSets/%s"
maxRequestBodySize = 4000000
)
var invalidNameCharRE = regexp.MustCompile(`[^a-zA-Z0-9_]`)
type dimension struct {
name string
value string
}
type aggregate struct {
name string
min float64
max float64
sum float64
count int64
dimensions []dimension
updated bool
}
type AzureMonitor struct {
Timeout config.Duration `toml:"timeout"`
NamespacePrefix string `toml:"namespace_prefix"`
StringsAsDimensions bool `toml:"strings_as_dimensions"`
Region string `toml:"region"`
ResourceID string `toml:"resource_id"`
EndpointURL string `toml:"endpoint_url"`
TimestampLimitPast config.Duration `toml:"timestamp_limit_past"`
TimestampLimitFuture config.Duration `toml:"timestamp_limit_future"`
Log telegraf.Logger `toml:"-"`
url string
preparer autorest.Preparer
client *http.Client
cache map[time.Time]map[uint64]*aggregate
timeFunc func() time.Time
MetricOutsideWindow selfstat.Stat
}
func (*AzureMonitor) SampleConfig() string {
return sampleConfig
}
func (a *AzureMonitor) Init() error {
a.cache = make(map[time.Time]map[uint64]*aggregate, 36)
authorizer, err := auth.NewAuthorizerFromEnvironmentWithResource("https://monitoring.azure.com/")
if err != nil {
return fmt.Errorf("creating authorizer failed: %w", err)
}
a.preparer = autorest.CreatePreparer(authorizer.WithAuthorization())
return nil
}
func (a *AzureMonitor) Connect() error {
a.client = &http.Client{
Transport: &http.Transport{
Proxy: http.ProxyFromEnvironment,
},
Timeout: time.Duration(a.Timeout),
}
// If information is missing try to retrieve it from the Azure VM instance
if a.Region == "" || a.ResourceID == "" {
region, resourceID, err := vmInstanceMetadata(a.client)
if err != nil {
return fmt.Errorf("getting VM metadata failed: %w", err)
}
if a.Region == "" {
a.Region = region
}
if a.ResourceID == "" {
a.ResourceID = resourceID
}
}
if a.ResourceID == "" {
return errors.New("no resource ID configured or available via VM instance metadata")
}
if a.EndpointURL == "" {
if a.Region == "" {
return errors.New("no region configured or available via VM instance metadata")
}
a.url = fmt.Sprintf("https://%s.monitoring.azure.com%s/metrics", a.Region, a.ResourceID)
} else {
a.url = a.EndpointURL + a.ResourceID + "/metrics"
}
a.Log.Debugf("Writing to Azure Monitor URL: %s", a.url)
a.MetricOutsideWindow = selfstat.Register(
"azure_monitor",
"metric_outside_window",
map[string]string{
"region": a.Region,
"resource_id": a.ResourceID,
},
)
a.Reset()
return nil
}
// Close shuts down an any active connections
func (a *AzureMonitor) Close() error {
a.client.CloseIdleConnections()
a.client = nil
return nil
}
// Add will append a metric to the output aggregate
func (a *AzureMonitor) Add(m telegraf.Metric) {
// Azure Monitor only supports aggregates 30 minutes into the past and 4
// minutes into the future. Future metrics are dropped when pushed.
tbucket := m.Time().Truncate(time.Minute)
if tbucket.Before(a.timeFunc().Add(-time.Duration(a.TimestampLimitPast))) {
a.MetricOutsideWindow.Incr(1)
return
}
// Azure Monitor doesn't have a string value type, so convert string fields
// to dimensions (a.k.a. tags) if enabled.
if a.StringsAsDimensions {
for _, f := range m.FieldList() {
if v, ok := f.Value.(string); ok {
m.AddTag(f.Key, v)
}
}
}
for _, f := range m.FieldList() {
fv, err := internal.ToFloat64(f.Value)
if err != nil {
continue
}
// Azure Monitor does not support fields so the field name is appended
// to the metric name.
sanitizeKey := invalidNameCharRE.ReplaceAllString(f.Key, "_")
name := m.Name() + "-" + sanitizeKey
id := hashIDWithField(m.HashID(), f.Key)
// Create the time bucket if doesn't exist
if _, ok := a.cache[tbucket]; !ok {
a.cache[tbucket] = make(map[uint64]*aggregate)
}
// Fetch existing aggregate
agg, ok := a.cache[tbucket][id]
if !ok {
dimensions := make([]dimension, 0, len(m.TagList()))
for _, tag := range m.TagList() {
dimensions = append(dimensions, dimension{
name: tag.Key,
value: tag.Value,
})
}
a.cache[tbucket][id] = &aggregate{
name: name,
dimensions: dimensions,
min: fv,
max: fv,
sum: fv,
count: 1,
updated: true,
}
continue
}
if fv < agg.min {
agg.min = fv
}
if fv > agg.max {
agg.max = fv
}
agg.sum += fv
agg.count++
agg.updated = true
}
}
// Push sends metrics to the output metric buffer
func (a *AzureMonitor) Push() []telegraf.Metric {
var metrics []telegraf.Metric
for tbucket, aggs := range a.cache {
// Do not send metrics early
if tbucket.After(a.timeFunc().Add(time.Duration(a.TimestampLimitFuture))) {
continue
}
for _, agg := range aggs {
// Only send aggregates that have had an update since the last push.
if !agg.updated {
continue
}
tags := make(map[string]string, len(agg.dimensions))
for _, tag := range agg.dimensions {
tags[tag.name] = tag.value
}
m := metric.New(agg.name,
tags,
map[string]interface{}{
"min": agg.min,
"max": agg.max,
"sum": agg.sum,
"count": agg.count,
},
tbucket,
)
metrics = append(metrics, m)
}
}
return metrics
}
// Reset clears the cache of aggregate metrics
func (a *AzureMonitor) Reset() {
for tbucket := range a.cache {
// Remove aggregates older than 30 minutes
if tbucket.Before(a.timeFunc().Add(-time.Duration(a.TimestampLimitPast))) {
delete(a.cache, tbucket)
continue
}
// Metrics updated within the latest 1m have not been pushed and should
// not be cleared.
if tbucket.After(a.timeFunc().Add(time.Duration(a.TimestampLimitFuture))) {
continue
}
for id := range a.cache[tbucket] {
a.cache[tbucket][id].updated = false
}
}
}
// Write writes metrics to the remote endpoint
func (a *AzureMonitor) Write(metrics []telegraf.Metric) error {
now := a.timeFunc()
tsEarliest := now.Add(-time.Duration(a.TimestampLimitPast))
tsLatest := now.Add(time.Duration(a.TimestampLimitFuture))
writeErr := &internal.PartialWriteError{
MetricsAccept: make([]int, 0, len(metrics)),
}
azmetrics := make(map[uint64]*azureMonitorMetric, len(metrics))
for i, m := range metrics {
// Skip metrics that our outside of the valid timespan
if m.Time().Before(tsEarliest) || m.Time().After(tsLatest) {
a.Log.Tracef("Metric outside acceptable time window: %v", m)
a.MetricOutsideWindow.Incr(1)
writeErr.Err = errors.New("metric(s) outside of acceptable time window")
writeErr.MetricsReject = append(writeErr.MetricsReject, i)
continue
}
amm, err := translate(m, a.NamespacePrefix)
if err != nil {
a.Log.Errorf("Could not create azure metric for %q; discarding point", m.Name())
if writeErr.Err == nil {
writeErr.Err = errors.New("translating metric(s) failed")
}
writeErr.MetricsReject = append(writeErr.MetricsReject, i)
continue
}
id := hashIDWithTagKeysOnly(m)
if azm, ok := azmetrics[id]; !ok {
azmetrics[id] = amm
azmetrics[id].index = i
} else {
azmetrics[id].Data.BaseData.Series = append(
azm.Data.BaseData.Series,
amm.Data.BaseData.Series...,
)
azmetrics[id].index = i
}
}
if len(azmetrics) == 0 {
if writeErr.Err == nil {
return nil
}
return writeErr
}
var buffer bytes.Buffer
buffer.Grow(maxRequestBodySize)
batchIndices := make([]int, 0, len(azmetrics))
for _, m := range azmetrics {
// Azure Monitor accepts new batches of points in new-line delimited
// JSON, following RFC 4288 (see https://github.com/ndjson/ndjson-spec).
buf, err := json.Marshal(m)
if err != nil {
writeErr.MetricsReject = append(writeErr.MetricsReject, m.index)
writeErr.Err = err
continue
}
batchIndices = append(batchIndices, m.index)
// Azure Monitor's maximum request body size of 4MB. Send batches that
// exceed this size via separate write requests.
if buffer.Len()+len(buf)+1 > maxRequestBodySize {
if retryable, err := a.send(buffer.Bytes()); err != nil {
writeErr.Err = err
if !retryable {
writeErr.MetricsReject = append(writeErr.MetricsAccept, batchIndices...)
}
return writeErr
}
writeErr.MetricsAccept = append(writeErr.MetricsAccept, batchIndices...)
batchIndices = make([]int, 0, len(azmetrics))
buffer.Reset()
}
if _, err := buffer.Write(buf); err != nil {
return fmt.Errorf("writing to buffer failed: %w", err)
}
if err := buffer.WriteByte('\n'); err != nil {
return fmt.Errorf("writing to buffer failed: %w", err)
}
}
if retryable, err := a.send(buffer.Bytes()); err != nil {
writeErr.Err = err
if !retryable {
writeErr.MetricsReject = append(writeErr.MetricsAccept, batchIndices...)
}
return writeErr
}
writeErr.MetricsAccept = append(writeErr.MetricsAccept, batchIndices...)
if writeErr.Err == nil {
return nil
}
return writeErr
}
func (a *AzureMonitor) send(body []byte) (bool, error) {
var buf bytes.Buffer
g := gzip.NewWriter(&buf)
if _, err := g.Write(body); err != nil {
return false, fmt.Errorf("zipping content failed: %w", err)
}
if err := g.Close(); err != nil {
return false, fmt.Errorf("closing gzip writer failed: %w", err)
}
req, err := http.NewRequest("POST", a.url, &buf)
if err != nil {
return false, fmt.Errorf("creating request failed: %w", err)
}
req.Header.Set("Content-Encoding", "gzip")
req.Header.Set("Content-Type", "application/x-ndjson")
// Add the authorization header. WithAuthorization will automatically
// refresh the token if needed.
req, err = a.preparer.Prepare(req)
if err != nil {
return false, fmt.Errorf("unable to fetch authentication credentials: %w", err)
}
resp, err := a.client.Do(req)
if err != nil {
if errors.Is(err, context.DeadlineExceeded) {
a.client.CloseIdleConnections()
a.client = &http.Client{
Transport: &http.Transport{
Proxy: http.ProxyFromEnvironment,
},
Timeout: time.Duration(a.Timeout),
}
}
return true, err
}
defer resp.Body.Close()
if resp.StatusCode >= 200 && resp.StatusCode <= 299 {
return false, nil
}
retryable := resp.StatusCode != 400
if respbody, err := io.ReadAll(resp.Body); err == nil {
return retryable, fmt.Errorf("failed to write batch: [%d] %s: %s", resp.StatusCode, resp.Status, string(respbody))
}
return retryable, fmt.Errorf("failed to write batch: [%d] %s", resp.StatusCode, resp.Status)
}
// vmMetadata retrieves metadata about the current Azure VM
func vmInstanceMetadata(c *http.Client) (region, resourceID string, err error) {
req, err := http.NewRequest("GET", vmInstanceMetadataURL, nil)
if err != nil {
return "", "", fmt.Errorf("error creating request: %w", err)
}
req.Header.Set("Metadata", "true")
resp, err := c.Do(req)
if err != nil {
return "", "", err
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return "", "", err
}
if resp.StatusCode >= 300 || resp.StatusCode < 200 {
return "", "", fmt.Errorf("unable to fetch instance metadata: [%s] %d",
vmInstanceMetadataURL, resp.StatusCode)
}
var metadata virtualMachineMetadata
if err := json.Unmarshal(body, &metadata); err != nil {
return "", "", err
}
region = metadata.Compute.Location
resourceID = metadata.ResourceID()
return region, resourceID, nil
}
func hashIDWithField(id uint64, fk string) uint64 {
h := fnv.New64a()
b := make([]byte, binary.MaxVarintLen64)
n := binary.PutUvarint(b, id)
h.Write(b[:n])
h.Write([]byte("\n"))
h.Write([]byte(fk))
h.Write([]byte("\n"))
return h.Sum64()
}
func hashIDWithTagKeysOnly(m telegraf.Metric) uint64 {
h := fnv.New64a()
h.Write([]byte(m.Name()))
h.Write([]byte("\n"))
for _, tag := range m.TagList() {
if tag.Key == "" || tag.Value == "" {
continue
}
h.Write([]byte(tag.Key))
h.Write([]byte("\n"))
}
b := make([]byte, binary.MaxVarintLen64)
n := binary.PutUvarint(b, uint64(m.Time().UnixNano()))
h.Write(b[:n])
h.Write([]byte("\n"))
return h.Sum64()
}
func translate(m telegraf.Metric, prefix string) (*azureMonitorMetric, error) {
dimensionNames := make([]string, 0, len(m.TagList()))
dimensionValues := make([]string, 0, len(m.TagList()))
for _, tag := range m.TagList() {
// Azure custom metrics service supports up to 10 dimensions
if len(dimensionNames) >= 10 {
continue
}
if tag.Key == "" || tag.Value == "" {
continue
}
dimensionNames = append(dimensionNames, tag.Key)
dimensionValues = append(dimensionValues, tag.Value)
}
vmin, err := getFloatField(m, "min")
if err != nil {
return nil, err
}
vmax, err := getFloatField(m, "max")
if err != nil {
return nil, err
}
vsum, err := getFloatField(m, "sum")
if err != nil {
return nil, err
}
vcount, err := getIntField(m, "count")
if err != nil {
return nil, err
}
mn, ns := "Missing", "Missing"
names := strings.SplitN(m.Name(), "-", 2)
if len(names) > 1 {
mn = names[1]
}
if len(names) > 0 {
ns = names[0]
}
ns = prefix + ns
return &azureMonitorMetric{
Time: m.Time(),
Data: &azureMonitorData{
BaseData: &azureMonitorBaseData{
Metric: mn,
Namespace: ns,
DimensionNames: dimensionNames,
Series: []*azureMonitorSeries{
{
DimensionValues: dimensionValues,
Min: vmin,
Max: vmax,
Sum: vsum,
Count: vcount,
},
},
},
},
}, nil
}
func getFloatField(m telegraf.Metric, key string) (float64, error) {
fv, ok := m.GetField(key)
if !ok {
return 0, fmt.Errorf("missing field: %s", key)
}
if value, ok := fv.(float64); ok {
return value, nil
}
return 0, fmt.Errorf("unexpected type: %s: %T", key, fv)
}
func getIntField(m telegraf.Metric, key string) (int64, error) {
fv, ok := m.GetField(key)
if !ok {
return 0, fmt.Errorf("missing field: %s", key)
}
if value, ok := fv.(int64); ok {
return value, nil
}
return 0, fmt.Errorf("unexpected type: %s: %T", key, fv)
}
func init() {
outputs.Add("azure_monitor", func() telegraf.Output {
return &AzureMonitor{
NamespacePrefix: "Telegraf/",
TimestampLimitPast: config.Duration(20 * time.Minute),
TimestampLimitFuture: config.Duration(-1 * time.Minute),
Timeout: config.Duration(5 * time.Second),
timeFunc: time.Now,
}
})
}

View file

@ -0,0 +1,619 @@
package azure_monitor
import (
"bufio"
"compress/gzip"
"encoding/json"
"net/http"
"net/http/httptest"
"sync/atomic"
"testing"
"time"
"github.com/Azure/go-autorest/autorest"
"github.com/Azure/go-autorest/autorest/adal"
"github.com/stretchr/testify/require"
"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/config"
"github.com/influxdata/telegraf/metric"
"github.com/influxdata/telegraf/testutil"
)
func TestAggregate(t *testing.T) {
tests := []struct {
name string
stringdim bool
metrics []telegraf.Metric
addTime time.Time
pushTime time.Time
expected []telegraf.Metric
expectedOutsideWindow int64
}{
{
name: "add metric outside window is dropped",
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu",
map[string]string{},
map[string]interface{}{
"value": 42,
},
time.Unix(0, 0),
),
},
addTime: time.Unix(3600, 0),
pushTime: time.Unix(3600, 0),
expectedOutsideWindow: 1,
},
{
name: "metric not sent until period expires",
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu",
map[string]string{},
map[string]interface{}{
"value": 42,
},
time.Unix(0, 0),
),
},
addTime: time.Unix(0, 0),
pushTime: time.Unix(0, 0),
},
{
name: "add strings as dimensions",
stringdim: true,
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu",
map[string]string{
"host": "localhost",
},
map[string]interface{}{
"value": 42,
"message": "howdy",
},
time.Unix(0, 0),
),
},
addTime: time.Unix(0, 0),
pushTime: time.Unix(3600, 0),
expected: []telegraf.Metric{
testutil.MustMetric(
"cpu-value",
map[string]string{
"host": "localhost",
"message": "howdy",
},
map[string]interface{}{
"min": 42.0,
"max": 42.0,
"sum": 42.0,
"count": 1,
},
time.Unix(0, 0),
),
},
},
{
name: "add metric to cache and push",
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu",
map[string]string{},
map[string]interface{}{
"value": 42,
},
time.Unix(0, 0),
),
},
addTime: time.Unix(0, 0),
pushTime: time.Unix(3600, 0),
expected: []telegraf.Metric{
testutil.MustMetric(
"cpu-value",
map[string]string{},
map[string]interface{}{
"min": 42.0,
"max": 42.0,
"sum": 42.0,
"count": 1,
},
time.Unix(0, 0),
),
},
},
{
name: "added metric are aggregated",
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu",
map[string]string{},
map[string]interface{}{
"value": 42,
},
time.Unix(0, 0),
),
testutil.MustMetric(
"cpu",
map[string]string{},
map[string]interface{}{
"value": 84,
},
time.Unix(0, 0),
),
testutil.MustMetric(
"cpu",
map[string]string{},
map[string]interface{}{
"value": 2,
},
time.Unix(0, 0),
),
},
addTime: time.Unix(0, 0),
pushTime: time.Unix(3600, 0),
expected: []telegraf.Metric{
testutil.MustMetric(
"cpu-value",
map[string]string{},
map[string]interface{}{
"min": 2.0,
"max": 84.0,
"sum": 128.0,
"count": 3,
},
time.Unix(0, 0),
),
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
msiEndpoint, err := adal.GetMSIVMEndpoint()
require.NoError(t, err)
t.Setenv("MSI_ENDPOINT", msiEndpoint)
// Setup plugin
plugin := &AzureMonitor{
Region: "test",
ResourceID: "/test",
StringsAsDimensions: tt.stringdim,
TimestampLimitPast: config.Duration(30 * time.Minute),
TimestampLimitFuture: config.Duration(-1 * time.Minute),
Log: testutil.Logger{},
timeFunc: func() time.Time { return tt.addTime },
}
require.NoError(t, plugin.Init())
require.NoError(t, plugin.Connect())
defer plugin.Close()
// Reset statistics
plugin.MetricOutsideWindow.Set(0)
// Add the data
for _, m := range tt.metrics {
plugin.Add(m)
}
// Push out the data at a later time
plugin.timeFunc = func() time.Time { return tt.pushTime }
metrics := plugin.Push()
plugin.Reset()
// Check the results
require.Equal(t, tt.expectedOutsideWindow, plugin.MetricOutsideWindow.Get())
testutil.RequireMetricsEqual(t, tt.expected, metrics)
})
}
}
func TestWrite(t *testing.T) {
// Set up a fake environment for Authorizer
// This used to fake an MSI environment, but since https://github.com/Azure/go-autorest/pull/670/files it's no longer possible,
// So we fake a user/password authentication
t.Setenv("AZURE_CLIENT_ID", "fake")
t.Setenv("AZURE_USERNAME", "fake")
t.Setenv("AZURE_PASSWORD", "fake")
tests := []struct {
name string
metrics []telegraf.Metric
expectedCalls uint64
expectedMetrics uint64
errmsg string
}{
{
name: "if not an azure metric nothing is sent",
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu",
map[string]string{},
map[string]interface{}{
"value": 42,
},
time.Unix(0, 0),
),
},
errmsg: "translating metric(s) failed",
},
{
name: "single azure metric",
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu-value",
map[string]string{},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
time.Unix(0, 0),
),
},
expectedCalls: 1,
expectedMetrics: 1,
},
{
name: "multiple azure metric",
metrics: []telegraf.Metric{
testutil.MustMetric(
"cpu-value",
map[string]string{},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
time.Unix(0, 0),
),
testutil.MustMetric(
"cpu-value",
map[string]string{},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
time.Unix(60, 0),
),
},
expectedCalls: 1,
expectedMetrics: 2,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Setup test server to collect the sent metrics
var calls atomic.Uint64
var metrics atomic.Uint64
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
calls.Add(1)
gz, err := gzip.NewReader(r.Body)
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
t.Logf("cannot create gzip reader: %v", err)
t.Fail()
return
}
scanner := bufio.NewScanner(gz)
for scanner.Scan() {
var m azureMonitorMetric
if err := json.Unmarshal(scanner.Bytes(), &m); err != nil {
w.WriteHeader(http.StatusInternalServerError)
t.Logf("cannot unmarshal JSON: %v", err)
t.Fail()
return
}
metrics.Add(1)
}
w.WriteHeader(http.StatusOK)
}))
defer ts.Close()
// Setup the plugin
plugin := AzureMonitor{
EndpointURL: "http://" + ts.Listener.Addr().String(),
Region: "test",
ResourceID: "/test",
TimestampLimitPast: config.Duration(30 * time.Minute),
TimestampLimitFuture: config.Duration(-1 * time.Minute),
Log: testutil.Logger{},
timeFunc: func() time.Time { return time.Unix(120, 0) },
}
require.NoError(t, plugin.Init())
// Override with testing setup
plugin.preparer = autorest.CreatePreparer(autorest.NullAuthorizer{}.WithAuthorization())
require.NoError(t, plugin.Connect())
defer plugin.Close()
err := plugin.Write(tt.metrics)
if tt.errmsg != "" {
require.ErrorContains(t, err, tt.errmsg)
return
}
require.NoError(t, err)
require.Equal(t, tt.expectedCalls, calls.Load())
require.Equal(t, tt.expectedMetrics, metrics.Load())
})
}
}
func TestWriteTimelimits(t *testing.T) {
// Set up a fake environment for Authorizer
// This used to fake an MSI environment, but since https://github.com/Azure/go-autorest/pull/670/files it's no longer possible,
// So we fake a user/password authentication
t.Setenv("AZURE_CLIENT_ID", "fake")
t.Setenv("AZURE_USERNAME", "fake")
t.Setenv("AZURE_PASSWORD", "fake")
// Setup input metrics
tref := time.Now().Truncate(time.Minute)
inputs := []telegraf.Metric{
metric.New(
"cpu-value",
map[string]string{
"status": "too old",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(-time.Hour),
),
metric.New(
"cpu-value",
map[string]string{
"status": "30 min in the past",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(-30*time.Minute),
),
metric.New(
"cpu-value",
map[string]string{
"status": "20 min in the past",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(-20*time.Minute),
),
metric.New(
"cpu-value",
map[string]string{
"status": "10 min in the past",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(-10*time.Minute),
),
metric.New(
"cpu-value",
map[string]string{
"status": "now",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref,
),
metric.New(
"cpu-value",
map[string]string{
"status": "1 min in the future",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(1*time.Minute),
),
metric.New(
"cpu-value",
map[string]string{
"status": "2 min in the future",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(2*time.Minute),
),
metric.New(
"cpu-value",
map[string]string{
"status": "4 min in the future",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(4*time.Minute),
),
metric.New(
"cpu-value",
map[string]string{
"status": "5 min in the future",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(5*time.Minute),
),
metric.New(
"cpu-value",
map[string]string{
"status": "too far in the future",
},
map[string]interface{}{
"min": float64(42),
"max": float64(42),
"sum": float64(42),
"count": int64(1),
},
tref.Add(time.Hour),
),
}
// Error message for status 400
msg := `{"error":{"code":"BadRequest","message":"'time' should not be older than 30 minutes and not more than 4 minutes in the future\r\n"}}`
tests := []struct {
name string
input []telegraf.Metric
limitPast time.Duration
limitFuture time.Duration
expectedCount int
expectedError string
}{
{
name: "only good metrics",
input: inputs[1 : len(inputs)-2],
limitPast: 48 * time.Hour,
limitFuture: 48 * time.Hour,
expectedCount: len(inputs) - 3,
},
{
name: "metrics out of bounds",
input: inputs,
limitPast: 48 * time.Hour,
limitFuture: 48 * time.Hour,
expectedCount: len(inputs),
expectedError: "400 Bad Request: " + msg,
},
{
name: "default limit",
input: inputs,
limitPast: 20 * time.Minute,
limitFuture: -1 * time.Minute,
expectedCount: 2,
expectedError: "metric(s) outside of acceptable time window",
},
{
name: "permissive limit",
input: inputs,
limitPast: 30 * time.Minute,
limitFuture: 5 * time.Minute,
expectedCount: len(inputs) - 2,
expectedError: "metric(s) outside of acceptable time window",
},
{
name: "very strict",
input: inputs,
limitPast: 19*time.Minute + 59*time.Second,
limitFuture: 3*time.Minute + 59*time.Second,
expectedCount: len(inputs) - 6,
expectedError: "metric(s) outside of acceptable time window",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Counter for the number of received metrics
var count atomic.Int32
// Setup test server
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
defer r.Body.Close()
reader, err := gzip.NewReader(r.Body)
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
t.Logf("unzipping content failed: %v", err)
t.Fail()
return
}
defer reader.Close()
status := http.StatusOK
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
var data map[string]interface{}
if err := json.Unmarshal(scanner.Bytes(), &data); err != nil {
w.WriteHeader(http.StatusInternalServerError)
t.Logf("decoding JSON failed: %v", err)
t.Fail()
return
}
timestamp, err := time.Parse(time.RFC3339, data["time"].(string))
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
t.Logf("decoding time failed: %v", err)
t.Fail()
return
}
if timestamp.Before(tref.Add(-30*time.Minute)) || timestamp.After(tref.Add(5*time.Minute)) {
status = http.StatusBadRequest
}
count.Add(1)
}
w.WriteHeader(status)
if status == 400 {
//nolint:errcheck // Ignoring returned error as it is not relevant for the test
w.Write([]byte(msg))
}
}))
defer ts.Close()
// Setup plugin
plugin := AzureMonitor{
EndpointURL: "http://" + ts.Listener.Addr().String(),
Region: "test",
ResourceID: "/test",
TimestampLimitPast: config.Duration(tt.limitPast),
TimestampLimitFuture: config.Duration(tt.limitFuture),
Log: testutil.Logger{},
timeFunc: func() time.Time { return tref },
}
require.NoError(t, plugin.Init())
// Override with testing setup
plugin.preparer = autorest.CreatePreparer(autorest.NullAuthorizer{}.WithAuthorization())
require.NoError(t, plugin.Connect())
defer plugin.Close()
// Test writing
err := plugin.Write(tt.input)
if tt.expectedError == "" {
require.NoError(t, err)
} else {
require.ErrorContains(t, err, tt.expectedError)
}
require.Equal(t, tt.expectedCount, int(count.Load()))
})
}
}

View file

@ -0,0 +1,37 @@
# Send aggregate metrics to Azure Monitor
[[outputs.azure_monitor]]
## Timeout for HTTP writes.
# timeout = "20s"
## Set the namespace prefix, defaults to "Telegraf/<input-name>".
# namespace_prefix = "Telegraf/"
## Azure Monitor doesn't have a string value type, so convert string
## fields to dimensions (a.k.a. tags) if enabled. Azure Monitor allows
## a maximum of 10 dimensions so Telegraf will only send the first 10
## alphanumeric dimensions.
# strings_as_dimensions = false
## Both region and resource_id must be set or be available via the
## Instance Metadata service on Azure Virtual Machines.
#
## Azure Region to publish metrics against.
## ex: region = "southcentralus"
# region = ""
#
## The Azure Resource ID against which metric will be logged, e.g.
## ex: resource_id = "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<vm_name>"
# resource_id = ""
## Optionally, if in Azure US Government, China, or other sovereign
## cloud environment, set the appropriate REST endpoint for receiving
## metrics. (Note: region may be unused in this context)
# endpoint_url = "https://monitoring.core.usgovcloudapi.net"
## Time limitations of metric to send
## Documentation can be found here:
## https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-store-custom-rest-api?tabs=rest#timestamp
## However, the returned (400) error message might document more strict or
## relaxed settings. By default, only past metrics witin the limit are sent.
# timestamp_limit_past = "30m"
# timestamp_limit_future = "-1m"

View file

@ -0,0 +1,60 @@
package azure_monitor
import (
"fmt"
"time"
)
type azureMonitorMetric struct {
Time time.Time `json:"time"`
Data *azureMonitorData `json:"data"`
index int
}
type azureMonitorData struct {
BaseData *azureMonitorBaseData `json:"baseData"`
}
type azureMonitorBaseData struct {
Metric string `json:"metric"`
Namespace string `json:"namespace"`
DimensionNames []string `json:"dimNames"`
Series []*azureMonitorSeries `json:"series"`
}
type azureMonitorSeries struct {
DimensionValues []string `json:"dimValues"`
Min float64 `json:"min"`
Max float64 `json:"max"`
Sum float64 `json:"sum"`
Count int64 `json:"count"`
}
// VirtualMachineMetadata contains information about a VM from the metadata service
type virtualMachineMetadata struct {
Compute struct {
Location string `json:"location"`
Name string `json:"name"`
ResourceGroupName string `json:"resourceGroupName"`
SubscriptionID string `json:"subscriptionId"`
VMScaleSetName string `json:"vmScaleSetName"`
} `json:"compute"`
}
func (m *virtualMachineMetadata) ResourceID() string {
if m.Compute.VMScaleSetName != "" {
return fmt.Sprintf(
resourceIDScaleSetTemplate,
m.Compute.SubscriptionID,
m.Compute.ResourceGroupName,
m.Compute.VMScaleSetName,
)
}
return fmt.Sprintf(
resourceIDTemplate,
m.Compute.SubscriptionID,
m.Compute.ResourceGroupName,
m.Compute.Name,
)
}