1
0
Fork 0

Adding upstream version 1.34.4.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-05-24 07:26:29 +02:00
parent e393c3af3f
commit 4978089aab
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
4963 changed files with 677545 additions and 0 deletions

View file

@ -0,0 +1,272 @@
# Grok Parser Plugin
The grok data format parses line delimited data using a regular expression like
language.
The best way to get acquainted with grok patterns is to read the logstash docs,
which are available [here][1].
The grok parser uses a slightly modified version of logstash "grok"
patterns, with the format:
```text
%{<capture_syntax>[:<semantic_name>][:<modifier>]}
```
The `capture_syntax` defines the grok pattern that's used to parse the input
line and the `semantic_name` is used to name the field or tag. The extension
`modifier` controls the data type that the parsed item is converted to or
other special handling.
By default all named captures are converted into string fields.
If a pattern does not have a semantic name it will not be captured.
Timestamp modifiers can be used to convert captures to the timestamp of the
parsed metric. If no timestamp is parsed the metric will be created using the
current time.
You must capture at least one field per line.
- Available modifiers:
- string (default if nothing is specified)
- int
- float
- duration (ie, 5.23ms gets converted to int nanoseconds)
- tag (converts the field into a tag)
- drop (drops the field completely)
- measurement (use the matched text as the measurement name)
- Timestamp modifiers:
- ts (This will auto-learn the timestamp format)
- ts-ansic ("Mon Jan _2 15:04:05 2006")
- ts-unix ("Mon Jan _2 15:04:05 MST 2006")
- ts-ruby ("Mon Jan 02 15:04:05 -0700 2006")
- ts-rfc822 ("02 Jan 06 15:04 MST")
- ts-rfc822z ("02 Jan 06 15:04 -0700")
- ts-rfc850 ("Monday, 02-Jan-06 15:04:05 MST")
- ts-rfc1123 ("Mon, 02 Jan 2006 15:04:05 MST")
- ts-rfc1123z ("Mon, 02 Jan 2006 15:04:05 -0700")
- ts-rfc3339 ("2006-01-02T15:04:05Z07:00")
- ts-rfc3339nano ("2006-01-02T15:04:05.999999999Z07:00")
- ts-httpd ("02/Jan/2006:15:04:05 -0700")
- ts-epoch (seconds since unix epoch, may contain decimal)
- ts-epochnano (nanoseconds since unix epoch)
- ts-epochmilli (milliseconds since unix epoch)
- ts-syslog ("Jan 02 15:04:05", parsed time is set to the current year)
- ts-"CUSTOM"
CUSTOM time layouts must be within quotes and be the representation of the
"reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`. To match a
comma decimal point you can use a period. For example
`%{TIMESTAMP:timestamp:ts-"2006-01-02 15:04:05.000"}` can be used to match
`"2018-01-02 15:04:05,000"` To match a comma decimal point you can use a period
in the pattern string. See [Goloang Time
docs](https://golang.org/pkg/time/#Parse) for more details.
Telegraf has many of its own [built-in patterns][] as well as support for most
of the Logstash builtin patterns using [these Go compatible
patterns][grok-patterns].
**Note:** Golang regular expressions do not support lookahead or lookbehind.
Logstash patterns that use these features may not be supported, or may use a Go
friendly pattern that is not fully compatible with the Logstash pattern.
[built-in patterns]: /plugins/parsers/grok/influx_patterns.go
[grok-patterns]: https://github.com/vjeantet/grok/blob/master/patterns/grok-patterns
If you need help building patterns to match your logs, you will find the [Grok
Debug](https://grokdebug.herokuapp.com) application quite useful!
[1]: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
## Configuration
```toml
[[inputs.file]]
## Files to parse each interval.
## These accept standard unix glob matching rules, but with the addition of
## ** as a "super asterisk". ie:
## /var/log/**.log -> recursively find all .log files in /var/log
## /var/log/*/*.log -> find all .log files with a parent dir in /var/log
## /var/log/apache.log -> only tail the apache log file
files = ["/var/log/apache/access.log"]
## The dataformat to be read from files
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "grok"
## This is a list of patterns to check the given log file(s) for.
## Note that adding patterns here increases processing time. The most
## efficient configuration is to have one pattern.
## Other common built-in patterns are:
## %{COMMON_LOG_FORMAT} (plain apache & nginx access logs)
## %{COMBINED_LOG_FORMAT} (access logs + referrer & agent)
grok_patterns = ["%{COMBINED_LOG_FORMAT}"]
## Full path(s) to custom pattern files.
grok_custom_pattern_files = []
## Custom patterns can also be defined here. Put one pattern per line.
grok_custom_patterns = '''
'''
## Timezone allows you to provide an override for timestamps that
## don't already include an offset
## e.g. 04/06/2016 12:41:45 data one two 5.43µs
##
## Default: "" which renders UTC
## Options are as follows:
## 1. Local -- interpret based on machine localtime
## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
## 3. UTC -- or blank/unspecified, will return timestamp in UTC
grok_timezone = "Canada/Eastern"
## When set to "disable" timestamp will not incremented if there is a
## duplicate.
# grok_unique_timestamp = "auto"
## Enable multiline messages to be processed.
# grok_multiline = false
```
### Timestamp Examples
This example input and config parses a file using a custom timestamp conversion:
```text
2017-02-21 13:10:34 value=42
```
```toml
[[inputs.file]]
grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}']
```
This example input and config parses a file using a timestamp in unix time:
```text
1466004605 value=42
1466004605.123456789 value=42
```
```toml
[[inputs.file]]
grok_patterns = ['%{NUMBER:timestamp:ts-epoch} value=%{NUMBER:value:int}']
```
This example parses a file using a built-in conversion and a custom pattern:
```text
Wed Apr 12 13:10:34 PST 2017 value=42
```
```toml
[[inputs.file]]
grok_patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"]
grok_custom_patterns = '''
TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR}
'''
```
This example input and config parses a file using a custom timestamp conversion
that doesn't match any specific standard:
```text
21/02/2017 13:10:34 value=42
```
```toml
[[inputs.file]]
grok_patterns = ['%{MY_TIMESTAMP:timestamp:ts-"02/01/2006 15:04:05"} value=%{NUMBER:value:int}']
grok_custom_patterns = '''
MY_TIMESTAMP (?:\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2})
'''
```
For cases where the timestamp itself is without offset, the `timezone` config
var is available to denote an offset. By default (with `timezone` either omit,
blank or set to `"UTC"`), the times are processed as if in the UTC timezone. If
specified as `timezone = "Local"`, the timestamp will be processed based on the
current machine timezone configuration. Lastly, if using a timezone from the
list of Unix
[timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), grok
will offset the timestamp accordingly.
#### TOML Escaping
When saving patterns to the configuration file, keep in mind the different TOML
[string](https://github.com/toml-lang/toml#string) types and the escaping
rules for each. These escaping rules must be applied in addition to the
escaping required by the grok syntax. Using the Multi-line line literal
syntax with `'''` may be useful.
The following config examples will parse this input file:
```text
|42|\uD83D\uDC2F|'telegraf'|
```
Since `|` is a special character in the grok language, we must escape it to
get a literal `|`. With a basic TOML string, special characters such as
backslash must be escaped, requiring us to escape the backslash a second time.
```toml
[[inputs.file]]
grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
grok_custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+"
```
We cannot use a literal TOML string for the pattern, because we cannot match a
`'` within it. However, it works well for the custom pattern.
```toml
[[inputs.file]]
grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
```
A multi-line literal string allows us to encode the pattern:
```toml
[[inputs.file]]
grok_patterns = ['''
\|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\|
''']
grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
```
#### Tips for creating patterns
Writing complex patterns can be difficult, here is some advice for writing a new
pattern or testing a pattern developed
[online](https://grokdebug.herokuapp.com).
Create a file output that writes to stdout, and disable other outputs while
testing. This will allow you to see the captured metrics. Keep in mind that
the file output will only print once per `flush_interval`.
```toml
[[outputs.file]]
files = ["stdout"]
```
- Start with a file containing only a single line of your input.
- Remove all but the first token or piece of the line.
- Add the section of your pattern to match this piece to your configuration file.
- Verify that the metric is parsed successfully by running Telegraf.
- If successful, add the next token, update the pattern and retest.
- Continue one token at a time until the entire line is successfully parsed.
#### Performance
Performance depends heavily on the regular expressions that you use, but there
are a few techniques that can help:
- Avoid using patterns such as `%{DATA}` that will always match.
- If possible, add `^` and `$` anchors to your pattern:
```toml
[[inputs.file]]
grok_patterns = ["^%{COMBINED_LOG_FORMAT}$"]
```

View file

@ -0,0 +1,43 @@
package grok
//nolint:lll // conditionally long lines allowed
const DefaultPatterns = `
# Example log file pattern, example log looks like this:
# [04/Jun/2016:12:41:45 +0100] 1.25 200 192.168.1.1 5.432µs
# Breakdown of the DURATION pattern below:
# NUMBER is a builtin logstash grok pattern matching float & int numbers.
# [nuµm]? is a regex specifying 0 or 1 of the characters within brackets.
# s is also regex, this pattern must end in "s".
# so DURATION will match something like '5.324ms' or '6.1µs' or '10s'
DURATION %{NUMBER}[nuµm]?s
RESPONSE_CODE %{NUMBER:response_code:tag}
RESPONSE_TIME %{DURATION:response_time_ns:duration}
EXAMPLE_LOG \[%{HTTPDATE:ts:ts-httpd}\] %{NUMBER:myfloat:float} %{RESPONSE_CODE} %{IPORHOST:clientip} %{RESPONSE_TIME}
# Wider-ranging username matching vs. logstash built-in %{USER}
NGUSERNAME [a-zA-Z0-9\.\@\-\+_%]+
NGUSER %{NGUSERNAME}
# Wider-ranging client IP matching
CLIENT (?:%{IPV6}|%{IPV4}|%{HOSTNAME}|%{HOSTPORT})
##
## COMMON LOG PATTERNS
##
# apache & nginx logs, this is also known as the "common log format"
# see https://en.wikipedia.org/wiki/Common_Log_Format
COMMON_LOG_FORMAT %{CLIENT:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \[%{HTTPDATE:ts:ts-httpd}\] "(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-)
# Combined log format is the same as the common log format but with the addition
# of two quoted strings at the end for "referrer" and "agent"
# See Examples at http://httpd.apache.org/docs/current/mod/mod_log_config.html
COMBINED_LOG_FORMAT %{COMMON_LOG_FORMAT} "%{DATA:referrer}" "%{DATA:agent}"
# HTTPD log formats
HTTPD20_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{LOGLEVEL:loglevel:tag}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:errormsg}
HTTPD24_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{WORD:module}:%{LOGLEVEL:loglevel:tag}\] \[pid %{POSINT:pid:int}:tid %{NUMBER:tid:int}\]( \(%{POSINT:proxy_errorcode:int}\)%{DATA:proxy_errormessage}:)?( \[client %{IPORHOST:client}:%{POSINT:clientport}\])? %{DATA:errorcode}: %{GREEDYDATA:message}
HTTPD_ERRORLOG %{HTTPD20_ERRORLOG}|%{HTTPD24_ERRORLOG}
# DATA spanning multiple lines
MULTILINEDATA (.|\n)*
`

View file

@ -0,0 +1,603 @@
package grok
import (
"bufio"
"bytes"
"errors"
"fmt"
"os"
"regexp"
"strconv"
"strings"
"time"
"github.com/vjeantet/grok"
"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/internal"
"github.com/influxdata/telegraf/metric"
"github.com/influxdata/telegraf/plugins/parsers"
)
var timeLayouts = map[string]string{
"ts-ansic": "Mon Jan _2 15:04:05 2006",
"ts-unix": "Mon Jan _2 15:04:05 MST 2006",
"ts-ruby": "Mon Jan 02 15:04:05 -0700 2006",
"ts-rfc822": "02 Jan 06 15:04 MST",
"ts-rfc822z": "02 Jan 06 15:04 -0700", // RFC822 with numeric zone
"ts-rfc850": "Monday, 02-Jan-06 15:04:05 MST",
"ts-rfc1123": "Mon, 02 Jan 2006 15:04:05 MST",
"ts-rfc1123z": "Mon, 02 Jan 2006 15:04:05 -0700", // RFC1123 with numeric zone
"ts-rfc3339": "2006-01-02T15:04:05Z07:00",
"ts-rfc3339nano": "2006-01-02T15:04:05.999999999Z07:00",
"ts-httpd": "02/Jan/2006:15:04:05 -0700",
// These four are not exactly "layouts", but they are special cases that
// will get handled in the ParseLine function.
"ts-epoch": "EPOCH",
"ts-epochnano": "EPOCH_NANO",
"ts-epochmilli": "EPOCH_MILLI",
"ts-syslog": "SYSLOG_TIMESTAMP",
"ts": "GENERIC_TIMESTAMP", // try parsing all known timestamp layouts.
}
const (
Measurement = "measurement"
Int = "int"
Tag = "tag"
Float = "float"
String = "string"
Duration = "duration"
Drop = "drop"
Epoch = "EPOCH"
EpochMilli = "EPOCH_MILLI"
EpochNano = "EPOCH_NANO"
SyslogTimestamp = "SYSLOG_TIMESTAMP"
GenericTimestamp = "GENERIC_TIMESTAMP"
)
var (
// matches named captures that contain a modifier.
// ie,
// %{NUMBER:bytes:int}
// %{IPORHOST:clientip:tag}
// %{HTTPDATE:ts1:ts-http}
// %{HTTPDATE:ts2:ts-"02 Jan 06 15:04"}
modifierRe = regexp.MustCompile(`%{\w+:(\w+):(ts-".+"|t?s?-?\w+)}`)
// matches a plain pattern name. ie, %{NUMBER}
patternOnlyRe = regexp.MustCompile(`%{(\w+)}`)
)
// Parser is the primary struct to handle and grok-patterns defined in the config toml
type Parser struct {
Patterns []string `toml:"grok_patterns"`
// namedPatterns is a list of internally-assigned names to the patterns
// specified by the user in Patterns.
// They will look like:
// GROK_INTERNAL_PATTERN_0, GROK_INTERNAL_PATTERN_1, etc.
NamedPatterns []string `toml:"grok_named_patterns"`
CustomPatterns string `toml:"grok_custom_patterns"`
CustomPatternFiles []string `toml:"grok_custom_pattern_files"`
Multiline bool `toml:"grok_multiline"`
Measurement string `toml:"-"`
DefaultTags map[string]string `toml:"-"`
Log telegraf.Logger `toml:"-"`
// Timezone is an optional component to help render log dates to
// your chosen zone.
// Default: "" which renders UTC
// Options are as follows:
// 1. Local -- interpret based on machine localtime
// 2. "America/Chicago" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
// 3. UTC -- or blank/unspecified, will return timestamp in UTC
Timezone string `toml:"grok_timezone"`
loc *time.Location
// UniqueTimestamp when set to "disable", timestamp will not incremented if there is a duplicate.
UniqueTimestamp string `toml:"grok_unique_timestamp"`
// typeMap is a map of patterns -> capture name -> modifier,
// ie, {
// "%{TESTLOG}":
// {
// "bytes": "int",
// "clientip": "tag"
// }
// }
typeMap map[string]map[string]string
// tsMap is a map of patterns -> capture name -> timestamp layout.
// ie, {
// "%{TESTLOG}":
// {
// "httptime": "02/Jan/2006:15:04:05 -0700"
// }
// }
tsMap map[string]map[string]string
// patternsMap is a map of all of the parsed patterns from CustomPatterns
// and CustomPatternFiles.
// ie, {
// "DURATION": "%{NUMBER}[nuµm]?s"
// "RESPONSE_CODE": "%{NUMBER:rc:tag}"
// }
patternsMap map[string]string
// foundTSLayouts is a slice of timestamp patterns that have been found
// in the log lines. This slice gets updated if the user uses the generic
// 'ts' modifier for timestamps. This slice is checked first for matches,
// so that previously-matched layouts get priority over all other timestamp
// layouts.
foundTSLayouts []string
timeFunc func() time.Time
g *grok.Grok
tsModder *tsModder
}
// Compile is a bound method to Parser which will process the options for our parser
func (p *Parser) Compile() error {
p.typeMap = make(map[string]map[string]string)
p.tsMap = make(map[string]map[string]string)
p.patternsMap = make(map[string]string)
p.tsModder = &tsModder{}
var err error
p.g, err = grok.NewWithConfig(&grok.Config{NamedCapturesOnly: true})
if err != nil {
return err
}
if p.UniqueTimestamp == "" {
p.UniqueTimestamp = "auto"
}
// Give Patterns fake names so that they can be treated as named
// "custom patterns"
p.NamedPatterns = make([]string, 0, len(p.Patterns))
for i, pattern := range p.Patterns {
pattern = strings.TrimSpace(pattern)
if pattern == "" {
continue
}
name := fmt.Sprintf("GROK_INTERNAL_PATTERN_%d", i)
p.CustomPatterns += "\n" + name + " " + pattern + "\n"
p.NamedPatterns = append(p.NamedPatterns, "%{"+name+"}")
}
if len(p.NamedPatterns) == 0 {
return errors.New("pattern required")
}
// Combine user-supplied CustomPatterns with DEFAULT_PATTERNS and parse
// them together as the same type of pattern.
p.CustomPatterns = DefaultPatterns + p.CustomPatterns
if len(p.CustomPatterns) != 0 {
scanner := bufio.NewScanner(strings.NewReader(p.CustomPatterns))
p.addCustomPatterns(scanner)
}
// Parse any custom pattern files supplied.
for _, filename := range p.CustomPatternFiles {
file, fileErr := os.Open(filename)
if fileErr != nil {
return fileErr
}
scanner := bufio.NewScanner(bufio.NewReader(file))
p.addCustomPatterns(scanner)
}
p.loc, err = time.LoadLocation(p.Timezone)
if err != nil {
p.Log.Warnf("Improper timezone supplied (%s), setting loc to UTC", p.Timezone)
p.loc = time.UTC
}
if p.timeFunc == nil {
p.timeFunc = time.Now
}
return p.compileCustomPatterns()
}
// ParseLine is the primary function to process individual lines, returning the metrics
func (p *Parser) ParseLine(line string) (telegraf.Metric, error) {
var err error
// values are the parsed fields from the log line
var values map[string]string
// the matching pattern string
var patternName string
for _, pattern := range p.NamedPatterns {
if values, err = p.g.Parse(pattern, line); err != nil {
return nil, err
}
if len(values) != 0 {
patternName = pattern
break
}
}
if len(values) == 0 {
p.Log.Debugf("Grok no match found for or no data extracted from: %q", line)
return nil, nil
}
fields := make(map[string]interface{})
tags := make(map[string]string)
// add default tags
for k, v := range p.DefaultTags {
tags[k] = v
}
timestamp := time.Now()
for k, v := range values {
if k == "" || v == "" {
continue
}
// t is the modifier of the field
var t string
// check if pattern has some modifiers
if types, ok := p.typeMap[patternName]; ok {
t = types[k]
}
// if we didn't find a modifier, check if we have a timestamp layout
if t == "" {
if ts, ok := p.tsMap[patternName]; ok {
// check if the modifier is a timestamp layout
if layout, ok := ts[k]; ok {
t = layout
}
}
}
// if we didn't find a type OR timestamp modifier, assume string
if t == "" {
t = String
}
switch t {
case Measurement:
p.Measurement = v
case Int:
iv, err := strconv.ParseInt(v, 0, 64)
if err != nil {
p.Log.Errorf("Error parsing %s to int: %s", v, err)
} else {
fields[k] = iv
}
case Float:
fv, err := strconv.ParseFloat(v, 64)
if err != nil {
p.Log.Errorf("Error parsing %s to float: %s", v, err)
} else {
fields[k] = fv
}
case Duration:
d, err := time.ParseDuration(v)
if err != nil {
p.Log.Errorf("Error parsing %s to duration: %s", v, err)
} else {
fields[k] = int64(d)
}
case Tag:
tags[k] = v
case String:
fields[k] = v
case Epoch:
parts := strings.SplitN(v, ".", 2)
if len(parts) == 0 {
p.Log.Errorf("Error parsing %s to timestamp: %s", v, err)
break
}
sec, err := strconv.ParseInt(parts[0], 10, 64)
if err != nil {
p.Log.Errorf("Error parsing %s to timestamp: %s", v, err)
break
}
ts := time.Unix(sec, 0)
if len(parts) == 2 {
padded := fmt.Sprintf("%-9s", parts[1])
nsString := strings.ReplaceAll(padded[:9], " ", "0")
nanosec, err := strconv.ParseInt(nsString, 10, 64)
if err != nil {
p.Log.Errorf("Error parsing %s to timestamp: %s", v, err)
break
}
ts = ts.Add(time.Duration(nanosec) * time.Nanosecond)
}
timestamp = ts
case EpochMilli:
ms, err := strconv.ParseInt(v, 10, 64)
if err != nil {
p.Log.Errorf("Error parsing %s to int: %s", v, err)
} else {
timestamp = time.Unix(0, ms*int64(time.Millisecond))
}
case EpochNano:
iv, err := strconv.ParseInt(v, 10, 64)
if err != nil {
p.Log.Errorf("Error parsing %s to int: %s", v, err)
} else {
timestamp = time.Unix(0, iv)
}
case SyslogTimestamp:
ts, err := internal.ParseTimestamp(time.Stamp, v, p.loc)
if err == nil {
if ts.Year() == 0 {
ts = ts.AddDate(timestamp.Year(), 0, 0)
}
timestamp = ts
} else {
p.Log.Errorf("Error parsing %s to time layout [%s]: %s", v, t, err)
}
case GenericTimestamp:
var foundTS bool
// first try timestamp layouts that we've already found
for _, layout := range p.foundTSLayouts {
ts, err := internal.ParseTimestamp(layout, v, p.loc)
if err == nil {
timestamp = ts
foundTS = true
break
}
}
// if we haven't found a timestamp layout yet, try all timestamp
// layouts.
if !foundTS {
for _, layout := range timeLayouts {
ts, err := internal.ParseTimestamp(layout, v, p.loc)
if err == nil {
timestamp = ts
foundTS = true
p.foundTSLayouts = append(p.foundTSLayouts, layout)
break
}
}
}
// if we still haven't found a timestamp layout, log it and we will
// just use time.Now()
if !foundTS {
p.Log.Errorf("Error parsing timestamp [%s], could not find any "+
"suitable time layouts.", v)
}
case Drop:
// goodbye!
default:
v = strings.ReplaceAll(v, ",", ".")
ts, err := internal.ParseTimestamp(t, v, p.loc)
if err == nil {
if ts.Year() == 0 {
ts = ts.AddDate(timestamp.Year(), 0, 0)
}
timestamp = ts
} else {
p.Log.Errorf("Error parsing %s to time layout [%s]: %s", v, t, err)
}
}
}
if p.UniqueTimestamp != "auto" {
return metric.New(p.Measurement, tags, fields, timestamp), nil
}
return metric.New(p.Measurement, tags, fields, p.tsModder.tsMod(timestamp)), nil
}
func (p *Parser) Parse(buf []byte) ([]telegraf.Metric, error) {
metrics := make([]telegraf.Metric, 0)
if p.Multiline {
m, err := p.ParseLine(string(buf))
if err != nil {
return nil, err
}
if m != nil {
metrics = append(metrics, m)
}
return metrics, nil
}
scanner := bufio.NewScanner(bytes.NewReader(buf))
for scanner.Scan() {
line := scanner.Text()
m, err := p.ParseLine(line)
if err != nil {
return nil, err
}
if m == nil {
continue
}
metrics = append(metrics, m)
}
return metrics, nil
}
func (p *Parser) SetDefaultTags(tags map[string]string) {
p.DefaultTags = tags
}
func (p *Parser) addCustomPatterns(scanner *bufio.Scanner) {
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if len(line) > 0 && line[0] != '#' {
names := strings.SplitN(line, " ", 2)
p.patternsMap[names[0]] = names[1]
}
}
}
func (p *Parser) compileCustomPatterns() error {
var err error
// check if the pattern contains a subpattern that is already defined
// replace it with the subpattern for modifier inheritance.
for i := 0; i < 2; i++ {
for name, pattern := range p.patternsMap {
subNames := patternOnlyRe.FindAllStringSubmatch(pattern, -1)
for _, subName := range subNames {
if subPattern, ok := p.patternsMap[subName[1]]; ok {
pattern = strings.Replace(pattern, subName[0], subPattern, 1)
}
}
p.patternsMap[name] = pattern
}
}
// check if pattern contains modifiers. Parse them out if it does.
for name, pattern := range p.patternsMap {
if modifierRe.MatchString(pattern) {
// this pattern has modifiers, so parse out the modifiers
pattern, err = p.parseTypedCaptures(name, pattern)
if err != nil {
return err
}
p.patternsMap[name] = pattern
}
}
return p.g.AddPatternsFromMap(p.patternsMap)
}
// parseTypedCaptures parses the capture modifiers, and then deletes the
// modifier from the line so that it is a valid "grok" pattern again.
//
// ie,
// %{NUMBER:bytes:int} => %{NUMBER:bytes} (stores %{NUMBER}->bytes->int)
// %{IPORHOST:clientip:tag} => %{IPORHOST:clientip} (stores %{IPORHOST}->clientip->tag)
func (p *Parser) parseTypedCaptures(name, pattern string) (string, error) {
matches := modifierRe.FindAllStringSubmatch(pattern, -1)
// grab the name of the capture pattern
patternName := "%{" + name + "}"
// create type map for this pattern
p.typeMap[patternName] = make(map[string]string)
p.tsMap[patternName] = make(map[string]string)
// boolean to verify that each pattern only has a single ts- data type.
hasTimestamp := false
for _, match := range matches {
// regex capture 1 is the name of the capture
// regex capture 2 is the modifier of the capture
if strings.HasPrefix(match[2], "ts") {
if hasTimestamp {
return pattern, fmt.Errorf("logparser pattern compile error: "+
"Each pattern is allowed only one named "+
"timestamp data type. pattern: %s", pattern)
}
if layout, ok := timeLayouts[match[2]]; ok {
// built-in time format
p.tsMap[patternName][match[1]] = layout
} else {
// custom time format
p.tsMap[patternName][match[1]] = strings.TrimSuffix(strings.TrimPrefix(match[2], `ts-"`), `"`)
}
hasTimestamp = true
} else {
p.typeMap[patternName][match[1]] = match[2]
}
// the modifier is not a valid part of a "grok" pattern, so remove it
// from the pattern.
pattern = strings.Replace(pattern, ":"+match[2]+"}", "}", 1)
}
return pattern, nil
}
// tsModder is a struct for incrementing identical timestamps of log lines
// so that we don't push identical metrics that will get overwritten.
type tsModder struct {
dupe time.Time
last time.Time
incr time.Duration
incrn time.Duration
rollover time.Duration
}
// tsMod increments the given timestamp one unit more from the previous
// duplicate timestamp.
// the increment unit is determined as the next smallest time unit below the
// most significant time unit of ts.
//
// ie, if the input is at ms precision, it will increment it 1µs.
func (t *tsModder) tsMod(ts time.Time) time.Time {
if ts.IsZero() {
return ts
}
defer func() { t.last = ts }()
// don't mod the time if we don't need to
if t.last.IsZero() || ts.IsZero() {
t.incrn = 0
t.rollover = 0
return ts
}
if !ts.Equal(t.last) && !ts.Equal(t.dupe) {
t.incr = 0
t.incrn = 0
t.rollover = 0
return ts
}
if ts.Equal(t.last) {
t.dupe = ts
}
if ts.Equal(t.dupe) && t.incr == time.Duration(0) {
tsNano := ts.UnixNano()
d := int64(10)
counter := 1
for {
a := tsNano % d
if a > 0 {
break
}
d = d * 10
counter++
}
switch {
case counter <= 6:
t.incr = time.Nanosecond
case counter <= 9:
t.incr = time.Microsecond
case counter > 9:
t.incr = time.Millisecond
}
}
t.incrn++
if t.incrn == 999 && t.incr > time.Nanosecond {
t.rollover = t.incr * t.incrn
t.incrn = 1
t.incr = t.incr / 1000
if t.incr < time.Nanosecond {
t.incr = time.Nanosecond
}
}
return ts.Add(t.incr*t.incrn + t.rollover)
}
func (p *Parser) Init() error {
if len(p.Patterns) == 0 {
p.Patterns = []string{"%{COMBINED_LOG_FORMAT}"}
}
if p.UniqueTimestamp == "" {
p.UniqueTimestamp = "auto"
}
if p.Timezone == "" {
p.Timezone = "UTC"
}
return p.Compile()
}
func init() {
parsers.Add("grok",
func(defaultMetricName string) telegraf.Parser {
return &Parser{
Measurement: defaultMetricName,
}
},
)
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,14 @@
# Test A log line:
# [04/Jun/2016:12:41:45 +0100] 1.25 200 192.168.1.1 5.432µs 101
DURATION %{NUMBER}[nuµm]?s
RESPONSE_CODE %{NUMBER:response_code:tag}
RESPONSE_TIME %{DURATION:response_time:duration}
TEST_LOG_A \[%{HTTPDATE:timestamp:ts-httpd}\] %{NUMBER:myfloat:float} %{RESPONSE_CODE} %{IPORHOST:clientip} %{RESPONSE_TIME} %{NUMBER:myint:int}
# Test B log line:
# [04/06/2016--12:41:45] 1.25 mystring dropme nomodifier
TEST_TIMESTAMP %{MONTHDAY}/%{MONTHNUM}/%{YEAR}--%{TIME}
TEST_LOG_B \[%{TEST_TIMESTAMP:timestamp:ts-"02/01/2006--15:04:05"}\] %{NUMBER:myfloat:float} %{WORD:mystring:string} %{WORD:dropme:drop} %{WORD:nomodifier}
TEST_TIMESTAMP %{MONTHDAY}/%{MONTHNUM}/%{YEAR}--%{TIME}
TEST_LOG_BAD \[%{TEST_TIMESTAMP:timestamp:ts-"02/01/2006--15:04:05"}\] %{NUMBER:myfloat:float} %{WORD:mystring:int} %{WORD:dropme:drop} %{WORD:nomodifier}

View file

@ -0,0 +1 @@
[04/Jun/2016:12:41:45 +0100] 1.25 200 192.168.1.1 5.432µs 101

View file

@ -0,0 +1 @@
[04/06/2016--12:41:45] 1.25 mystring dropme nomodifier

View file

@ -0,0 +1,3 @@
2022-12-01T12:41:45Z Error A long and
multiline
message