Adding upstream version 1.34.4.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
e393c3af3f
commit
4978089aab
4963 changed files with 677545 additions and 0 deletions
272
plugins/parsers/grok/README.md
Normal file
272
plugins/parsers/grok/README.md
Normal file
|
@ -0,0 +1,272 @@
|
|||
# Grok Parser Plugin
|
||||
|
||||
The grok data format parses line delimited data using a regular expression like
|
||||
language.
|
||||
|
||||
The best way to get acquainted with grok patterns is to read the logstash docs,
|
||||
which are available [here][1].
|
||||
|
||||
The grok parser uses a slightly modified version of logstash "grok"
|
||||
patterns, with the format:
|
||||
|
||||
```text
|
||||
%{<capture_syntax>[:<semantic_name>][:<modifier>]}
|
||||
```
|
||||
|
||||
The `capture_syntax` defines the grok pattern that's used to parse the input
|
||||
line and the `semantic_name` is used to name the field or tag. The extension
|
||||
`modifier` controls the data type that the parsed item is converted to or
|
||||
other special handling.
|
||||
|
||||
By default all named captures are converted into string fields.
|
||||
If a pattern does not have a semantic name it will not be captured.
|
||||
Timestamp modifiers can be used to convert captures to the timestamp of the
|
||||
parsed metric. If no timestamp is parsed the metric will be created using the
|
||||
current time.
|
||||
|
||||
You must capture at least one field per line.
|
||||
|
||||
- Available modifiers:
|
||||
- string (default if nothing is specified)
|
||||
- int
|
||||
- float
|
||||
- duration (ie, 5.23ms gets converted to int nanoseconds)
|
||||
- tag (converts the field into a tag)
|
||||
- drop (drops the field completely)
|
||||
- measurement (use the matched text as the measurement name)
|
||||
- Timestamp modifiers:
|
||||
- ts (This will auto-learn the timestamp format)
|
||||
- ts-ansic ("Mon Jan _2 15:04:05 2006")
|
||||
- ts-unix ("Mon Jan _2 15:04:05 MST 2006")
|
||||
- ts-ruby ("Mon Jan 02 15:04:05 -0700 2006")
|
||||
- ts-rfc822 ("02 Jan 06 15:04 MST")
|
||||
- ts-rfc822z ("02 Jan 06 15:04 -0700")
|
||||
- ts-rfc850 ("Monday, 02-Jan-06 15:04:05 MST")
|
||||
- ts-rfc1123 ("Mon, 02 Jan 2006 15:04:05 MST")
|
||||
- ts-rfc1123z ("Mon, 02 Jan 2006 15:04:05 -0700")
|
||||
- ts-rfc3339 ("2006-01-02T15:04:05Z07:00")
|
||||
- ts-rfc3339nano ("2006-01-02T15:04:05.999999999Z07:00")
|
||||
- ts-httpd ("02/Jan/2006:15:04:05 -0700")
|
||||
- ts-epoch (seconds since unix epoch, may contain decimal)
|
||||
- ts-epochnano (nanoseconds since unix epoch)
|
||||
- ts-epochmilli (milliseconds since unix epoch)
|
||||
- ts-syslog ("Jan 02 15:04:05", parsed time is set to the current year)
|
||||
- ts-"CUSTOM"
|
||||
|
||||
CUSTOM time layouts must be within quotes and be the representation of the
|
||||
"reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`. To match a
|
||||
comma decimal point you can use a period. For example
|
||||
`%{TIMESTAMP:timestamp:ts-"2006-01-02 15:04:05.000"}` can be used to match
|
||||
`"2018-01-02 15:04:05,000"` To match a comma decimal point you can use a period
|
||||
in the pattern string. See [Goloang Time
|
||||
docs](https://golang.org/pkg/time/#Parse) for more details.
|
||||
|
||||
Telegraf has many of its own [built-in patterns][] as well as support for most
|
||||
of the Logstash builtin patterns using [these Go compatible
|
||||
patterns][grok-patterns].
|
||||
|
||||
**Note:** Golang regular expressions do not support lookahead or lookbehind.
|
||||
Logstash patterns that use these features may not be supported, or may use a Go
|
||||
friendly pattern that is not fully compatible with the Logstash pattern.
|
||||
|
||||
[built-in patterns]: /plugins/parsers/grok/influx_patterns.go
|
||||
[grok-patterns]: https://github.com/vjeantet/grok/blob/master/patterns/grok-patterns
|
||||
|
||||
If you need help building patterns to match your logs, you will find the [Grok
|
||||
Debug](https://grokdebug.herokuapp.com) application quite useful!
|
||||
|
||||
[1]: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
|
||||
|
||||
## Configuration
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
## Files to parse each interval.
|
||||
## These accept standard unix glob matching rules, but with the addition of
|
||||
## ** as a "super asterisk". ie:
|
||||
## /var/log/**.log -> recursively find all .log files in /var/log
|
||||
## /var/log/*/*.log -> find all .log files with a parent dir in /var/log
|
||||
## /var/log/apache.log -> only tail the apache log file
|
||||
files = ["/var/log/apache/access.log"]
|
||||
|
||||
## The dataformat to be read from files
|
||||
## Each data format has its own unique set of configuration options, read
|
||||
## more about them here:
|
||||
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
|
||||
data_format = "grok"
|
||||
|
||||
## This is a list of patterns to check the given log file(s) for.
|
||||
## Note that adding patterns here increases processing time. The most
|
||||
## efficient configuration is to have one pattern.
|
||||
## Other common built-in patterns are:
|
||||
## %{COMMON_LOG_FORMAT} (plain apache & nginx access logs)
|
||||
## %{COMBINED_LOG_FORMAT} (access logs + referrer & agent)
|
||||
grok_patterns = ["%{COMBINED_LOG_FORMAT}"]
|
||||
|
||||
## Full path(s) to custom pattern files.
|
||||
grok_custom_pattern_files = []
|
||||
|
||||
## Custom patterns can also be defined here. Put one pattern per line.
|
||||
grok_custom_patterns = '''
|
||||
'''
|
||||
|
||||
## Timezone allows you to provide an override for timestamps that
|
||||
## don't already include an offset
|
||||
## e.g. 04/06/2016 12:41:45 data one two 5.43µs
|
||||
##
|
||||
## Default: "" which renders UTC
|
||||
## Options are as follows:
|
||||
## 1. Local -- interpret based on machine localtime
|
||||
## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
||||
## 3. UTC -- or blank/unspecified, will return timestamp in UTC
|
||||
grok_timezone = "Canada/Eastern"
|
||||
|
||||
## When set to "disable" timestamp will not incremented if there is a
|
||||
## duplicate.
|
||||
# grok_unique_timestamp = "auto"
|
||||
|
||||
## Enable multiline messages to be processed.
|
||||
# grok_multiline = false
|
||||
```
|
||||
|
||||
### Timestamp Examples
|
||||
|
||||
This example input and config parses a file using a custom timestamp conversion:
|
||||
|
||||
```text
|
||||
2017-02-21 13:10:34 value=42
|
||||
```
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}']
|
||||
```
|
||||
|
||||
This example input and config parses a file using a timestamp in unix time:
|
||||
|
||||
```text
|
||||
1466004605 value=42
|
||||
1466004605.123456789 value=42
|
||||
```
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ['%{NUMBER:timestamp:ts-epoch} value=%{NUMBER:value:int}']
|
||||
```
|
||||
|
||||
This example parses a file using a built-in conversion and a custom pattern:
|
||||
|
||||
```text
|
||||
Wed Apr 12 13:10:34 PST 2017 value=42
|
||||
```
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"]
|
||||
grok_custom_patterns = '''
|
||||
TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR}
|
||||
'''
|
||||
```
|
||||
|
||||
This example input and config parses a file using a custom timestamp conversion
|
||||
that doesn't match any specific standard:
|
||||
|
||||
```text
|
||||
21/02/2017 13:10:34 value=42
|
||||
```
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ['%{MY_TIMESTAMP:timestamp:ts-"02/01/2006 15:04:05"} value=%{NUMBER:value:int}']
|
||||
|
||||
grok_custom_patterns = '''
|
||||
MY_TIMESTAMP (?:\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2})
|
||||
'''
|
||||
```
|
||||
|
||||
For cases where the timestamp itself is without offset, the `timezone` config
|
||||
var is available to denote an offset. By default (with `timezone` either omit,
|
||||
blank or set to `"UTC"`), the times are processed as if in the UTC timezone. If
|
||||
specified as `timezone = "Local"`, the timestamp will be processed based on the
|
||||
current machine timezone configuration. Lastly, if using a timezone from the
|
||||
list of Unix
|
||||
[timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), grok
|
||||
will offset the timestamp accordingly.
|
||||
|
||||
#### TOML Escaping
|
||||
|
||||
When saving patterns to the configuration file, keep in mind the different TOML
|
||||
[string](https://github.com/toml-lang/toml#string) types and the escaping
|
||||
rules for each. These escaping rules must be applied in addition to the
|
||||
escaping required by the grok syntax. Using the Multi-line line literal
|
||||
syntax with `'''` may be useful.
|
||||
|
||||
The following config examples will parse this input file:
|
||||
|
||||
```text
|
||||
|42|\uD83D\uDC2F|'telegraf'|
|
||||
```
|
||||
|
||||
Since `|` is a special character in the grok language, we must escape it to
|
||||
get a literal `|`. With a basic TOML string, special characters such as
|
||||
backslash must be escaped, requiring us to escape the backslash a second time.
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
|
||||
grok_custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+"
|
||||
```
|
||||
|
||||
We cannot use a literal TOML string for the pattern, because we cannot match a
|
||||
`'` within it. However, it works well for the custom pattern.
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
|
||||
grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
|
||||
```
|
||||
|
||||
A multi-line literal string allows us to encode the pattern:
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ['''
|
||||
\|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\|
|
||||
''']
|
||||
grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
|
||||
```
|
||||
|
||||
#### Tips for creating patterns
|
||||
|
||||
Writing complex patterns can be difficult, here is some advice for writing a new
|
||||
pattern or testing a pattern developed
|
||||
[online](https://grokdebug.herokuapp.com).
|
||||
|
||||
Create a file output that writes to stdout, and disable other outputs while
|
||||
testing. This will allow you to see the captured metrics. Keep in mind that
|
||||
the file output will only print once per `flush_interval`.
|
||||
|
||||
```toml
|
||||
[[outputs.file]]
|
||||
files = ["stdout"]
|
||||
```
|
||||
|
||||
- Start with a file containing only a single line of your input.
|
||||
- Remove all but the first token or piece of the line.
|
||||
- Add the section of your pattern to match this piece to your configuration file.
|
||||
- Verify that the metric is parsed successfully by running Telegraf.
|
||||
- If successful, add the next token, update the pattern and retest.
|
||||
- Continue one token at a time until the entire line is successfully parsed.
|
||||
|
||||
#### Performance
|
||||
|
||||
Performance depends heavily on the regular expressions that you use, but there
|
||||
are a few techniques that can help:
|
||||
|
||||
- Avoid using patterns such as `%{DATA}` that will always match.
|
||||
- If possible, add `^` and `$` anchors to your pattern:
|
||||
|
||||
```toml
|
||||
[[inputs.file]]
|
||||
grok_patterns = ["^%{COMBINED_LOG_FORMAT}$"]
|
||||
```
|
43
plugins/parsers/grok/influx_patterns.go
Normal file
43
plugins/parsers/grok/influx_patterns.go
Normal file
|
@ -0,0 +1,43 @@
|
|||
package grok
|
||||
|
||||
//nolint:lll // conditionally long lines allowed
|
||||
const DefaultPatterns = `
|
||||
# Example log file pattern, example log looks like this:
|
||||
# [04/Jun/2016:12:41:45 +0100] 1.25 200 192.168.1.1 5.432µs
|
||||
# Breakdown of the DURATION pattern below:
|
||||
# NUMBER is a builtin logstash grok pattern matching float & int numbers.
|
||||
# [nuµm]? is a regex specifying 0 or 1 of the characters within brackets.
|
||||
# s is also regex, this pattern must end in "s".
|
||||
# so DURATION will match something like '5.324ms' or '6.1µs' or '10s'
|
||||
DURATION %{NUMBER}[nuµm]?s
|
||||
RESPONSE_CODE %{NUMBER:response_code:tag}
|
||||
RESPONSE_TIME %{DURATION:response_time_ns:duration}
|
||||
EXAMPLE_LOG \[%{HTTPDATE:ts:ts-httpd}\] %{NUMBER:myfloat:float} %{RESPONSE_CODE} %{IPORHOST:clientip} %{RESPONSE_TIME}
|
||||
|
||||
# Wider-ranging username matching vs. logstash built-in %{USER}
|
||||
NGUSERNAME [a-zA-Z0-9\.\@\-\+_%]+
|
||||
NGUSER %{NGUSERNAME}
|
||||
# Wider-ranging client IP matching
|
||||
CLIENT (?:%{IPV6}|%{IPV4}|%{HOSTNAME}|%{HOSTPORT})
|
||||
|
||||
##
|
||||
## COMMON LOG PATTERNS
|
||||
##
|
||||
|
||||
# apache & nginx logs, this is also known as the "common log format"
|
||||
# see https://en.wikipedia.org/wiki/Common_Log_Format
|
||||
COMMON_LOG_FORMAT %{CLIENT:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \[%{HTTPDATE:ts:ts-httpd}\] "(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-)
|
||||
|
||||
# Combined log format is the same as the common log format but with the addition
|
||||
# of two quoted strings at the end for "referrer" and "agent"
|
||||
# See Examples at http://httpd.apache.org/docs/current/mod/mod_log_config.html
|
||||
COMBINED_LOG_FORMAT %{COMMON_LOG_FORMAT} "%{DATA:referrer}" "%{DATA:agent}"
|
||||
|
||||
# HTTPD log formats
|
||||
HTTPD20_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{LOGLEVEL:loglevel:tag}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:errormsg}
|
||||
HTTPD24_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{WORD:module}:%{LOGLEVEL:loglevel:tag}\] \[pid %{POSINT:pid:int}:tid %{NUMBER:tid:int}\]( \(%{POSINT:proxy_errorcode:int}\)%{DATA:proxy_errormessage}:)?( \[client %{IPORHOST:client}:%{POSINT:clientport}\])? %{DATA:errorcode}: %{GREEDYDATA:message}
|
||||
HTTPD_ERRORLOG %{HTTPD20_ERRORLOG}|%{HTTPD24_ERRORLOG}
|
||||
|
||||
# DATA spanning multiple lines
|
||||
MULTILINEDATA (.|\n)*
|
||||
`
|
603
plugins/parsers/grok/parser.go
Normal file
603
plugins/parsers/grok/parser.go
Normal file
|
@ -0,0 +1,603 @@
|
|||
package grok
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"bytes"
|
||||
"errors"
|
||||
"fmt"
|
||||
"os"
|
||||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/vjeantet/grok"
|
||||
|
||||
"github.com/influxdata/telegraf"
|
||||
"github.com/influxdata/telegraf/internal"
|
||||
"github.com/influxdata/telegraf/metric"
|
||||
"github.com/influxdata/telegraf/plugins/parsers"
|
||||
)
|
||||
|
||||
var timeLayouts = map[string]string{
|
||||
"ts-ansic": "Mon Jan _2 15:04:05 2006",
|
||||
"ts-unix": "Mon Jan _2 15:04:05 MST 2006",
|
||||
"ts-ruby": "Mon Jan 02 15:04:05 -0700 2006",
|
||||
"ts-rfc822": "02 Jan 06 15:04 MST",
|
||||
"ts-rfc822z": "02 Jan 06 15:04 -0700", // RFC822 with numeric zone
|
||||
"ts-rfc850": "Monday, 02-Jan-06 15:04:05 MST",
|
||||
"ts-rfc1123": "Mon, 02 Jan 2006 15:04:05 MST",
|
||||
"ts-rfc1123z": "Mon, 02 Jan 2006 15:04:05 -0700", // RFC1123 with numeric zone
|
||||
"ts-rfc3339": "2006-01-02T15:04:05Z07:00",
|
||||
"ts-rfc3339nano": "2006-01-02T15:04:05.999999999Z07:00",
|
||||
"ts-httpd": "02/Jan/2006:15:04:05 -0700",
|
||||
// These four are not exactly "layouts", but they are special cases that
|
||||
// will get handled in the ParseLine function.
|
||||
"ts-epoch": "EPOCH",
|
||||
"ts-epochnano": "EPOCH_NANO",
|
||||
"ts-epochmilli": "EPOCH_MILLI",
|
||||
"ts-syslog": "SYSLOG_TIMESTAMP",
|
||||
"ts": "GENERIC_TIMESTAMP", // try parsing all known timestamp layouts.
|
||||
}
|
||||
|
||||
const (
|
||||
Measurement = "measurement"
|
||||
Int = "int"
|
||||
Tag = "tag"
|
||||
Float = "float"
|
||||
String = "string"
|
||||
Duration = "duration"
|
||||
Drop = "drop"
|
||||
Epoch = "EPOCH"
|
||||
EpochMilli = "EPOCH_MILLI"
|
||||
EpochNano = "EPOCH_NANO"
|
||||
SyslogTimestamp = "SYSLOG_TIMESTAMP"
|
||||
GenericTimestamp = "GENERIC_TIMESTAMP"
|
||||
)
|
||||
|
||||
var (
|
||||
// matches named captures that contain a modifier.
|
||||
// ie,
|
||||
// %{NUMBER:bytes:int}
|
||||
// %{IPORHOST:clientip:tag}
|
||||
// %{HTTPDATE:ts1:ts-http}
|
||||
// %{HTTPDATE:ts2:ts-"02 Jan 06 15:04"}
|
||||
modifierRe = regexp.MustCompile(`%{\w+:(\w+):(ts-".+"|t?s?-?\w+)}`)
|
||||
// matches a plain pattern name. ie, %{NUMBER}
|
||||
patternOnlyRe = regexp.MustCompile(`%{(\w+)}`)
|
||||
)
|
||||
|
||||
// Parser is the primary struct to handle and grok-patterns defined in the config toml
|
||||
type Parser struct {
|
||||
Patterns []string `toml:"grok_patterns"`
|
||||
// namedPatterns is a list of internally-assigned names to the patterns
|
||||
// specified by the user in Patterns.
|
||||
// They will look like:
|
||||
// GROK_INTERNAL_PATTERN_0, GROK_INTERNAL_PATTERN_1, etc.
|
||||
NamedPatterns []string `toml:"grok_named_patterns"`
|
||||
CustomPatterns string `toml:"grok_custom_patterns"`
|
||||
CustomPatternFiles []string `toml:"grok_custom_pattern_files"`
|
||||
Multiline bool `toml:"grok_multiline"`
|
||||
Measurement string `toml:"-"`
|
||||
DefaultTags map[string]string `toml:"-"`
|
||||
Log telegraf.Logger `toml:"-"`
|
||||
|
||||
// Timezone is an optional component to help render log dates to
|
||||
// your chosen zone.
|
||||
// Default: "" which renders UTC
|
||||
// Options are as follows:
|
||||
// 1. Local -- interpret based on machine localtime
|
||||
// 2. "America/Chicago" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
||||
// 3. UTC -- or blank/unspecified, will return timestamp in UTC
|
||||
Timezone string `toml:"grok_timezone"`
|
||||
loc *time.Location
|
||||
|
||||
// UniqueTimestamp when set to "disable", timestamp will not incremented if there is a duplicate.
|
||||
UniqueTimestamp string `toml:"grok_unique_timestamp"`
|
||||
|
||||
// typeMap is a map of patterns -> capture name -> modifier,
|
||||
// ie, {
|
||||
// "%{TESTLOG}":
|
||||
// {
|
||||
// "bytes": "int",
|
||||
// "clientip": "tag"
|
||||
// }
|
||||
// }
|
||||
typeMap map[string]map[string]string
|
||||
// tsMap is a map of patterns -> capture name -> timestamp layout.
|
||||
// ie, {
|
||||
// "%{TESTLOG}":
|
||||
// {
|
||||
// "httptime": "02/Jan/2006:15:04:05 -0700"
|
||||
// }
|
||||
// }
|
||||
tsMap map[string]map[string]string
|
||||
// patternsMap is a map of all of the parsed patterns from CustomPatterns
|
||||
// and CustomPatternFiles.
|
||||
// ie, {
|
||||
// "DURATION": "%{NUMBER}[nuµm]?s"
|
||||
// "RESPONSE_CODE": "%{NUMBER:rc:tag}"
|
||||
// }
|
||||
patternsMap map[string]string
|
||||
// foundTSLayouts is a slice of timestamp patterns that have been found
|
||||
// in the log lines. This slice gets updated if the user uses the generic
|
||||
// 'ts' modifier for timestamps. This slice is checked first for matches,
|
||||
// so that previously-matched layouts get priority over all other timestamp
|
||||
// layouts.
|
||||
foundTSLayouts []string
|
||||
|
||||
timeFunc func() time.Time
|
||||
g *grok.Grok
|
||||
tsModder *tsModder
|
||||
}
|
||||
|
||||
// Compile is a bound method to Parser which will process the options for our parser
|
||||
func (p *Parser) Compile() error {
|
||||
p.typeMap = make(map[string]map[string]string)
|
||||
p.tsMap = make(map[string]map[string]string)
|
||||
p.patternsMap = make(map[string]string)
|
||||
p.tsModder = &tsModder{}
|
||||
var err error
|
||||
p.g, err = grok.NewWithConfig(&grok.Config{NamedCapturesOnly: true})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if p.UniqueTimestamp == "" {
|
||||
p.UniqueTimestamp = "auto"
|
||||
}
|
||||
|
||||
// Give Patterns fake names so that they can be treated as named
|
||||
// "custom patterns"
|
||||
p.NamedPatterns = make([]string, 0, len(p.Patterns))
|
||||
for i, pattern := range p.Patterns {
|
||||
pattern = strings.TrimSpace(pattern)
|
||||
if pattern == "" {
|
||||
continue
|
||||
}
|
||||
name := fmt.Sprintf("GROK_INTERNAL_PATTERN_%d", i)
|
||||
p.CustomPatterns += "\n" + name + " " + pattern + "\n"
|
||||
p.NamedPatterns = append(p.NamedPatterns, "%{"+name+"}")
|
||||
}
|
||||
|
||||
if len(p.NamedPatterns) == 0 {
|
||||
return errors.New("pattern required")
|
||||
}
|
||||
|
||||
// Combine user-supplied CustomPatterns with DEFAULT_PATTERNS and parse
|
||||
// them together as the same type of pattern.
|
||||
p.CustomPatterns = DefaultPatterns + p.CustomPatterns
|
||||
if len(p.CustomPatterns) != 0 {
|
||||
scanner := bufio.NewScanner(strings.NewReader(p.CustomPatterns))
|
||||
p.addCustomPatterns(scanner)
|
||||
}
|
||||
|
||||
// Parse any custom pattern files supplied.
|
||||
for _, filename := range p.CustomPatternFiles {
|
||||
file, fileErr := os.Open(filename)
|
||||
if fileErr != nil {
|
||||
return fileErr
|
||||
}
|
||||
|
||||
scanner := bufio.NewScanner(bufio.NewReader(file))
|
||||
p.addCustomPatterns(scanner)
|
||||
}
|
||||
|
||||
p.loc, err = time.LoadLocation(p.Timezone)
|
||||
if err != nil {
|
||||
p.Log.Warnf("Improper timezone supplied (%s), setting loc to UTC", p.Timezone)
|
||||
p.loc = time.UTC
|
||||
}
|
||||
|
||||
if p.timeFunc == nil {
|
||||
p.timeFunc = time.Now
|
||||
}
|
||||
|
||||
return p.compileCustomPatterns()
|
||||
}
|
||||
|
||||
// ParseLine is the primary function to process individual lines, returning the metrics
|
||||
func (p *Parser) ParseLine(line string) (telegraf.Metric, error) {
|
||||
var err error
|
||||
// values are the parsed fields from the log line
|
||||
var values map[string]string
|
||||
// the matching pattern string
|
||||
var patternName string
|
||||
for _, pattern := range p.NamedPatterns {
|
||||
if values, err = p.g.Parse(pattern, line); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if len(values) != 0 {
|
||||
patternName = pattern
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
if len(values) == 0 {
|
||||
p.Log.Debugf("Grok no match found for or no data extracted from: %q", line)
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
fields := make(map[string]interface{})
|
||||
tags := make(map[string]string)
|
||||
|
||||
// add default tags
|
||||
for k, v := range p.DefaultTags {
|
||||
tags[k] = v
|
||||
}
|
||||
|
||||
timestamp := time.Now()
|
||||
for k, v := range values {
|
||||
if k == "" || v == "" {
|
||||
continue
|
||||
}
|
||||
// t is the modifier of the field
|
||||
var t string
|
||||
// check if pattern has some modifiers
|
||||
if types, ok := p.typeMap[patternName]; ok {
|
||||
t = types[k]
|
||||
}
|
||||
// if we didn't find a modifier, check if we have a timestamp layout
|
||||
if t == "" {
|
||||
if ts, ok := p.tsMap[patternName]; ok {
|
||||
// check if the modifier is a timestamp layout
|
||||
if layout, ok := ts[k]; ok {
|
||||
t = layout
|
||||
}
|
||||
}
|
||||
}
|
||||
// if we didn't find a type OR timestamp modifier, assume string
|
||||
if t == "" {
|
||||
t = String
|
||||
}
|
||||
|
||||
switch t {
|
||||
case Measurement:
|
||||
p.Measurement = v
|
||||
case Int:
|
||||
iv, err := strconv.ParseInt(v, 0, 64)
|
||||
if err != nil {
|
||||
p.Log.Errorf("Error parsing %s to int: %s", v, err)
|
||||
} else {
|
||||
fields[k] = iv
|
||||
}
|
||||
case Float:
|
||||
fv, err := strconv.ParseFloat(v, 64)
|
||||
if err != nil {
|
||||
p.Log.Errorf("Error parsing %s to float: %s", v, err)
|
||||
} else {
|
||||
fields[k] = fv
|
||||
}
|
||||
case Duration:
|
||||
d, err := time.ParseDuration(v)
|
||||
if err != nil {
|
||||
p.Log.Errorf("Error parsing %s to duration: %s", v, err)
|
||||
} else {
|
||||
fields[k] = int64(d)
|
||||
}
|
||||
case Tag:
|
||||
tags[k] = v
|
||||
case String:
|
||||
fields[k] = v
|
||||
case Epoch:
|
||||
parts := strings.SplitN(v, ".", 2)
|
||||
if len(parts) == 0 {
|
||||
p.Log.Errorf("Error parsing %s to timestamp: %s", v, err)
|
||||
break
|
||||
}
|
||||
|
||||
sec, err := strconv.ParseInt(parts[0], 10, 64)
|
||||
if err != nil {
|
||||
p.Log.Errorf("Error parsing %s to timestamp: %s", v, err)
|
||||
break
|
||||
}
|
||||
ts := time.Unix(sec, 0)
|
||||
|
||||
if len(parts) == 2 {
|
||||
padded := fmt.Sprintf("%-9s", parts[1])
|
||||
nsString := strings.ReplaceAll(padded[:9], " ", "0")
|
||||
nanosec, err := strconv.ParseInt(nsString, 10, 64)
|
||||
if err != nil {
|
||||
p.Log.Errorf("Error parsing %s to timestamp: %s", v, err)
|
||||
break
|
||||
}
|
||||
ts = ts.Add(time.Duration(nanosec) * time.Nanosecond)
|
||||
}
|
||||
timestamp = ts
|
||||
case EpochMilli:
|
||||
ms, err := strconv.ParseInt(v, 10, 64)
|
||||
if err != nil {
|
||||
p.Log.Errorf("Error parsing %s to int: %s", v, err)
|
||||
} else {
|
||||
timestamp = time.Unix(0, ms*int64(time.Millisecond))
|
||||
}
|
||||
case EpochNano:
|
||||
iv, err := strconv.ParseInt(v, 10, 64)
|
||||
if err != nil {
|
||||
p.Log.Errorf("Error parsing %s to int: %s", v, err)
|
||||
} else {
|
||||
timestamp = time.Unix(0, iv)
|
||||
}
|
||||
case SyslogTimestamp:
|
||||
ts, err := internal.ParseTimestamp(time.Stamp, v, p.loc)
|
||||
if err == nil {
|
||||
if ts.Year() == 0 {
|
||||
ts = ts.AddDate(timestamp.Year(), 0, 0)
|
||||
}
|
||||
timestamp = ts
|
||||
} else {
|
||||
p.Log.Errorf("Error parsing %s to time layout [%s]: %s", v, t, err)
|
||||
}
|
||||
case GenericTimestamp:
|
||||
var foundTS bool
|
||||
// first try timestamp layouts that we've already found
|
||||
for _, layout := range p.foundTSLayouts {
|
||||
ts, err := internal.ParseTimestamp(layout, v, p.loc)
|
||||
if err == nil {
|
||||
timestamp = ts
|
||||
foundTS = true
|
||||
break
|
||||
}
|
||||
}
|
||||
// if we haven't found a timestamp layout yet, try all timestamp
|
||||
// layouts.
|
||||
if !foundTS {
|
||||
for _, layout := range timeLayouts {
|
||||
ts, err := internal.ParseTimestamp(layout, v, p.loc)
|
||||
if err == nil {
|
||||
timestamp = ts
|
||||
foundTS = true
|
||||
p.foundTSLayouts = append(p.foundTSLayouts, layout)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
// if we still haven't found a timestamp layout, log it and we will
|
||||
// just use time.Now()
|
||||
if !foundTS {
|
||||
p.Log.Errorf("Error parsing timestamp [%s], could not find any "+
|
||||
"suitable time layouts.", v)
|
||||
}
|
||||
case Drop:
|
||||
// goodbye!
|
||||
default:
|
||||
v = strings.ReplaceAll(v, ",", ".")
|
||||
ts, err := internal.ParseTimestamp(t, v, p.loc)
|
||||
if err == nil {
|
||||
if ts.Year() == 0 {
|
||||
ts = ts.AddDate(timestamp.Year(), 0, 0)
|
||||
}
|
||||
timestamp = ts
|
||||
} else {
|
||||
p.Log.Errorf("Error parsing %s to time layout [%s]: %s", v, t, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if p.UniqueTimestamp != "auto" {
|
||||
return metric.New(p.Measurement, tags, fields, timestamp), nil
|
||||
}
|
||||
|
||||
return metric.New(p.Measurement, tags, fields, p.tsModder.tsMod(timestamp)), nil
|
||||
}
|
||||
|
||||
func (p *Parser) Parse(buf []byte) ([]telegraf.Metric, error) {
|
||||
metrics := make([]telegraf.Metric, 0)
|
||||
|
||||
if p.Multiline {
|
||||
m, err := p.ParseLine(string(buf))
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if m != nil {
|
||||
metrics = append(metrics, m)
|
||||
}
|
||||
return metrics, nil
|
||||
}
|
||||
|
||||
scanner := bufio.NewScanner(bytes.NewReader(buf))
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
m, err := p.ParseLine(line)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
if m == nil {
|
||||
continue
|
||||
}
|
||||
metrics = append(metrics, m)
|
||||
}
|
||||
|
||||
return metrics, nil
|
||||
}
|
||||
|
||||
func (p *Parser) SetDefaultTags(tags map[string]string) {
|
||||
p.DefaultTags = tags
|
||||
}
|
||||
|
||||
func (p *Parser) addCustomPatterns(scanner *bufio.Scanner) {
|
||||
for scanner.Scan() {
|
||||
line := strings.TrimSpace(scanner.Text())
|
||||
if len(line) > 0 && line[0] != '#' {
|
||||
names := strings.SplitN(line, " ", 2)
|
||||
p.patternsMap[names[0]] = names[1]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (p *Parser) compileCustomPatterns() error {
|
||||
var err error
|
||||
// check if the pattern contains a subpattern that is already defined
|
||||
// replace it with the subpattern for modifier inheritance.
|
||||
for i := 0; i < 2; i++ {
|
||||
for name, pattern := range p.patternsMap {
|
||||
subNames := patternOnlyRe.FindAllStringSubmatch(pattern, -1)
|
||||
for _, subName := range subNames {
|
||||
if subPattern, ok := p.patternsMap[subName[1]]; ok {
|
||||
pattern = strings.Replace(pattern, subName[0], subPattern, 1)
|
||||
}
|
||||
}
|
||||
p.patternsMap[name] = pattern
|
||||
}
|
||||
}
|
||||
|
||||
// check if pattern contains modifiers. Parse them out if it does.
|
||||
for name, pattern := range p.patternsMap {
|
||||
if modifierRe.MatchString(pattern) {
|
||||
// this pattern has modifiers, so parse out the modifiers
|
||||
pattern, err = p.parseTypedCaptures(name, pattern)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
p.patternsMap[name] = pattern
|
||||
}
|
||||
}
|
||||
|
||||
return p.g.AddPatternsFromMap(p.patternsMap)
|
||||
}
|
||||
|
||||
// parseTypedCaptures parses the capture modifiers, and then deletes the
|
||||
// modifier from the line so that it is a valid "grok" pattern again.
|
||||
//
|
||||
// ie,
|
||||
// %{NUMBER:bytes:int} => %{NUMBER:bytes} (stores %{NUMBER}->bytes->int)
|
||||
// %{IPORHOST:clientip:tag} => %{IPORHOST:clientip} (stores %{IPORHOST}->clientip->tag)
|
||||
func (p *Parser) parseTypedCaptures(name, pattern string) (string, error) {
|
||||
matches := modifierRe.FindAllStringSubmatch(pattern, -1)
|
||||
|
||||
// grab the name of the capture pattern
|
||||
patternName := "%{" + name + "}"
|
||||
// create type map for this pattern
|
||||
p.typeMap[patternName] = make(map[string]string)
|
||||
p.tsMap[patternName] = make(map[string]string)
|
||||
|
||||
// boolean to verify that each pattern only has a single ts- data type.
|
||||
hasTimestamp := false
|
||||
for _, match := range matches {
|
||||
// regex capture 1 is the name of the capture
|
||||
// regex capture 2 is the modifier of the capture
|
||||
if strings.HasPrefix(match[2], "ts") {
|
||||
if hasTimestamp {
|
||||
return pattern, fmt.Errorf("logparser pattern compile error: "+
|
||||
"Each pattern is allowed only one named "+
|
||||
"timestamp data type. pattern: %s", pattern)
|
||||
}
|
||||
if layout, ok := timeLayouts[match[2]]; ok {
|
||||
// built-in time format
|
||||
p.tsMap[patternName][match[1]] = layout
|
||||
} else {
|
||||
// custom time format
|
||||
p.tsMap[patternName][match[1]] = strings.TrimSuffix(strings.TrimPrefix(match[2], `ts-"`), `"`)
|
||||
}
|
||||
hasTimestamp = true
|
||||
} else {
|
||||
p.typeMap[patternName][match[1]] = match[2]
|
||||
}
|
||||
|
||||
// the modifier is not a valid part of a "grok" pattern, so remove it
|
||||
// from the pattern.
|
||||
pattern = strings.Replace(pattern, ":"+match[2]+"}", "}", 1)
|
||||
}
|
||||
|
||||
return pattern, nil
|
||||
}
|
||||
|
||||
// tsModder is a struct for incrementing identical timestamps of log lines
|
||||
// so that we don't push identical metrics that will get overwritten.
|
||||
type tsModder struct {
|
||||
dupe time.Time
|
||||
last time.Time
|
||||
incr time.Duration
|
||||
incrn time.Duration
|
||||
rollover time.Duration
|
||||
}
|
||||
|
||||
// tsMod increments the given timestamp one unit more from the previous
|
||||
// duplicate timestamp.
|
||||
// the increment unit is determined as the next smallest time unit below the
|
||||
// most significant time unit of ts.
|
||||
//
|
||||
// ie, if the input is at ms precision, it will increment it 1µs.
|
||||
func (t *tsModder) tsMod(ts time.Time) time.Time {
|
||||
if ts.IsZero() {
|
||||
return ts
|
||||
}
|
||||
defer func() { t.last = ts }()
|
||||
// don't mod the time if we don't need to
|
||||
if t.last.IsZero() || ts.IsZero() {
|
||||
t.incrn = 0
|
||||
t.rollover = 0
|
||||
return ts
|
||||
}
|
||||
if !ts.Equal(t.last) && !ts.Equal(t.dupe) {
|
||||
t.incr = 0
|
||||
t.incrn = 0
|
||||
t.rollover = 0
|
||||
return ts
|
||||
}
|
||||
if ts.Equal(t.last) {
|
||||
t.dupe = ts
|
||||
}
|
||||
|
||||
if ts.Equal(t.dupe) && t.incr == time.Duration(0) {
|
||||
tsNano := ts.UnixNano()
|
||||
|
||||
d := int64(10)
|
||||
counter := 1
|
||||
for {
|
||||
a := tsNano % d
|
||||
if a > 0 {
|
||||
break
|
||||
}
|
||||
d = d * 10
|
||||
counter++
|
||||
}
|
||||
|
||||
switch {
|
||||
case counter <= 6:
|
||||
t.incr = time.Nanosecond
|
||||
case counter <= 9:
|
||||
t.incr = time.Microsecond
|
||||
case counter > 9:
|
||||
t.incr = time.Millisecond
|
||||
}
|
||||
}
|
||||
|
||||
t.incrn++
|
||||
if t.incrn == 999 && t.incr > time.Nanosecond {
|
||||
t.rollover = t.incr * t.incrn
|
||||
t.incrn = 1
|
||||
t.incr = t.incr / 1000
|
||||
if t.incr < time.Nanosecond {
|
||||
t.incr = time.Nanosecond
|
||||
}
|
||||
}
|
||||
return ts.Add(t.incr*t.incrn + t.rollover)
|
||||
}
|
||||
|
||||
func (p *Parser) Init() error {
|
||||
if len(p.Patterns) == 0 {
|
||||
p.Patterns = []string{"%{COMBINED_LOG_FORMAT}"}
|
||||
}
|
||||
|
||||
if p.UniqueTimestamp == "" {
|
||||
p.UniqueTimestamp = "auto"
|
||||
}
|
||||
|
||||
if p.Timezone == "" {
|
||||
p.Timezone = "UTC"
|
||||
}
|
||||
|
||||
return p.Compile()
|
||||
}
|
||||
|
||||
func init() {
|
||||
parsers.Add("grok",
|
||||
func(defaultMetricName string) telegraf.Parser {
|
||||
return &Parser{
|
||||
Measurement: defaultMetricName,
|
||||
}
|
||||
},
|
||||
)
|
||||
}
|
1243
plugins/parsers/grok/parser_test.go
Normal file
1243
plugins/parsers/grok/parser_test.go
Normal file
File diff suppressed because it is too large
Load diff
14
plugins/parsers/grok/testdata/test-patterns
vendored
Normal file
14
plugins/parsers/grok/testdata/test-patterns
vendored
Normal file
|
@ -0,0 +1,14 @@
|
|||
# Test A log line:
|
||||
# [04/Jun/2016:12:41:45 +0100] 1.25 200 192.168.1.1 5.432µs 101
|
||||
DURATION %{NUMBER}[nuµm]?s
|
||||
RESPONSE_CODE %{NUMBER:response_code:tag}
|
||||
RESPONSE_TIME %{DURATION:response_time:duration}
|
||||
TEST_LOG_A \[%{HTTPDATE:timestamp:ts-httpd}\] %{NUMBER:myfloat:float} %{RESPONSE_CODE} %{IPORHOST:clientip} %{RESPONSE_TIME} %{NUMBER:myint:int}
|
||||
|
||||
# Test B log line:
|
||||
# [04/06/2016--12:41:45] 1.25 mystring dropme nomodifier
|
||||
TEST_TIMESTAMP %{MONTHDAY}/%{MONTHNUM}/%{YEAR}--%{TIME}
|
||||
TEST_LOG_B \[%{TEST_TIMESTAMP:timestamp:ts-"02/01/2006--15:04:05"}\] %{NUMBER:myfloat:float} %{WORD:mystring:string} %{WORD:dropme:drop} %{WORD:nomodifier}
|
||||
|
||||
TEST_TIMESTAMP %{MONTHDAY}/%{MONTHNUM}/%{YEAR}--%{TIME}
|
||||
TEST_LOG_BAD \[%{TEST_TIMESTAMP:timestamp:ts-"02/01/2006--15:04:05"}\] %{NUMBER:myfloat:float} %{WORD:mystring:int} %{WORD:dropme:drop} %{WORD:nomodifier}
|
1
plugins/parsers/grok/testdata/test_a.log
vendored
Normal file
1
plugins/parsers/grok/testdata/test_a.log
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
[04/Jun/2016:12:41:45 +0100] 1.25 200 192.168.1.1 5.432µs 101
|
1
plugins/parsers/grok/testdata/test_b.log
vendored
Normal file
1
plugins/parsers/grok/testdata/test_b.log
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
[04/06/2016--12:41:45] 1.25 mystring dropme nomodifier
|
3
plugins/parsers/grok/testdata/test_multiline.log
vendored
Normal file
3
plugins/parsers/grok/testdata/test_multiline.log
vendored
Normal file
|
@ -0,0 +1,3 @@
|
|||
2022-12-01T12:41:45Z Error A long and
|
||||
multiline
|
||||
message
|
Loading…
Add table
Add a link
Reference in a new issue