logstash本地调试日志方法详解

文章发布较早，内容可能过时，阅读注意甄别。

以前在新增项目需要添加日志采集配置的时候，我的操作流程都是 filebeat 添加配置采集，然后 logstash 配置中转，接着就创建索引交付了，现在来看，这个流程不无不妥，但是在配置 logstash 的时候，往往简单化地参考了以往项目的配置，拿着就上了，事实上，这样是有问题的。

因为以往的操作中，忽略了一个重要的环节，那就是将日志当中的时间问题，按照以往的操作，可能日志采集也是通的，看起来似乎没啥问题，但是如果时间格式不注意，可能会出问题，所以这个问题，应该首先在正式上之前，先自行测试齐备。

这篇文章记录一下测试的整个流程以及注意点，而不过度解读时间格式，时间格式这个问题会在另外一篇文章中详细探讨。

# 1，准备。

主机：centos-7.4。

logstash：6.5.4。

在实际生产环境之外，自己搭建专有的 elk 测试环境，然后进行测试。或者使用如下几条命令，在一台主机安装 logstash 环境，进行测试。

# 1，安装依赖。

主要是依赖 jdk 环境。

yum -y install lrzsz vim curl wget java ntpdate && ntpdate -u cn.pool.ntp.org

# 2，配置 yum 源。

添加源：

cat > /etc/yum.repos.d/elk.repo << EOF
[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF

1
2
3
4
5
6
7
8
9
10

导入 key：

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

如果执行这一步报错，则有可能是主机时间问题，可以同步一下主机时间，在执行！

# 3，安装 logstash。

yum -y install logstash-6.5.4

安装之后就可以直接进行如下的测试了，不需要配置其他任何东东。

# 2，思路。

这里不搞太过复杂，不需要从头配置走通整个采集流程，主要是为了验证 logstash 解析的配置，所以直接在配置里读取本地文件，以模拟日志文件，然后通过输出到控制台的方式，进行判断分析。

# 3，两种模式。

先放一个常规配置，然后根据配置来解释含义：

input {
	file {
		path 			=> ["/data/log/test.log"]  # 读取本地日志目录
		start_position 	=> "beginning"  # 每次重新启动从日志的开头读
		sincedb_path 	=> "/dev/null"  # 该参数用来指定sincedb文件名，但是如果我们设置为/dev/null这个linux系统上特殊的空洞文件，那么logstash每次重启进程的时候，尝试读取sincedb内容，都只会读到空洞，也就可以理解为前不有过运行记录，自然就从初始位置开始读取了！
		codec 			=> "json"  # 如果日志格式是json的，可以启动此参数
	}
 }

output {
	stdout {
		codec => rubydebug  # 以rubydebug格式在控制台输出
		# codec => json  # 以json格式在控制台输出
	}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

接下来先分别看一下两个输出模式的内容有什么不一样。

# 1，json 格式输出。

使用配置文件如下：

$cat /usr/share/logstash/test.conf  # 因为安装之后在这里，所以测试的配置文件直接放到这里了

input {
	file {
		path 			=> ["/data/log/test.log"]
		start_position 	=> "beginning"
		sincedb_path 	=> "/dev/null"
	 #	codec 			=> "json" # 因为是测试，随便写点数据，因此先注释此配置
	}
 }

output {
	stdout {
		# codec => rubydebug
		codec => json
	}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

配置完成之后，可用如下命令先检测一下配置有没有问题：

$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/test.conf -t

WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2019-10-12 17:29:35.035 [main] writabledirectory - Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}
[INFO ] 2019-10-12 17:29:35.050 [main] writabledirectory - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/share/logstash/data/dead_letter_queue"}
[WARN ] 2019-10-12 17:29:35.707 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
Configuration OK
[INFO ] 2019-10-12 17:29:38.302 [LogStash::Runner] runner - Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash

1
2
3
4
5
6
7
8
9

看到了有 OK 的输出，说明配置没有问题，接下来可以去掉 -t，直接监听测试的日志目录内容：

$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/test.conf

WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2019-10-12 17:32:18.537 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2019-10-12 17:32:18.562 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.5.4"}
[INFO ] 2019-10-12 17:32:18.609 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"9f86b0f6-ea25-476a-b6ad-627f02e0d720", :path=>"/usr/share/logstash/data/uuid"}
[INFO ] 2019-10-12 17:32:22.343 [Converge PipelineAction::Create<main>] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[INFO ] 2019-10-12 17:32:22.836 [Converge PipelineAction::Create<main>] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x1a225873 run>"}
[INFO ] 2019-10-12 17:32:22.962 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2019-10-12 17:32:23.172 [[main]<file] observingtail - START, creating Discoverer, Watch with file and sincedb collections
[INFO ] 2019-10-12 17:32:23.688 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}

1
2
3
4
5
6
7
8
9
10
11
12

启动之后，将会在前台进行监听，现在随便写一条内容放到 log 里边，看看有什么效果：

$ cat test.log

this is test json

1
2
3

然后看到控制台输出如下：

{
  "@version": "1",
  "host": "localhost.localdomain",
  "message": "this is test json",
  "tags": ["_jsonparsefailure"],
  "@timestamp": "2019-10-12T09:35:25.836Z",
  "path": "/data/log/test.log"
}

1
2
3
4
5
6
7
8

注意此条内容中，时间是 9 点，而上边的输出则是当前时间 17 点，这是因为应用自身的时区问题，logstash 采用 UTC 的时间，比我们当前时间晚 8 个小时，这个时区问题，会在其他地方细讲，这里不赘述。

# 2，rubydebug 格式输出。

同样，还是先调整配置：

$cat /usr/share/logstash/test.conf

input {
	file {
		path 			=> ["/data/log/test.log"]
		start_position 	=> "beginning"
		sincedb_path 	=> "/dev/null"
	 #	codec 			=> "json" # 因为是测试，随便写点数据，因此先注释此配置
	}
 }

output {
	stdout {
		codec => rubydebug
	# codec => json
	}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

然后启动此条管道。

$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/test.conf

接着写一条日志，可以看到输出如下：

$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/test.conf

WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2019-10-12 17:40:07.301 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2019-10-12 17:40:07.332 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.5.4"}
[INFO ] 2019-10-12 17:40:12.008 [Converge PipelineAction::Create<main>] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[INFO ] 2019-10-12 17:40:12.555 [Converge PipelineAction::Create<main>] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x2d0a9cba sleep>"}
[INFO ] 2019-10-12 17:40:12.651 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2019-10-12 17:40:12.680 [[main]<file] observingtail - START, creating Discoverer, Watch with file and sincedb collections
[INFO ] 2019-10-12 17:40:13.350 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}

{
    "@timestamp" => 2019-10-12T09:40:13.929Z,
          "host" => "localhost.localdomain",
          "path" => "/data/log/test.log",
       "message" => "this is test json",
      "@version" => "1"
}

{
    "@timestamp" => 2019-10-12T09:41:15.295Z,
          "host" => "localhost.localdomain",
          "path" => "/data/log/test.log",
       "message" => "this is test rubydebug",
      "@version" => "1"
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

可以看到，这种输出能够更加清晰的呈现出日志被解析之后的样式，相当于把上边的 json 格式化了，从而便于观察，因此接下来的调试就用这种方案了。

当然还有另外两条是否从文件开头读取日志的参数，这个就在此演示了，有兴趣可以自己做实验。

# 4，时间格式调试。

学会了上边的调试方法，接下来就可以进行一波真刀真枪的调试了。

从上到下没少提时间格式这个名词，不过貌似还没有对其进行简单解释，其实很简单，就是我们希望日志在 kibana 当中呈现的时候，时间戳是与日志当中当条日志记录的时间戳一致，这听起来似乎理所应当，但是如果我们没有用合适的配置对时间格式进行匹配，那么很可能最终结果将不是我们所想要的。

这段话应当反复研读，以认同这一配置细节的重要性，然后才能有信心继续往内部深挖。

举个例子，比如我本条日志里的时间是2019-10-12T09:41:15，我希望写入到 elasticsearch 的时候，这条日志的时间也是这个，而不是提前或者延后，那么就需要进行严格的时间格式配置，一旦配置是错误的，那么 logstash 会将采集到日志的时间作为此条日志的时间戳，这个误差，在某些特定搜索场景下，可能会导致结果失准，那么接下来，就来实际验证一下刚刚的这两种情况。

比如我有如下日志一条：

{
  "timestamp": "2019-10-10T15:37:08+08:00",
  "level": "info",
  "message": "这是一条测试日志",
  "content": "{\"taskId\":\"213292\",\"checkList\":[\"20191012\"],\"userinfoId\":\"28881951\"}",
  "trace_id": "461d7df6633fdebda346d7e73e6813e8"
}

1
2
3
4
5
6
7

今天是阳历 2019-10-12，我这里为了展示效果，就把日志的时间调整为了两天前 2019-10-10，希望日志采集之后写入到 es 的时间也是 2019-10-10，但是如果配置不对，那么时间也将跑偏。

# 1，错误示范。

首先还是写好解析的配置，因为这里需要针对时间进行解析了，就要用到了 data 关键字，我以往不太注意这个地方，可能随便拷贝了其他类似日志的配置，大概如下：

input {
	file {
		path 			=> ["/data/log/test.log"]
		start_position 	=> "beginning"
		sincedb_path 	=> "/dev/null"
		codec 			=> "json"
	}
}

filter {
	date {
		match => ["time", "yyyy-MM-dd HH:mm:ss"] # 获取time字段的时间戳，作为es的@timestamp
		remove_field => "time"  # 丢弃掉日志中的time字段
	}
}

output {
	stdout {
		codec => rubydebug
	}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

现在整条配置的核心是中间的 filter，但是不解里边内容的真意，胡乱复制过来，就用上了，但事实是有问题的，具体问题，马上就能看到。

那么，现在继续加载此配置进行监听，看看输出什么，关于日志生成的调试，我个人一个小小心得是，先把日志放到另一个文件中，然后用 cat 模拟应用将日志写入 log 文件。

$ cat a
{"timestamp":"2019-10-10T15:37:08+08:00","level":"info","message":"这是一条测试日志","content":"{\"taskId\":\"213292\",\"checkList\":[\"20191012\"],\"userinfoId\":\"28881951\"}","trace_id":"461d7df6633fdebda346d7e73e6813e8"}

$ cat a >> test.log

1
2
3
4

接着可以看到控制台那边的输出：

$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/test.conf

。。。。。。
打印信息忽略
。。。。。。

{
       "message" => "这是一条测试日志",
    "@timestamp" => 2019-10-12T10:20:14.733Z,
     "timestamp" => "2019-10-10T15:37:08+08:00",
         "level" => "info",
      "@version" => "1",
       "content" => "{\"taskId\":\"213292\",\"checkList\":[\"20191012\"],\"userinfoId\":\"28881951\"}",
      "trace_id" => "461d7df6633fdebda346d7e73e6813e8",
          "host" => "localhost.localdomain",
          "path" => "/data/log/test.log"
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

从这里可以很明显看到，我们日志是 10 生成的，结果记录到 es 里边的 @timestamp字段中，确实当前时间，这不是我们想要的。

# 2，正确示范。

现在再来重新审视一下关于时间格式解析的配置文件：

	date {
		match => ["time", "yyyy-MM-dd HH:mm:ss"]
	}

1
2
3

date 关键字就是针对时间格式进行处理的，可以将日志当中一些不符合默认标准($time_iso8601)的进行处理，从而让日志以我们想要的效果展示。

match 用于配置具体的匹配内容规则，前边的 time 表示匹配实际日志当中的时间戳的名称，这里用 time 明显是不对的，因为原始日志里边时间戳的字段名是 timestamp，后边的是匹配具体日志时间戳规则的，这个内容，将专门在另外文章讲解，这里不详细展开。

理解了这个地方的要点之后，得出如下配置：

input {
	file {
		path 			=> ["/data/log/test.log"]
		start_position 	=> "beginning"
		sincedb_path 	=> "/dev/null"
		codec 			=> "json"
	}
}

filter {
	date {
		match => [ "timestamp" , "yyyy-MM-dd'T'HH:mm:ss+08:00" ]
		remove_field => "timestamp"
	}
}

output {
	stdout {
		codec => rubydebug
	}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

写入到配置中去，然后重复上边的操作，将日志写入，看看控制台输出：

$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/test.conf

。。。。。。
打印信息忽略
。。。。。。

{
          "host" => "localhost.localdomain",
       "message" => "这是一条测试日志",
    "@timestamp" => 2019-10-10T07:37:08.000Z,
      "trace_id" => "461d7df6633fdebda346d7e73e6813e8",
     "timestamp" => "2019-10-10T15:37:08+08:00",
       "content" => "{\"taskId\":\"213292\",\"checkList\":[\"20191012\"],\"userinfoId\":\"28881951\"}",
          "path" => "/data/log/test.log",
      "@version" => "1",
         "level" => "info"
}