ansible-playbook编排使用tips原创

文章发布较早，内容可能过时，阅读注意甄别。

# 0，常用变量

在日常配置编排过程中，我们经常需要用到一些内置的变量来进行一些判断或者配置的工作，这里整理一些常用的变量，以便于使用查阅。

部署客户端主机名：ansible_hostname
部署客户端主机 IP：ansible_default_ipv4.address
部署客户端主机详细信息：hostvars，返回主机详细信息，可以通过点操作定位具体需要的内容。

# 1，判断错误

有时候我们在部署服务的时候，会针对一些服务状态进行检测，从而依据检测结果来判断是否将主机放回到负载列表当中，这里举一个 web 检测的例子，首先通过 NGINX 定义了一个返回值：

$ curl localhost/get_info
{"status":"success","result":"hello world!"}

1
2

然后定义剧本如下：

$ cat site.yaml
---
- hosts: localhost
  tasks:
    - uri:
        url: "http://localhost/get_info"
        method: GET
      register: webpage
      failed_when: webpage.status != 200
    - debug:
        var: webpage

1
2
3
4
5
6
7
8
9
10
11

这里使用本机进行验证此类功能，直接通过如下命令即可运行：

$ ansible-playbook site.yaml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [localhost] ********************************************************************************************************************************************************************
TASK [Gathering Facts] **************************************************************************************************************************************************************
ok: [localhost]
TASK [uri] **************************************************************************************************************************************************************************
ok: [localhost]
TASK [debug] ************************************************************************************************************************************************************************
ok: [localhost] => {
    "webpage": {
        "changed": false,
        "connection": "close",
        "content_length": "44",
        "content_type": "application/json",
        "cookies": {},
        "cookies_string": "",
        "date": "Tue, 02 Jun 2020 06:30:53 GMT",
        "elapsed": 0,
        "failed": false,
        "failed_when_result": false,
        "json": {
            "result": "hello world!",
            "status": "success"
        },
        "msg": "OK (44 bytes)",
        "redirected": false,
        "server": "openresty",
        "status": 200,
        "url": "http://localhost/get_info"
    }
}
PLAY RECAP **************************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

上边通过输出的状态码进行判断，除此之外，还可以通过输出当中的具体内容匹配来进行判断，比如上边内容当中，我们看到对应的 json 字段里输出的返回内容，也可以根据具体内容来进行判断，此时调整剧本内容如下：

---
- hosts: localhost
  tasks:
    - uri:
        url: "http://localhost/get_info"
        method: GET
      register: webpage
      failed_when: webpage.status != 200
    - fail:
        msg: "check status field"
      when: "'hello' not in webpage.json.result"

1
2
3
4
5
6
7
8
9
10
11

webpage.json.result可以将返回的内容定位到相应位置，然后再进行判断，当然不少时候也可以用如下方式判断：

---
- hosts: localhost
  tasks:
    - uri:
        url: "http://localhost/get_info"
        method: GET
        return_content: yes
      register: webpage
      failed_when: webpage.status != 200
    - fail:
        msg: "check status field"
      when: "'hello' not in webpage.content"

1
2
3
4
5
6
7
8
9
10
11
12

多了一个参数 return_content: yes，但需要注意的一个地方是：webpage.content中的 json 字段是无法展开的。

# 2，只跑一次

run_once: true

有一些在列表中的任务，只需要跑一次就好，比如上线部署剧本中，发布列表可能有 10 台，那么如下步骤中，一些地方就不需要执行 10 次了，比如生成版本号以及同步到本地的两个步骤。

---
- name: "创建远程主机上的版本目录"
  file: path=/release/{{project}}/{{_version}} state=directory
  tags: deploy
- name: "将代码同步到远程主机版本目录"
  synchronize:
    src: /{{WORKSPACE}}/
    dest: /release/{{project}}/{{_version}}/
    delete: yes
  tags: deploy
- name: "将项目部署到生产目录"
  file: path=/data/www/{{project}} state=link src=/release/{{project}}/{{_version}}
  tags: deploy
- name: "使版本目录保持五个版本历史"
  script: chdir=/release/{{project}} keepfive.sh
  tags: deploy
- name: "生成远程版本号"
  shell: "ls /release/{{project}} > /release/{{project}}.log"
  tags: deploy
- name: "同步版本号到本地"
  synchronize: "src=/release/{{project}}.log dest=/root/.jenkins/version/{{JOB_NAME}}.log mode=pull"
  tags: deploy
- name: "执行自由脚本"
  script: chdir=/release/{{project}} free.sh
  tags: deploy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

通过添加 run_once: true可以定义对应的任务只执行一次：

---
- name: "创建远程主机上的版本目录"
  file: path=/release/{{project}}/{{_version}} state=directory
  tags: deploy
- name: "将代码同步到远程主机版本目录"
  synchronize:
    src: /{{WORKSPACE}}/
    dest: /release/{{project}}/{{_version}}/
    delete: yes
  tags: deploy
- name: "将项目部署到生产目录"
  file: path=/data/www/{{project}} state=link src=/release/{{project}}/{{_version}}
  tags: deploy
- name: "使版本目录保持五个版本历史"
  script: chdir=/release/{{project}} keepfive.sh
  tags: deploy
- name: "生成远程版本号"
  shell: "ls /release/{{project}} > /release/{{project}}.log"
  run_once: true
  tags: deploy
- name: "同步版本号到本地"
  synchronize: "src=/release/{{project}}.log dest=/root/.jenkins/version/{{JOB_NAME}}.log mode=pull"
  run_once: true
  tags: deploy
- name: "执行自由脚本"
  script: chdir=/release/{{project}} free.sh
  tags: deploy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

这样即便是发布了 10 台机器，那么版本号这两步也就只执行一次。

# 3，滚动更新

ansible 默认情况下，执行剧本时是从剧本角度出发，每个任务在主机列表中逐一执行，有时候我们发布一些服务，需要一台一台从负载中摘出部署，然后放回去，这个时候，走默认的方向就不太合适了。用如下一个简单例子来说明：

$ cat site.yml
---
- hosts: remote
  tasks:
    - debug:
        msg: "this is task a"
    - debug:
        msg: "this is task b"

1
2
3
4
5
6
7
8

执行如上剧本看输出：

ansible-playbook -i test_hosts s.yaml
PLAY [remote] ***********************************************************************************************************************************************************************
TASK [Gathering Facts] **************************************************************************************************************************************************************
ok: [10.3.22.90]
ok: [10.3.22.87]
TASK [debug] ************************************************************************************************************************************************************************
ok: [10.3.22.90] => {
    "msg": "this is task a"
}
ok: [10.3.22.87] => {
    "msg": "this is task a"
}
TASK [debug] ************************************************************************************************************************************************************************
ok: [10.3.22.90] => {
    "msg": "this is task b"
}
ok: [10.3.22.87] => {
    "msg": "this is task b"
}
PLAY RECAP **************************************************************************************************************************************************************************
10.3.22.87                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.3.22.90                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

可以看到两个任务被分开，在两台主机上分别来执行。

ansible 中的 serial就是用来控制任务执行数量的参数，上面的场景，我们可以把对应值设为 1，剧本就会每台主机逐一执行了。

---
- hosts: remote
  serial: 1
  tasks:
    - debug:
        msg: "this is task a"
    - debug:
        msg: "this is task b"

1
2
3
4
5
6
7
8

执行结果如下：

ansible-playbook -i test_hosts s.yaml
PLAY [remote] ***********************************************************************************************************************************************************************
TASK [Gathering Facts] **************************************************************************************************************************************************************
ok: [10.3.22.90]
TASK [debug] ************************************************************************************************************************************************************************
ok: [10.3.22.90] => {
    "msg": "this is task a"
}
TASK [debug] ************************************************************************************************************************************************************************
ok: [10.3.22.90] => {
    "msg": "this is task b"
}
PLAY [remote] ***********************************************************************************************************************************************************************
TASK [Gathering Facts] **************************************************************************************************************************************************************
ok: [10.3.22.87]
TASK [debug] ************************************************************************************************************************************************************************
ok: [10.3.22.87] => {
    "msg": "this is task a"
}
TASK [debug] ************************************************************************************************************************************************************************
ok: [10.3.22.87] => {
    "msg": "this is task b"
}
PLAY RECAP **************************************************************************************************************************************************************************
10.3.22.87                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.3.22.90                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

结果与上边的一对比就很容易理解这个参数的含义了。

除了定义 serial: 1，当然还可以定义更多，比如主机列表有 30 台，我们可以定义 serial: 5，就变成 5 台 5 台执行了。

在实际应用中，某些主机执行失败，只会中断当次构建，但不影响其他主机往后执行，这对我们来说并不安全，因为这可能会让没有走完发布流程的主机放回到负载列表中，因此，常常还有一个搭配使用的参数是：max_fail_percentage，表示当最大失败主机的比例达到多少时，ansible 就让整个 play 失败。

---
- hosts: remote
  serial: 1
  max_fail_percentage: 25
  tasks:
    - debug:
        msg: "this is task a"
    - debug:
        msg: "this is task b"

1
2
3
4
5
6
7
8
9

25 是一个比例数据，表示构建列表中，如果有四分之一的失败，则会退出构建，如果我们希望一有失败就退出，则可以将之设为 0。

注意：当serial: 1时，就不必再用 run_once: true，因为这种定义是无意义的。

# 4，打印多行

有时候我们在运行完一些任务之后，希望能够批量打印一些内容返回出来，此时可以使用如下方式进行打印：

$ cat test.yaml
---
- hosts: localhost
  connection: local
  vars:
    msg: |
      lookupd-tcp
       {{ node1 }}:4160
       {{ node2 }}:4160
       {{ node3 }}:4160
       lookupd-http
       {{ node1 }}:4161
       {{ node2 }}:4161
       {{ node3 }}:4161
       data-tcp
       {{ node1 }}:41501
       {{ node1 }}:41502
       {{ node2 }}:41501
       {{ node2 }}:41502
       {{ node3 }}:41501
       {{ node3 }}:41502
       data-http
       {{ node1 }}:41511
       {{ node1 }}:41512
       {{ node2 }}:41511
       {{ node2 }}:41512
       {{ node3 }}:41511
       {{ node3 }}:41512
  tasks:
    - name: test
      debug:
        msg: "{{ msg.split('\n') }}"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

然后运行看下：

$ansible-playbook test.yaml -e "node1=10.3.0.0 node2=10.0.0.1 node3=10.0.0.2"
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [localhost] ***************************************************************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] *********************************************************************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [test] ********************************************************************************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "lookupd-tcp",
        " 10.3.0.0:4160",
        " 10.0.0.1:4160",
        " 10.0.0.2:4160",
        " lookupd-http",
        " 10.3.0.0:4161",
        " 10.0.0.1:4161",
        " 10.0.0.2:4161",
        " data-tcp",
        " 10.3.0.0:41501",
        " 10.3.0.0:41502",
        " 10.0.0.1:41501",
        " 10.0.0.1:41502",
        " 10.0.0.2:41501",
        " 10.0.0.2:41502",
        " data-http",
        " 10.3.0.0:41511",
        " 10.3.0.0:41512",
        " 10.0.0.1:41511",
        " 10.0.0.1:41512",
        " 10.0.0.2:41511",
        " 10.0.0.2:41512",
        ""
    ]
}
PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

# 6，只同步远程目录没有的文件

在 Nginx 对接 consul 的场景中，upsync 的 dump 文件一般只需要第一次同步上去，而后就不需要再同步，这个时候可以用 rsync 的参数来实现：

- name: sync the upsync config
  synchronize:
    src: /data/.jenkins/jobs/ops-prod/jobs/ops-nginx-config/workspace/prod-nginx/upsync/
    dest: /etc/nginx/upsync/
    mode: push
    rsync_opts: "--ignore-existing"
    delete: yes
    rsync_timeout: 30
  tags:
    - prod-nginx