Extending Ansible with callback plugins

Dani Hodovic March 16, 2018 6 min read
Responsive image

Introduction

Ansible allows you to extend the system by using plugins. Plugins are are executed at various stages of a run and allow you to hook into the system and add your own logic. Plugins are written in Python.

Callback plugins respond to events Ansible sends and can be used to notify external systems. The use case I used it for was to notify Slack whenever an Ansible run failed. I run Ansible in pull mode as a cronjob on our instances. This means that there is no immediate feedback if an Ansible run failed.

I could implement this by wrapping my entire playbook in a block and use the rescue directive when the playbook files to send a Slack message using the Slack module. However when using the rescue directive I wouldn't have enough data on why the playbook failed unless I used clever ways of registering failures in variables. Overall it doesn't feel like a sustainable solution because it requires extensive logic in the playbook for the sole purpose of sending a Slack notification.

Callback plugins are a more elegant solution. A callback plugin is injected with data about the Ansible state when an event (such as a playbook error) occurs and you can extract the exact reason why a playbook failed. I send this data to Slack so that an engineer can quickly understand and debug an Ansible failure.

Developing a callback plugin

We're going to develop a simple callback plugin which sends us a Slack message any time a playbook fails. We consider a failure when Ansible can't connect to a host machine or a playbook task fails.

In order to notify Slack we'll need to create a Slack webhook which I'm not going to cover here, but you can find this in the Slack API documentation.

I will create a class which inherits from Ansible's CallbackBase. The class will define three class properties:

  • CALLBACK_VERSION = 2.0 - This is not the version of our plugin, but rather what version of Ansible the plugin is run with. Ansible v1 and Ansible v2 define different interfaces for plugins. I'm running Ansible 2.5 so I'll use the v2 plugin interface.

  • CALLBACK_NEEDS_WHITELIST = True - Defines whether the plugin needs to be whitelisted in ansible.cfg in order to run. If this value is False I can simply drop the plugin into our project and it should always run. However I prefer to be explicit and whitelist my plugins in ansible.cfg.

  • CALLBACK_NAME = 'slack' - I'm actually not sure why this variable is used, because when Ansible runs plugins it reads the name from the filename. However we'll define it anyway because Ansible documentation states that it's required

Callback plugins work by implementing methods which are executed at different stages of a playbook run.

  • #set_options - is ran on setup of the callback plugin and allows us to read environment variables and assign them to python variables.

  • #v2_runner_on_failed - is ran when a task playbook fails. We'll send a Slack message notifying us of the failure.

  • #v2_runner_on_unreachable - occurs when Ansible can't connect to a specific host. We'll also notify Slack when this happens.

Below is an example of a simple Slack callback plugin.

# These imports are defined for every callback plugin I've seen so far.
# If you don't import `absolute_import` standard library modules may be
# overriden by Ansible python modules with the same name. For example: I use the
# standard library `json` module, but Ansible has a callback plugin with the
# same name. When I excluded `absolute_import` and imported `json` the json
# module I got was Ansible's json module and not the standard library one.
# I'm not sure why `division` and `print_function` need to be imported.
from __future__ import (absolute_import, division, print_function)
from ansible.plugins.callback import CallbackBase

__metaclass__ = type

import json
import urllib2
import sys
import os

# Ansible documentation of the module. I'm also not sure why this is required,
# but other plugins add documentation so it seems to be a standard.
DOCUMENTATION = '''
    callback: slack
    options:
      slack_webhook_url:
        required: True
        env:
          - name: SLACK_WEBHOOK_URL
      slack_channel:
        required: False
        env:
          - name: SLACK_CHANNEL
'''

class CallbackModule(CallbackBase):
    CALLBACK_VERSION = 2.0
    CALLBACK_NAME = 'slack'
    CALLBACK_NEEDS_WHITELIST = True

    def __init__(self):
        super(CallbackModule, self).__init__()

    def set_options(self, task_keys=None, var_options=None, direct=None):
        super(CallbackModule, self).set_options(task_keys=task_keys, var_options=var_options, direct=direct)

        # Read and assign environment variables to memory so that we can use
        # them later.
        self.slack_webhook_url = os.environ.get('SLACK_WEBHOOK_URL')
        self.slack_channel = os.environ.get('SLACK_CHANNEL')

        if self.slack_webhook_url is None:
            self._display.display('Error: The slack callback plugin requires `SLACK_WEBHOOK_URL` to be defined in the environment')
            sys.exit(1)

    def v2_runner_on_failed(self, taskResult, ignore_errors=False):
        notify(self.slack_webhook_url, taskResult, self.slack_channel)

    def v2_runner_on_unreachable(self, taskResult):
        notify(self.slack_webhook_url, taskResult, self.slack_channel)


def notify(slack_webhook_url, taskResult, slack_channel=None):
    # Format the Slack message. We'll use message attachments
    # https://api.slack.com/docs/message-attachments
    payload = {
        'username': 'Ansible',
        'attachments': [
            {
                'title': 'Ansible run has failed. HOST: {} {}'.format(taskResult._host, taskResult._task),
                'color': '#FF0000',
                'text': '```{}```'.format(json.dumps(taskResult._result, indent=2))
            }
        ]
    }

    # The webhook has a default url. If one is not configured, we'll use the
    # default
    if slack_channel:
        payload['channel'] = slack_channel

    req = urllib2.Request(slack_webhook_url)
    urllib2.urlopen(req, data=json.dumps(payload))

Configuring the plugin

To enable the plugin you'll need to whitelist it in ansible.cfg. Add the following lines to your ansible.cfg file.

[defaults]
callback_whitelist = slack

It's important to note that Ansible source code does not care what's defined as the python property CALLBACK_NAME in the plugin. Instead the callback_whitelist needs to match whatever the filename is of your callback plugin. I spent an hour debugging why my callback plugin wasn't being called before digging into the source code and finding this:

(callback_name, _) = os.path.splitext(os.path.basename(callback_plugin._original_path))

It takes the filename of your plugin and compares it to the whitelist. If it matches, your callback is ran. I'm still not sure what CALLBACK_NAME is used for, but according to the documentation it's required.

Note that the CALLBACK_VERSION and CALLBACK_NAME definitions are required for properly functioning plugins for Ansible >=2.0.

We'll also need to place the python file into a directory that is read by Ansible. Ansible documentation states:

You can activate a custom callback by either dropping it into a callback_plugins directory adjacent to your play, inside a role, or by putting it in one of the callback directory sources configured in ansible.cfg.

I have a per-project ansible.cfg that exists in the same directory as my main playbook. This means that I place my callback plugins in the directory adjacent to ansible.cfg.

$ tree ~/repos/myproject/ansible
/home/dani/repos/myproject/ansible
├── ansible.cfg
├── callback_plugins
│   ├── slack.py
├── inventory.yml
├── main.yml

Using the callback plugin

Our Slack callback plugin should now run anytime we have a failing task or can't connect to a host. We can create an example task which is going to fail immediately.

Example playbook

$ cat slack-playbook.yml
- name: Slack example
  hosts: localhost
  tasks:
    - name: Fail
      shell: exit 1

Example inventory

$ cat inventory.yml
all:
  hosts:
    localhost:
      ansible_connection: local

Example ansible.cfg

$ cat ansible.cfg
[defaults]
callback_whitelist = slack

When running our playbook we need to define the environment variables SLACK_WEBHOOK_URL and optionally SLACK_CHANNEL (if the channel is not defined it will use the default channel the webhook was configured for). Ansible documentation states that you can pass variables from ansible.cfg to the plugins, but I've didn't find any instructions on how this is done. The standard way I've seen for most plugins is to use environment variables.

$ SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxxxxxxxxx ANSIBLE_CONFIG=./ansible.cfg ansible-playbook slack-playbook.yml -i inventory.yml

PLAY [Slack example] *****************************************************************

TASK [Gathering Facts] ***************************************************************
ok: [localhost]

TASK [Fail] **************************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "exit 1", "delta": "0:00:00.001367", "end": "2018-03-22 05:54:31.217632", "msg": "non-zero return code", "rc": 1, "start": "2018-03-22 05:54:31.216265", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
        to retry, use: --limit @/tmp/tmp.S1IR7jIb1p/slack.retry

PLAY RECAP ***************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1

The Slack output Slack call

I wrote a simple callback plugin that implemented three of the interface methods callback plugins provide. I didn't find any official documentation on what methods you can override, you can find all of them in the source code.