User Guide

Under the hood

Matcher Engine Implementation Details

The Matcher crate contains the core logic of the Tornado Engine. It is in charge of: - Receiving events from the Collectors - Processing incoming events and detecting which Filter and Rule they satisfy - Triggering the expected actions

Due to its strategic position, its performance is of utmost importance for global throughput.

The code’s internal structure is kept simple on purpose, and the final objective is reached by splitting the global process into a set of modular, isolated and well-tested blocks of logic. Each “block” communicates with the others through a well-defined API, which at the same time hides its internal implementation.

This modularization effort is twofold; first, it minimizes the risk that local changes will have a global impact; and second, it separates functional from technical complexity, so that increasing functional complexity does not result in increasing code complexity. As a consequence, the maintenance and evolutionary costs of the code base are expected to be linear in the short, mid- and long term.

At a very high level view, when the matcher initializes, it follows these steps:

  • Configuration (see the code in the “config” module): The configuration phase loads a set of files from the file system. Each file is a Filter or a Rule in JSON format. The outcome of this step is a processing tree composed of Filter and Rule configurations created from the JSON files.

  • Validation (see the code in the “validator” module): The Validator receives the processing tree configuration and verifies that all nodes respect a set of predefined constraints (e.g., the identifiers cannot contain dots). The output is either the same processing tree as the input, or else an error.

  • Match Preparation (see the code in the “matcher” module): The Matcher receives the processing tree configuration, and for each node:

    • if the node is a Filter:

      • Builds the Accessors for accessing the event properties using the AccessorBuilder (see the code in the “accessor” module).

      • Builds an Operator for evaluating whether an event matches the Filter itself (using the OperatorBuilder, code in the “operator” module).

    • if the node is a rule:

      • Builds the Accessors for accessing the event properties using the AccessorBuilder (see the code in the “accessor” module).

      • Builds the Operator for evaluating whether an event matches the “WHERE” clause of the rule (using the OperatorBuilder, code in the “operator” module).

      • Builds the Extractors for generating the user-defined variables using the ExtractorBuilder (see the code in the “extractor” module). This step’s output is an instance of the Matcher that contains all the required logic to process an event against all the defined rules. A matcher is stateless and thread-safe, thus a single instance can be used to serve the entire application load.

  • Listening: Listen for incoming events and then match them against the stored Filters and Rules.

Tornado Monitoring and Statistics

Tornado Engine performance metrics are exposed via Tornado APIs and periodically collected by a dedicated telegraf instance telegraf_tornado_monitoring.service. Metrics are stored into the database master_tornado_monitoring in InfluxDB.

Tornado Monitoring and Statistics gives an insight about what data Tornado is processing and how. These information can be useful in several scenarios, including workload inspection, identification of bottlenecks, and issue debugging. A common use case is to identify performance-related issues: for example a difference between the amount of events received and events processed by Tornado may identify a performance problem because Tornado does not have enough resources to handle the current workload.

Examples of collected metrics are:

  • events_processed_counter: total amount of event processed by Tornado Engine

  • events_received_counter: total amount of events received by Tornado Engine through all Collectors

  • actions_processed_counter: total amount of actions executed by Tornado Engine

Metrics will be automatically deleted according to the selected retention policy.

The user can configure Tornado Monitoring and Statistics via GUI under Configuration / Modules / tornado / Configuration. Two parameters are available:

  • Tornado Monitoring Retention Policy: defines the number of days for which metrics are retained in InfluxDB and defaults to 7 days, after which data will be no longer available.

  • Tornado Monitoring Polling Interval: sets how often the Collector queries the Tornado APIs to gather metrics and defaults to 5 seconds.

To apply changes you have to either run neteye_secure_install for both options or execute /usr/share/neteye/tornado/scripts/apply_tornado_monitoring_retention_policy.sh or /usr/share/neteye/tornado/scripts/apply_tornado_monitoring_polling_interval.sh, according to the parameter changed.

Note

On a NetEye Cluster, execute the command on the node where icingaweb2 is active.

Tornado Engine (Executable)

This crate contains the Tornado Engine executable code, which is a configuration of the Engine based on actix and built as a portable executable.

Structure of Tornado Engine

This specific Tornado Engine executable is composed of the following components:

  • A JSON Collector

  • The Engine

  • The Archive Executor

  • The Elasticsearch Executor

  • The Foreach Executor

  • The Icinga 2 Executor

  • The Director Executor

  • The Monitoring Executor

  • The Logger Executor

  • The Script Executor

  • The Smart Monitoring Executor

Each component is wrapped in a dedicated actix actor.

This configuration is only one of many possible configurations. Each component has been developed as an independent library, allowing for greater flexibility in deciding whether and how to use it.

At the same time, there are no restrictions that force the use of the components into the same executable. While this is the simplest way to assemble them into a working product, the Collectors and Executors could reside in their own executables and communicate with the Tornado Engine via a remote call. This can be achieved either through a direct TCP or HTTP call, with an RPC technology (e.g., Protobuf, Flatbuffer, or CAP’n’proto), or with a message queue system (e.g., Nats.io or Kafka) in the middle for deploying it as a distributed system.

Tornado Interaction with Icinga 2

The interaction between Tornado and Icinga 2 is explained in details in sections Icinga 2 and Smart Monitoring Check Result. In particular the Smart Monitoring Executor interacts with Icinga 2 to create objects and set their statuses. To ensure that the status of the Icinga 2 objects does not get lost, NetEye provides an automatism that stops the execution of Smart Monitoring Actions during any Icinga 2 restart or Icinga Director deployment.

The automatism keeps track of all Icinga 2 restarts (we consider also Icinga Director deployments as Icinga 2 restarts) in the icinga2_restarts table of the director database. As soon as an Icinga 2 restart takes place, a new entry with PENDING status is added in that table and at the same time the Tornado Smart Monitoring Executor is deactivated via API.

The icinga-director.service unit monitors the status of the Icinga 2 restarts that are in PENDING status and sets them to FINISHED as soon as the service recognizes that Icinga 2 completed the restart, then the Tornado Smart Monitoring Executor is activated. In case of Icinga2 errors, see the troubleshooting page ::ref::icinga2-not-starting.