Cluster Upgrade from 4.47 to 4.48¶

This guide will lead you through the steps specific for upgrading a NetEye Cluster installation from version 4.47 to 4.48.

Granted the environment connectivity is seamless, the upgrade procedure may take up to 30 minutes per node.

Warning

Remember that you must upgrade sequentially without skipping versions, therefore an upgrade to 4.48 is possible only from 4.47; for example, if you have version 4.27, you must first upgrade to the 4.28, then 4.29, and so on.

Breaking Changes¶

SLM Filters migration to Icinga DB¶

Starting from NetEye 4.48, Objects Filters configured in SLM Contracts accept only Icinga DB filter expressions and no longer support the old IDO filter syntax.

Existing Object Filters will be automatically migrated to the new syntax during the upgrade process. At the same time, any new SLM Contract created after the upgrade must use the new syntax for Object Filters. To create Object Filters with the new syntax, it is recommended to use the search filter builder in the Icinga DB Overview and then copy the generated expression.

For example, the old filter expression host_name=neteye* will now need to be written in the new syntax as host.name~neteye*.

Monitoring module removed¶

Starting from NetEye 4.48, the Monitoring module will be removed. All modules that previously relied on the Monitoring module must be fully migrated to Icinga DB before upgrading. The previously selected roles and permissions of the Monitoring module have been migrated to the equivalent Icinga DB roles and permissions.

IDO Support removed¶

Starting from NetEye 4.48, the IDO backend is no longer supported, so all monitoring data and integrations must be migrated to Icinga DB. As a consequence, since IDO is no longer used, idoreports will also be removed in favor of Icinga DB reports. Migration is needed before proceeding with the upgrade. Before proceeding, ensure a backup of the IDO database has been created.

Log Manager module removed¶

Starting from NetEye 4.48, the deprecated Log Manager module and its UI will be removed. As a result, the file /neteye/shared/rsyslog/conf/rsyslog.d/logmanager-hosts.conf, which maps the IP addresses received from Rsyslog to hostnames and host groups, will no longer be managed via the UI. Any future additions or changes to these mappings will therefore need to be made manually by updating the file directly.

During the upgrade procedure, the logmanager database will be dropped. If you want to preserve its data, we recommend backing it up before running the upgrade.

Additionally, the retention-policy-neteyelocal service will be removed during the upgrade.

The Log Manager module was responsible for managing the retention of the log files written by Rsyslog. After the upgrade, the Logscleaner service handles log retention by compressing log files and deleting them after 30 days. For more information, including how to customize the log retention policy, see the Rsyslog section.

telegraf-local User Customization¶

During the Upgrade process, the system will perform an optimization of the current telegraf-local configuration. Currently, every NetEye tenant creates a separate Telegraf Consumer instance, reading metrics from the tenant’s NATS topic and writing them into the tenant’s InfluxDB database. If the Alyvix module is installed and enabled for the tenant, an additional Telegraf Consumer per tenant is created. For example, the Telegraf Consumer for Telegraf metrics is created under /neteye/local/telegraf/conf/neteye_consumer_influxdb_<tenant>.conf, with its own dropin directory in /neteye/local/telegraf/conf/neteye_consumer_influxdb_<tenant>.d/.

With NetEye 4.48, there will be only one Telegraf Consumer (two if Alyvix is installed, per PCS node) reading from all tenants’s NATS topics and writing the data into the respective tenant’s InfluxDB database. This significantly reduces the number of Telegraf Consumers and optimizes resource usage, especially in environments with many tenants. Any customization made for a specific tenant’s Telegraf Consumer, will be automatically backed up under /root/telegraf_local_consumer_conf_backup/ and deleted during the Upgrade process.

If you have made any, you will need to adapt your previous custom configuration to the new architecture, which is now tenant-agnostic, and prepare the newly updated customization into /neteye/local/telegraf/conf/neteye_consumer_telegraf_metrics.d (or /neteye/local/telegraf/conf/neteye_consumer_alyvix_metrics.d for Alyvix). The new architecture works in such a way that for all InfluxDB Nodes (the default influxdb.neteyelocal and all the ones created under InfluxDBOnlyNodes in case of a Cluster), there will be one Telegraf Output plugin configured to dynamically write into the correct tenant’s database, using the tagpass feature to filter the metrics by tenant. Two tags are now added by Telegraf processors: tenant_influxdb_db which states the tenant’s InfluxDB database to write into, and tenant which states the tenant name. The tenant name will be discovered by the name of the NATS subject, which is in the format <tenant>.<module>.<data_type> (i.e. acme_tenant.telegraf.metrics refers to the subject for the acme_tenant tenant).

To migrate your previous custom configuration, you will need to exploit the same tagpass feature. For inputs, you need to enrich the metrics with the tenant and tenant_influxdb_db tags, and for processors and outputs you need to use the tagpass to filter the metrics by tenant. Below is an example of all the three configurations:

# Example of custom configuration for a tenant named `acme_tenant` with InfluxDB database `acme_tenant-alyvix`

# Input plugin configuration, with the addition of the tenant tags
[[inputs.cpu]]
  # other configuration options...

  # Add the tenant tags to the metrics
  [inputs.cpu.tags]
    tenant = "acme_tenant"
    tenant_influxdb_db = "acme_tenant-alyvix"


# Processor plugin configuration, with the addition of the tagpass to filter by tenant
[[processors.override]]
  # other configuration options...

  # Add the tagpass to filter the metrics by tenant
  [processors.override.tagpass]
    tenant = ["acme_tenant"]

# Output plugin configuration, with the addition of the tagpass to filter by tenant
[[outputs.influxdb_v2]]
  # other configuration options...

  # Optionally, you can also exploit yourself the `tenant_influxdb_db` tag
  database_tag = "tenant_influxdb_db"

  # Add the tagpass to filter the metrics by tenant
  [outputs.influxdb_v2.tagpass]
    tenant = ["acme_tenant"]

Warning

You now must ensure that any custom configuration for the Telegraf Consumer is compatible with the new setup, where a single Consumer handles metrics for all tenants. To check before the Upgrade, you can use the telegraf --config ./telegraf.conf --config-directory ./telegraf.d --test command, which will validate the configuration and print any error found.

Once you are ready, and you have ensured that the current configuration is compatible with the new setup, you need to do two steps:

First, you need to prepare the new configuration for the Telegraf Consumers, by creating the drop in directory /neteye/local/telegraf/conf/neteye_consumer_telegraf_metrics.d (or /neteye/local/telegraf/conf/neteye_consumer_alyvix_metrics.d for Alyvix) and creating there the newly created custom configuration files after adapting them to the new setup as explained above on all PCS nodes in case of a Cluster.
Update the owner and mode of the new configuration files to match the previous ones, which are telegraf:telegraf and 0640 respectively and telegraf:telegraf and 0750 for the drop in directories.
Then, enable the acknowledgement flag in the UI, specifically under Configuration > Modules > neteye > Configuration.

Note

There is no need to remove the previous configuration, since it will be automatically disabled and cleaned up during the Upgrade.

During the Upgrade, Tenant-specific read-only InfluxDB users will be created. If you previously have created read only users for visualization, such as a Grafana Data Source, and it happens that you have named them <db_name>_ro, an upgrade prerequisite check will fail, asking you to add these users’s passwords to the corrisponding password file, following this pattern: /root/.pwd_influxdb_$influxdb_host_$influxdb_user, where ${influxdb_host} is the hostname of the InfluxDB node (i.e. influxdb.neteyelocal) and ${influxdb_user} is the username of the read only user (i.e. acme_tenant_ro). Password will be validated against the correct InfluxDB node according to the tenant’s configured InfluxDB node, otherwise they will be automatically generated if no such user is found.

OCS Inventory Module removed¶

Starting from NetEye 4.48, the deprecated OCS Inventory module will be removed in favour of the GLPI Asset Management module, which provides more advanced features and better integration with the rest of the NetEye ecosystem. This only applies to installations with the neteye-asset Feature Moduele installed.

Warning

All of the configuration related to OCS Inventory will be automatically removed, including the shared directories (/neteye/shared/ocsinventory-server and /neteye/shared/ocsinventory-ocsreports), the OCS Inventory database (ocsweb) and all the PCS and DRBD resources if on a cluster. If you want to preserve any of these, we recommend backing them up before running the upgrade.

You will be finally asked to confirm the removal of OCS Inventory by enabling the acknowledgement flag in the UI, specifically under Configuration > Modules > assetmanagement > Configuration, before proceeding with the upgrade.

Elastic Stack upgrade to 9.4.1¶

In NetEye 4.48, Elastic Stack upgrades from version 9.3.3 to 9.4.1. To ensure compatibility, review the official breaking changes linked below:

Prerequisites¶

Before starting the upgrade, carefully read the latest release notes on NetEye’s blog and check the features that will change or be deprecated.

All NetEye packages installed on a currently running version must be updated according to the update procedure prior to running the upgrade.
NetEye must be up and running in a healthy state.
Disk Space required:
- 3GB for / and /var
- 150MB for /boot
If the NetEye Elastic Stack module is installed:
1. The rubygems.org domain should be reachable by the NetEye Master only during the update/upgrade procedure. This domain is needed to update additional Logstash plugins and thus is required only if you manually installed any Logstash plugin that is not present by default.
2. There is a number of configuration items that should not be modified in order to avoid issues during the update/upgrade of your instance. Please check out Protected Configuration Items for details.
Make sure you have migrated all your monitoring data from IDO to Icinga DB, because it’s a mandatory requirement before upgrading to NetEye 4.48. The migration is performed using the neteye cluster upgrade-prerequisites ido-migration command.
If idoreports is in use, run icingacli reporting migrate idoreports on the node where Icinga Web 2 is running to migrate to Icinga DB reports before proceeding with the upgrade. It is highly recommended to first run icingacli reporting migrate idoreports --dry-run to verify compatibility with your existing report filters without applying any changes.
Before starting the upgrade, you must set the corresponding flags under Configuration / Modules / neteye / Configuration to disable the IDO DB, IDO reports and Monitoring module. You can proceed with the upgrade only after selecting these flags.
To confirm that you read the breaking changes regarding the removal of the Log Manager module, set the flag under Configuration / Modules / logmanager / Configuration.

1. Run the Upgrade¶

The Cluster Upgrade is carried out by running the following command:

cluster# (nohup neteye upgrade &) && tail --retry -f nohup.out

Warning

If the NetEye Elastic Stack feature module is installed and a new version of Elasticsearch is available, please note that the procedure may take a while to upgrade the Elasticsearch cluster. For more information on the Elasticsearch cluster upgrade and how to customize the upgrade process, please consult the dedicated section.

After the command was executed, the output will inform if the upgrade was successful or not:

In case of successful upgrade you might need to restart the nodes to properly apply the upgrades. If the reboot is not needed, please skip the next step.
In case the command fails refer to the troubleshooting section.

2. Reboot Nodes¶

Restart each node, one at a time, to apply the upgrades correctly.

Run the reboot command
cluster-node-N# neteye node reboot
In case of a standard NetEye node, put it back online once the reboot is finished
cluster-node-N# pcs node unstandby --wait=300

You can now reboot the next node.

3. Cluster Reactivation¶

At this point you can proceed to restore the cluster to high availability operation.

Run the checks in the section Checking that the Cluster Status is Normal. If any of the above checks fail, please contact our service and support team before proceeding.
Re-enable fencing on the last standard node, if it was enabled prior to the upgrade:
```
cluster# pcs property set stonith-enabled=true
```

4. Additional Tasks¶

The IDO database is not removed automatically during the upgrade, so if you want to delete it you have to run the following commands:

mysql -e "DROP DATABASE icinga;"
mysql -e "DROP USER 'icinga'@'localhost';"

If you have the Elastic Stack installed, the retention-policy-neteyelocal service will be removed during the upgrade procedure. After the upgrade is complete, you will need to Director deploy to make the changes effective.