Network device monitoring with telegraf, grafana and SNMP

This post will cover the steps to make a dashboard for monitoring a Juniper SRX110H2-VA. A FreeBSD 13.0 instance will be used to collect and display the data. It will start with a simple graph to display a single field and each subsequent panel will use a new feature or technique creating a great dashboard.

First we need to configure the SNMP agent with details for connecting to the SRX, for this example we are using SNMP v2c (forgive me!):

[[inputs.snmp]]
  agents = [ "192.168.1.250:161" ]
  timeout = "5s"
  interval = "30s"
  version = 2
  community = "grafana"
  retries = 3

Telegraf can collect data from SNMP fields and tables. Our first dashboard will keep things simple, and collect the value for an OID which refers to a single field:

System temperature

[[inputs.snmp.field]]
    name = "hostname"
    oid = "1.3.6.1.2.1.1.5.0"
    is_tag = true

[[inputs.snmp.field]]
    name = "jnxOperatingTemp.9.1.0.0"
    oid = "1.3.6.1.4.1.2636.3.1.13.1.7.9.1.0.0"

The ‘is_tag’ is used to make the value of the SNMP field an index value in your telegraf database.

It is always a good idea to sanity check the telegraf.conf file each time you make an edit to ensure it will parse correctly when you run the telegraf service:

# telegraf --test --config /usr/local/etc/telegraf.conf

2021-06-11T19:41:47Z I! Starting Telegraf 1.17.3
> snmp,agent_host=192.168.1.250,host=thinker,hostname=CS7-FWEDGE01 jnxOperatingTemp.9.1.0.0=61i 1623440509000000000

If your telegraf.conf is correctly constructed running the above command will display SNMP output for each of the OIDs you were retrieving. If this is the case start the service ‘service telegraf start‘ and let the influxdb fill with data for a short while. Then go to the grafana webGUI create a new dashboard:

Then create an empty panel:

The following adjustments have been made to the default settings to produce this graph:

  • SELECT : the alias modifier has been added and given the parameter of ‘Routing Engine’. This will be used in the table legend instead of [[inputs.snmp.field]] name parameter as it appears in the telegraf.conf file.
  • GROUP BY: a time interval of 5 minutes has been used to smooth out rapid minor fluctuations in temperature.
  • ALIAS BY: By specifying just ‘$col’ we remove the table name from each legend entry, going from ‘snmp.Routing Engine’ to just ‘Routing Engine’.
  • AXIS -> Left Y -> Unit: A minor cosmetic tweak to make the axis display Celcius.

Uptime

[[inputs.snmp.field]]
    name = "sysUpTime"
    oid = "1.3.6.1.2.1.1.3.0"

The SysUpTime OID returns an integer value of hundredths of seconds since the system was last initialised. to make that value usable, we make the following new tweaks:

  • SELECT : add the ‘math’ modifier and give the parameter ‘/ 6000’ to divided the returned value by 6000. This will turn the value into minutes.

IPv4 and IPv6 Flow Sessions

Lets combine two complementary fields into one graph:

[[inputs.snmp.field]]
    name = "jnxJsSPUMonitoringFlowSessIPv4.0"
    oid = "1.3.6.1.4.1.2636.3.39.1.12.1.1.1.12.0"

  [[inputs.snmp.field]]
    name = "jnxJsSPUMonitoringFlowSessIPv6.0"
    oid = "1.3.6.1.4.1.2636.3.39.1.12.1.1.1.13.0"

Whilst you could just add an new query to pull in another field into this graph it is simpler to add another field to the first query SELECT statement:

  • SELECT : add the ‘field’ modifier which will add an new line to the SELECT statement.

Interfaces – xDSL

  [[inputs.snmp.table]]
    name = "interfaces"
    inherit_tags = [ "hostname" ]
    oid = "1.3.6.1.2.1.2.2"

    [[inputs.snmp.table.field]]
      name = "ifDescr"
      oid = "1.3.6.1.2.1.2.2.1.2"
      is_tag = true

Now we are walking the contents of a SNMP table, we use the ‘inherit_tag’ attribute to associate the ‘hostname’ field value with each table entry to aid our filtering from within grafana. Again ‘is_tag’ is used to allow the ifDescr value to be used as an index and permit filtering on the associated row values.

  • FROM – ‘ifDescr = pp0’ – here the ‘is_tag’ SNMP field is used to filter all of the SRX interfaces and just show the one associated with the VDSL interface.
  • SELECT – ‘non_negative_derivative’ is used a counters will eventually wrap around and may start back at zero or even a negative value. Using this transformation accounts for these changes and stops your graph from plunging to wild values.
  • Panel -> Series Override – This one is personal preference, but I like to have my RX and TX values on different sides of the X Axis. Some basic regex is used, matching the value ‘TX’ by surrounding it in forward slashes.

Interfaces – fixed FastEthernet

Duplicating the previous xDSL panel, here we make one adjustment:

  • FROM – regex: ‘ifDescr =~ /fe-\d\/\d\/\d$/’ – simply matches all interfaces containing those characters, also filtering out the interface unit numbers (ie fe-0/0/0.0)

Conclusion

Hopefully the above has demonstrated the ease with which you can filter and display your SNMP data. Probably the hardest task is tracking down useful OIDs to monitor. Good luck!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: