We found an error in the metric Invalid ordered sets (er_bad_os) in BVQ. On closer analysis, we saw that the BNA itself performs an incorrect calculation of the metric internally and then sends it out incorrectly via SMI-S.

Problem:

  • Offline switch ports, where already Invalid ordered sets were measured before, show permanently very high Invalid ordered sets in BVQ (in our lab approx. 430 million Count/s)
  • BNA displays the identical values in its own user interface.
  • If you check the switch directly via CLI the er_bad_os  metric, it shows no increase

Workaround:

If you have set up alerts that check this metric, we recommend restricting the Alert rule in BVQ to only check online switch ports.

Example:

Alert rule selection for object type SAN switch portsan_agg_enabledstate != "Enabled but Offline"
Alert condtion ErrorPI_san_switch_fc_port_shortterm_bad_osMaxAvg >= "1.000,00Count/s"
Alert condition WarnPI_san_switch_fc_port_shortterm_bad_osMaxAvg >= "50,00Count/s
Alert condition InfoPI_san_switch_fc_port_shortterm_bad_osMaxAvg >= "5,00Count/s"

Hint:

If the error counters on the corresponding switch port are reset on the SAN switch, BNA & BVQ report the correct values for the offline ports (0 count/s) again.
(info) We'll try to address the issue at Broadcom. The probability that this error will be fixed in the discontinued BNA will be rather small probably.

A request to our customers:

We have seen the problem in our lab only for Invalid ordered sets. This does not mean, that the problem can also occur with other error counters, too.
If you find any other incorrect metrics, please let us know.

(info) This issue is only relevant for customers using the BVQ SAN Package (Brocade)

  • No labels