AN25 IP WatchDog 2 from 10 – list of monitored devices, failure indication by power output (Lua script) | NETIO products: Smart power sockets controlled over LAN and WiFi
Tags: 
User library

Do you need an indication that any out of 10 LAN devices is not working or unavailable? Regardless of whether a cable got loose, WiFi was disconnected or the UPS failed, when one or more of the ten devices fail to respond to PING, the Lua script described in this Application Note lights up a red warning light.

 

Do you have any questions?

Supported devices: NETIO 4All, NETIO PowerPDU 4C, NETIO 4

 

Every 15 seconds, the Lua script described below checks all the IP addresses in the list. For each IP address, the script checks whether it responds, and if it doesn’t, for how long. If a failure lasts longer than 60 seconds, the script turns on an output to signal the problem.

  • The output can be connected to a warning light or a horn. The light can be placed, for instance, at a receptionist desk with a 24/7 service.
  • Several instances of the script can run in parallel to monitor different groups of IT devices.
  • The number of monitored devices is not fixed to 10; there can be 5 devices just as well as 20.
  • The script can be configured to signal the alarm (by switching on the output) only after a certain number of devices from the list (e.g. one, two, or three) fails to respond.
  • The alarm can be also signaled by e-mail.

 

Note:   The devices are monitored using the PING function. Some devices may not respond to PING (reply disabled) even if they work. On the other hand, some IP cameras may, for example, respond to PING (Device OK) even if the video encoding does not work.

 

IP WatchDog by NETIO products

NETIO can work as an IP WatchDog in several ways:

1) IP WatchDog function in device settings (web administration).
Checks for a response from a single IP address. Can be configured for each output separately.
 

2) AN09: IP Watchdog 1 of 2 - PING based failure detection for 1 or 2 devices (Lua script). The status is considered OK if a reply is received from at least one of the two IP addresses. Can be used to monitor internet connectivity because a backup address (e.g. Google server 8.8.8.8) should always work.
 

3) AN24: IP WatchDog 1 to 1 - device LAN connectivity detection and indication (Lua script). Detects and indicates the network presence of a device. Suitable e.g. to monitor the status of a network printer in order to switch off certain equipment when the printer is powered off.
 

4) AN25: IP WatchDog 2 from 10 - list of monitored devices, failure indication by power output (Lua script). This is a simple function for IT departments; a failure can be signaled by turning on a red light. The light turns on when 1 or more IP addresses specified in a table (10 target addresses) fail to respond for a specified time (or after a specified number of retries).

 

Do you need an IP WatchDog in a different configuration? Please contact our support.

 

Creating the rule

To create and run the Lua script, do the following:

1) In the Actions section of NETIO 4 web administration, click Create Rule to add a rule.

 

2) Fill in the following parameters:

  • Enabled: checked
  • Name: IP Watchdog 1 from 10 (user-defined)
  • Description: Watchdog for 10 IP devices (user-defined)
  • Trigger: System started up
  • Schedule: Always

3) Copy the following code and paste it into the field for the Lua script:

 
------------NETIO AN25------------

------------Section 1------------
local IPs = -- Watched IP addresses
{
"192.168.101.185",
"192.168.101.136",
"192.168.1.155",
"192.168.1.25",
"192.168.0.36",
"192.168.2.56"  
}

local period = 5 -- period between pings [s]
local disconectedIPs = 1 -- number of maximum disconnected IP addresses
local maxMissedCycles = 2 -- number of maximum missed ping cycles
local controlOutput = 1 -- output for signalization (1-4)
local actionOK = 1 -- number of action for OK state (0-5)
local actionNoOK = 4 -- number of action for no OK state (0-5)
local periodAction = 1 -- periodical execution of actionNoOK (0/1)
local shortTimeMs = 2000 -- time for actions 2 and 3 [ms]
---------End of Section 1---------

local iterator = 1
local deadIPs = {}
local missedCycles = 0
local actionExecuted = false

function doAction()
  log("Doing NO OK action")
  setOutput(controlOutput,actionNoOK)
end

function doOKAction()
  log("Doing OK action")
  setOutput(controlOutput,actionOK)
end


function setOutput(output,action)
  if action == 0 then -- turn off
    devices.system.SetOut{output = output, value = false}
  elseif action == 1 then -- turn on
    devices.system.SetOut{output = output, value = true}
  elseif action == 2 then -- short off
    devices.system.SetOut{output = output, value = false}
    milliDelay(shortTimeMs,function() short(output,true) end)
  elseif action == 3 then -- short on
    devices.system.SetOut{output = output, value = true}
    milliDelay(shortTimeMs,function() short(output,false) end)   
  elseif action == 4 then -- toggle
    if devices.system["output" ..output.. "_state"] == 'on' then 
      devices.system.SetOut{output=output,value=false}
    else
      devices.system.SetOut{output=output, value=true}
    end
  elseif action == 5 then
    -- do nothing
  end
end


function pingIP()
  local IP = IPs[iterator]
  ping{address=IP, timeout=5, callback=checkResult}
end

function checkResult(o)
  if not o.success then
    logf("IP:%s is not OK",IPs[iterator])
    table.insert(deadIPs, IPs[iterator]) 
  end
  if iterator == #IPs then
    log("Last IP, checking dead IPs")
    checkCounters()
  end
  iterator = (iterator)%#IPs + 1
  delay(period,function() pingIP() end)
end  


function checkCounters()
  local reportString = "{"
  for i=1,#deadIPs,1 do 
    reportString = reportString .. deadIPs[i]
    if i<#deadIPs then
      reportString = reportString .. ", "
    end
  end
  reportString = reportString .. "}"
  logf("Dead IPs : %d - " .. reportString,#deadIPs)
  if #deadIPs >= disconectedIPs then
    log("Number of dead IPs exceeded limit.")
    missedCycles = missedCycles + 1
    logf("Missed cycles: %d/%d",missedCycles,maxMissedCycles)
    if missedCycles >= maxMissedCycles then
      log("Missed cycles limit exceeded")
      if (not actionExecuted) or (toboolean(periodAction)) then
        actionExecuted = true
        doAction()
      end
    end
  else
    missedCycles = 0
    if actionExecuted then
      log("Number of disconnected IPs is now in limit, excecuting OK action")
      actionExecuted = false
      doOKAction()
    end
  end
  deadIPs = {}
end

pingIP()

 

4) To finish creating the rule, click Create Rule at the bottom of the screen.

 

 

Method of operation

  • The script periodically checks all the specified IP addresses (IPs).
  • If, during one testing cycle, the number of unresponsive IP addresses is at least disconectedIPs, the script increments the counter variable.
  • When the value of counter exceeds the allowed maximum (maxMissedCycles), the specified action (actionNoOK) is performed with the controlled output (controlOutput).
  • As soon as the connection is re-established, the counter variable is reset to 0 and the actionOK action is performed with the controlled output.
  • The periodAction variable specifies whether the actionNoOK action is performed – after the limit is exceeded – in every cycle (periodAction = 1) or just once (periodAction = 0).

 

  • The following picture illustrates the script operation. The blue line shows the number of unresponsive IPs, while the red line shows the threshold for the number of unresponsive IPs (disconnectedIPs). The white circles indicate how many times (cumulatively) the red line has been reached. The maxMissedCycles variable is set to 3 and the periodAction variable is set to 1. The actionNoOK action is first performed in the 9th cycle, and in every cycle thereafter up to and including the 11th cycle. In the 12th cycle, the number of unresponsive IPs has fallen below the disconectedIPs threshold, so the actionOK action is performed.

 

 

Setting the variables

  • IPs
    • A table of strings containing the monitored IP addresses.
    • The table starts and begins with curly braces, individual IPs must be enclosed in double quotes and separated with commas.
    • Instead of the IPs, DNS names can be also specified (e.g. "www.google.com"); however, in case of a DNS server failure, the device will be considered unresponsive even if it works OK.
       
    • Example with 10 different IP addresses:

local IPs = {

"192.168.101.186",

"192.168.101.3",

"192.168.101.21",

"22.128.12.252",

"157.25.2.5",

"32.21.51.32",

"51.211.56.25",

"128.11.18.8",

"135.51.18.105",

"143.38.15.202" 

}

 

  • period
    • This variable sets the time between the successive ping checks of individual IP addresses (in seconds).
    • The minimum value is 1.
    • When the period is set to 5 seconds and there are 10 IPs, all IPs will be checked in 50 seconds (that is the duration of one cycle).
    • Example – to set the period to 5 seconds: period = 5
       
  • disconectedIPs
    • Minimum number of non-responding IP addresses to trigger the action.
    • Example – to trigger the action when 2 or more IPs keep failing to respond during a certain time: disconectedIPs = 2
       
  • maxMissedCycles
    • The maximum tolerated number of successive cycles during which at least disconectedIPs addresses are unresponsive. If the number of cycles exceeds this threshold, actionNoOK is invoked.
    • Example – to trigger the action after more than 4 cycles of unavailability: maxMissedCycles = 4 (4 is the maximum number of cycles before triggering the action; therefore, the action will be triggered in the 5th cycle, as long as the specified number of IPs is still unreachable)
       
  • controlOutput
    • Specifies the controlled output that is used to signal failures (1 to 4).
    • Example – to specify output no. 2: controlOutput = 2
       
  • actionOK
    • Specifies the action to perform with the controlled output if the number of unresponsive IPs drops below the disconectedIPs threshold.
    • Actions:
      •  0 – output switched off
      •  1 – output switched on
      •  2 – “short off”, output is set to 0, and after a delay specified in the shortTimeMs variable, the output is set to 1
      •  3 – “short on”, output is set to 1, and after a delay specified in the shortTimeMs variable, the output is set to 0
      •  4 – “toggle”, if the output was on, it is turned off, and vice versa
         
  • actionNoOK
    • Specifies the action to perform with the controlled output if the number of unresponsive IPs reaches the disconectedIPs threshold.
    • The action numbers are the same as for actionOK.
       
  • periodAction
    • Specifies whether actionNoOK is performed in every cycle after the maxMissedCycles threshold is exceeded, or only once.
       
  • shortTimeMs
    •  Specifies (in milliseconds) for how long the output stays off or on for actions 2 and 3 respectively.
    •  The minimum value is 100 ms.
    •  Example – for 2 seconds between state changes: shortTimeMs = 2000

 

Starting the script

After configuring all the parameters and saving the script, the NETIO 4x smart sockets device needs to be restarted. After the device reboots, the script is started and the accessibility of the defined IP addresses begins to be checked.

 

The log will contain:

  • Failure of a device, with a timestamp
  • Alarm activation (output turned on / off)
  • Number of unresponsive IPs at a given time

 

FAQ:

 

1) Is it possible to monitor more than 10 IP addresses?

Yes, the IPs variable can contain as many IP addresses as needed. The 10 IPs are chosen as an example. Having more IPs extends the reaction time; allow for 3-5 seconds per IP address for the check.

 

2) Is it possible to send an e-mail in response to an outage?

Of course; however, e-mail must be correctly configured in the Settings - Email section. Then, insert the following command into the doAction() or doOKAction() function (depending on when the e-mail should be sent):

mail("my.email@gmail.com","NETIO IP Watchdog","Too many disconnected IP addresses")

 

3) Where can I see which device out of the 10 monitored devices has failed?

Information about unresponsive devices is recorded in the device log.

 

4) Why check for failure of exactly 2 out of 10 devices?

This is just an example; the number of devices can be set in the disconectedIPs variable.

 


 

Supported FW versions:

3.0.0 and higher (Firmware archive)

More about Lua:

https://wiki.netio-products.com

 


 

 

This Application Note is compatible with:

Smart power socket NETIO

 

NETIO 4

NETIO 4 is smart power socket (smart power strip) with four 230V/8A sockets, connected to LAN and WiFi. Each of the four power sockets can be individually switched on/off using various M2M API protocols. NETIO 4 is a unique product designed for IT, industry, smart homes, multimedia installations and other applications. Use the product whenever you need 230V sockets controlled by a mobile app, by a computer program (via M2M API) or by a custom script (Lua), and featuring a timer (Scheduler) or auto reboot functionality (IP WatchDog). 

More about NETIO 4

 

Smart power socket NETIO 4All

 

NETIO 4All

NETIO 4All is a PDU module featuring four 230V/8A power sockets with consumption metering for each socket as well as LAN and WiFi connectivity. Each of the four sockets can be individually switched on/off over the Web or using various M2M API protocols. Electricity consumption (A, W, kWh) can be measured at each power socket. NETIO 4All smart sockets are designed for remote measurement and control of electrical sockets. Use the product whenever you need 230V sockets controlled by a mobile app, by a computer program (via M2M API) or by a custom script (Lua) that runs directly in the NETIO 4All smart socket device.

More about NETIO 4All

 
NETIO PowerPDU 4C is small PDU with power measurement and IEC-320 outputs

 

NETIO PowerPDU 4C

NETIO PowerPDU 4C is a small 110/230V PDU (Power Distribution Unit). Each of the four IEC-320 C13 outlets can be independently controlled (On / Off / Reset / Toggle). Electrical parameters (A, W, kWh, TPF, V, Hz) are measured with high accuracy at each outlet. The device features two LAN ports (and a built-in Ethernet switch) for connecting to a LAN. Each power output supports ZCS (Zero Current Switching) to protect the connected equipment.

More about NETIO PowerPDU 4C

 

Ask for a price or technical parameters

For device testing use name/password demo/demo