View Revisions: Issue #16521

Summary 0016521: Combination of kernel + iputils leads to erroneously reported packet loss
Revision 2019-10-04 00:41 by toracat
Description This is a kernel bug that can be worked around in iputils.

With 3.10.0-1062.1.1.el7.x86_64, apparently IPv4 ICMP Redirects are reported as socket errors and show up in MSG_ERRQUEUE. This can be seen from the following output:

[root@scratch ~]# uname -a
Linux scratch 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@scratch ~]# ping -c5 192.168.0.20
PING 192.168.0.20 (192.168.0.20) 56(84) bytes of data.
64 bytes from 192.168.0.20: icmp_seq=1 ttl=63 time=0.501 ms
From 192.168.2.1 icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
From 192.168.2.1: icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=2 ttl=64 time=0.682 ms
64 bytes from 192.168.0.20: icmp_seq=3 ttl=64 time=0.819 ms
64 bytes from 192.168.0.20: icmp_seq=4 ttl=64 time=0.859 ms

--- 192.168.0.20 ping statistics ---
4 packets transmitted, 4 received, +1 errors, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.501/0.715/0.859/0.141 ms

Receiving redirects may also result in:

 4 packets transmitted, 3 received, +2 errors, 25% packet loss

Which will trigger an error in nagios.

Compare the same output using the Xen4CentOS kernel:

[root@scratch ~]# uname -a
Linux scratch 4.9.188-35.el7.x86_64 #1 SMP Wed Aug 7 11:27:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@scratch ~]# ping -c5 192.168.0.20
PING 192.168.0.20 (192.168.0.20) 56(84) bytes of data.
64 bytes from 192.168.0.20: icmp_seq=1 ttl=63 time=0.469 ms
From 192.168.2.1: icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=2 ttl=64 time=0.793 ms
From 192.168.2.1: icmp_seq=3 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=3 ttl=64 time=0.623 ms
64 bytes from 192.168.0.20: icmp_seq=4 ttl=64 time=0.563 ms
64 bytes from 192.168.0.20: icmp_seq=5 ttl=64 time=0.530 ms

--- 192.168.0.20 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4037ms
rtt min/avg/max/mdev = 0.469/0.595/0.793/0.113 ms

The attached patch to iputils also mitigates the issue, and is probably easier to apply than trying to isolate the kernel bug.
Revision 2019-10-02 21:11 by sarahn
Description This is a kernel bug that can be worked around in iputils.

With 3.10.0-1062.1.1.el7.x86_64, apparently IPv4 ICMP Redirects are reported as socket errors and show up in MSG_ERRQUEUE. This can be seen from the following output:

[root@scratch ~]# uname -a
Linux scratch 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@scratch ~]# ping -c5 192.168.0.20
PING 192.168.0.20 (192.168.0.20) 56(84) bytes of data.
64 bytes from 192.168.0.20: icmp_seq=1 ttl=63 time=0.501 ms
From 192.168.2.1 icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
From 192.168.2.1: icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=2 ttl=64 time=0.682 ms
64 bytes from 192.168.0.20: icmp_seq=3 ttl=64 time=0.819 ms
64 bytes from 192.168.0.20: icmp_seq=4 ttl=64 time=0.859 ms

--- 192.168.0.20 ping statistics ---
4 packets transmitted, 4 received, +1 errors, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.501/0.715/0.859/0.141 ms

Receiving redirects may also result in:

 4 packets transmitted, 3 received, +2 errors, 25% packet loss

Which will trigger an error in nagios.

Compare the same output using the Xen4CentOS kernel:

[root@scratch ~]# uname -a
Linux scratch 4.9.188-35.el7.x86_64 #1 SMP Wed Aug 7 11:27:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@scratch ~]# ping -c5 192.168.0.20
PING 192.168.0.20 (192.168.0.20) 56(84) bytes of data.
64 bytes from 192.168.0.20: icmp_seq=1 ttl=63 time=0.469 ms
From 192.168.2.1: icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=2 ttl=64 time=0.793 ms
From 192.168.2.1: icmp_seq=3 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=3 ttl=64 time=0.623 ms
64 bytes from 192.168.0.20: icmp_seq=4 ttl=64 time=0.563 ms
64 bytes from 192.168.0.20: icmp_seq=5 ttl=64 time=0.530 ms

--- 192.168.0.20 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4037ms
rtt min/avg/max/mdev = 0.469/0.595/0.793/0.113 ms

The attached patch to iputils also mitigates the issue, and is probably easier to apply than trying to isolate the kernel bug.