View Issue Details

IDProjectCategoryView StatusLast Update
0016521CentOS-7iputilspublic2019-10-04 01:34
Reportersarahn 
PrioritynormalSeveritymajorReproducibilityalways
Status assignedResolutionopen 
Product Version7.7-1908 
Target VersionFixed in Version 
Summary0016521: Combination of kernel + iputils leads to erroneously reported packet loss
DescriptionThis is a kernel bug that can be worked around in iputils.

With 3.10.0-1062.1.1.el7.x86_64, apparently IPv4 ICMP Redirects are reported as socket errors and show up in MSG_ERRQUEUE. This can be seen from the following output:

[root@scratch ~]# uname -a
Linux scratch 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@scratch ~]# ping -c5 192.168.0.20
PING 192.168.0.20 (192.168.0.20) 56(84) bytes of data.
64 bytes from 192.168.0.20: icmp_seq=1 ttl=63 time=0.501 ms
From 192.168.2.1 icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
From 192.168.2.1: icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=2 ttl=64 time=0.682 ms
64 bytes from 192.168.0.20: icmp_seq=3 ttl=64 time=0.819 ms
64 bytes from 192.168.0.20: icmp_seq=4 ttl=64 time=0.859 ms

--- 192.168.0.20 ping statistics ---
4 packets transmitted, 4 received, +1 errors, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.501/0.715/0.859/0.141 ms

Receiving redirects may also result in:

 4 packets transmitted, 3 received, +2 errors, 25% packet loss

Which will trigger an error in nagios.

Compare the same output using the Xen4CentOS kernel:

[root@scratch ~]# uname -a
Linux scratch 4.9.188-35.el7.x86_64 #1 SMP Wed Aug 7 11:27:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@scratch ~]# ping -c5 192.168.0.20
PING 192.168.0.20 (192.168.0.20) 56(84) bytes of data.
64 bytes from 192.168.0.20: icmp_seq=1 ttl=63 time=0.469 ms
From 192.168.2.1: icmp_seq=2 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=2 ttl=64 time=0.793 ms
From 192.168.2.1: icmp_seq=3 Redirect Host(New nexthop: 192.168.0.20)
64 bytes from 192.168.0.20: icmp_seq=3 ttl=64 time=0.623 ms
64 bytes from 192.168.0.20: icmp_seq=4 ttl=64 time=0.563 ms
64 bytes from 192.168.0.20: icmp_seq=5 ttl=64 time=0.530 ms

--- 192.168.0.20 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4037ms
rtt min/avg/max/mdev = 0.469/0.595/0.793/0.113 ms

The attached patch to iputils also mitigates the issue, and is probably easier to apply than trying to isolate the kernel bug.
Steps To ReproducePut two different virtual machines on the same bridge but on different networks.

Assign the gateway IP address for each network to the bridge.

Ensure:

net.ipv4.conf.<bridge>.send_redirects=1

is set on the host.

Ping from one host to the other.
TagsNo tags attached.
abrt_hash
URL

Activities

sarahn

sarahn

2019-10-02 21:11

reporter  

iputils-do-not-count-icmp-redirect-errors-as-errors.patch (1,927 bytes)
From 5e2d2c9eea9f50ac518e6af258a6b2a33bf645de Mon Sep 17 00:00:00 2001
From: Sarah Newman <srn@prgmr.com>
Date: Tue, 1 Oct 2019 19:12:14 -0700
Subject: [PATCH] ping: do not count icmp redirect errors as errors

When sending a specific number of probes using a raw socket, currently
receiving an icmp redirect will cause the number of actual probes sent
to be less than the requested count, as shown by this santized nagios
debug output:

CMD: /usr/bin/ping -n -U -W 10 -c 5 example.com
Output: PING example.com (10.1.1.1) 56(84) bytes of data.
Output: From 10.2.2.2 icmp_seq=1 Redirect Host(New nexthop: 10.1.1.1)
Output: From 10.2.2.2: icmp_seq=1 Redirect Host(New nexthop: 10.1.1.1)
Output: 64 bytes from 10.1.1.1: icmp_seq=1 ttl=63 time=0.490 ms
Output: 64 bytes from 10.1.1.1: icmp_seq=2 ttl=63 time=0.613 ms
Output: 64 bytes from 10.1.1.1: icmp_seq=3 ttl=63 time=0.565 ms
Output: From 10.2.2.2 icmp_seq=4 Redirect Host(New nexthop: 10.1.1.1)
Output:
Output: --- example.com ping statistics ---
Output: 4 packets transmitted, 3 received, +2 errors, 25% packet loss,
time
3000ms
Output: rtt min/avg/max/mdev = 0.490/0.556/0.613/0.050 ms
PING CRITICAL - Packet loss = 25%, RTA = 0.56
ms|rta=0.556000ms;100.000000;500.000000;0.000000 pl=25%;2;6;0
100.000000:2% 500.000000:6%

Suppress counting icmp redirect errors as errors given that the program
should still expect to see a response.

Signed-off-by: Sarah Newman <srn@prgmr.com>
---
 ping.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ping.c b/ping.c
index 070cefe..9795f09 100644
--- a/ping.c
+++ b/ping.c
@@ -914,7 +914,10 @@ int ping4_receive_error_msg(socket_st *sock)
 		}
 
 		net_errors++;
-		nerrors++;
+		/* A redirect should not really indicate an error since we should still
+		   get the response. */
+		if (e->ee_type != ICMP_REDIRECT)
+		  nerrors++;
 		if (options & F_QUIET)
 			goto out;
 		if (options & F_FLOOD) {
-- 
2.17.1

mchapman

mchapman

2019-10-03 23:59

reporter   ~0035310

Possibly fixed in the kernel by:

commit 8d65b1190ddc548b0411477f308d04f4595bac57
Author: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri Sep 20 18:21:25 2013 +0800

    net: raw: do not report ICMP redirects to user space
    
    Redirect isn't an error condition, it should leave
    the error handler without touching the socket.
    
    Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
toracat

toracat

2019-10-04 00:41

manager   ~0035311

I see that the kernel patch referenced by @mchapman is not in the distro kernel but is in kernel-4.9.188 that @sarahn used to show the normal behavio(u)r.

CentOS can add the patch to the centosplus kernel (kernel-plus). To get it into the distro kernel, this needs to be reported upstream at http://bugzilla.redhat.com . Then CentOS kernel will inherit it.
sarahn

sarahn

2019-10-04 01:34

reporter   ~0035313

Thanks @mchapman and @toracat. Submitted https://bugzilla.redhat.com/show_bug.cgi?id=1758386

Issue History

Date Modified Username Field Change
2019-10-02 21:11 sarahn New Issue
2019-10-02 21:11 sarahn File Added: iputils-do-not-count-icmp-redirect-errors-as-errors.patch
2019-10-03 23:59 mchapman Note Added: 0035310
2019-10-04 00:41 toracat Note Added: 0035311
2019-10-04 00:41 toracat Status new => assigned
2019-10-04 00:41 toracat Description Updated View Revisions
2019-10-04 00:41 toracat Steps to Reproduce Updated View Revisions
2019-10-04 01:34 sarahn Note Added: 0035313