View Issue Details

IDProjectCategoryView StatusLast Update
0013836CentOS-7Cloud-Imagespublic2018-12-06 01:12
Reporterfmbiete 
PriorityhighSeveritymajorReproducibilityalways
Status feedbackResolutionopen 
PlatformAmazon Web ServicesOSCentOSOS Version7.4
Product Version 
Target VersionFixed in Version 
Summary0013836: RHBA-2017:2283 - AWS AMI rebuild with ENA support
DescriptionWe need to rebuild the CentOS 7 AWS AMI to include the latest fixes and improvements from upstream:

https://access.redhat.com/errata/RHBA-2017:2283
https://bugzilla.redhat.com/show_bug.cgi?id=1410047

The changes from 7.3 are not included either.
TagsNo tags attached.
abrt_hash
URL

Activities

fmbiete

fmbiete

2017-10-14 10:45

reporter   ~0030373

This only requires adding 2 flags to the AMI build process, when registering the image into AWS.

--sriov-net-support simple --ena-support

No change in the kickstart part is required

Example:

aws ec2 register-image --name 'CentOS-7 HVM with updates' --description 'fixed networking support' --virtualization-type hvm --root-device-name /dev/xvda1 --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs": { "SnapshotId": "snap-whatever", "VolumeSize":8, "DeleteOnTermination": false, "VolumeType": "gp2"}}]' --architecture x86_64 --sriov-net-support simple --ena-support
fmbiete

fmbiete

2017-12-18 08:40

reporter   ~0030775

No activity in this ticket, but AMI 1708_11.01 includes those attributes (ENA & SRIOV support). I suppose someone noticed that CentOS AMI couldn't run in the new instance types. Is this bugzilla even read by the Cloud/Virt members?

{
            "ProductCodes": [
                {
                    "ProductCodeId": "aw0evgkw8e5c1q413zgy5pjce",
                    "ProductCodeType": "marketplace"
                }
            ],
            "Description": "CentOS Linux 7 x86_64 HVM EBS 1708_11.01",
            "VirtualizationType": "hvm",
            "Hypervisor": "xen",
            "ImageOwnerAlias": "aws-marketplace",
            "EnaSupport": true,
            "SriovNetSupport": "simple",
            "ImageId": "ami-192a9460",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "Encrypted": false,
                        "DeleteOnTermination": false,
                        "VolumeType": "standard",
                        "VolumeSize": 8,
                        "SnapshotId": "snap-013406753fcf8e3df"
                    }
                }
            ],
            "Architecture": "x86_64",
            "ImageLocation": "aws-marketplace/CentOS Linux 7 x86_64 HVM EBS 1708_11.01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-95096eef.4",
            "RootDeviceType": "ebs",
            "OwnerId": "679593333241",
            "RootDeviceName": "/dev/sda1",
            "CreationDate": "2017-12-05T14:49:45.000Z",
            "Public": true,
            "ImageType": "machine",
            "Name": "CentOS Linux 7 x86_64 HVM EBS 1708_11.01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-95096eef.4"
        },
arronax

arronax

2017-12-27 11:02

reporter   ~0030829

Stumbled upon this issue while investigating ENA on the older AMIs, so I think it could be useful to update it. There are quite a lot of other issues on C5/M5 in the bug list, too, but this is the one popping up on `centos7 ena` search.


Short summary: some instances started with ami-192a9460 may show connectivity issues. Can be fixed by restart or updating network device names.


As fmbiete writes, ami-192a9460 does include necessary flags, and it even has everything for ENA in the system itself, but it MAY NOT properly work with newer instance types without some fixes.

Sometimes, if you spin up an m5 instance off ami-192a9460, it will start up (no complains on ENA from AWS) but you won't be able to connect to it. AWS EC2 console shows "Instance reachability check failed"

I started up a bunch of instances, most m5, and one c5, and hit the issue twice. Actually, I hit the issue on the first one I started, so that's why I looked into it. I later hit this again on 5th instance started.

I checked this a bit and was able to make ENA work on "bad" instance started from ami-192a9460, at least according to checks provided by AWS documentation.

Documentation is here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

According to the doc, system might show connection issues if there's no eth0 device present. I changed instance type to m4 and confirmed device is named ens3. Renamed ens3 to eth0 and disabled Predictable Network Interface Names. After that, once I made sure that cloud-init is happy with new device name and generates ifcfg-eth0 properly, I changed instance type again to m5. Instance then started up and I was able to connect to it, and it generally seems to be fine.

I also noted following: when m5 instance starts up properly, device name seems to always be ens5.

Later I found that restarting a faulty instance may also fix it, and device name is ens5 after that. This makes me believe that another restart might actually break this again, if device name changes. However, few restarts didn't show this behavior, so maybe I'm wrong.


It seems that updating AMI and setting default device name to eth0 would be the best way to prevent instances from being faulty in the first place.


Following steps were needed to make "bad" instance based on ami-192a9460 work properly:
* start instance as m4 or anything non-ENA
* add `net.ifnames=0` to `GRUB_CMDLINE_LINUX` line in `/etc/default/grub` and regenerate grub config with `grub2-mkconfig -o /boot/grub2/grub.cfg`
* update `/etc/udev/rules.d/70-persistent-net.rules` and rename device to eth0
* change instance type to m5 and start it
* check that ena is used as a driver for eth0: `ethtool -i eth0`

Note: not sure disabling predictable interface names is even necessary, but it's in the docs.
siebrand

siebrand

2018-01-23 20:57

reporter   ~0031010

> AMI 1708_11.01 includes those attributes (ENA & SRIOV support)

Unfortunately the latest CentOS images (1801_01) do not appear to contain these settings, as they cannot be used with instance type m5 on AWS.

(I've tried to look really hard for a up to date changelog or release notes for the cloud images, but I wasn't able to find any -- expected was https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7 or similar, but that doesn't appear to mention any cloud image builds).
luckyknight

luckyknight

2018-01-30 16:19

reporter   ~0031112

Can confirm that the latest AWS image does not contain these settings and will not let me build a C5 or M5 server due to lack of ENA.

This is the message from AWS:

"Instance type is disabled.
This instance type requires ENA support. To enable this instance type, return to the previous step and select an AMI that is enabled for ENA."
nan008

nan008

2018-02-05 15:32

reporter   ~0031157

Any news if this issue will be fixed soon?

M5s are cheaper and we would like to replace the existing machines we have with m5. They all run Centos 7 latest build

We are getting the same error as luckyknight above my post after trying to launch the ami-6e28b517 and ami-4bf3d731 (1801_01) from AWS Marketplace. M5/C5 are greyed out lacking ENA support
icemanncsu

icemanncsu

2018-02-09 21:48

reporter   ~0031200

You can use AMI: ami-02e98f78 in the mean time. Just add this to your user-data and it will be updated to 1801 and rebooted ready to use!

---------
package_upgrade: true

power_state:
  delay: "now"
  mode: reboot
  message: Rebooting-System
  timeout: 30
  condition: True
---------
icemanncsu

icemanncsu

2018-03-12 15:56

reporter   ~0031415

Is there no update on this issue?
siebrand

siebrand

2018-03-12 16:16

reporter   ~0031416

> Is there no update on this issue?

Apparently not. I've tried to find someone on IRC, no luck. I've sent a tweet to someone who thought might have updated the wiki page about CentOS cloud images recently, and no reply. I'm at a loss on who to even poke.
icemanncsu

icemanncsu

2018-03-29 17:47

reporter   ~0031525

How does an issue with Priority: high & Severity: major, go months with no action?
BlueH2O

BlueH2O

2018-04-03 20:56

reporter   ~0031552

Hoping someone blows the dust off this report soon!
jwitko

jwitko

2018-04-04 17:02

reporter   ~0031561

Hi, I've also been waiting for this fix since October 2017. It would be greatly appreciated to be applied to CentOS 6 as well.
lucy

lucy

2018-04-08 15:40

reporter   ~0031585

I noticed that version 1803_01 was added to the AWS marketplace, and it supports ENA!
AMI ID in N.Virginia is ami-b81dbfc5

Did anyone try it already?
jwitko

jwitko

2018-04-08 16:17

reporter   ~0031586

Yes, I have the AWS ENA drivers installed and running on M5 instances with CentOS 7 1803_01 AMI.

It does not appear that CentOS 6 1803_01 has the requirements to use the ENA drivers.
lucy

lucy

2018-04-08 17:01

reporter   ~0031587

When you say that you have the drivers installed, you mean that you installed them manually, or that you confirm that the ones that come with the image by default work properly?
Also, does anyone know where the release notes of CentOS 7 1803_01 are? or at least some list of changes from the previous version?
Thanks!
jordan.davies

jordan.davies

2018-04-19 20:17

reporter   ~0031641

This also applies to the x1e instance types.

Any update on when CentOS 7 will be compatible with the newer instance types?
guss77

guss77

2018-05-01 05:59

reporter   ~0031695

I have used the new AMI - CentOS Linux 7 x86_64 HVM EBS ENA 1803_01 - successfully to launch c5.large serves.

The network interface name after launch is "ens5"

Unfortunately, this AMI is not published anyway, and specifically not on Centos 7 cloud image list - so automated scripts that read this to update their AMI list will not pick it up.
kbsingh@karan.org

kbsingh@karan.org

2018-05-03 13:59

administrator   ~0031704

you are using an unpublished AMI ? Can you elaborate a bit ? This is infact the marketplace listed ami as of many weeks ago.
guss77

guss77

2018-05-03 15:36

reporter   ~0031709

Unfortunately, I don't understand the AWS Marketplace - it looks like they specifically work very hard to not let you see what AMI you'll be actually running until you go through the entire Marketplace launch wizard - so I don't know how the AMI I'm currently using relates to the CentOS 7 Marketplace entry.

I know that it isn't listed in https://wiki.centos.org/Cloud/AWS , which is my reference to official AMI images, but if you go to the AMI registry and search for the text I listed above, you can see it. Example on us-east-1:

$ aws ec2 describe-images --filters Name=owner-alias,Values=aws-marketplace Name=name,Values='CentOS Linux 7 x86_64 HVM EBS ENA*' --output text
IMAGES x86_64 2018-04-04T00:11:39.000Z CentOS Linux 7 x86_64 HVM EBS ENA 1803_01 True xen ami-0ebdd976 aws-marketplace/CentOS Linux 7 x86_64 HVM EBS ENA 1803_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-8274d6ff.4aws-marketplace machine CentOS Linux 7 x86_64 HVM EBS ENA 1803_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-8274d6ff.4 679593333241 True /dev/sda1 ebs simple available hvm
BLOCKDEVICEMAPPINGS /dev/sda1
EBS False False snap-0b665edcc96bbb410 8 gp2
PRODUCTCODES aw0evgkw8e5c1q413zgy5pjce marketplace
vsheffer

vsheffer

2018-11-02 17:46

reporter   ~0033054

Is there any update for this? I would like to use x1e instance types that don't work with ami-9887c6e7.
siebrand

siebrand

2018-11-04 19:21

reporter   ~0033066

> Is there any update for this?

CentOS 7 (x86_64) - with Updates HVM version 1805_01 was built with ENA support. I'm using it for m5/c5 instances with no problems at all in eu-west-1 as ami-3548444c.

$ modinfo ena
filename: /lib/modules/3.10.0-862.14.4.el7.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz
version: 1.2.0k
license: GPL
description: Elastic Network Adapter (ENA)
author: Amazon.com, Inc. or its affiliates
retpoline: Y
rhelversion: 7.5
srcversion: 15437686CDA6E16E3FE5EE2
alias: pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias: pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias: pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias: pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
intree: Y
vermagic: 3.10.0-862.14.4.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: E4:A1:B6:8F:46:8A:CA:5C:22:84:50:53:18:FD:9D:AD:72:4B:13:03
sig_hashalgo: sha256
parm: debug:Debug level (0=none,...,16=all) (int)

Is it possible that 1805_01 not working for x1e instances has another cause?
vsheffer

vsheffer

2018-11-05 02:09

reporter   ~0033071

Before, more or less, randomly finding a Chef based CentOS AMI, I came across the official CentOS/AWS Marketplace AMI. I assumed it would be a better version.

After having my X1e instance reject Packer built AMIs based on the CentOS AMI, I started investigating this issue in detail.

My conclusion is simple:

Do not use the CentOS AMI from the AWS Marketplace.

A very good AMI, from our experience, is the following (or the DCOS version from which it is derived):

https://github.com/irvingpop/packer-chef-highperf-centos7-ami

It not only supports x1e instance types, but provides better transparency as to the contents of the AMI than what centos.org is currently providing. The CentOS organization does a great job in so many areas, but not in providing AMIs.
pakdel

pakdel

2018-11-19 18:20

reporter   ~0033128

I have been creating m4 CentOS 7 instances, and all of the recent ones have ENA . This is the AMI of the one I am using in eu-west-1 :ami-3548444c

However, they also come with /etc/sysconfig/network-scripts/ifcfg-eth0, which causes confusion for the network service, because it cannot restart dhclient properly. Consequently, systemctl status network fails.
This is what I have to do every time I create a new instance, if there is no eth0:
> rm -i /etc/sysconfig/network-scripts/ifcfg-eth0
> for PATH_DHCLIENT_PID in /var/run/dhclient*
> do
> export PATH_DHCLIENT_PID
> dhclient -r
> # Making sure it really truly stopped
> kill $(<PATH_DHCLIENT_PID)
> rm -f $PATH_DHCLIENT_PID
> done
> systemctl restart network
alau

alau

2018-12-06 01:12

reporter   ~0033235

When CentOS Linux 7 x86_64 HVM EBS ENA 1805_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-77ec9308.4 (ami-d8c21dba in ap-southeast-2)
is launched on any EC2 instance type that supports Enhanced Networking via the ena or ixgbevf drivers, network.service actually fails:

# systemctl status network.service
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2018-12-06 00:38:24 UTC; 20min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 588 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)
   CGroup: /system.slice/network.service
           └─779 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--ens5.lease -pf /var/run/dhclient-ens5.pid ens5

Dec 06 00:38:22 localhost.localdomain dhclient[719]: DHCPACK from 172.31.32.1 (xid=0x66b200d6)
Dec 06 00:38:24 ip-172-31-43-240 dhclient[719]: bound to 172.31.43.240 -- renewal in 1389 seconds.
Dec 06 00:38:24 ip-172-31-43-240 network[588]: Determining IP information for ens5... done.
Dec 06 00:38:24 ip-172-31-43-240 network[588]: [ OK ]
Dec 06 00:38:24 ip-172-31-43-240 network[588]: Bringing up interface eth0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device eth0 does not seem to be present, delaying initialization.
Dec 06 00:38:24 ip-172-31-43-240 network[588]: [FAILED]
Dec 06 00:38:24 ip-172-31-43-240 systemd[1]: network.service: control process exited, code=exited status=1
Dec 06 00:38:25 ip-172-31-43-240 systemd[1]: Failed to start LSB: Bring up/down networking.
Dec 06 00:38:25 ip-172-31-43-240 systemd[1]: Unit network.service entered failed state.
Dec 06 00:38:25 ip-172-31-43-240 systemd[1]: network.service failed.

As arronax@ has pointed out this is because Predictable Network Interface Names which causes ens3 or ens5 to be used instead of eth0.
Because PERSISTENT_DHCLIENT="1" has also been set in ifup-eth0, network.service will fail when restarted because it'll try to spawn another instance of dhclient that'll clash with already running instance.

# systemctl status network.service
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2018-12-06 01:07:32 UTC; 6s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 1441 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)
   CGroup: /system.slice/network.service
           └─779 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--ens5.lease -pf /var/run/dhclient-ens5.pid ens5

Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal network[1441]: RTNETLINK answers: File exists
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal network[1441]: RTNETLINK answers: File exists
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal network[1441]: RTNETLINK answers: File exists
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal network[1441]: RTNETLINK answers: File exists
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal network[1441]: RTNETLINK answers: File exists
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal network[1441]: RTNETLINK answers: File exists
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal systemd[1]: network.service: control process exited, code=exited status=1
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal systemd[1]: Failed to start LSB: Bring up/down networking.
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal systemd[1]: Unit network.service entered failed state.
Dec 06 01:07:32 ip-172-31-43-240.ap-southeast-2.compute.internal systemd[1]: network.service failed.

Please set
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=0"
in /etc/default/grub for you next AMI release.

--
Andrew Lau | EC2 Support Operations | Amazon Web Services (Sydney)

Issue History

Date Modified Username Field Change
2017-09-17 10:45 fmbiete New Issue
2017-10-14 10:45 fmbiete Note Added: 0030373
2017-12-18 08:40 fmbiete Note Added: 0030775
2017-12-27 11:02 arronax Note Added: 0030829
2018-01-23 20:57 siebrand Note Added: 0031010
2018-01-30 16:19 luckyknight Note Added: 0031112
2018-02-05 15:32 nan008 Note Added: 0031157
2018-02-09 21:48 icemanncsu Note Added: 0031200
2018-03-12 15:56 icemanncsu Note Added: 0031415
2018-03-12 16:16 siebrand Note Added: 0031416
2018-03-29 17:47 icemanncsu Note Added: 0031525
2018-04-03 20:56 BlueH2O Note Added: 0031552
2018-04-04 17:02 jwitko Note Added: 0031561
2018-04-08 15:40 lucy Note Added: 0031585
2018-04-08 16:17 jwitko Note Added: 0031586
2018-04-08 17:01 lucy Note Added: 0031587
2018-04-19 20:17 jordan.davies Note Added: 0031641
2018-05-01 05:59 guss77 Note Added: 0031695
2018-05-03 13:59 kbsingh@karan.org Note Added: 0031704
2018-05-03 13:59 kbsingh@karan.org Status new => feedback
2018-05-03 15:36 guss77 Note Added: 0031709
2018-11-02 17:46 vsheffer Note Added: 0033054
2018-11-04 19:21 siebrand Note Added: 0033066
2018-11-05 02:09 vsheffer Note Added: 0033071
2018-11-19 18:20 pakdel Note Added: 0033128
2018-12-06 01:12 alau Note Added: 0033235