Friday, 2021-12-17

opendevreviewVerification of a change to openstack/ironic-python-agent master failed: [trivial] Fix typo in __init__.py  https://review.opendev.org/c/openstack/ironic-python-agent/+/82204900:04
*** sshnaidm is now known as sshnaidm|afk02:45
arne_wiebalckGood morning, Ironic!07:27
jingvar\007:36
rpittaugood morning ironic! o/07:44
*** amoralej|off is now known as amoralej08:23
jandershey arne_wiebalck rpittau and Ironic o/09:12
arne_wiebalckhey janders o/09:13
arne_wiebalckhey rpittau and jingvar o/09:13
ajyaHi, happy Friday! Can 2nd core take a look at this https://review.opendev.org/c/openstack/ironic/+/821576 ? It's tiny.09:19
rpittauhey arne_wiebalck janders ajya :)09:34
rpittauHappy Friday!09:34
rpittauajya: done09:35
ajyathanks, rpittau 09:35
holtgrewearne_wiebalck: for feedback, I was able to create a UEFI buildable image with dib but it does not boot when installed on a software RAID110:17
arne_wiebalckholtgrewe: it does boot w/o RAID?10:21
holtgrewearne_wiebalck: yes10:23
holtgrewebut let me triple-check10:24
arne_wiebalckholtgrewe: can you see on the console where it gets stuck?10:25
arne_wiebalckholtgrewe: also, check the deploy logs (on the conductor in /var/log/ironic/deploy) to see if the IPA complained about anything10:25
arne_wiebalckholtgrewe: which release is this again?10:26
holtgrewearne_wiebalck: OS xena installed with kayobe/kolla and I'm trying to boot a CentOS7.9 image.10:28
arne_wiebalckholtgrewe: does the image have support for md and the rootfs UUID as metadata ?10:30
holtgrewearne_wiebalck: good questions. I'm running the installation on a non-software RAID right now. I can answer the first question after looking whether mdadm is present, right? How would I find the answer to your second question?10:31
holtgrewearne_wiebalck: The conductor has the following log entry. Could not get 'rootfs_uuid' property for image a4aa2287-a543-4e2e-ada5-57f4e279ed18 from Glance for node 9f0fe673-02a7-403a-b30f-f9dc30fa2ac3. KeyError: 'rootfs_uuid'.10:33
holtgreweOK, so I gather your meta data question refers to the meta data in OS/glance?10:36
*** redrobot6 is now known as redrobot10:37
holtgreweOK, mdmadm package missing inside *sigh*, so the answer to your first question is no, obviously10:38
arne_wiebalckholtgrewe: sorry, got distracted10:42
arne_wiebalckholtgrewe: yes, mdadm/kernel module in the image10:42
arne_wiebalckholtgrewe:  `openstack image show` should have `rootfs_uuid` as a property (this is maybe not needed for UEFI since the EFI content is  copied over)10:44
holtgrewearne_wiebalck: `disk-image-create -p mdadm` should be enough to get md support for CentOS7.9 as the kernel contains the module10:46
holtgreweI guess.10:46
arne_wiebalckholtgrewe: yes10:51
holtgrewearne_wiebalck: Ah, I can set the --root-label with disk-image builder. ShouldI just pass in a random UUID there?10:51
arne_wiebalckholtgrewe: it should be the UUID that is used in the image10:52
arne_wiebalckholtgrewe: so, if you have an instance, grab the UUID and make it a property10:52
arne_wiebalckholtgrewe: Ironic will need this UUID to find the rootfs and mount it10:52
arne_wiebalckholtgrewe: to run grub2-install10:53
holtgreweOK, rebuilding the image now.10:54
holtgrewearne_wiebalck: so this is what blkid would give me for the file system to be mounted at "/"? It tells me >>/dev/nbd0p3: LABEL="cloudimg-rootfs" UUID="967cd880-ab18-4bd4-a92c-976131bb6ab3" TYPE="ext4" PARTLABEL="root" PARTUUID="19b7fce5-ba7b-4bbb-9769-fd50e2a57137"<<10:58
holtgreweor is it the PTUUID of the device? no, you said file system10:59
arne_wiebalck"967cd880-ab18-4bd4-a92c-976131bb6ab3"11:01
holtgreweGreat. I think that I can even set this explicitely when providing DIB_BLOCK_DEVICE_CONFIG.11:01
holtgreweIt looks like dib is a pretty sharp tool in my box after all...11:02
arne_wiebalckholtgrewe: oh, wasn't aware, nice11:05
* holtgrewe is cleaning machine and applying RAID configuration ...11:15
* holtgrewe is deploing the image...11:19
holtgrewehttps://paste.openstack.org/show/811743/12:08
holtgrewearne_wiebalck: these are my notes, is this helpful enough for your docs?12:08
holtgreweyikes, it boots into dracut ... https://paste.openstack.org/show/811744/12:09
holtgrewe*sigh* one more round12:09
holtgreweI guess the problem is the kernel command line >>BOOT_IMAGE=/boot/vmlinuz-3.10.0-1160.49.1.el7.x86_64 root=LABEL=img-rootfs ro console=tty0 crashkernel=auto net.ifnames=0 console=ttyS0 console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset gfxpayload=text12:13
holtgreweyeah, you have to provide this to disk-image-create12:18
arne_wiebalckholtgrewe: we have `rd.auto` to auto-assemble the md devices12:37
arne_wiebalckholtgrewe: and `root=UUID=` point to the rootfs UUID12:38
holtgrewearne_wiebalck: ok, adding rd.auto=1 now12:49
* holtgrewe oh the suspense13:04
* arne_wiebalck is crossing fingers13:11
holtgrewearne_wiebalck: sadly, no13:16
holtgrewecat /proc/cmdline => BOOT_IMAGE=/boot/vmlinuz-3.10.0-1160.49.1.el7.x86_64 root=LABEL=root_fs ro console=tty0 crashkernel=auto net.ifnames=0 console=ttyS0 console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset gfxpayload=text rd.auto=113:16
holtgreweno /dev/md013:16
arne_wiebalckholtgrewe: you logged into the instance?13:17
holtgreweI'm stuck in dracut after boot on serial console.13:17
holtgreweno mdadm in dracut13:18
*** amoralej is now known as amoralej|lunch13:19
arne_wiebalckholtgrewe: yep, that is needed ... IIRC we had this removed by accident as well at some point13:20
holtgreweok, looks like I want to have the dracut-regenerate element13:23
holtgrewehere we go again13:28
holtgreweWherever this goes, thanks a lot! I would never have gotten that far without you.13:28
arne_wiebalckholtgrewe: np, let's see if we can make it work :)13:29
holtgrewearne_wiebalck: heh, I'm learning things that I never intended to learn... all I want is to do some science ;-)13:35
arne_wiebalckholtgrewe: what is the context of your Ironic deployment if I may ask?13:36
holtgreweI'm replacing the infrastructure of our "mid size" HPC system. It used to be proxmox for VM and xcat for bare metal.13:38
holtgreweit's about 250 nodes plus 3 ... mid-sied ceph clusters13:39
holtgreweonly a "few" PB of storage of HDD and only a "few" 100 TB of NVME (ceph)13:40
holtgreweNothing compared to CERN but quite something for our life science context.13:40
holtgrewePlus we are running a number of data management and analysis systems that used to run in a separate proxmox cluster.13:43
holtgreweOK, reinstall appears to be through, I did not do the cleaning step this time, maybe that was a mistake.13:44
holtgrewe"md/raid1:md127: not clean -- starting background reconstruction"13:46
holtgreweand a thousand times "dracut-initqueue[969]: Warning: dracut-initqueue timeout - starting timeout scripts"13:48
holtgreweI think the clean step would have been necessary13:49
arne_wiebalckprobably, we even recreate the s/w RAID on every cleaning13:50
holtgreweyeah, that's also kind of for free, the expensive part is boothing the machine into IPA, the raid operations are instanteneous13:53
holtgreweif you were to wipe the disks that would probably be dominating time13:54
holtgreweNot to forget the "cooling off" time between making a node available and it being actually available by nova. ;-) You have to love distributed systems with async calls.13:56
arne_wiebalckdepends on how you wipe (shred may take days, secure erase may only take seconds)13:56
arne_wiebalckholtgrewe: thanks for the infra overview!13:56
arne_wiebalckholtgrewe: and for the paste with the steps!13:57
holtgreweI'll create a new paste once I have it working end-to-end.13:57
arne_wiebalckholtgrewe: cool, ty13:58
*** amoralej|lunch is now known as amoralej14:01
rpittauhey if anyone has a minute I added all the classifier patches to ironic-week-prio, they require just an approval and should be a very quick review, thanks!14:14
rpittaummm I ahve the terrible suspect that something's off either with pbr or with pip14:19
rpittauprobably pip14:20
rpittauorrr could be setuptools also14:21
holtgrewearne_wiebalck: this works now https://paste.openstack.org/show/811746/14:37
arne_wiebalckholtgrewe: it booted off the s/w RAID now?14:40
holtgrewearne_wiebalck: I'm 99% certain14:43
holtgreweI have to rebuild the image with a devueser now14:43
holtgreweto figure out what's causing my issues why cloud-init is not working14:43
holtgreweit got over the dracut14:43
holtgreweearlier, I forgot to put the dracut-regenerate element which is why it did not work14:44
arne_wiebalckholtgrewe: right14:48
arne_wiebalckholtgrewe: sounds like progress, though14:48
holtgrewein ~5min I should know whether it really worked14:50
holtgreweand then on to the next problem14:50
TheJuliagood morning14:50
TheJuliaHappy Friday!14:53
holtgrewearne_wiebalck: I can confirm that it now booted from an md raid1 array!15:50
arne_wiebalckholtgrewe: awesome15:50
arne_wiebalckGood morning, TheJulia o/15:50
holtgreweso the latest notes paste should be fine15:50
holtgreweNow on to the next riddle... how is the ironic host supposed to get its IP address in "flat" network mode?15:51
rpittaubye everyone, have a great weekend! o/15:51
TheJuliaholtgrewe: not sure I grok the question your seeking to answer15:53
TheJuliathe baremetal node, or.... the conductor host?15:53
holtgreweTheJulia: I got my baremetal/ironic node to boot an UEFI image at last via nova. I attached a port with a static IP as I usually would for VMs. I put configdrive userdata via Ansible as I usually would. The node bootes up and has dhcp running.15:55
holtgreweShould it get its IP via dhcp from the ironic neutron agent as it does on deployment?15:55
TheJuliaholtgrewe: generally that is how people do it15:57
holtgreweTheJulia: OK... so at least I understood how it should work. Now I can try to figure out where things go wrong. Thanks.16:01
holtgreweOK... enough for today. Thanks all, have a nice weekend o/16:08
TheJuliaholtgrewe: okay, have a wonderful weekend!16:10
arne_wiebalckholtgrewe: o/16:10
holtgreweI'll also setup UEFI+software RAID1 setup with Rocky 8.x once I got my GPFS upgrade through and will share the command lines that worked for me.16:11
holtgreweIs the format from my paste above enough?16:11
holtgreweI could also wrap this in some explanatory text if it helps you. If you point me at the right place in the repositories I can also add a documentation patch.16:13
holtgreweanyway, off for today16:13
*** holtgrewe is now known as holtgrewe^gone16:13
arne_wiebalckholtgrewe^gone: if you would do that, that would be great ofc16:13
arne_wiebalckholtgrewe^gone: should go to the admin section in the ironic repo16:14
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: An idea for rbac positive/negative testing  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/81916516:41
opendevreviewMerged openstack/ironic master: Fix redfish update_firmware for newer Sushy  https://review.opendev.org/c/openstack/ironic/+/82157617:53
*** amoralej is now known as amoralej|off17:55
arne_wiebalckbye everyone, see18:05
arne_wiebalck you next week o/18:05
opendevreviewMerged openstack/ironic-python-agent master: [trivial] Fix typo in __init__.py  https://review.opendev.org/c/openstack/ironic-python-agent/+/82204918:22
-opendevstatus- NOTICE: The review.opendev.org server is being rebooted to validate a routing configuration update, and should return to service shortly22:28

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!