Tuesday, 2021-02-16

*** clarkb has joined #openstack-infra00:06
*** tosky has quit IRC00:30
*** mgoddard has quit IRC00:51
*** mgoddard has joined #openstack-infra00:57
*** dviroel has quit IRC01:06
*** noonedeadpunk has quit IRC01:19
*** noonedeadpunk has joined #openstack-infra01:21
*** zxiiro has quit IRC01:45
*** hamalq has quit IRC01:47
*** dmsimard has quit IRC02:16
*** mtreinish has quit IRC02:16
*** dmsimard has joined #openstack-infra02:17
*** rcernin has quit IRC02:31
*** zzzeek has quit IRC02:53
*** zzzeek has joined #openstack-infra02:55
*** prometheanfire has quit IRC02:58
*** prometheanfire has joined #openstack-infra03:01
*** prometheanfire has quit IRC03:22
*** rcernin has joined #openstack-infra03:23
*** rcernin has quit IRC03:28
*** rcernin has joined #openstack-infra03:28
*** prometheanfire has joined #openstack-infra03:36
*** ykarel has joined #openstack-infra03:52
*** ykarel_ has joined #openstack-infra03:58
*** ykarel has quit IRC04:01
*** ysandeep|away is now known as ysandeep|ruck04:54
*** ykarel_ has quit IRC04:57
*** vishalmanchanda has joined #openstack-infra05:27
*** ianw has quit IRC06:18
*** ianw has joined #openstack-infra06:19
*** zzzeek has quit IRC06:26
*** zzzeek has joined #openstack-infra06:29
*** slaweq has joined #openstack-infra06:35
*** mugsie has quit IRC06:37
tkajinamis anybody from infra team around ?06:47
tkajinamI noticed some of the CI jobs are not invoked, although we haven't made any changes in zuul configuration in individual repos06:48
tkajinamhttps://review.opendev.org/c/openstack/puppet-nova/+/76476306:48
tkajinamfor example now puppet-openstack-integraton-* jobs are not executed in check queue06:48
tkajinamis there any change made at infra layer ?06:48
*** ysandeep|ruck is now known as ysandeep|lunch07:00
*** jcapitao has joined #openstack-infra07:12
*** amoralej|off is now known as amoralej07:37
*** eolivare has joined #openstack-infra07:49
*** sboyron_ has joined #openstack-infra08:10
*** piotrowskim has joined #openstack-infra08:14
*** andrewbonney has joined #openstack-infra08:23
*** ysandeep|lunch is now known as ysandeep|ruck08:24
*** rpittau|afk is now known as rpittau08:37
*** rcernin has quit IRC08:40
*** tosky has joined #openstack-infra08:44
*** hashar has joined #openstack-infra08:45
*** dtantsur|afk is now known as dtantsur08:53
*** xek has joined #openstack-infra08:54
*** jpena|off is now known as jpena08:59
*** lucasagomes has joined #openstack-infra09:08
*** yamamoto has quit IRC09:13
*** rcernin has joined #openstack-infra09:18
*** rcernin has quit IRC09:35
*** yamamoto has joined #openstack-infra09:48
*** zzzeek has quit IRC09:53
*** zzzeek has joined #openstack-infra09:53
*** yamamoto has quit IRC10:01
*** derekh has joined #openstack-infra10:13
*** yamamoto has joined #openstack-infra10:18
*** derekh has quit IRC10:29
*** derekh has joined #openstack-infra10:29
*** ociuhandu has joined #openstack-infra10:31
*** ociuhandu has quit IRC10:40
*** ociuhandu has joined #openstack-infra10:46
*** ociuhandu has quit IRC10:46
*** ociuhandu has joined #openstack-infra10:47
*** yamamoto has quit IRC10:49
*** ociuhandu has quit IRC10:52
*** yamamoto has joined #openstack-infra10:53
*** yamamoto has quit IRC10:55
*** yamamoto has joined #openstack-infra10:58
*** dviroel has joined #openstack-infra11:01
*** ociuhandu has joined #openstack-infra11:01
*** yamamoto has quit IRC11:02
*** gfidente has joined #openstack-infra11:07
*** ociuhandu has quit IRC11:13
*** mugsie has joined #openstack-infra11:13
*** ociuhandu has joined #openstack-infra11:17
*** yamamoto has joined #openstack-infra11:21
*** vishalmanchanda has quit IRC11:22
*** vishalmanchanda has joined #openstack-infra11:29
*** yamamoto has quit IRC11:32
*** ysandeep|ruck is now known as ysandeep|afk11:34
*** yamamoto has joined #openstack-infra11:39
*** yamamoto has quit IRC11:44
*** yamamoto has joined #openstack-infra11:46
*** jcapitao is now known as jcapitao_lunch12:12
*** ociuhandu has quit IRC12:13
*** ociuhandu has joined #openstack-infra12:14
*** ysandeep|afk is now known as ysandeep|ruck12:16
*** ociuhandu has quit IRC12:21
*** slaweq has quit IRC12:23
*** mtreinish has joined #openstack-infra12:27
*** jpena is now known as jpena|lunch12:27
*** rlandy has joined #openstack-infra12:32
*** ramishra has quit IRC12:41
*** ramishra has joined #openstack-infra12:41
*** ramishra has quit IRC12:42
*** ramishra has joined #openstack-infra12:44
*** redrobot has joined #openstack-infra13:01
*** yamamoto has quit IRC13:07
*** jcapitao_lunch is now known as jcapitao13:08
*** amoralej is now known as amoralej|lunch13:12
*** yamamoto has joined #openstack-infra13:13
*** yamamoto has quit IRC13:18
*** yamamoto has joined #openstack-infra13:18
*** yamamoto has quit IRC13:18
*** ociuhandu has joined #openstack-infra13:23
*** jpena|lunch is now known as jpena13:28
*** rcernin has joined #openstack-infra13:33
*** rcernin has quit IRC13:38
*** yamamoto has joined #openstack-infra13:56
*** tbarron|out has quit IRC13:59
*** nweinber has joined #openstack-infra14:07
*** yamamoto has quit IRC14:08
*** amoralej|lunch is now known as amoralej14:17
fungitkajinam: i'll take a look and see if i can spot the problem14:17
tkajinamfungi, thanks a lot !14:18
fungitkajinam: can you be specific about which jobs you expected aren't running for that change?14:19
fungiit looks like it's running 4 jobs on that change14:19
fungioh, sorry, you said puppet-openstack-integraton-*14:20
tkajinamfungi, this is an example which doesn't get the expected ci jobs https://review.opendev.org/c/openstack/puppet-ceilometer/+/77573014:20
tkajinamwe expect items like this https://review.opendev.org/c/openstack/puppet-ceilometer/+/76510014:20
* fungi drinks some more coffee and tries again ;)14:20
*** slaweq has joined #openstack-infra14:21
tkajinamfungi, I sent an email to openstack-discuss. I'll be away soon because it is becoming too late so it would be nice if you can share your findings in your reply to that email14:21
fungitkajinam: yep, thanks i'll follow up there14:21
tkajinamfungi, one more input. It seems that it is affecting not only master but also some stable branches14:23
tkajinamhttps://review.opendev.org/c/openstack/puppet-oslo/+/77411814:23
fungii see puppet-openstack-integration-6-scenario001-tempest-ubuntu-bionic running against a puppet-ceilometer change yesterday14:23
fungiand on a puppet-nova change over the weekend14:24
tkajinamit should have puppet-openstack-unit-* and puppet-openstack-integration-6-scenario00*-tempest-centos-814:24
tkajinamyeah14:24
tkajinaminteresting this is that expected jobs were triggered 765100 when we merged 765100 at 18 pm UTC14:25
tkajinaminteresting thing *14:25
tkajinambut when I submitted 775730 at 2am UTC some jobs are missing14:26
tkajinamand there are no changed made in puppet repos between these two timestamps14:26
tkajinamiiuc14:26
fungitkajinam: keep in mind that not all jobs run on all changes, it can depend on how the jobs or their parent jobs are defined as to which files are being changed14:27
fungifor example this is an ancestor of that job: https://zuul.opendev.org/t/openstack/job/puppet-openstack-integration-run-base14:27
tkajinamfungi, yeah I understand that point14:28
*** ociuhandu has quit IRC14:28
fungiif only files in the irrelevant_files list there are being changed, and if the descendants of that job don't override it, then the job won't be included14:28
tkajinamfungi, but unfortunately I don't see anything clearly explaining the difference.14:28
tkajinamI mean, both commits make change in files under manifests directory, so should trigger that job14:29
*** ociuhandu has joined #openstack-infra14:29
fungilooks like those jobs chain up to puppet-openstack-integration-base which is no longer defined? if i click on it in the job browser zuul gives an error:  https://zuul.opendev.org/t/openstack/job/puppet-openstack-integration-base14:30
fungii'll see if i can tell where that is/was defined14:31
tkajinamfungi, it is defined here https://github.com/openstack/puppet-openstack-integration/blob/master/zuul.d/base.yaml#L314:31
fungitkajinam: aha, here we are: https://zuul.opendev.org/t/openstack/config-errors14:31
fungilook in there for "puppet"14:31
fungi"Unknown projects: openstack/tempest-horizon"14:31
tkajinamahh ok14:32
tkajinamthat makes sence14:32
fungithat project got retired recently, and it's still listed as a required-project14:32
tkajinamwe should remove that because it was retired14:32
tkajinamyeah14:32
tkajinamthat explains why thing were broken without any change in puppet repos14:32
fungiyep14:32
tkajinamfungi, thanks a lot. I'll look into it14:33
fungilooks like openstack-ansible and vexxhost's openstack-operator are also affected: https://codesearch.opendev.org/?q=openstack/tempest-horizon14:33
tkajinamyeah I still see some project with reference to it14:37
*** ysandeep|ruck is now known as ysandeep|away14:43
*** slaweq has quit IRC14:46
*** sboyron_ is now known as sboyron14:51
*** ociuhandu has quit IRC14:52
*** ociuhandu has joined #openstack-infra14:52
*** ociuhandu has quit IRC14:57
*** jcapitao_ has joined #openstack-infra15:03
*** jcapitao has quit IRC15:06
*** ociuhandu has joined #openstack-infra15:08
*** jcapitao_ is now known as jcapitao15:09
*** hashar has quit IRC15:29
*** rcernin has joined #openstack-infra15:34
*** dklyle has joined #openstack-infra15:37
*** rcernin has quit IRC15:39
fricklerjrosser: noonedeadpunk: that could explain the jobs not running you mentioned earlier ^^15:39
noonedeadpunkwe don't use tests from https://opendev.org/openstack/openstack-ansible-os_horizon/src/branch/master/tests/os_horizon-overrides.yml#L26-L2915:41
noonedeadpunkif you're about that15:41
fungiyeah, there are no related zuul config errors for ansible, i just happened to notice that file when doing a codesearch so thought i'd point it out as potential cleanup15:42
fungithe errors seem to all be for various puppet-openstack branches and also some old horizon branches15:43
noonedeadpunkwe've added test here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/utility_all.yml#L103 but it seems that `tempest_test_whitelist` does not have it for some reason15:43
fungiand i think at least one (if not both) horizon branches affected are slated to be deleted15:43
fricklero.k., I didn't look in that detail, I just matched the mentioning of horizon tests not running with the above15:46
noonedeadpunkgotcha:) well yes, we've also just added that new tempest test so it's kind of related, yes:)15:47
*** jcapitao is now known as jcapitao|off16:05
*** yamamoto has joined #openstack-infra16:06
*** rpittau is now known as rpittau|afk16:10
*** yamamoto has quit IRC16:17
*** amoralej is now known as amoralej|off16:21
*** hashar has joined #openstack-infra16:22
*** gyee has joined #openstack-infra16:27
*** ociuhandu has quit IRC16:41
*** ociuhandu has joined #openstack-infra16:41
*** ociuhandu has quit IRC16:44
*** ociuhandu has joined #openstack-infra16:44
*** lucasagomes has quit IRC17:09
*** zul has joined #openstack-infra17:21
*** slaweq has joined #openstack-infra17:24
*** slaweq has quit IRC17:24
*** slaweq has joined #openstack-infra17:26
*** ociuhandu has quit IRC17:30
*** ociuhandu has joined #openstack-infra17:31
*** ociuhandu has quit IRC17:31
*** slaweq has quit IRC17:31
*** ociuhandu has joined #openstack-infra17:31
*** rcernin has joined #openstack-infra17:34
*** piotrowskim has quit IRC17:35
*** gfidente is now known as gfidente|afk17:37
*** rcernin has quit IRC17:39
*** eolivare has quit IRC17:44
*** jcapitao|off has quit IRC17:51
*** ociuhandu_ has joined #openstack-infra17:57
*** ociuhandu has quit IRC18:01
*** ociuhandu_ has quit IRC18:02
*** hashar is now known as hasharDinner18:07
*** jpena is now known as jpena|off18:24
*** ralonsoh has quit IRC18:25
*** derekh has quit IRC18:32
*** hamalq has joined #openstack-infra18:35
openstackgerritMerged openstack/ptgbot master: Bot is now openinfraptg on #openinfra-events  https://review.opendev.org/c/openstack/ptgbot/+/77486318:46
*** andrewbonney has quit IRC18:54
*** slaweq has joined #openstack-infra19:07
*** dtantsur is now known as dtantsur|afk19:20
*** jcapitao|off has joined #openstack-infra19:24
*** hasharDinner is now known as hashar19:27
*** rcernin has joined #openstack-infra19:35
*** rcernin has quit IRC19:40
*** hashar has quit IRC19:45
*** hashar has joined #openstack-infra19:45
clarkbdansmith: melwitt: if we have a nova flavor that says we get an 80GB disk and a ~9GB image that ends up with only ~15GB of disk in reality is there a nova behavior that would explain that?19:46
clarkbI suspect that our images may actually pad out to 15GB which may explain that choice. But growing the filesystems seems to not work in a particular cloud region occasionally and I'm wondering if nova will happily give you an instance even though the disk flavor can't be met19:46
clarkbcorvus: fungi: ^ possible that dansmith and melwitt may recognize this behavior?19:47
fricklerclarkb: in all cases that I saw, the disk according to ansible was 100G, only the fs was stuck at 15G19:50
clarkbfrickler: oh where was ansible reporting that? In the facts collection file?19:50
clarkb(I had missed that and want to take a look)19:50
frickleryeah19:50
clarkbcool let me take a look at that19:50
fricklerhttps://ccc35cfa38f56032f297-95ab28bd06b01f2c7089eb38812248e0.ssl.cf2.rackcdn.com/759091/5/gate/nova-grenade-multinode/8997228/zuul-info/host-info.primary.yaml is what I looked at on Friday, but that gives me conn refused currently19:52
fungiclarkb: something i was considering was adding an lsblk call in validate-host along with the df, but if we already have an example which proves the block device is larger, then what we really need is whatever errors growroot would have emitted19:52
clarkbfrickler: that link loads for me19:53
fungiwhich may be included in the syslog19:53
fungior journal19:53
fungibut we don't normally collect those if we fail in pre19:53
clarkband ya that shows vda is 100GB19:53
clarkbbut https://ccc35cfa38f56032f297-95ab28bd06b01f2c7089eb38812248e0.ssl.cf2.rackcdn.com/759091/5/gate/nova-grenade-multinode/8997228/zuul-info/zuul-info.primary.txt is only 15GB so ya growroot fs failures?19:53
clarkbor as tobiash suggests possibly it has not finished yet for some reason19:54
clarkbhttps://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/growroot/init-scripts/systemd/growroot.service is the unit we should be running to growroot20:00
clarkbthat specifically says WantedBy=multi-user.target20:00
clarkbI think that rules out tobiash suggestion as long as systemd is working properly20:01
fungiyeah, zuul shouldn't be able to ssh in before that, right?20:01
clarkbsort of20:01
clarkbI think networking is also wanted by mutli-user.target so there may be a race there20:01
* clarkb checks the local system units20:02
fungii thought only root was supposed to be able to authenticate before mutli-user.target was reached20:02
clarkbmy sshd says WantedBy=multi-user.target too20:02
fungiyeah, but if you try to ssh in after sshd starts and before mutli-user.target i think that's when you get the "system is still starting up, try again shortly" soft of message20:04
clarkbah20:05
clarkbthat would imply growroot is actually failing. So ya adding a journalctl -u growroot log to things may be a good idea?20:05
clarkbianw: ^ fyi that may interest you from an image building and dib perspective20:08
ianwugh.  it should be captured in syslog?20:18
fungiyeah, that's what i was suggesting too, try to grep it out of syslog or something20:18
fungior on debian derivatives, it may also be in boot.log20:18
*** gfidente|afk has quit IRC20:19
fungimm, maybe only up through ubuntu xenial and debian stretch timeframe20:20
fungilooks like that probably went away with the switch to systemd20:21
fungior soon thereafter20:21
fungishould hopefully still get copied to /var/log/syslog though yes20:23
clarkbmaybe in our prerun when we gather system info we can just do journalctl -u growroot?20:33
ianw++, will try poking at some hosts20:35
*** hashar has quit IRC20:35
*** zul has quit IRC20:42
melwittclarkb: off the top of my head no, I don't know of a behavior like that. and nova shouldn't be giving out an instance without honoring the flavor... I don't know why that's happening. I'll look around and see if I can find anything related20:50
fungimelwitt: could be entirely unrelated to nova, just grasping at straws until we have more useful output captured from early boot20:52
melwittunderstood20:52
fungiproblem is, it happens infrequently. same provider, image and flavor works fine most of the time20:52
melwittfungi: keeping in grasping at straws, I was just reading this https://docs.openstack.org/image-guide/openstack-images.html#disk-partitions-and-resize-root-partition-on-boot-cloud-init is it possible that some images may not have the necessary cloud-initramfs-growroot package in them?20:58
melwittbut you'd think nova would fail the boot if that grow couldn't happen. I'll look and see if/where we check that, if we can check it21:00
fungiyeah, we see it fail intermittently with the same image which also sometimes succeeds21:00
fungieven in the same provider21:01
melwittok, I see21:01
*** rcernin has joined #openstack-infra21:20
*** rcernin has quit IRC21:29
*** rcernin has joined #openstack-infra21:34
*** rcernin has quit IRC21:39
*** nweinber has quit IRC21:47
*** sboyron has quit IRC21:47
dansmithclarkb: sorry, just saw this, but no I don't know of any behavior that would cause that21:49
fungiso far we've seen different host-ids associated with each failure, so i'm doubtful it's anything specific to nova or the hypervisor layer21:50
dansmithclarkb: does the device appear to be 80G with a 15G partition in it, or does the device show 80 and it was only grown to 15?21:50
fungibut they at least are all happening in the same provider21:51
fungidansmith: 100gb block device with a 15gb filesystem21:51
dansmithis that provider running libvirt?21:51
clarkbI believe it is yes21:51
clarkb(and this should be after growroot ran)21:51
dansmithfungi: okay21:51
dansmithany chance they're using encrypted storage?21:53
dansmiththat just adds more layers on top of things which sometimes have some restrictions21:53
dansmithbut it would be a lot of overhead to be using that stuff for root unless you're paying for it21:54
fungidansmith: i would not be surprised if it's encrypted storage, this is in citynetwork and their target clientele are exceedingly security-conscious21:54
fungibut i don't know either way21:54
clarkbya I don't know either21:55
funginoonedeadpunk might know21:55
dansmithokay, I think that the libvirt driver will often grow the disk _and_ filesystem if it's clear, but I guess I'm not positive.. but it can't always do that if there is encryption going on21:55
dansmithyeah,21:59
dansmithso that could explain the difference I guess. some image backend configuration that differs from what you're normally getting on other hosts21:59
clarkbthat is helpful to know as a possibility, thank you22:00
dansmithhttps://github.com/openstack/nova/blob/7b5ac717bd338be32414ae25f60a4bfe4c94c0f4/nova/virt/disk/api.py#L110-L15822:00
*** jcapitao|off has quit IRC22:03
*** yamamoto has joined #openstack-infra22:11
*** xek has quit IRC22:19
*** vishalmanchanda has quit IRC22:22
*** rcernin has joined #openstack-infra22:24
*** slaweq has quit IRC22:30
*** arxcruz|rover has quit IRC22:46
*** arxcruz has joined #openstack-infra22:47
*** tosky has quit IRC23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!