Saturday, 2022-03-12

*** mazzy5098812929580859 is now known as mazzy50988129295808501:35
mnaserhttps://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/833476/15/.zuul.yaml01:53
mnasersomehow.. zuul isn't responding to this change at all01:53
mnasernot an error, but also not a job running01:53
mnaserhmm, seeing buildset sitting in queue for >10 minutes with no jobs appearing02:18
fungimnaser: so it was working as of patchset 14 but not from 15 onwards?02:28
mnaserfungi: yeah.. and now its just not responding, even if the dif from patch 13 to the current should work (two extra files changed)02:30
fungiand gerrit says nothing changed in .zuul.yaml between those02:30
mnaserhttps://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/833476/13..2302:30
mnasermy thought process said "let me go back to what i know works"02:30
fungiwe had an issue where one of the schedulers lost contact with gerrit earlier, i wonder if it's continuing to occur02:31
mnaseron recheck it shows up on zuul status and then disappears with no report02:32
fungilooking in the debug log on the scheduler, a recheck of that change has been waiting for a merger to return the layout for some time02:40
fungithere was a large spike in the merger queue at 02:00 when openstack's periodic jobs started02:41
fungiexecutors are all chewing through those right now02:41
fungithe executor queue is ~60002:42
fungiand there's a node request backlog of around 500 at the moment02:42
fungi2022-03-12 02:31:27,374 DEBUG zuul.Pipeline.vexxhost.check: [e: 5ee2a58c5c16417ebbf6ebefc2836469] Scheduling merge for item <QueueItem da28318bde374e6b907d3b0881809f46 for <Change 0x7f93f688fb80 vexxhost/ansible-collection-atmosphere 833476,23> in check> (files: ['zuul.yaml', '.zuul.yaml'], dirs: ['zuul.d', '.zuul.d'])02:47
fungithat was 18 minutes ago02:47
fungihasn't returned yet02:47
fungii wonder if we're running lots more periodic jobs because of all the stable/yoga branches getting added02:54
fungithe executor impact seems much more prolonged than in prior windows02:55
fungii misread the executor queue graph, we're nearly caught up on the backlog there finally02:57
mnaserfungi: i wondered if like maybe there ewas a timeout where a job didnt start fast enough03:04
fungii don't think so, it's still waiting on the config merge results03:10
mnaserfungi: so.. everything operating as expected, just gotta be more patient?03:11
fungiwell, i don't see signs that it's not simply overloaded still from the 02:00 openstack periodic job burst03:12
fungithough the executors have finally caught back up03:13
mnaserso should i kick off a recheck again?03:13
mnaser(i also don't wanna make the problem worse by rechecking)03:13
fungiso it's not clear to me why the merge request hasn't been handled yet03:14
fungiit's been waiting for about 45 minutes already03:15
mnaseri mean i saw it show up in the queue and then disappear afte ra while (no jobs were spawned)03:17
fungizm08 claims to have handled it at 02:31:3003:18
fungi2022-03-12 02:31:30,965 DEBUG zuul.MergerApi: [e: 5ee2a58c5c16417ebbf6ebefc2836469] Removing request <MergeRequest b8b108cd446c452fa30402ceb9dbf4b6, job_type=merge, state=completed, path=/zuul/merger/requests/b8b108cd446c452fa30402ceb9dbf4b6>03:19
mnaserhmm03:21
mnaserrecheck'd, lets see03:21
mnaserwelp03:21
mnasershowed up in status, disappeared, and no response from zuul03:21
fungioh! i found it i think03:22
fungii assumed it was zuul01 because that's where the merge request originated, but zuul02 is what handled the enqueuing03:22
fungi2022-03-12 02:31:48,712 DEBUG zuul.Pipeline.vexxhost.check: [e: 5ee2a58c5c16417ebbf6ebefc2836469] <QueueItem da28318bde374e6b907d3b0881809f46 for <Change 0x7ff71a575760 vexxhost/ansible-collection-atmosphere 833476,23> in check> is a failing item because ['it has an invalid configuration']03:22
mnaserwelp, would be nice to actually get that notification :p03:24
mnasernow why is it an invalid configuration...03:25
mnaserand why isnt zuul telling me that =(03:25
mnaserand the same config worked .. earlier?03:25
mnaseri wonder if this is a weird cache busting thing, i will try to rename the job and try again?03:25
fungiright, i'm sure what's going on there, the debug log on the scheduler doesn't actually seem to mention what's invalid about the config either03:26
mnasernope.. nothing03:26
fungier, i meant i'm NOT sure what's going on there03:27
mnaserhttps://opendev.org/zuul/zuul/src/branch/master/zuul/manager/__init__.py#L1456-L145703:27
mnaserhttps://opendev.org/zuul/zuul/src/branch/master/zuul/manager/__init__.py#L1519-L1537 i guess thats how it ends up here?03:29
fungimy clone of your change also wasn't tested for the same reason:03:32
fungi2022-03-12 03:29:26,335 DEBUG zuul.Pipeline.vexxhost.check: [e: 2898824303784a5d92e3b4369f1ce586] <QueueItem f7233b166890434bbb6acb2c88715fb9 for <Change 0x7f93d47f7c40 vexxhost/ansible-collection-atmosphere 833492,1> in check> is a failing item because ['it has an invalid configuration']03:32
mnaserim so confused03:35
mnaserbut whats invalid about it?03:36
mnaserunless that change was like03:36
mnaserlike the report was not for change 13 when it passed03:36
mnaserlike the comment @ change 13 was maybe for change 12 or 11?03:37
mnaserlet me look at zuul builds03:37
mnasernope, that was 833476,13 .. wth03:38
fungimnaser: you have a .zuul.yaml and a zuul.d in the tree03:39
fungiis that intentional?03:40
mnaseroh man03:40
mnaserits not but i thought they would all get parsed03:40
mnaserthat must be whyyyyy03:40
fungii think zuul expects the yaml files inside zuul.d/playbooks to be zuul configs not ansible03:41
mnaserbut03:41
fungiso it tries to parse them and then fails03:41
mnaseri saw openstack-ansible repo does the same03:41
mnaserhttps://opendev.org/openstack/openstack-ansible/src/branch/master/zuul.d/playbooks03:43
mnaserbut i think you must either have zuul.d or yaml file03:43
mnasernot both03:43
fungimnaser: ahh, yeah they just have a zuul.d/03:43
fungino .zuul.yaml03:43
mnaserthat might jsut be it..03:44
fungiso maybe git mv .zuul.yaml zuul.d/project.yaml03:44
fungior something like that03:44
fungihttps://zuul-ci.org/docs/zuul/latest/project-config.html#configuration-loading03:45
fungi"Zuul looks first for a file named zuul.yaml or a directory named zuul.d, and if they are not found, .zuul.yaml or .zuul.d (with a leading dot)."03:45
fungiso it found your zuul.d and expected it to contain zuul configuration. it did not even bother to read your .zuul.yaml03:46
mnaseryeah i think that's what probably happened03:47
fungiit's clearly too late at night for me to adequately troubleshoot such things, i should have looked at your entire change and spotted that straight away, sorry!03:48
mnaserahhhhh03:48
mnaseralso i think you're righta bout the files thing03:48
mnaserin SOA they use .yml for playbooks03:48
mnaserOSA*03:48
mnaserbut .yaml for zuul files03:48
fungiaha03:49
fungithat's sneaky03:49
fungiwell, anyway, sounds like you're probably on the right track now. i think i'm going to put down the computer and watch some cartoons before i pass out. have a great night/weekend!03:50
mnaserno worries, take care03:51
fricklero.k., since I haven't seen any further feedback on the neutron yoga issue, and the recheck still didn't start any jobs, I will now run another set of full-reconfigure, first on zuul02, then on 0108:44
fricklerhmm, in the list of Cat jobs that the executor is submitting, neutron stable/yoga is also missing, which doesn't make this very promising to me08:52
*** mgoddard- is now known as mgoddard08:53
fricklerso I'm going to delete the branch in gerrit now and recreate it (sha is 452a3093f62b314d0508bc92eee3e7912f12ecf1 for reference)09:04
fricklerthis is looking better: 2022-03-12 09:11:33,251 INFO zuul.Scheduler: Tenant reconfiguration beginning for openstack due to projects {('opendev.org/openstack/neutron', 'stable/yoga')}09:13
*** odyssey4me|away is now known as odyssey4me09:48
*** odyssey4me is now known as odyssey4me|away09:49
frickler#status log recreated neutron:stable/yoga branch @452a3093f62b314d0508bc92eee3e7912f12ecf1 in order to have zuul learn about this branch09:56
opendevstatusfrickler: finished logging09:56
*** odyssey4me|away is now known as odyssey4me10:25
*** odyssey4me is now known as odyssey4me|away11:00
*** odyssey4me|away is now known as odyssey4me11:00
*** odyssey4me is now known as odyssey4me|away11:15
*** dviroel_ is now known as dviroel|out11:55
fungifrickler: thanks! and good idea. did you have to abandon and restore the open changes on it?12:25
funginever mind, i see that you did12:27
*** odyssey4me|away is now known as odyssey4me14:35
*** odyssey4me is now known as odyssey4me|away14:35
*** odyssey4me|away is now known as odyssey4me15:03
*** odyssey4me is now known as odyssey4me|away15:04
*** odyssey4me|away is now known as odyssey4me15:07
*** odyssey4me is now known as odyssey4me|away15:23
*** odyssey4me|away is now known as odyssey4me16:30
*** odyssey4me is now known as odyssey4me|away16:30
*** odyssey4me|away is now known as odyssey4me16:45
*** odyssey4me is now known as odyssey4me|away16:45
*** odyssey4me|away is now known as odyssey4me18:18
*** odyssey4me is now known as odyssey4me|away18:19
mnaserinfra-root: i think nodepool might be borked.  no available nodes, a bunch of deleting ones, only 11 changes in queue, change of mine queued for 24 minutes now21:54
fungilooking21:55
*** odyssey4me|away is now known as odyssey4me22:01
fungi2022-03-12 22:01:00,368 INFO nodepool.NodeLauncher: [e: 88a1bf654b794b97bbdc4820b1c00d8f] [node_request: 300-0017541887] [node: 0028811545] Creating server with hostname ubuntu-focal-rax-dfw-0028811545 in rax-dfw from image ubuntu-focal22:01
fungi2022-03-12 22:01:00,997 DEBUG nodepool.NodeLauncher: [e: 88a1bf654b794b97bbdc4820b1c00d8f] [node_request: 300-0017541887] [node: 0028811545] Waiting for server c10d466c-9a54-492b-b6c6-13734f2edbf322:01
*** odyssey4me is now known as odyssey4me|away22:02
fungithat was on nl01, maybe another one timed out, looking harder22:02
fungino, not an earlier one unless the node request id changed22:04
fungimnaser: looks like there was a delay in handling that node request, but it did end up getting one22:10
mnaserfungi: yup indeed.. strange22:12
fungioh! i forgot we have an nl04 ;)22:15
fungiit first locked the request at 21:30:3922:15
fungi2022-03-12 21:30:42,358 DEBUG nodepool.NodeLauncher: [e: 88a1bf654b794b97bbdc4820b1c00d8f] [node_request: 300-0017541887] [node: 0028811529] Waiting for server 59a78014-cbe8-4dbc-bf5b-da23e8c45f6122:16
fungithat was in ovh-bhs122:16
fungi21:40:42 Launch attempt 1/3 failed22:16
fungi21:50:44 Launch attempt 2/3 failed22:16
fungi22:00:47 Launch attempt 3/3 failed22:17
fungi2022-03-12 22:00:57,214 DEBUG nodepool.driver.NodeRequestHandler[nl04.opendev.org-PoolWorker.ovh-bhs1-main-72cd222540804b9ea0137d5a7d41ec59]: [e: 88a1bf654b794b97bbdc4820b1c00d8f] [node_request: 300-0017541887] Declining node request because nodes failed22:17
fungiso that explains it22:17
funginl04 had three failed attempts in ovh-bhs1 waiting 10 minutes for each, then after that wasted half hour nl01 took the request and booted one successfully in rax-dfw22:18
fungiopenstack.exceptions.ResourceTimeout: Timeout waiting for the server to come up.22:19
fungiis all nodepool got back from te api... not all that helpful22:19
fungibut yeah, looks like ovh-bhs1 may be having a bad day22:20
fungiwe got some maintenance notices in french the other day, maybe said they were doing something this weekend. none of our sysadmins is fluent but apparently when someone in fr headquarters approves the account it's locked to receiving all communications in français22:21
fungii'll see if i get some time to paste them into google translate22:21
mnaserfungi: feel free to paste it here22:31
mnasera lot of people forget i speak french fluently =P22:32
fungitouché ;)23:55
fungieh, i can work this one out without the translator, i think. api impacting maintenance tuesday 19:00-01:00 utc for gra11 (gra1 to us probably?)23:58
fungiUne opération de maintenance est programmée pour la région GRA11 le 15 mars 2022, entre 19h00 et 01h00 (UTC). Aucune perturbation n'est à prévoir sur vos instances. Certaines API resteront injoignables pendant la durée de l'opération.23:59
fungithe other two announcements were similar. they're all for lateish tuesday for the same region, looks like23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!