Friday, 2021-11-12

opendevreviewMerged opendev/system-config master: reprepro: stop mirroring Debian stretch  https://review.opendev.org/c/opendev/system-config/+/81734000:45
*** odyssey4me is now known as Guest565500:47
ianwi have locks in a root screen to cleanup debian-security/debian repos for ^00:54
ianwdebian-security done01:29
opendevreviewlikui proposed openstack/diskimage-builder master: Replace deprecated assertEquals  https://review.opendev.org/c/openstack/diskimage-builder/+/81766301:46
opendevreviewChris Stone proposed openstack/diskimage-builder master: Adding allow-remove-essential for ubuntu grub install.  https://review.opendev.org/c/openstack/diskimage-builder/+/81766602:13
opendevreviewChris Stone proposed openstack/diskimage-builder master: Adding allow-remove-essential for ubuntu grub removal.  https://review.opendev.org/c/openstack/diskimage-builder/+/81766602:15
Alex_GaynorIs linaro having issues? None of our arm64 jobs have started in 10 minutes02:16
corvusAlex_Gaynor: i think that's about when the periodic jobs got enqueued, so the queues more or less instantaneously max out02:29
corvusit seems like there should be more capacity in linaro though...02:30
corvusah, our max servers are higher than the quota on that provider, so the graph is misleading02:35
corvusso yeah, they're waiting on quota02:35
corvusAlex_Gaynor, ianw: oh, neat -- it looks like osu osl has arm64 images too?02:41
corvushttps://zuul.opendev.org/t/pyca/stream/bd04cde127334fedb6bc9b03e69a0fea?logfile=console.log is about to run on one i think02:42
*** diablo_rojo is now known as Guest566902:59
ianwAlex_Gaynor: yep, what corvus said about things maxing out at about that hour due to all periodic jobs kicking off then.  we moved it forward a bit to avoid .eu timezones recently, so that might be a bit different03:04
ianwor backwards, depending on your POV I guess.  anyway, it used to happen a couple of hours from now03:05
ianwso far Fedora 35 has managed to crash my work VM daily at a random point, generally in the middle of scrolling a largish web page.  helpfully giving no message, oops, or anything but instantly dying :/03:07
ianwok, debian mirror now done too https://grafana.opendev.org/d/T5zTt6PGk/afs?orgId=103:09
corvusianw: year of the linux desktop!03:14
ianw#status log debian-stretch has been yeeted from nodepool and AFS mirrors03:14
ianwi guess now i'm going to diagnose why the status bot has also yeeted itself03:15
ianwwe merged a couple of changes, and it looks like it restarted 29 hours ago03:22
ianwwe do not appear to get any logging out of it03:24
ianwok, i found the logging03:28
ianw2021-11-12 03:22:57,183 DEBUG irc.client: connect(server='irc.oftc.net', port=6697, nickname='opendevstatus', ...)03:29
ianw2021-11-12 03:22:57,355 DEBUG irc.client: TO SERVER: NICK opendevstatus03:29
ianw2021-11-12 03:22:57,356 DEBUG irc.client: TO SERVER: USER opendevstatus 0 * :opendevstatus03:29
ianw2021-11-12 03:22:57,356 DEBUG irc.client: process_forever(timeout=0.2)03:29
ianw2021-11-12 03:22:57,400 DEBUG irc.client: _dispatcher: disconnect03:29
ianwit is and endless loop like that03:29
fungiianw: did the restart coincide with one of the changes we merged? or was it prior to them merging?03:42
ianwyes, it restarted after those changes we merged.03:42
ianwi'm going to try reverting the ssl one and see if that helps.  i can't see the problem but it's my first suspect03:42
fungiyeah, looks like similar timing to me03:43
fungiand agreed, the other changes don't really go anywhere near the connection setup03:43
ianwi've now managed to restart all containers, because git was so out of date it hadn't managed to figure out the recent letsencrypt update.  not sure, but we should also double-check unattended-upgrades03:44
fungistatusbot's running from a container, right? so it could be just about anything which changed in its dependency set or base image since the last time it was built?03:45
ianwok, reverting that change, it is back (i docker cp'd a reverted file, overwrote it in the container and did a docker restart on the container)03:48
ianw#status log debian-stretch has been yeeted from nodepool and AFS mirrors03:48
ianwoh it's still joining03:49
opendevstatusianw: finished logging03:50
fungiso good hunch03:50
ianwi guess we don't get statusbot change logs from gerritbot?03:51
ianwhttps://review.opendev.org/c/opendev/statusbot/+/81769403:52
Clark[m]ianw: unattended upgrades on eavesdrop?03:54
ianwoh, i also added "use_ssl=True" manually to the config and that didn't work either (although with hindsight, we shouldn't have changed the default action)03:55
ianwClark[m]: yeah, that's the next weird thing, there was a lot of non-upgraded packages there, including git, meaning i couldn't clone from opendev.org03:55
ianwtristanC: ^03:55
fungiunattended-upgrades only applies security fixes by default, i think03:56
ianwi'm going to have to go running around to music lessons etc. in about 30 minutes -- if we want to approve the revert that should push and restart with a good image that puts us in a steady state03:56
fungiit's possible gerritbot is broken? i would have expected it to announce that change, it's not in-channel at all03:57
fungi2021-11-12 03:39:29     <--     opendevreview (~opendevre@104.239.144.232) has quit (Remote host closed the connection)03:58
ianwthat was my fault, but it has restarted03:59
fungiahh, okay04:00
fungii wonder why it ignored the statusbot change04:00
fungiits config definitely says opendev/statusbot changes should be announced to this channel04:01
ianwyeah it tried to send it and got an exception04:01
ianwNov 12 03:50:20 eavesdrop01 docker-gerritbot[2961013]: irc.client.ServerNotConnectedError: Not connected.04:01
fungihuh04:01
fungimaybe it was still starting up?04:02
fungianyway, as things seem under control, i'm going to go back to losing consciousness04:02
ianw++04:03
ianwnope, it's just failed to send again.  it is getting the message04:04
opendevreviewIan Wienand proposed opendev/statusbot master: [dnm] testing notifications  https://review.opendev.org/c/opendev/statusbot/+/81769504:06
ianwi dunno, i just restarted it04:06
opendevreviewMerged opendev/statusbot master: Revert "Add use_ssl option"  https://review.opendev.org/c/opendev/statusbot/+/81769404:17
ianw^ just promoted; i pulled and restarted statusbot with it manually cause i'm under a bit of time pressure right now.  it has connected04:21
*** pojadhav|sick is now known as pojadhav04:40
*** ysandeep|out is now known as ysandeep06:07
*** gouthamr_ is now known as gouthamr06:19
fricklerslightly related we don't seem to have rDNS records for eavesdrop, is that intentional somehow or just an oversight?06:56
*** ysandeep is now known as ysandeep|lunch08:09
*** diablo_rojo is now known as Guest569208:09
*** gibi is now known as giblet08:22
opendevreviewAlfredo Moralejo proposed opendev/system-config master: Enable mirroring of centos stream 9 contents  https://review.opendev.org/c/opendev/system-config/+/81713608:41
*** ykarel is now known as ykarel|lunch08:45
opendevreviewFabio Verboso proposed openstack/project-config master: Iotronic-pythonclient and Iotronic-UI update. Jobs moved to py38 (set in .zuul.yaml in the project repositories).  https://review.opendev.org/c/openstack/project-config/+/81771909:09
ianwfrickler: we don't really have an "eavesdrop" any more ... when we moved the services to opendev we moved it to "meetings.opendev.org" so it's a bit less creepy-sounding and so the services just run on eavesdrop01 for historical reasons09:17
*** ysandeep|lunch is now known as ysandeep09:23
fricklerianw: well I was talking about the host eavesdrop01.opendev.org, is there a reason for it to not have rDNS?09:24
ianwfrickler: oh, you said rDNS.  indeed that is an oversight, i just fixed that :)09:24
ianwjinx :)09:24
frickler^509:24
*** chandankumar is now known as raukadah10:05
*** ykarel|lunch is now known as ykarel10:25
*** lbragstad0 is now known as lbragstad11:06
*** ysandeep is now known as ysandeep|afk12:01
*** ykarel_ is now known as ykarel12:32
*** ysandeep|afk is now known as ysandeep12:59
*** jpena|off is now known as jpena13:15
opendevreviewMerged opendev/irc-meetings master: Update policy popup meeting time & details  https://review.opendev.org/c/opendev/irc-meetings/+/81749613:32
opendevreviewMerged opendev/irc-meetings master: Update Interop meeting details  https://review.opendev.org/c/opendev/irc-meetings/+/81722513:33
*** pojadhav is now known as pojadhav|afk14:09
*** ykarel is now known as ykarel|away14:20
*** ysandeep is now known as ysandeep|dinner15:49
*** frenzy_friday is now known as frenzyfriday|PTO15:59
*** marios is now known as marios|out16:47
*** ysandeep|dinner is now known as ysandeep16:57
*** ysandeep is now known as ysandeep|out16:58
*** jpena is now known as jpena|off17:41
clarkbfwiw feeling a lot better today. but still going to try and take it easy. I've caught up on some PBR stuff, and next will be looking at some zuul things. Ping me if there is anything else i should be looking at that I've missed the last coupel of days17:44
fungiclarkb: it's mostly been a quiet week other than trying to shake out multi-scheduler bugs, good friday to take it easy if you ask me17:50
*** Guest5508 is now known as melwitt19:26
*** melwitt is now known as Guest571619:27
*** Guest5716 is now known as melwitt19:32
*** melwitt is now known as jgwentworth19:34
*** outbrito_ is now known as outbrito19:42
corvusi'm seeing repeated exceptions like this:20:44
corvus2021-11-12 20:39:50,687 ERROR zuul.Scheduler: [e: e3b2e8d75d704c9cb21fbf3954a61297] Exception while removing nodeset from build <Build 6ac065cf12574e0d9434ab215c2cf6fa of neutron-ovn-tempest-slow voting:True> for change <Change 0x7f9c5f5cb9a0 openstack/neutron 805391,13>20:44
corvusbut maybe the've stopped now?  maybe that's a transient problem that was eventually corrected?  possibly instigated by a gate reset?20:47
clarkbwhen it says removing nodeset from build is that the actual nodeset of ndoes used to run the jobs or the abstract nodeset definition?20:52
fungithere were a couple of connection reset exceptions today20:58
fungilooks like they were both gerrit.GerritPoller getting disconnected form googlesource so maybe they were working on something20:59
clarkbfungi: tehre was a big google outage overnight20:59
clarkbaffected youtube and other things as well as gerrit upstream21:00
fungicorvus: were those the "Exception: No job nodeset for ..." exceptions you were looking at?21:00
fungioh, seems at least some of those are secondary, following a zuul.exceptions.MergeFailure21:01
corvusi haven't tracked the issue down yet, but it seems to have stopped.21:05
fungilooks like all the ones i'm finding are changes for openstack/neutron, but different changes (some more than once)21:08
fungioh, though here's one where it wasn't followed by a mergefailure21:09
fungier, wasn't following a mergefailure21:09
fungistill for an openstack/neutron change though21:10
corvusit's doing it again... same change interestingly21:33
clarkbcorvus: 805391's parent is 811411 and 811411 had a failure causing jobs to be cancelled for 805391. I guess this is why you were asking about resets?21:44
corvusyeah... i'm tracking it down now; i suspect something wrong with cancelJobs(prime=True); using the repl now21:44
corvusi expect it to be called with prime=True, but it's not behaving like it is.  i can't tell for certain whether it is called with true or not21:54
corvusoh i totally misread that, it's being called with prime=False21:56
corvusso basically, it's intentional that it would just try to cancel them over and over; i still feel like i'm missing something21:58
corvusah, i think tests.unit.testscheduler.TestScheduler.testfailingdependentchanges actually shows the error, it just doesn't fail (for the same reason it's not fatal in prod -- it does work, just not efficiently)22:07
corvusaffects both master and 4.10.422:08
corvusokay, before sos this was all protected by a conditional; so i think i see the fix22:11

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!