Monday, 2021-09-13

clarkbI'm working on an ansible update fwiw00:00
corvusmm logs for opendev site lgtm00:01
corvus++exim00:01
fungiexim started, sending my update now00:02
opendevreviewClark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible  https://review.opendev.org/c/opendev/system-config/+/80857000:03
fungiexim logged receipt of my message00:03
clarkbI don't see it in the archive or my inbox yet00:04
fungiyeah, mailman doesn't seem to have picked it up, checking its logs00:05
corvuswhat's the queue id of the msg?00:07
corvusexim q id00:08
fungi1mPZR5-00018k-Q300:08
fungiclaims it was handed off to mailman00:09
corvussame cgi wrapper00:14
corvusexit 0 is going to make exim think it succeeded00:15
corvusprobably need to change all uses of MAILMAN_SITE_DIR to HOST00:15
corvus/var/lib/mailman/mail/mailman is the binary exim calls00:15
fungishould i stop exim?00:15
corvusyeah00:15
corvuswe're bitbucketing incoming messages00:16
fungistopped00:16
clarkbthe other place we set it out side of ansible to create lists is the init scripts for each of the sites in /etc/init.d/mailman-*00:16
corvusand the exim config00:16
clarkbah yup00:17
clarkbI'll work on the ansible side fixup00:17
fungiokay, cleared my schedule for the next little while, we need the per-site configs updated, and the initscripts00:20
fungii guess i should stop all the mailman services as well00:20
fungidoing that now00:21
clarkbfungi: ++00:21
fungiall the mailman services are stopped00:21
fungii'll fix up the local copies of initscripts00:21
clarkbthe exim one is trickier and I'm not entirely sure how to update it00:23
clarkbthe current exim config does the lookup for mm_cfg.py but we want mm_cfg.py to do the lookup00:23
clarkbI think I do something like environment = HOST=${lc:$domain}00:24
clarkbother places we do lc::$domain. Is there a difference?00:24
clarkbcorvus: ^00:25
clarkbfungi: ya thats the exim file that needs to update I'm not entirely sure of the syntax. I expect you and corvus understand it better00:27
fungiyep, i'm looking00:27
fungiright now the mailman_transport sets00:27
fungienvironment = MAILMAN_SITE_DIR=${lookup{${lc:$domain}}lsearch{/etc/mailman/sites}}00:27
clarkbI think we want something like environment = HOST=${lc:$domain}00:28
fungii agree HOST=${lc:$domain} should suffice00:28
fungiit's really just a subset of MAILMAN_SITE_DIR without the external file mapping00:28
clarkbmm_cfg.py is doing the external file mapping for us now when HOST is set00:29
fungiright, we essentially moved it there00:29
fungii've made that edit on the server as well as adding HOST to the initscripts if we want to start the opendev site up again, start exim, and send another test00:30
clarkbfungi: the only question I had was above in the file we do ${lc::$domain} note the extra : note sure what that difference is00:31
clarkbbut ya I think starting exim4 and opendev services and trying again is probably worthwhile00:32
fungioh, https://www.exim.org/exim-html-current/doc/html/spec_html/ch-string_expansions.html00:34
fungi ${lc:<string>}00:34
fungiThis forces the letters in the string into lower-case00:34
clarkbah we want that00:34
fungii'm not finding any lc:: examples00:34
clarkbfungi: its in the file you edited00:34
clarkbunless that is for some other file? it is in a different config section in the host vars area of ansible00:34
clarkbI think we want lower case for the lookup to work so that lgtm00:34
fungi ${<op>:<string>}00:35
fungiThe string is first itself expanded, and then the operation specified by <op> is applied to it.00:35
clarkbaha00:35
opendevreviewClark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible  https://review.opendev.org/c/opendev/system-config/+/80857000:36
clarkbthat chagne is trying to keep a running list of the updates necessary00:36
corvusback; 1 sec00:37
corvusexim.conf lgtm00:39
fungiif there are no objections, i'll start mailman-opendev and exim4 and try to send another message00:40
clarkbfungi: none from me00:40
clarkbre 808570 I think what we can do is hold a node that it executed on in zuul then compare all the files to be happy it will do what we want when deployed00:40
clarkbbut that is feeling like a tomorrow activity00:40
fungicool, in that case i'll test again now00:43
fungisent00:45
fungiand received00:45
clarkbI confirm I have recieved it as well00:45
clarkbhttp://lists.opendev.org/pipermail/service-discuss/2021-September/000282.html and it is in the archive00:45
fungii'll start the others and send similar replies to their discussion lists00:46
corvus\o/00:46
opendevreviewClark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible  https://review.opendev.org/c/opendev/system-config/+/80857000:46
clarkbthat fixes an issue with my transcribing from the list server and also adds a small fix for a thing that isn't on the list server but also shouldn't be a problem on the listserver currently00:47
clarkbtomorrow morning I can set it up to hold the test nodes for ^ and then we can cross check similar to when we switched from puppet to ansible00:48
clarkbalso yay00:48
clarkbI'll write these notes on the etherpad00:48
clarkbdo we want to do one more reboot?00:51
clarkbwe had talked about that earlier though it may be ok as is?00:51
fungias soon as i see the last reply get through the list i'll status notice the upgrade to provide closure for those watching the status log, but yeah i think we can save the reboot for after the ansible changes are in00:51
fungifolks can continue fiddling, but i need to call it a night momentarily00:52
fungiand seems like we've gotten it to a safe enough state that we can tackle it with fresh eyes after a bit of sleep00:53
clarkbagreed00:53
fungiyay, lists at all 5 sites distributed my post, so i think we're good00:53
clarkbI'm focusing on writing down my notes in the etherpad so that we can refer to that tomorrow00:53
fungithanks, i started that but then ended up heads-down adjusting files for new fixes00:54
clarkbianw: the one thing that might be good for you to look at is borg on that server. We commented out the cron jobs because borg is in a venv iirc and we chagned python under borg00:54
clarkbianw: its possible that running ansible will automatically fix that for us but I'm not sure. Thought you might have input on that topic ni particular00:54
fungiyeah, odds are it's a venv for something like python 3.5 and now we've got 3.8 there00:55
clarkbfungi: I'm marking off the bits on the etehrpad for sending the announcement and all that00:55
fungithanks!00:55
ianwclarkb: hrm, i imagine it would be broken; i think probably just rm-ing the venv would get it recreated00:55
ianwi can take a look in a bit00:55
fungi#status log The mailing list services for lists.airshipit.org, lists.opendev.org, lists.openstack.org, lists.starlingx.io, and lists.zuul-ci.org are back in operation once again and successfully delivering messages00:56
clarkbianw: thanks. Note that the server is in the emergency file and we'll keep it there as we need to land 808570 and figure out if we need to disable autoremove of packages on this server too00:56
opendevstatusfungi: finished logging00:56
fungiclarkb: worst case we set a package hold on the kernel packages we don't want it to uninstall00:57
fungithat keeps it from upgrading a package anyway, pretty sure it would also block autoremoval00:57
clarkbfungi: yup I took some notes around that on the etherpad. I think we want to reboot before we remove it from the emergency file00:57
fungioh, or we could just toggle the metadata to manually installed instead of autoinstalled00:57
clarkbas that will tell us if it reliably boots on the new kernel00:57
clarkbthen we can set the new kernel to manual00:58
clarkband have it ignore us with package updates (which will overwrite the decompressed file)00:58
clarkbwe can also do a test boot with a compressed file and see if the chainloader can haldne that00:58
clarkbif it can then we're fine00:58
clarkbanyway I agree we are in a happy spot and I got my notes written down00:58
clarkbthank you everyone for the help and on a weekend too :/00:58
funginotes on the pad lgtm, thanks for summarizing01:01
clarkbianw: corvus  https://etherpad.opendev.org/p/listserv-inplace-upgrade-testing-2021 is the etherpad if you don't have it. The interesting bits are towards the bottom01:01
clarkbI just added plan to replace the server to the list as well01:02
clarkband with that I should go find dinner. Thanks again everyone!01:03
ianwclarkb / fungi : i've done the simplest thing which is to just re-create the borg virtualenv.  i've run a backup to RAX manually and it worked06:52
ianwThis archive:                6.37 GB              2.64 GB            117.18 MB06:52
ianwlast is dedup size, so that seems reasonable06:53
ianwi've uncommented the cron jobs06:53
*** jpena|off is now known as jpena07:05
*** frenzy_friday is now known as anbanerj|ruck07:06
*** ykarel__ is now known as ykarel07:28
*** jpena is now known as jpena|away07:40
*** ykarel is now known as ykarel|lunch08:03
opendevreviewArx Cruz proposed opendev/elastic-recheck rdo: Add common.js to openstack-health  https://review.opendev.org/c/opendev/elastic-recheck/+/80864408:46
opendevreviewTakashi Kajinami proposed openstack/project-config master: Retire puppet-freezer - Step 3: Remove Project  https://review.opendev.org/c/openstack/project-config/+/80867508:57
*** ykarel|lunch is now known as ykarel09:15
*** jpena|away is now known as jpena09:38
*** ysandeep is now known as ysandeep|brb10:55
*** ysandeep|brb is now known as ysandeep10:59
*** dviroel|out is now known as dviroel11:32
*** jpena is now known as jpena|lunch11:35
*** odyssey4me is now known as Guest712612:12
*** jpena|lunch is now known as jpena12:30
*** hjensas is now known as hjensas|afk12:45
*** tosky is now known as Guest713113:40
*** tosky_ is now known as tosky13:40
*** ykarel is now known as ykarel|away14:36
*** odyssey4me is now known as Guest713515:00
clarkbianw: thanks!15:15
fungiclarkb: see martin's comment on https://review.opendev.org/808479 don't we have a full update mode we can use to correct all of those? or does it happen with new renames but was missed in some older ones?15:24
clarkbfungi: you have to manually run the playbook to fix that up. its the same playbook but you have to select the "do everything" flag15:25
clarkbfungi: when we've tried to be more aggressive about setting that stuff it causes the playbook to take forever and we have had those api errors from gitea too15:26
fungigot it, is that something we'd be able to update but restrict to specific projects during renames?15:26
clarkbfungi: possibly yes, but it would require updating the code for renames. Currently renames are very specifically just doing the rename bit, but we could add steps to enforce the metadata too15:27
clarkbthat is all reasonably well tested now if you want to give it a go15:27
opendevreviewClark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible  https://review.opendev.org/c/opendev/system-config/+/80857015:30
clarkbthat should fix a testing issue with the change15:30
clarkbfungi: ^ would probably be good if you could look over that change and evaluate some of the small deltas I've made compared to what is on the server. In particular the line.startswith(host + ':') check in the mm_cfg.py file and the variable setting in the init script template15:40
fungiyep, now that i'm well-rested and caffeinated i should be able to do that16:07
*** odyssey4me is now known as Guest714116:07
clarkbfungi: I am neither of those things its a good thing one of us is :)16:08
clarkbit helps if I can spell starlingx too :) new patchset up shortly16:09
opendevreviewClark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible  https://review.opendev.org/c/opendev/system-config/+/80857016:10
Clark[m]corvus: ianw left a question https://review.opendev.org/c/zuul/zuul-registry/+/808624 as well16:24
clarkbwow I wrong windows that message :)16:24
*** jpena is now known as jpena|off16:25
corvuslooking over zuul stats, everything seems nominal, except that the zuul event processing times seem longer than normal.  that measures the amount of time it takes for an event once received from gerrit to make it to the scheduler.  i don't think our nodepool changes should have affected this, unless they are just slowing everything down, a lot.  the times still seem spikey, which is typical, as usually the spikes are caused by tenant16:52
corvusreconfiguration events.  looking at the logs, it seems like we have an unusually high number of tenant reconfig events happening right now -- about 3x so far compared to some random days last week.  one potential cause could be larger than usual numbers of zuul.yaml changes due to the release cycle.16:52
corvusin short, the only metric change i've seen appears to be related to a change in usage patterns, so i think i'm happy with the nodepool series.16:53
clarkbthat is a neat observation of behaviorial changes and ya release time seems like it would be good for making those changes as bugs in ci are fixed or papered over16:54
fungiand also new branches being created (with zuul configuration on them) and changes being prepped to set up jobs for the next openstack development cycle16:55
*** odyssey4me is now known as Guest714617:00
opendevreviewClark Boylan proposed opendev/system-config master: DNM forcing lists testinfra failure to hold nodes  https://review.opendev.org/c/opendev/system-config/+/80880517:03
fungii guess there's an autohold already for that?17:04
clarkbfungi: I just created one yes17:05
clarkbthen we can compare results between the testnode and prod17:05
fungiclarkb: inline q on 80857017:27
fungithough maybe that codepath is never reached without a multi-site install17:28
fungiahh, yep, it's never hit in that case17:33
clarkbyup its a bit hierarchical based on the top level role config17:36
clarkbfungi: corvus 149.202.176.57 is the held node we can cross check with lists.openstack.org to ensure that 808570 does what we expect18:02
clarkbI'm going to keep reviewing this zuul stack so I don't lose that context but wanted to point out that job is done and the node is held if others wants to take a look18:03
clarkbThen I can probably look at ^ after lunch myself18:03
fungiso far, the weekly memory usage graph suggests focal may require slightly less ram for our mailman sites than xenial did: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=219&rra_id=all18:16
fungithough we probably won't know until it's been running a week or so18:17
*** ysandeep is now known as ysandeep|out18:20
clarkbmy isp is failing to route to review.opendev.org right now :(18:38
clarkbI can hit it from my phone and mtr shows it failing with a last hop in their AS18:39
clarkbis anyone else having trouble as a sanity check?18:39
clarkbok packets get through again and it appears the issue is happening between my ISP and HE at NWAX18:49
clarkbhopefully it doesn't become a persistent issue18:49
fungiipv6 or v4 (or both)?18:52
fungialso, diff with shell subprocess file substitution is a really quick way to compare files between hosts18:56
fungidiff -u <(ssh lists cat /etc/mailman/mm_cfg.py) <(ssh 149.202.176.57 cat /etc/mailman/mm_cfg.py)18:56
clarkbfungi: v4, I don't have ipv6 from this isp (yet, they keep saying it is coming soon)18:59
fungithe slight differences between the files on production and the held test node are ~trivial, looks correct19:00
clarkbthanks for checking. I'm eating lunch now then have to run an errand then will do my own cross checking. The diff with ssh subshells is a neat hack. Last time i did this I checked checksums and then when they didn't match looked directly iirc19:01
fungithe only file which is a 1:1 match is the exim config, the other local fixes were not entirely 1:1 (for example i didn't get rid of the old envvar, didn't alter the pidfile definitions, wasn't checking for a trailing : after hostnames in the mapping file...)19:03
clarkbfungi: line 641 of exim.conf is the place where we do lc::$domain fwiw20:11
clarkbbut that has been like that and not a difference from our updates here20:11
clarkbI agree the exim config is the same on the two servers20:12
clarkbfor mm_cfg.py the only difference seems to be the ':' append in the line startswith check20:13
fungiagreed20:15
fungii have a feeling exim is simply collapsing the double :: there into a single :20:16
clarkbok I checked exim config, the mm_cfg.py, the init scripts and the apache vhost configs and they all lgtm between the two servers. https://review.opendev.org/c/opendev/system-config/+/808570 is probably ready to land if we can get another set of eyeballs on it20:18
clarkbianw: corvus ^ fyi20:18
corvuslgtm +220:23
clarkbthanks for looking. I'm thinking we land that today, double check it didn't do anything to lists.katacontainers.io unexpectedly (it really shouldn't as all the code there is for vhostd mailman). Then tomorrow followup with running ansible on lists.o.o and rebooting it and holding any pacakges we might want to hold etc20:27
clarkbfungi: ^ does that make sense to you? If so maybe you can +A it?20:28
clarkbI need to get a meeting agenda sent out and am trying to do some more zuul review (but its going slowly because I'm slow today)20:28
fungiclarkb: sounds great20:31
*** dviroel is now known as dviroel|out20:37
clarkbfungi: while putting together the meeting agenda I'm noting the osf -> openinfra renames are not on the wiki yet20:42
clarkbfungi: and for the inspur/ and osf/ prefixes are all repos getting moved out of there? I wonder if we separately need to remove the orgs from gitea (I don't think any of the rename stuff does that sort of cleanup today)20:54
clarkbfungi: should I +A https://review.opendev.org/c/opendev/system-config/+/808570 or will you?21:08
clarkbfwiw I checked the held lists.kc.io on the test job and I don't see any unexpected leaks of things21:10
clarkbwhich is the only other remaining concern to landing that I think since we have the other host in the emergency file currently21:11
fungiclarkb: i just approved 80857021:43
fungisorry, was doing dinner21:43
fungii'll add the project rename changes to the wiki momentarily21:43
clarkbthanks!21:44
fungirenames list on the meeting agenda page is now updated21:55
*** odyssey4me is now known as Guest716222:18
opendevreviewMerged opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible  https://review.opendev.org/c/opendev/system-config/+/80857022:23
fungiclarkb: ^22:29
fungiso we want to do a test reboot before or after we take it out of the emergency disable list?22:29
clarkbthanks22:29
clarkbfungi: I think we should do test reboots before we take it out of the emergency list beacuse taking it out of the emergency list will potentially delete the old kernels we have22:30
clarkbbut if we can reliably boot the modern kernel that becomes less of a concern22:30
fungisure, should we reboot now or hold off?22:31
clarkbup to you I guess. I'm starting to fade fast and not sure I want to resurrect the server this evening if it needs to be rescued again22:31
fungiat least we know how to recover it fairly quickly if the reboot chokes, but sure we can save that for tomorrow22:31
fungii'm in no hurry there22:31
clarkbya we know the process now :) and my homedir has a bunch of copies of files we can easily move into place22:32
clarkbany idea why 808570 merging is running infra-prod-manage-projects? It should noop but that is unexpected22:49
clarkbok the job to update lists ran for 808570 and lists.o.o was left alone as expected and lists.kc.io continues to seem happy. No HOST configs in any of the files there23:11
clarkbI'm going to take a break now and try to be more well rested for tomorrow and reboot testing and all that23:12
fungisounds great, thanks!23:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!