Tuesday, 2022-08-30

diablo_rojoHello... ummm it seems there is no ptgbot in the #openinfra-events channel? Would someone be able to restart it? 00:02
fungii just spotted the same. looking into it now00:03
diablo_rojolol yeah...00:03
diablo_rojoWhoops00:03
fungithe last thing it logged to its debug log was on 2022-05-2500:04
diablo_rojoOh okay so before the Summit even. 00:04
diablo_rojoHeh00:04
fungithough the process is running and says it was started 2022-05-1300:04
diablo_rojoWhoops00:04
fungiit probably lost contact with the irc server it was connected to and never realized it's been listening on a dead socket since then00:05
fungii'll restart the container00:05
diablo_rojoThank you fungi !00:05
diablo_rojomuch appreciated00:05
fungi#status log Restarted the ptgbot container on eavesdrop01 since it seems to have fallen off the IRC network on 2022-05-25 and never realized it needed to reconnect00:06
opendevstatusfungi: finished logging00:06
fungithe containers equivalent of "have you tried turning it off and on again"00:07
opendevreviewMerged openstack/project-config master: Match the ansible-lint <6.5 pin from zuul-jobs  https://review.opendev.org/c/openstack/project-config/+/85509800:24
diablo_rojoI maybe now have killed the ptgbot site? :D00:34
fungii can check the apache logs00:37
fungiConnection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:8000 (localhost) failed00:38
diablo_rojoIn my defense, it wasnt faulty json this time lol00:38
diablo_rojoAt least syntactically00:38
fungi2022-08-30 00:31:22,254 ERROR ptgbot.bot: Bot airbag activated: Unusually large message: diablo_rojo: Error loading DB: [Errno Expecting property name enclosed in double quotes]00:41
fungiwhatever it was crashed the process that listens on 8000/tcp, i think00:42
fungiseems like maybe the db needs to be wiped?00:43
fungii guess it managed to partially write something while crashing?00:43
fungii can try restarting it and see if it's really the db contents causing the problem00:44
fungithe ptgbot-web process started up okay after downing and upping the container again00:46
diablo_rojoI cleared the db00:47
diablo_rojoGonna do some local testing before I go loading the bot again. I thought it was a simple json tweak, but I don't want to keep needing you to restart.00:48
diablo_rojoThank you for all your help already fungi !00:48
fungiyou bet. i'll be around for a while longer if you need to test some more00:49
*** rlandy|bbl is now known as rlandy00:54
*** rlandy is now known as rlandy|out01:09
*** dasm is now known as dasm|off02:00
*** ysandeep|out is now known as ysandeep04:52
*** pojadhav|out is now known as pojadhav|ruck04:59
opendevreviewKe Niu proposed opendev/system-config master: remove unicode prefix from code  https://review.opendev.org/c/opendev/system-config/+/85448705:38
*** ysandeep is now known as ysandeep|afk05:41
*** ysandeep|afk is now known as ysandeep06:02
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: added elrepo element  https://review.opendev.org/c/openstack/diskimage-builder/+/85381707:06
*** ysandeep is now known as ysandeep|afk07:36
*** jpena|off is now known as jpena07:36
*** elodilles_pto is now known as elodilles08:06
*** ysandeep|afk is now known as ysandeep09:12
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: removed -l from shebang  https://review.opendev.org/c/openstack/diskimage-builder/+/85515409:28
*** soniya29 is now known as soniya29|afk10:26
*** rlandy|out is now known as rlandy10:38
*** ysandeep is now known as ysandeep|break11:27
*** dviroel|out is now known as dviroel11:30
*** soniya29|afk is now known as soniya2911:31
*** ysandeep|break is now known as ysandeep12:20
*** dasm|off is now known as dasm13:46
*** ysandeep is now known as ysandeep|dinner14:39
*** dviroel is now known as dviroel|mtg14:43
*** ysandeep|dinner is now known as ysandeep15:00
*** dviroel|mtg is now known as dviroel15:18
*** artom__ is now known as artom15:34
*** dviroel is now known as dviroel|lunch15:40
fungiseen as a topic branch in the docker-jitsi-meet repo, there was apparently recent work to add support for etherpad's ep_whiteboard plugin. that might be worth looking into15:48
clarkbfungi: related, is there a change to update the meetpad config for the audio stuff yet?15:49
clarkbdoes it make sense to start with just setting the one setting and then reconciling the config deltas afterwards?15:49
fungii was restarting my work on that, which is what caused me to notice the wbo stuff15:50
fungiand yeah, the challenge is that settings have moved from one place to another, and at the same time some things we were setting change to become default15:51
fungiso i'm probably going to try a clean import of the configs first15:51
fungiand then identify what bits we still may want to override15:51
clarkbfungi: on the mm3 update a good canary I've found is that the hyperkitting listing renders properly: https://4b7ee2cce31df7bf2b0a-c162fa8e75cb459a7d10e69223bc94c7.ssl.cf1.rackcdn.com/851248/64/check/system-config-run-lists3/915494e/bridge.openstack.org/screenshots/mm3-opendev-archives.png I think those updates lgtm based on that having lists listed15:52
clarkbafter my current meeting I'm going to hunt down some food, but then I'll look into comparing migrated lists against new list configs to see how they compare and if we should be setting any additional config items15:53
*** ysandeep is now known as ysandeep|out15:56
fungiin theory all the relevant options were mapped over, and at least the incidents list didn't end up with a public archive, which is good15:56
clarkbya I mean for the new lists15:56
fungioh, right-o15:57
clarkbusing migrated lists as a sanity check against our defaults for new lists15:57
fungiyes, agreed15:57
fungiit should be possible to dump the configs and do a sxs comparison15:57
fungithe held node has some of both (i only migrated three lists)15:58
fungii can also try importing some others if there are specific kinds we're interested in, but the three i did were our basic three archetypes (announcements, discussion, private)15:59
*** NeilHanlon_ is now known as NeilHanlon16:03
clarkbyup I was going to compare service-discuss and openstack-discuss to start. And ya the rest api will dump a json represetnation of the config which we can diff16:04
fungianother option might be to hold a second node and dump the config from the pre-import service-discuss for comparison to post-import16:16
fungior just redo the migration on a new held node and compare before/after configs16:17
clarkbI don't think that is necessary since all of them are created the same way with ansible and the rest api16:32
clarkbso they've all got the same config when created new16:33
*** dviroel|lunch is now known as dviroel16:33
*** jpena is now known as jpena|off16:42
clarkbfungi: on the imported list these are the settings that differ from the default created list: convert_html_to_plaintext, filter_extensions, process_bounces, pass_types16:43
clarkbfungi: on the held node in my homedir you can diff service-discuss2.json and openstack-discuss2.json16:43
clarkbhttps://docs.mailman3.org/projects/mailman/en/latest/src/mailman/handlers/docs/filtering.html#conversion-to-plain-text for the first one16:46
clarkbmaybe we double check lynx is installed in the docker images and then default to that?16:46
clarkbI don't think lynx is installed in the mailman-core image so taht may be an image bug16:47
clarkbhttps://docs.mailman3.org/projects/mailman/en/latest/src/mailman/handlers/docs/filtering.html#passing-and-filtering-extensions is the next thing. Filtering that list of file extensions by default seems reasonable16:48
clarkbI'm not sure about process_bounces. Haven't found a good doc for it. I think that is having mailman process the bounces for sent email (and possibly unsubbing people?)16:50
clarkbhttps://docs.mailman3.org/projects/mailman/en/latest/src/mailman/handlers/docs/filtering.html#passing-mime-types for the last thing. That also seems reasonable16:51
clarkbLet me know if you think we should be setting those to match the imported list when we create new lists and I can work on updating our ansible for that16:52
fungiclarkb: yes, turning off process_bounces was a workaround for dmarc-enforcing recipients getting unsubscribed when their mtas rejected messages with broken dkim sigs16:54
fungiwell, not really unsubscribed, but by default their subscriptions get set to non-delivering16:54
clarkbgotcha. Seems like we should turn that off by default then. At least to start that seems like a good safety toggle16:54
clarkbI'll work on a patch16:55
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124817:01
fungiright, if we switch to using some of the active dmarc mitigation mechanisms, we can probably go back to processing bounce messages again17:04
fungithough it not only helps avoid mass-unsub when people send dkim-signed messages to a list, it also protects against mass-unsub events when somebody decides to put the server on a spam blocklist17:05
fungibut on the flip side of that, it means we'll continue sending to more and more invalid addresses over time as people change jobs or delete accounts without fixing their subscriptions17:05
fungiso some mail systems may decide the listserv is spammer-controlled because it's constantly sending to lots of dead addresses17:06
*** efoley_ is now known as efoley17:24
*** pojadhav|ruck is now known as pojadhav|out17:53
clarkbre lynx that might be another vote towards modifying the upstream images since we can layer that in pretty easily18:40
fungigood point18:52
*** dasm is now known as dasm|off19:21
opendevreviewClark Boylan proposed opendev/system-config master: Install refstack with openstack constraints  https://review.opendev.org/c/opendev/system-config/+/85527919:51
clarkbinfra-root ^ refstack is erroring because cryptography 37 removed a method for rsa key verification. I expect that to fix things by pinning cryptography to an older version, but want to make sure that builds properly and check installed versions first19:56
clarkbAnd now lunch, then I'll look at cleaning up that mm3 change a bit20:00
diablo_rojoThank you clarkb !20:01
fungii'll start the openstack-discuss import test by analyzing the amount of data i need to rsync over from prod20:01
fungidf is taking... a while20:15
fungier, du i mean20:16
fungi22G /srv/mailman/openstack20:33
fungiyeeowch20:34
fungibut the held node has plenty of room to spare, so not too concerned. the rsync will just take a while20:34
fungibecause of the container uid/gid issue, the easiest solution is to rsync over to ~ as mailman and then mv the directory into ~mailman/import and chown it to the container-friendly owner20:45
fungibut at least that way if we mount a volume at ~mailman for all the containers, it will be a quick atomic move20:45
clarkbbah no wget and no curl on the python-builder image21:17
clarkbtahts ok I can install it. but I should've checked that first21:18
clarkboh we already install curl. /me changes to that one21:18
opendevreviewClark Boylan proposed opendev/system-config master: Install refstack with openstack constraints  https://review.opendev.org/c/opendev/system-config/+/85527921:20
ianwweird that refstack depends on a rsa key thing from cryptography...?21:22
clarkbianw: it uses pubkey auth for some reason21:29
clarkbI think because the idea was you'd run it from a script in a system updating your results rather tahn interactively via a web browser21:29
clarkbbut yes21:29
fungirsync is roughly half done21:34
*** lbragstad1 is now known as lbragstad21:35
opendevreviewClark Boylan proposed opendev/system-config master: Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124821:47
opendevreviewClark Boylan proposed opendev/system-config master: DNM force mm3 failure to hold the node  https://review.opendev.org/c/opendev/system-config/+/85529221:47
opendevreviewClark Boylan proposed opendev/system-config master: Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124821:54
opendevreviewClark Boylan proposed opendev/system-config master: DNM force mm3 failure to hold the node  https://review.opendev.org/c/opendev/system-config/+/85529221:54
clarkbok that restores all the regular CI jobs and adds an infra prod job21:55
clarkband cleans up a number of todos and so on. I think we can call this ready for review now. I'm going to put a hold on 855292 so that we can double check things like the pipermail redirect work as expected21:55
clarkbI'm keeping the old hold in place because fungi is still using it to test the migration of service-discuss21:55
clarkbhttps://review.opendev.org/c/opendev/system-config/+/855279 appears to have installed cryptography 36.0.2 as expected22:00
fungiclarkb: thanks, yep i'm shoving more data to it now in fact22:04
funginearly done copying22:04
clarkbfungi: we should probably pay attention to timing data for the openstack-discuss migration too just to get an idea of how long their downtime will last22:05
clarkbalso I think I might have fixed the auto filed warnings.22:05
clarkbYou're supposed to set the path to the class as a string and not the actual class object22:05
clarkbwow we run a lot of jobs against 851248 when I put them back in. I wonder what I edited to trigger that22:16
clarkbbah something broke. Maybe the auto field thing? we set no log though :/22:21
clarkbI'll be able to check it on the held node soon enough22:22
fungirsync of the openstack site took approximately two hours to complete. for the real migration, i expect we'll seed the copy while everything is running, then do one final rsync once services are stopped in order to minimize downtime from that22:25
clarkb++22:27
clarkbok my held node change didn't fail22:27
clarkbwell it will fail through the forced test failure but it looks like it passed otherwise?22:27
clarkbits failing on the django admin user creation step which was always flaky before because you ahve to delay for enough of the db to be ready. I thought I had done that but apparently not22:30
clarkber I thought I had added sufficient checks for the db to be ready22:30
clarkbI think the auto filed thing is working comparing https://zuul.opendev.org/t/openstack/build/915494ed931a4fdeb3794bf82fe6bae8/log/job-output.txt#20442 to https://zuul.opendev.org/t/openstack/build/ba392150076f475d9eae0a90f7efcdca/log/job-output.txt#2071522:33
clarkbfungi: 104.130.4.104 is the newly held node. That one can be used to check the pipermail stuff when you are done with openstack-discuss. But no rush22:34
*** dviroel is now known as dviroel|out22:37
fungii took a sushi break, but am going to start testing the import now (and timing each step)22:40
fungiokay, got everything moved to the right place and ownership applied to the files, import is proceeding22:51
fungii expect this to take a while22:51
fungiactually, the first stage is already completed: real 1m54.675s22:53
fungihopefully the other steps go as quickly22:54
fungithe archive import will probably be the slowest step22:54
fungiit's to the point of displaying the counter22:56
fungiit reported skipping a message22:56
clarkbdid it indicate why?22:57
fungiMySQLdb._exceptions.OperationalError: (1153, "Got a packet bigger than 'max_allowed_packet' bytes")22:57
fungimaybe we need to tune the db?22:57
fungiit's up to 10% imported now and so far just the one skipped message, at least22:58
clarkbinteresting. I wonder if that is a mismatch between client lib expectations and mariadb22:58
opendevreviewClark Boylan proposed opendev/system-config master: Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124822:59
opendevreviewClark Boylan proposed opendev/system-config master: DNM force mm3 failure to hold the node  https://review.opendev.org/c/opendev/system-config/+/85529222:59
clarkbThat adds another db is steady state check attempt22:59
clarkbI'm going to rotate out the newer node that I held22:59
fungiarchive import is up to 1/3 complete now23:01
clarkbwow much quicker than I anticipated23:02
clarkbfungi: don't forget to check the size of the xapian index dirs after wards23:02
fungiyep23:03
fungithey were pretty small after the opendev list imports, at least23:03
clarkband then I guess we need to look into that skipped message23:05
fungii can almost guarantee it's because of some massive attachment23:05
clarkbhttps://dba.stackexchange.com/questions/886/changed-max-allowed-packet-and-still-receiving-packet-too-large-error23:06
fungiunfortunately, all it gave me to go on is an opaque gmail message-id23:06
fungiclarkb: yeah, i looked through a few similar posts. basically sounds like we could increase it in the container's my.cnf23:06
clarkbyup23:06
fungiif we want to preserve this one message23:07
fungi(i mean, there might be more, but this is almost 3/4 done already and just the one skip so far)23:07
clarkbI guess that means that the actual mail content is in mysql?23:07
clarkbno more mbox or whatever typical email storage?23:07
fungiseems that way23:07
clarkband then xapian indexes what is in mysql23:08
fungiit might also keep an mbox copy too, i don't know23:08
clarkbya I'm mostly trying to figure out how we estimate storage space needs23:08
clarkbbut I suppose if we deplioy with /var/lib/mailman on lvm backed with volumes we can roughly ballpark then adjust later if neccessary23:09
fungigrep if the mbox found me the message: https://lists.openstack.org/pipermail/openstack-discuss/2018-November/000114.html23:12
fungi17MiB of attachments, looks like23:15
fungithough the last one is the vast majority of that23:16
fungireal 18m42.199s23:16
funginot bad for the import23:16
funginow for the reindex23:16
fungiunderway23:17
clarkbI think I'm ok losing that message. However, maybe this indicates new messages with large attachments will also fail. Not sure if that is desireable23:17
fungiIndexing 30183 emails23:20
fungiand yeah, i feel like we should probably adjust the db config and do more test imports23:21
fungiwe should probably perform test imports of all our existing lists on a held node in order to shake out potential gotchas before we migrate them for real23:21
fungijust to reduce the chances of aborting a maintenance window and possibly leaving ourselves with a mess to clean up23:22
fungialso good for getting more accurate timing data for the actual migrations23:22
fungireal 6m41.204s23:30
fungiso wall clock time for all three steps 1m54.675s + 18m42.199s + 6m41.204s = 27m18.078s23:34
fungi1.2G /var/lib/mailman/web-data/fulltext_index23:39
fungi687M /var/lib/mailman/database23:39
fungi1.4G /var/lib/mailman/import/lists.openstack.org/archives/private/openstack-discuss23:40
fungifor comparison23:40
fungiso between the database and the search index, the on-disk size is around a third larger than the original23:42
funginothing worrisome, i don't think23:42
fungiwe only have 46G used on / in prod, much of which is the operating system, so a 100gb volume would be plenty for years to come, but we could do 250gb just to be safe23:44
clarkbNot too bad23:57
clarkbmy new check for django migrations isn't working and it isn't clear to me why23:57
*** rlandy is now known as rlandy|out23:57
clarkbbah because command doesn't do |23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!