Tuesday, 2021-08-03

*** diablo_rojo__ is now known as diablo_rojo09:02
*** corvus is now known as Guest315411:18
*** Guest3154 is now known as notcorvus14:39
*** notcorvus is now known as corvus14:40
*** corvus is now known as Guest319714:40
*** Guest3197 is now known as corvus14:42
*** corvus is now known as notcorvus14:42
*** notcorvus is now known as corvus14:42
*** corvus is now known as Guest320014:43
*** Guest3200 is now known as corvus15:17
*** corvus is now known as notcorvus15:17
*** notcorvus is now known as corvus15:17
clarkbAnyone else here for the opendev infra team meeting?19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Aug  3 19:01:19 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-August/000272.html Our Agenda19:01
ianwo/19:01
fungiohai19:01
clarkbHello19:02
clarkb#topic Announcements19:02
clarkbI had no announcements19:02
clarkb#topic Actions from last meeting19:02
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-07-27-19.01.txt minutes from last meeting19:02
clarkb#action someone write spec to replace Cacti with Prometheus19:02
clarkbtristanC has some prometheus checks in the matrix gerritbot. I'm still hopeful I may have time this week to start a draft of this spec19:03
clarkbconsidering that people want to add this to new software and some of our existing software has prometheus integration (gitea) getting this moving forwad seems like a good idea19:03
corvusZuul has it too19:04
clarkbah neat19:04
corvus(For health, not job data)19:05
clarkb#topic Topics19:06
fungithe topics topic!19:06
clarkb#topic Service Coordinator Election19:06
clarkbfungi: I kept it because you enjoy it so much :)19:06
* fungi blushes19:06
clarkbAs mentioend in the email sent out last week Nominations will run for another week. I'm really encouraging someone else to take this on now so I can not do it :)19:06
clarkbfrickler ianw fungi corvus ^ I know we're all busy but if you're interested please feel free to volunteer :)19:07
clarkb#topic Review Upgrade19:08
clarkbI expect that we are close to being able to drop this topic from the meeting agenda, but wanted to keep it for at least this week19:08
clarkbWe just merged a change to bump the review.opendev.org dns record ttl up to one hour from 5 minutes19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/803372 Stack of gerrit fixes19:09
clarkbI've also got this stack of changes which should pull in the mariadb and openid fixes that we landed upstream as well as switch us over to using the mariadb connector and then removing the mysql connector from our images19:09
clarkbThey are split out this way because I suspect that we'll want to land the first two. Do a gerrit restart, then after we have happy gerrit for a bit we can land the third19:10
clarkbianw: The other major followup is the old server cleanups?19:10
clarkbI think frickler ack'd the file preservation question. Do we want to set a date for cleaning those up?19:10
ianwyep, i think i can probably just go ahead with that at this point19:11
fungiagreed19:11
clarkbcool and thank you for doing a ton of work to make this happen19:11
clarkb#topic Gerrit User Cleanups19:13
clarkbI put together another set of proposed user cleanups. There are about 103 remaining conflicts. I think I've got 73 proposed for cleanup. That will take us to 30 remaining19:13
clarkbMy goal here is that we'll do the 73 then the 30 will be manageable as a single comit pushed back to All-Users and we won't have to do the more hacky retire and delete external ids thing we have been doing19:14
clarkbI plan to reach out to those ~30 when we get there as well19:14
clarkbfungi has reviewed this list. If anyone has tiem to take a look that is appreciated. I basically put together an audit file with a bunch of user info and teh separately indicate which of those we'll be retiring and cleaning up19:15
clarkbBut I plan to try and run the cleanup scripts against that list tomorrow19:15
clarkb#topic Project Renames19:16
clarkbI'll pull this topic off the agenda for next week but wanted to follow up on how the renames went last week19:16
clarkbThey went really well. Having the testing is great19:16
fungismoothly, i thought19:16
clarkb#link https://review.opendev.org/c/opendev/system-config/+/802922 Need to manage ssh known_hosts for review02 users19:16
fungiyeah, the testing was a huge help19:17
clarkbI think ^ was the major thing that we worked around by hand19:17
fungiand we predicted that ahead of time so it didn't disrupt the maintenance19:17
clarkbif others can review that and indicate the chagne looks good I'll add the host var data to prod host vars and then I can approve that19:17
fungithe only surprise was that it choked on storyboard-dev being unreachable19:17
clarkbyup19:17
clarkb#link https://review.opendev.org/c/opendev/system-config/+/803266 Further testing improvements19:17
clarkbis another related chagne which aims to make the testing we've added even more useful19:17
fungithat will hopefully merge in the next hour or so19:18
ianwahh sorry yep i looked at that yesterday19:18
ianwthat doesn't add the host key for giteas?19:18
ianwwas that intentional?19:18
clarkbianw: yes. The idea is we'll do it via private vars.19:18
clarkbianw: the reason for this is the previous patchset tried to add the giteas and the localhost known_hosts but failed becusae that overrode the testing group vars19:19
clarkbianw: this meant the known hosts entry we had for the testing review was actually the prod review and host key verification failed19:19
clarkbianw: to work around this the plan is to just set those values in the private host vars which I'll do as soon as we think that change is mergeable.19:19
clarkbThe content I'll add is at https://review.opendev.org/c/opendev/system-config/+/802922/6/inventory/service/host_vars/review02.opendev.org.yaml19:20
clarkbianw: if you think a different approach makes sense feel free to leave that in review. There are a few options there but fungi and corvus  felt that keeping the pubkey and private key stuff close together in private vars made sense19:21
ianwok, i feel like we could grab the keys from gitea servers directly but i'll rethink on it19:21
clarkb#topic Matrix eavesdrop and gerritbot bots19:22
clarkbcorvus and tristanC have been working to get these bots deployed. Testing has shown a couple of issues. Specifically creating log dirs properly and not using fedora as the base image so that ssh works for gerritbot19:23
clarkbI think fixes for both of those items are on their way in whcih means I expect we can interact with those bots in the opendev test room19:23
clarkbcorvus: ^ is there anything else to be aware of as the matrix bots get deployed?19:23
corvusNope just waiting to check back19:24
clarkb#topic gitea01 backups19:24
clarkbgitea01 backups are still sad. This is more important now that we have done a project rename. We do have backups to the other host working. If you have to restore gitea database from backup we need to be careful to use the up to date backup19:25
clarkbbackups are listed by date so this should be pretty apparent but calling it out here so that others see it19:25
clarkbOne option available to us is to drop the AAAA record for the backup server in vexxhost. Now that we actually want backups to update I think we should consider this but I know others feel more strongly about ipv6 than I do (I don't have local ipv6 yet :( )19:25
ianwnote we do have daily copies going to rax, so we're not too bad19:26
clarkbyup19:26
ianwi dunno, perhaps email is the next step.  i haven't heard any more on the underlying issue19:27
clarkbthat seems reasonable to me19:27
ianwi'm open to ideas but i feel like i've more or less exhausted my options for fiddling things client side19:27
clarkbyup I think we need to rely on the cloud to correct it. The only thing we can do now is workaround it by dropping the aaaa record19:28
clarkbI'm happy if we start with email19:29
ianwi can do that 19:29
clarkbthanks19:29
clarkb#topic Gitea 1.15.0 upgrade prep19:29
clarkbWhen I did the gitea 1.14 upgrade recently I noticed there was a release candidate for gitea 1.15.19:29
clarkbThis turned into a thing yesterday where 1.15.0 didn't work for a number of reasons19:30
clarkbThe first is nodejs 16 and gitea are no longer compatibile despite their release notes saying this is the default nodejs to build with. We have since (just a few minutes ago) landed a change to use nodejs 14 to work around that as 14 is happy with old and new gitea19:31
clarkbNext we've discovered that some of our old web ui based project management for things like project settings updatse and renames don't necessarily work with new gitea (specifically the renames)19:31
clarkbI took this as a good indication we should rewrite to use the proper rest api as the api does support these actions now19:32
clarkb#link https://review.opendev.org/c/opendev/system-config/+/803366/ Update project renaming and its child19:32
clarkbAgain that should be safe with older and newer gitea19:32
fungii expect we didn't use the api for those in the past because there weren't methods/objects to accomplish that?19:32
ianwoh nice, and gate tested with the new playbooks?19:32
clarkbfungi: yup that is what corvus thought19:32
clarkbianw: yup19:32
clarkbthen at the end of the stack I've placed a WIP change for gitea v1.15.0-rc2 that does a bunch of v1.15.0 specific things around templates and file paths and some config19:33
clarkbI don't want to land the last change until we can do so with a real release. But we should be in good shape to deploy that release when it is ready19:33
clarkbI'm happy I caught this early because there are a number of things that all needed updating to make this work right. Also when we get a real release we can hold nodes and double check it all looks good (I did that with the rc recently and it was mostly ok other than the rename not working)19:34
clarkb#topic Mailman Ansible and Server Upgrades19:35
clarkb#link https://review.opendev.org/c/opendev/system-config/+/803263 fix newlist command19:36
clarkbcorvus: ianw ^ if one of you can review that it would be great. It seems small on the surface but I think it gets deep into how ansible handles commands and how newlist reads prompts19:36
fungithat was a fun subtle "we're not testing *exactly* like production" bug19:36
clarkbotherwise I'd go ahead and approve it19:36
clarkbfungi did test redirecting a file with only a newline in it to newlist on a held test node and that worked19:36
clarkbso we expect it will work but we got this wrong once already so eyeballs are a good thing :)19:37
fungii guess the way we could have caught that was if we'd not used --quiet in the newlist test and instead blackholed delivery in exim19:37
clarkbya I suppose that is still an option but someone that groks exim better than me may want to take a shot at it19:37
clarkbI've also got a todo I need to get back to which is "upgrade the list serve servers"19:38
clarkbIt occured to me that the lists.katacontainers.io server is a good guinea pig. In part because I expect it to be easier to snapshot and therefore easier to iterate on for testing. But also because if it goes really sideways we can emergency migrate those lists onto lists.openstack.org (long planned anyway) and then figure out why it was so broken19:39
clarkbfungi: ^ I'll probably bug you for a second set of eyeballs as I try to bootstrap that testing19:39
ianwhrm, i wonder if running it under nohup might work too?19:39
fungioh, on a related note, a user approached me privately today to find out why they'd stopped receiving posts from one of the lists, and in researching i discovered it was because the ipv6 address for lists.o.o ended up on the spamhaus xbl. as of ~16:30 deliveries were working to the problem destinations again19:39
clarkbianw: ya that might work too. We went with the ansible version to avoid needing to use the shell module, but nohup should do that as well19:40
clarkbif we feel strognly about it I expect fungi can test newlist on the test server again using nohup19:41
fungirough timeframe on the xbl listing is it likely got added around july 2119:41
clarkbfungi: fun. I guess I solve that problem by not having aaaa records on my mailservers19:42
clarkb#topic Open Discussion19:43
clarkbThat was what I had on the agenda.19:43
clarkbApologies for the ton of various docker image and gitea and gerrit and so on changes19:44
ianwi'm hopefully i can remove debian-stretch soon19:44
clarkbI just started looking at this as I did gitea things and the thread was very long. I think it does get us to a better spot overall once we get through them19:44
clarkbianw: jayf noticed the fedora 32 mirror removal. dib functests stopped working19:44
corvuswe can configure exim to not use ipv6 for outgoing if we want19:44
ianwyeah, mea culpa for not updating that19:44
ianw#link https://review.opendev.org/c/zuul/zuul-jobs/+/80298119:44
clarkbianw: https://review.opendev.org/c/openstack/diskimage-builder/+/799341 thta apparently fixes it which I'll review after the meeting19:45
fungii have working ipv6 on my mta and spamassassin was logging that the RCVD_IN_XBL test was matching on messages from the lists, but it wasn't subtracting enough from the score to send them to my spam inbox19:45
fungiso i didn't spot it19:45
ianwremoves centos-8-stream from using centos-8 wheels19:45
ianwi'm working on getting centos-8-wheels built; it basically works19:45
ianw#link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/80298819:45
ianwi just need to get the publish jobs working19:45
ianwi have added some volumes19:46
clarkbianw: will that change fix it? I think we still want to use the pypi mirrors but not the wheel19:46
ianwfungi: if you could confirm on https://review.opendev.org/c/opendev/base-jobs/+/802639 removal is the plan that would be great19:46
clarkbianw: that role seems to only do the distro mirrors though19:47
ianwclarkb: my intention was to turn off just the wheel bits but i may have missed the mark 19:47
clarkbianw: ya the tasks/mirror.yaml top level tasks lists does the pypi mirror config19:48
fungican do19:48
clarkbianw: ianw  left a note19:48
ianwok will cycle back19:48
clarkbtab completion fail19:49
ianwoh and clarkb you mentioned backup pruning too?19:49
clarkbianw: oh ya I mentioend it yesterday since we're getting the emails warning us of disk space19:49
clarkbianw: did that get run?19:49
clarkbseems we still got the warning email overnight19:50
ianwno i didn't yet.  how about i clean up review-test etc. and then i'll do that19:50
clarkbsounds good thanks19:50
clarkbsounds like that may be it. As awalys feel free to bring up discussion on IRC or on the mailing list19:51
clarkbwe aren't limited to this hour block19:51
clarkbThanks everyone!19:52
clarkb#endmeeting19:52
opendevmeetMeeting ended Tue Aug  3 19:52:03 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:52
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-03-19.01.html19:52
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-03-19.01.txt19:52
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-03-19.01.log.html19:52
fungithanks clarkb!19:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!