19:01:06 <clarkb> #startmeeting infra
19:01:06 <openstack> Meeting started Tue Apr 21 19:01:06 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:09 <openstack> The meeting name has been set to 'infra'
19:01:16 <mordred> o/
19:01:16 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-April/000010.html Our Agenda
19:01:19 <ianw> o/
19:01:41 <clarkb> #topic Announcements
19:01:46 <zbr> o/
19:02:00 <clarkb> I wanted to call out here that splitting opendev into its own comms channels seems to be working for getting more people to engage
19:02:09 <AJaeger> o/
19:02:27 <clarkb> welcome! to all those people (not sure if any are here in this channel now but we've seen more traffic on the mailing list)
19:03:00 <fungi> we're up to 80 nicks currently in the #opendev channel
19:03:38 <fungi> (still a far cry from the 250+ in #openstack-infra, but many of those may be zombies for all intents and purposes)
19:04:38 <clarkb> #topic Actions from last meeting
19:04:38 <fungi> also 20 subscribers to service-discuss and 25 to service-announce
19:04:44 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-04-14-19.01.txt minutes from last meeting
19:04:51 <clarkb> there were no actions.
19:04:57 <clarkb> #topic Priority Efforts
19:05:03 <clarkb> #topic Update Config Management
19:05:28 <clarkb> maybe mordred can update on his activity here then ianw?
19:07:16 <clarkb> maybe we lost mordred
19:07:30 <clarkb> my understanding of it is that we've continued to push towards zuul driven CD of things
19:07:48 <clarkb> in particular we are now looking at cleaning up puppetry as and where necessary
19:07:56 <corvus> does anyone know what the status of containerized zuul is?
19:08:01 <mordred> heya - sorry
19:08:07 <mordred> yes!
19:08:23 <mordred> so - three things going on
19:08:26 <corvus> (i'd like to proceed with the tls work, so catching up on that would be helpful for me)
19:09:06 <mordred> first - I'm still working through the followup from the gerrit rollout - next on that list is gerritbot - this led me to eavesdrop which has turned in to reorganizing how we run puppet a bit
19:09:18 <mordred> so - sorry for that rabbithole - but I think it'll be worth it
19:09:36 <mordred> https://review.opendev.org/#/q/topic:puppet-apply-jobs
19:09:40 <mordred> that's the topic related to that
19:09:48 <mordred> second and third are nodepool and zuul
19:10:21 <mordred> https://review.opendev.org/#/q/topic:container-zuul
19:10:26 <corvus> (i think that rabbit hole -- getting the puppet jobs down to size -- is great and worth it)
19:10:40 <mordred> nodepool-launcher is ready to go and I think now safe to land: https://review.opendev.org/#/c/720527/
19:10:56 <mordred> it won't restart containers in prod, so I think we can land it then do a manual rolling restart of the launchers
19:11:20 <mordred> if people are happy with what we did there with starting vs. not starting docker-compose ... I can apply the same thing to the zuul patch:
19:11:30 <mordred> https://review.opendev.org/#/c/717620/
19:12:08 <mordred> (issue being we don't necessarily want ansible to run docker-compose up every time it runs - but we DO want that to happen in the gate)
19:12:34 <mordred> I believe once I update that patch with the start boolean - it'll also be ready to go
19:12:42 <mordred> and I think also safe to land
19:12:56 <mordred> but - since that's nodepool and zuul - please review with an eye to "is this safe to land"
19:13:05 <corvus> we probably could start nodepool-launcher every time
19:13:39 <mordred> corvus: maybe we land first with nothing starting - because we have to stop the systemd stuff ...
19:13:42 <corvus> yeah
19:13:49 <corvus> i'm okay with starting conservative there
19:13:50 <mordred> and then land a patch to flip the var on the things where we're happy to do it every time
19:14:04 <corvus> what's the thinking on nodepool builders?  i didn't get the full story yesterday
19:14:18 <fungi> also ianw discovered that debootstrap (used by dib to make debian/ubuntu images) needs a couple of patches to work from a container, so has published a custom build of it in a ppa and confirmed that's working from a container
19:14:19 <corvus> (is nb04 broken? or what?)
19:14:21 <mordred> ianw is further with diagnosing the issue - I think he's got a working build
19:14:25 <fungi> we debated switching to something newer like mmdebstrap, but don't want dib to break for users of older platforms where those newer tools aren't yet shipped as part of the distro
19:14:29 <mordred> but it involves two unlanded merge requests
19:14:44 <mordred> corvus: nb04 is broken for debuntu builds
19:14:48 <mordred> so they have been removed from it
19:14:53 <ianw> yes, a few things in progress
19:14:58 <mordred> oh good - it's ianw
19:15:03 <clarkb> specifically because debootstrap in docker containers explodes the next thing that runs in the container?
19:15:19 <ianw> yes, it likes to unmount /proc
19:15:19 <frickler> https://review.opendev.org/721394
19:15:41 <corvus> it sounds like we can't really run our builders or executors in containers at the moment
19:15:54 <corvus> i'm a little worried that the zuul tls work is starting to collide with this
19:16:13 <corvus> the zuul patch has the executors running outside of containers
19:16:40 <corvus> should we rethink what we're doing with the builders?  or can we get them into a consistent state soon?
19:16:42 <mordred> corvus: well - I think we can get to full ansible
19:16:43 <ianw> i am working to get our dib functional tests of converted to building from the container
19:16:57 <fungi> it sounds like we should be able to run builders from containers with a patched debootstrap
19:17:00 <mordred> corvus: which would be the part of the story that would most impact tls work, yes?
19:17:06 <corvus> mordred: yeah
19:17:26 <corvus> so we're looking at having 3 builders run from ansible+puppet, and 1 from ansible+containers?
19:17:35 <mordred> so - yeah - let's give ianw a little bit to see if we can get a solid container story for the builder with patched debootstrap
19:17:37 <corvus> or 3 from just "ansible"
19:17:49 <ianw> please don't forget there is an arm64 builder which has not had a lot of attention, but i would not like to drop
19:17:53 <mordred> I think just ansible if we can't get the container build going
19:18:01 <mordred> ianw: I have thoughts on that - let's come back to arm
19:18:19 <corvus> what kind of time are we talking about there, cause it sounds like ianw is working on a rabbit hole of his own with the container functional testing?
19:18:36 <corvus> basically, we're holding a zuul release on opendev being able to test this stuff
19:18:59 <mordred> well - it sounds like the patched debootstrap works - so now it's about updating testing to prove that it works and make sure we don't regress, yes?
19:19:29 <fungi> (and working out the arm story)
19:19:36 <corvus> so i think we need to either get the system into a place where we can realistically land a coordinated configuration change to the whole system in a day or two, or else sever the dependency between opendev and zuul releases (at least, temporarily)
19:19:57 <mordred> ok. so - there are a couple of options for that
19:20:40 <mordred> we can work on an ansible+pip install (I can work on that right now)- based on the current ansible+docker install and similar to how we did zuul-executors in the zuul patch
19:20:50 <mordred> we'll need focal nodes for them to be new enough
19:21:03 <clarkb> mordred: why do we need focal for that?
19:21:14 <clarkb> everything is pip installed so shouldn't depend on focal?
19:21:22 <mordred> because of the reasons we're using the containers in the first place- the rpm helper tools on bionic are old or missing
19:21:36 <clarkb> oh for builders specifically. Got it
19:21:38 <mordred> yeah
19:21:38 <ianw> mordred: if you mean, just use pip on a plain host to install, i.e. replicating the puppet in ansible, i have a patch that does that
19:21:44 <AJaeger> for focal, we need to merge https://review.opendev.org/#/c/720718/ to mirror it - and stop mirroring trusty
19:21:48 <mordred> I think it's not unreasonable to upload a focal base image
19:22:15 <mordred> yes
19:22:27 <mordred> and that would be good so that we can have integration test jobs
19:22:38 <mordred> but - I think we can work those in parallel
19:22:48 <ianw> also, arm only builds xenial/buster/bionic/centos atm.  we don't need the updated tools which are required for fedora, as of right now
19:22:53 <corvus> fungi: i don't understand your comment in 720718
19:23:06 <mordred> and get a focal base image uploaded to rax-dfw and boot a nb on it that we can use for fedora builds
19:23:09 <corvus> fungi: i don't know what the differences between those two hosts are
19:23:15 <mordred> as ianw says - we only need that for fedora builds
19:23:30 <clarkb> corvus: its a response to my comment
19:23:48 <ianw> corvus: we have done the work to move reprepro from puppet to ansible yet
19:23:52 <corvus> clarkb: i understand that.  i don't understand how mirror-update.opendev.org and mirror-update.openstack.org are different
19:23:53 <mordred> so we can boot the other ansible-baesd builders on bionic
19:24:12 <clarkb> corvus: mirror-update.opendev.org is ansible managed and only does rsync based mirror updates currently
19:24:21 <fungi> the opendev.org server is the new one cron jobs are being migrated too, off the older openstack.org server
19:24:27 <clarkb> corvus: mirror-update.openstack.org does all the other mirror updates (reprepro and maybe other tools too)
19:24:30 <fungi> s/too/to/
19:24:45 <corvus> but they're both afs heads?
19:25:03 <fungi> yes, both write into afs
19:25:35 <mordred> I think we should not tie this to reworking anything about how reprepro and old mirror-update works - really just upping the quota and doing a manual release should be fine to get this moving, yes?
19:25:43 <clarkb> mordred: yes
19:25:43 <corvus> sorry, this is proving a distraction.  i still don't understand fungi's comment and the implications, but i'll just follow up later.
19:26:07 <clarkb> basically my comment was calling out that you need to bump the quota and do the manual release
19:26:19 <clarkb> if you do that its should all be fine
19:26:21 <corvus> and fungi said something isn't necessary, but i don't know wha.t
19:26:35 <mordred> great - so I think tasks would be: get focal mirroring going, get nodepool building focal nodes, build a manual focal-minimal to upload as a base image into rax-dfw, get a pure-ansible port of nodepool-builder
19:26:53 <mordred> most of those can be done in parallel
19:27:04 <fungi> corvus: oh, because mirror-update.opendev.org get vos release run remotely by ansible and uses localauth to avoid timeouts
19:27:08 <corvus> so do we want to switch all of the nb nodes to pure-ansible, retiring the current nb04?
19:27:12 <mordred> I'm happy to take the pure-ansible port since I'm cranking on that stuff- can someone else help drive the mirror update?
19:27:14 <fungi> sorry, i had to page all that back in
19:27:24 <corvus> then make the container switch later after there's lots more testing?
19:27:55 <ianw> mordred: bionic is sufficient to build fedora.  in fact, i already did all of that, let me fine the patch
19:27:55 <fungi> my comment was specifically in response to clarkb's
19:27:56 <mordred> yeah - I think that's a sane thing to do for now - although I do think that continuning the container debugging and testing work is imporant
19:28:05 <mordred> ianw: but not suse
19:28:09 <mordred> ianw: because it doesnt' have zypper
19:28:25 <fungi> in response to clarkb's "Its possible this is no longer a concern..."
19:29:23 <mordred> so - I think we should operate under the assumption that having at least one focal node would be beneficial - and that we also might need at least one bionic node because arm. hopefully we can coalesce on only focal once we can prove out that it works fine for arm
19:29:24 <clarkb> mordred: ianw I think the only major risk with the focal plan is focal + arm64. But it sounds like we can maybe keep that on xenial or bionic for a bit longer
19:29:29 <fungi> so i was saying initial vos release timeouts *are* a concern for anything added to mirror-update.openstack.org (like reprepro-based mirroring which is still there for the moment) but not for things mirrored using mirror-update.opendev.org (like rsync-based stuff)
19:29:30 <mordred> yeah
19:29:39 <mordred> I don't think the ansible differences between bionic and focal are likley large
19:29:45 <fungi> corvus: does that answer your question?
19:29:46 <mordred> we don't have big things like systemd vs sysvinit
19:30:28 <corvus> fungi: does that change apply to opendev or openstack?
19:30:57 <corvus> mordred: that sounds reasonable
19:31:02 <fungi> corvus: openSTACK because it's an ubuntu mirror
19:31:14 <fungi> so vos release timeouts are still a concern for that change
19:31:36 <fungi> i probably should have quoted clarkb's comment in my reply, but thought it was obvious what i was replying to (clearly it wasn't, sorry!)
19:32:04 <corvus> well, the part you were referring to with "this" would have been helpful
19:32:13 <mordred> I'm happy to work on the ansible nodepool-builder (which ianw may have already done) and the focal-minimal image in rax-dfw - can someone else drive the steps needed to get the vos release done safely?
19:32:16 <ianw> https://review.opendev.org/#/c/692924/
19:32:20 <mordred> ianw: awesome, thanks!
19:32:28 <clarkb> ianw: ^ does that seem reasonable? I think we can switch to containers on top of ansible without containers easily enough when we have that working relaibly
19:33:10 <ianw> tbh this is was i was proposing about 6 months ago :)
19:33:17 <mordred> ianw: although I think a lot of the current code can stay as it is in container-builder - so it'll be about picking out the appropriate things that we need for pip nodepool
19:33:25 <clarkb> mordred: I can start a root screen on mirror-update and grab the lock to get this started
19:33:42 <clarkb> then we need to update the quota, merge the cahnge, and manually trigger the update
19:33:50 <mordred> clarkb: ++
19:34:07 <clarkb> but first I need to load my ssh key
19:34:19 <mordred> ianw: well - I think we would have been golden with the container work you did -- if there wasn't this crazy debootstrap bug :)
19:34:39 <mordred> of all the things to derail this - I wasn't expecting _that_ ;)
19:34:57 <ianw> it's just a lot of uncharted territory all around
19:35:25 <ianw> anyway, i'll keep working on it
19:35:51 <mordred> ianw: cool - and yes - I'd love for us to be able to switch back to pure-containers there
19:35:57 <clarkb> mordred: if we can swing back to gerrit things, we have a third party ci operator that is using devstack-gate and discovered that we are still not replicating to review.o.o/p/ properly
19:36:04 <mordred> although there is also an arm thing we have to solve
19:36:18 <clarkb> mordred: I triggered replicaton openstack/requirements (the out of date repo) just in case that was somethign that didn't get rereplicated after config updates and no change
19:36:19 <mordred> I've got thoughts on the arm thing - it's solvable
19:36:46 <mordred> http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-04-19.log.html#t2020-04-19T15:04:32 <-- basic summary of what we need to do to build arm and x86 images when we buiold images
19:36:50 <clarkb> I also noticed we have replication things failing for the local /opt/git thing there
19:37:01 <mordred> clarkb: ok - we need to investigate that - it should be fixed
19:37:18 <mordred> can we swing back to that after the meeting?
19:37:55 <clarkb> mordred: yup
19:38:02 <corvus> weird, a quick check shows that checks out for me
19:38:19 <clarkb> corvus: you can cloen it but you end up on an older commit aiui
19:39:04 <corvus> clarkb: i'm saying i don't see an older commit
19:39:25 <mordred> oh - I know what it is
19:39:27 <corvus> but anyway, mordred requested we defer this
19:39:55 <mordred> well - I looked anyway
19:40:09 <mordred> the issue is a few missing repos that we created while the mount wasn't properly in place
19:40:19 <mordred> so we didn't actually create them on the real filesystem
19:40:22 <corvus> (my local test case is opendev/system-config, because that's the ancient checkout i noticed the problem with)
19:40:33 <mordred> starlingx/kernel would be one I believe
19:41:32 <mordred> actually - it's owned by root
19:41:54 <mordred> anywho - we can fix that
19:42:06 <mordred> we should figure out if we're running manage-projects as the wrong user
19:42:19 <mordred> and thus creating the local replication target repos as the wrong user
19:42:47 <clarkb> k anything else on this subject? as a time check we have 18 minutes left and a few other things to get to (but this was also a huge chunk of change last week so want to make sure we get through it)
19:43:32 <mordred> I'm good
19:43:40 <fungi> all clear here
19:44:47 <clarkb> #topic OpenDev
19:45:01 <clarkb> As mentioned we seem to be picking up some new traffic which is good
19:45:26 <clarkb> Fungi has proposed that openstack-infra become a SIG and the openstack TC is on board with that
19:45:32 <clarkb> I think it makes sense too
19:46:12 <frickler> why not fold it into qa?
19:46:13 <fungi> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014296.html Forming a Testing and Collaboration Tools (TaCT) SIG
19:46:16 <AJaeger> The TaCT SIG - I love the proposal ;)
19:46:28 <clarkb> frickler: I think gmann is concerned they don't have the knowledge to manage some of the tools officially like that
19:46:37 <fungi> frickler: members/leaders of the openstack qa team have expressed concern over suddenly becoming responsible for "more stuff"
19:46:50 <clarkb> frickler: I think long term that may make sense but for now this gives us the ability to keep it a thing that is more tightly scoped and work with qa team as necessary
19:47:02 <fungi> (also i could see qa shutting down as a team and folding into the same sig eventually)
19:47:17 <frickler> oh, maybe that direction might work, too
19:47:28 <frickler> or split of some things like devstack
19:47:31 <frickler> o.k.
19:47:38 <frickler> s/of/off
19:48:52 <fungi> we could stand some volunteers to serve as chairs for that sig, if it's something folks are generally in favor of forming
19:49:21 <clarkb> If you'd like to volunteer to be service coordinate for opendev now is the time to do so
19:49:39 <clarkb> I mentioned I'd be happy to continue but also think new involvement is good as well
19:49:49 <fungi> keep in mind that there is little responsibility as a sig chair, mostly just be aware of what's going on generally within the sig and be able to serve as a representative for it
19:50:08 <clarkb> ya we'll need a sig chair as well but thats less involved I exect
19:50:22 <clarkb> #topic General Topics
19:50:28 <clarkb> It is PTG planning time
19:50:58 <clarkb> we have been given some constraints in an effort to have collaboration happen between projects and keep hours sane for attendees
19:51:04 <fungi> i have a link i was going to paste with sig chair responsibilities, but maybe i'll just follow up to that ml post with it
19:51:12 <frickler> so is the vPTG to run completely on meetpad? or something else like zoom or bj?
19:51:16 <fungi> (snice we're short on time)
19:51:23 <clarkb> the result of that is a giant ethercalc where we need to sign up for time
19:51:33 <fungi> frickler: not determined yet
19:51:35 <clarkb> frickler: I think that is still be sorted out. I'd like to be able to make meetpad an option
19:51:41 <clarkb> so we should keep pushing on it
19:52:03 <clarkb> from that etherpad I've identified 3 two hour blcoks that I think work with our global presence
19:52:04 <fungi> i gather there's communication going out in the next day or two from the event planners to try and nail down requirements for collaboration software
19:52:05 <corvus> i'll see about working out what the deal is with the python version there
19:52:13 <clarkb> corvus: I think mordred pushed a fix for it
19:52:14 <frickler> o.k., but that'd need some big push IMO. I'll try to get some time allocated for that
19:52:18 <corvus> oh awesome
19:52:30 <mordred> corvus: the root cause was fun
19:52:34 <clarkb> Monday 1300-1500 UTC, Monday 2300-0100 UTC, Wednesday 0400-0600 UTC
19:52:40 <mordred> corvus: https://review.opendev.org/#/c/721707/
19:52:49 <clarkb> those are the blocks I think will work so that we can each attend ~2 out of ~3 without too much pain
19:53:31 <clarkb> if there isn't any immediate objection to those blocks I can go ahead and sign us up for them (and tweak later if necessary)
19:54:09 <clarkb> (and yes it will mean an early morning or a late night for many of us if you intend to hit 2 of the 3)
19:54:28 <clarkb> (but it seemed to be an equitable distribution when I wrote out the times in a table )
19:55:12 <frickler> +1 from me
19:55:23 <fungi> yeah, i'll make myself available whenever
19:55:42 <AJaeger> LGTM
19:55:44 <fungi> maybe i can even swing all three with appropriate quantities of caffeine coursing through my veins
19:55:48 <clarkb> cool. I probably won't get to signing up for those times until later today so let me know if there is a major conflict
19:55:57 <clarkb> Next up is the wiki update but I think we can skip it due to time
19:56:00 <clarkb> which takes us to etherpad
19:56:10 <fungi> etherpad is dead, long live etherpad?
19:56:19 <clarkb> as part of mordreds container/ansible/cd work the old etherpad servers are gone including etherpad-dev
19:56:29 <clarkb> we haven't replaced that server and will instead rely on system-config end to end testing
19:56:45 <clarkb> the idea mordred and I had was if we need to verify UI behavior we can hold a test node and use it to manually verify off what zuul built
19:57:02 <clarkb> I expect this will work reasonably well as a tool we can leverage for various services
19:57:10 <mordred> yeah - if we find that sucks - we can always spin up a new etherpad-dev
19:57:18 <clarkb> Wanted to call this out as a separate agenda because if it isn't working well then that feedback would be good to hear
19:57:19 <clarkb> mordred: ++
19:57:46 <clarkb> #topic Open Discussion
19:57:52 <clarkb> alright have a few minutes for any thing else
19:59:26 <clarkb> Sounds like that might be it. Thank you everyone!
19:59:30 <clarkb> #endmeeting