19:01:37 #startmeeting infra 19:01:38 Meeting started Tue May 1 19:01:37 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:39 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:41 The meeting name has been set to 'infra' 19:01:44 o\ 19:01:46 * fungi trickles 19:01:52 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:59 #topic Announcements 19:02:26 The TC election is now complete and we have new members on the TC. Thank you to fungi and the rest of our election officials for running that 19:02:48 those would be diablo_rojo_phon, tonyb and persia 19:02:57 Congrats and thanks to everyone :D 19:03:10 \o/ 19:03:14 yes, went smoothly, and our automation and zuul integration for maintaining the elections is improving with each one 19:03:44 Summit is fast approaching. 19:04:22 I think it will be a fun one for infra related topics, hope to see you there 19:04:51 #topic Actions from last meeting 19:04:57 #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-04-24-19.01.txt minutes from last meeting 19:05:20 We didn't formally record these as actions but cmurphy pabelanger and mordred have all udpated specs for the future of infra config management 19:05:37 we'll talk more about that in a moment, but thank you for getting that together 19:05:47 3 humans enter, 3 humans leave? 19:05:57 :) 19:06:18 I'm going to mash specs approval and priority efforts together since ^ is the bulk of it 19:06:20 cmurphy has a bit of a head start. i've seen tons of amazing puppet cleanup changes merging 19:06:33 #topic Priority Efforts 19:06:43 fungi: cmurphy also has a bit of a head start due to just generally speaking being awesome 19:06:53 :) 19:06:53 before we walk about config things any storyboard related content we want to go over first? 19:06:54 well, sure 19:06:56 fungi: ^ 19:07:13 the barbican deliverables migrated their task tracking from lp to storyboard on friday 19:07:37 there were also some user guide improvements which landed 19:08:15 and teh octavia team provided the sb maintaniers with an excellent list of feedback, some of which is already addressed by changes which landed recently or are in flight 19:08:35 not much else i'm aware of over the past week on that front 19:08:40 cool. Any idea fi the slow queries has been looked at yet? 19:08:51 no, i don't think they have 19:08:58 uhoh. there are slow queries? 19:09:11 if someone wants to try to zero in on that, it seems to be related to queries against story tags 19:09:26 automatic boards/worklists are hard hit by it 19:09:29 mordred: probably a missing index, your favorite thing :) 19:09:44 \o/ 19:09:53 oh, and i don't know if i mentioned last week 19:10:01 but the outreachy intern for sb got approved 19:10:14 cool 19:10:19 fungi: when does that start? 19:10:26 so they'll likely be showing up after their semester finals wrap up, i don't remember exactly when 19:10:32 * fungi checks the sb meeting minutes 19:11:06 fungi, clarkb: well- it seems there are just no relevant indexes on tags at all :) 19:11:25 see, mordred's already solved it! ;) 19:11:46 alright anything else before we discuss the config management specs? 19:12:03 meeting log says "early/mid may" 19:12:06 for the intern starting 19:12:09 cool 19:12:15 woot 19:12:27 #link http://eavesdrop.openstack.org/meetings/storyboard/2018/storyboard.2018-04-25-19.02.log.html#l-35 19:12:33 that's all i have 19:12:40 #topic Modern Config Management 19:12:46 #link https://review.openstack.org/449933 Puppet 4 Infra 19:12:51 #link https://review.openstack.org/469983 Ansible Infra 19:12:59 #link https://review.openstack.org/565550 Containerized Infra 19:13:16 we now have specs for the three ideas that were thrown out in the we need to stop using puppet 3 discussion 19:13:18 PAC Infra! 19:13:52 I have not yet had a chance to read through them all, but I think we should all try and take that as a TODO item to start reading them and discussing those options 19:13:57 ++ 19:14:25 We probably don't need to work out every little detail on all three of them, and instead using the specs as a starting point for making our decision then refine the selected spec(s) from there 19:15:41 So please everyone if you are interested in this go read the specs, and leave your thoughts. Then as needed we can take specific topics that come up to the mailing list and irc/meeting 19:15:56 pabelanger: cmurphy mordred is there anything specific you think epople should look out for? 19:16:43 nothing comes to mind 19:16:48 me either 19:17:04 also no 19:17:18 cool, I'll continue on so that we can have more time for reviewing :) thanks again! 19:17:31 #topic General Topics 19:18:06 clarkb: I do think we should answer the config management question before final vote on ianw's 3rd party ci spec - as I think our answer there might color how we want to deal with that 19:18:15 mordred: noted 19:18:31 maybe I should say that on the review of that spec 19:18:39 mordred: ++ 19:19:02 Really quickly before pabelanger's Gerrit host upgrade topic, I wanted to mention that we (mostly corvus) have updated the way zuul handles configuration data itnernally drastically reducing memory consumption and cpu requirements to load configs. THis is great news for zuul users but we've also noticed a couple bugs in config handling. So if you ntoice any weird job behavior please do bring it up 19:19:15 ++ that spec is still WIP, pending such discussions 19:19:19 +100 19:19:26 All known bugs with that zuul config update are fixed though 19:19:36 yeah, we've found 3 brown-paper-bag fixes so far :) 19:19:49 Yay for less memory 19:20:00 at least one of those was not actually a regression, just something we hadn't realized was wrong yet, right? 19:20:07 fungi: yup 19:20:26 (the extra layouts on dependent configuration change series) 19:20:58 pabelanger: ok I think the floor is yours, Gerrit host upgrade time 19:21:26 indeed! Mostly wanted to remind people of our replacement of gerrit tomorrow 19:21:47 I don't actually expect it to take too long, and directions are at: https://etherpad.openstack.org/p/review01-xenial-upgrade 19:21:47 #link http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all zuul scheduler memory usage 19:22:13 pabelanger: ok infra-root should probably give those a look over today/early tomorrow so we are comfortable with them 19:22:28 we're at line 21.5? 19:22:36 yes 19:22:41 we reviewed those heavily in the context of the -dev server replacement, so hopefully no surprises 19:22:46 * corvus is still not used to the new etherpad line numbering system :) 19:23:02 We still need to reduce dns, I can do that either later today or first thing in morning 19:23:05 #link https://etherpad.openstack.org/p/review01-xenial-upgrade xenial upgrade for review01.o.o 19:23:29 corvus: yeah, i should check whether the fix for the font sizing/spacing has appeared in a new tag yet 19:23:30 pabelanger: looks like ist only an hour ttl so anytime between now and an hour or so before the change is probably fine 19:23:38 yah 19:24:18 only question is around backups, so if somebody can answer that, it would be great 19:24:18 and do we still have enough rooters around for that? I intend on being here for it (but more recently daughter has picked up a doctor visit at 9am ~4 hours before which should be enough time) 19:24:22 should line 23.5 say "ssh into review.o.o"? 19:24:31 (no new etherpad tags since the font fix merged, btw, just rechecked) 19:24:51 * fungi laughs heartily at "line 23.5" 19:24:55 corvus: yes 19:25:10 fixing 19:25:20 pabelanger: re backups you'll need to accept the backup server's host key iirc and if we don't preserve the ssh keypair add the new one to backup server 19:25:49 pabelanger: the way to confirm they are working (at least what I've done in the past) is to just run it in foreground in screen 19:25:53 pabelanger: yeah, backups needs to be manually updated, i can take action item for that 19:25:59 ianw: thanks! 19:26:06 great 19:26:10 pabelanger: the link on line 52.5 is for review-dev 19:26:43 yah, looks like it. let me find correct review 19:30:01 looks like link is updated now 19:30:32 yah, looks like I never pushed up the new patch 19:31:10 pabelanger: ok so we need a new patch before we are ready? 19:31:54 clarkb: no, https://review.openstack.org/565566/ is the correct patch now 19:32:05 I just forgot to push it up 19:32:05 oh gotcha didn't update the etherpad. cool 19:32:17 oh and it was just pushed got it 19:32:54 alright anything else we want to go over re review.o.o host upgrade? 19:33:10 pabelanger: we should remind the release team directly too 19:33:31 yah, i can do that today 19:33:40 scheduled start time is 20:00 utc, right? 19:33:52 fungi: that is what my email client tells me 19:34:05 yeah, seems i already added a reminder for that to my calendar 19:34:24 also, should we apt-get update review01.o.o and reboot too? It's been online for almost a month, maybe a good idea to pick up latest kernel before gerrit start running there? 19:34:35 pabelanger: seems reasonable 19:34:41 yeah, you could update and reboot it now 19:34:48 great idea 19:34:52 k, I'll do that today 19:34:53 pabelanger: ya lets do that 19:35:38 then on the old server side maybe we image it then delete the actual server then delete the image after some period? rather than keeping the large instance around for longer? 19:35:43 that may be a next week task though 19:35:57 wfm 19:36:20 well, the rootfs is only 40g 19:36:24 but sure 19:36:40 that matches how i've archived some of our other systems i've taken down 19:36:46 its a 60gb ram instance though 19:37:04 yeah, but the image size will be determined by the rootfs i think 19:37:14 unless they include the ephemeral disk 19:37:26 (which, yes, is massive at 236g) 19:37:51 ya snapshot will be small iirc 19:38:05 alright anything else? otherwise we can have open discussion and maybe end the meeting early 19:38:06 should we snapshot before we poweroff and detach volumes? 19:38:15 may also make sense to go through and remove any old snapshots we don't need for stuff we took down ages ago 19:38:17 because we poweroff to detach volumes in step 1 19:38:30 fungi: ++ 19:38:37 pabelanger: I don't think you need to 19:38:41 should be able to snapshot the server after it's down 19:38:47 great 19:38:48 ++ 19:39:52 #topic Open Discussion 19:40:06 dmsimard: mentions that 0.15rc1 for ara is a thing 19:40:40 apparently once that release happens we'll want https://review.openstack.org/#/c/558688/ to improve performance of ara as wsgi app 19:41:03 noticed a few issues in our gerrit periodic jobs, so pushed up a patch to clean it up: https://review.openstack.org/564506/ that should also be the last place we are using install-distro-package.sh 19:42:31 #link https://review.openstack.org/564506/ cleanup install-distro-package.sh usage 19:42:58 other then that, fedora-28 is released today. Does anybody have a preference if we create a fedora-latest / fedora-stable nodeset? The idea will be to jobs use that, over fedora-28 label to make it a little easy to change out this node every 6 months 19:43:20 pabelanger: I like that particularly for fedora since there isn't really long term support available 19:43:23 then send warning to ML when we swap out the label in ozj 19:43:38 seems reasonable to me 19:43:41 but I imagine that it changes enough stuff each release that there will be fallout 19:43:51 others are probably in a better position than me to judge the impact 19:44:14 yah, there is always some fallout 19:44:29 but fedora-latest should minimize the amount of zuul.yaml files we need to change 19:44:34 does fedora stop supporting the last release as soon as the new one is available, or is there some overlap? 19:44:57 it's a 9 month cycle isn't it? so three months after new release old release is done? 19:45:03 last 2 releases are supported, so fedora-27 / fedora-28 now. However, we usually try to only support latest in openstack-infra 19:45:11 mordred: one month after release + 2 19:45:15 are we going to continue to have, say, fedora-27 vs fedora-28 nodes but swap them in a nodeset called fedora-latest? 19:45:17 mostly to keep number of images down 19:45:18 mordred: so roughly every 13 months it goes away 19:45:20 clarkb: ah - neat 19:45:27 ubuntu is 9 months on non lts 19:45:39 that's what I was thinking of 19:45:41 fungi: no, I'd remove fedora-27 once fedora-latest nodeset is created and jobs updated 19:46:20 and the nodes and images will just be called fedora-latest? 19:46:32 I think we're doing something similar with suse leap too 19:46:54 fungi: I think the nodes will be fedora-28 but the label will be fedora-latest 19:47:08 only nodes, we'd still have fedora-28 label, which allows us to bring online fedora-29 in 6 months. Then warning we are switching fedora-latest to fedora-29, then delete fedora-28 image 19:47:10 it would probably be nice if we could pre-verify devstack before switching 19:47:12 fungi: that way we can transition once we have working images rather than just updating and hoping it works 19:47:18 what clarkb said 19:47:56 ianw: should be able to depends-on a devstack patch with the update-fedora-latest patch 19:48:45 yah 19:49:06 that's convenient enough, i suppose 19:49:24 yep, just as long as it's a part of the process :) 19:49:28 we can have a brief wip window for that config change and ask people who are concerned about their jobs to test against it via depends-on? 19:49:30 still waiting for AFS mirror to catch up on fedora-28 release, then we can test devstack job 19:50:04 fungi: yah, we can document it and send it out to ML during our cut over warning emails 19:50:39 as long as we've given people some opportunity to notice and fix whatever it breaks for their jobs (however brief) that covers my concerns 19:50:51 maybe also point out common gotchas (if known) 19:50:52 ++ 19:52:01 big one in a future release is dropping of python2 19:52:05 but fedora isn't there yet 19:52:13 (I think they said first of 2019 would get that?) 19:52:28 * mordred looks forward to further ditching of python2 19:52:43 *first release of 2019 would get that 19:52:54 so next spring 19:53:00 * dmsimard is back before the end of the meeting \o/ 19:53:30 to add on top of what clarkb mentioned, https://review.openstack.org/#/c/558688/ will increase not only ara's wsgi performance but also os_loganalyze 19:53:43 * mordred hands dmsimard a box of mostly not unhappy wild boars he found 19:54:00 fedora-30 I think is the goal to delete python2 19:54:08 I found that the logstash workers hammer logs.o.o pretty hard and that since there was no wsgi app pool set, it was mostly pegging some poor threads 19:54:31 oh, fedora-33 now: https://fedoraproject.org/wiki/FinalizingFedoraSwitchtoPython3#Phase_3:_Maybe_get_rid_of_Python_2 19:54:39 dmsimard: poor apache :/ 19:55:17 wow. TIL https://github.com/naftaliharris/tauthon 19:55:42 mordred: spring in which hemisphere? ;) 19:56:49 fungi: fair 19:56:56 alright before we debate the merits of a python2.7 fork I'm calling the meeting :) Thank you everyone! 19:57:06 #endmeeting