19:01:29 #startmeeting infra 19:01:29 Meeting started Tue Dec 12 19:01:29 2017 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:33 The meeting name has been set to 'infra' 19:01:41 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:02:01 #topic Announcements 19:02:03 o/ 19:02:15 o/ 19:02:26 It is now official dmsimard and frickler are infra root 19:02:32 \o 19:02:48 welcome to the jungle! 19:03:04 yay! try not to delete the backup server! 19:03:11 the assistance is most heartily welcomed 19:03:11 lolo 19:03:17 o/ 19:03:24 at least, not by accident 19:03:53 Other than that it is the time of year where those of us in the northern hemisphere end up sick and everyone is randomly on vacation or otherwise not working 19:04:09 mordred and I apparently have the plague (though I am getting over it finally) 19:04:17 so ya stay healthy 19:04:22 +1 19:04:27 #topic Actions from last meeting 19:04:30 plenty of limes in my grog 19:04:38 #action fungi delete jenkins account -1 verify votes from open changes in gerrit and announce it 19:04:42 #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-12-05-19.01.txt Minutes from last meeting 19:04:42 i haven't done that yet. wanted to find a quiet time due to the number of code review events that's going to generate, and ended up busy with other things over the weekend 19:04:56 fungi: it also hasn't been very quiet when not weekend 19:05:10 now, for example, would not be a great time ;) 19:05:38 #topic Specs approval 19:05:46 #link https://review.openstack.org/#/c/524024/ Top level project hosting 19:06:14 I've now reviewed this spec a few times. I think it is ready to be put up for council voting. I think others have managed to take a look as well 19:06:22 What do we think? ready to push forward? 19:06:33 i think so 19:06:37 as do i 19:06:59 +1 19:07:09 I haven't look at the latest, but think there is momentum around it 19:07:15 plan to look again after meeting 19:07:53 fyi, i just put the mailman multi-site patches up for review 19:07:54 the diff from ps1 is pretty small 19:08:52 there is opportunity for other folks to jump in as well (there are 4 services this addresses) 19:09:02 jeblair: we are aiming to get it done before PTG right? 19:09:09 I'm ok with the proposal, I can add my +1 19:09:37 yes, i would like to use this for zuul before we release 3.0 which i want to do before the ptg 19:09:38 ok lets put it up for council vote then. I'll look over reviews probably thursday afternoon 19:09:47 wfm 19:09:59 I just put up another spec if we want to share it for this topic 19:10:02 so that's driving my enthusiasm for quick implementation :) 19:10:16 jeblair: i suppose goal being that we can have the updated urls/addresses for docs, git repo and mailing list? 19:10:41 fungi: yep, so we're not wasting a marketing opportunity 19:10:44 included in the zuul 3.0.0 release announcement i mean 19:10:52 ya I think you don't want to tell everyone to look somewhere new shortly after a major release 19:10:53 (and in its docs and whatnot) 19:10:58 best to get that all lined up along with release 19:11:24 also, it will help convince folks we're not solely openstack-focused 19:11:34 which is a big line item on the 3.0 release 19:11:55 +1 19:12:05 "zuul: it's not just for openstack any more" 19:12:13 also, since lists.katacontainers.io exists, we can fold that back into the one true listserver after this lands 19:12:14 there is no openstack, only zuul 19:12:14 (not that it particularly was before) 19:12:44 Yah, ptg is coming up fast, so agree we should vote 19:13:15 #agreed https://review.openstack.org/#/c/524024/ up for council vote this week. Please vote by Thursday afternoon Pacific Time 19:13:35 dmsimard: did you intend to rollcall +1 that spec? 19:13:44 can I do that ? 19:13:49 or is it just TC ? 19:13:54 infra council 19:14:05 I mean, I see I have the permissions to do that, but I didn't think I would -- okay, I will. 19:14:10 if you see the option to use it, then yes you can use it ;) 19:14:20 your membership in infraroot is what gives you that and yes you can use it 19:14:30 * dmsimard still learning 19:14:37 dmsimard, frickler: https://docs.openstack.org/infra/system-config/project.html#teams 19:15:12 I didn't even know infrastructure council was a thing, TIL. 19:15:47 it's essentially our group decision making process 19:16:10 hopefully just the right amount of structure without being too much bureaucracy 19:16:19 wfm 19:16:21 are there any other details/concerns about this spec and/or effort that we want to talk about here? 19:16:32 (otherwise I think we should keep moving because there are things to talk about!) 19:16:45 clarkb: eot from me 19:16:57 #topic Priority Efforts 19:17:05 #topic Zuul v3 19:17:26 Couple of things on zuulv3 front off the top of my head, the new dashboard has been deployed. It is shiny 19:17:34 But also we are currently suffering from a memory leak 19:17:58 (probably dont' want to debug the memory leak here, but worth communicating it is a problem we know about) 19:18:20 debugging is happening in #openstack-infra-incident 19:19:11 GitPython got a release 19:19:19 * fungi cheers 19:19:22 Command socket changes are happening as is the finger server work 19:19:35 Seems like a steady amount of progress 19:19:54 yep, chugging along 19:20:46 in yesterday's meeting we also confirmed some initial direction for having a usable but minimal base job in the stdlib 19:20:48 sorry for beeing late 19:21:27 also confirmed with leifmadsen zuul quickstart docs in progress. going to sync up and help with testing them 19:21:36 pabelanger: are there patches yet? 19:22:05 clarkb: just an etherpad, let me get url 19:22:17 https://etherpad.openstack.org/p/zuulv3-quickstart 19:22:28 I was going to see about first pass and moving that in to RST format 19:24:03 ok, one question. As a non user of the red hat distros, would it make more sense to provide a distro agnostic guide? install zk from tarball, run zuul from virtualenv sort of thing? 19:24:11 I guess we don't have to sort that out here and now 19:24:27 clarkb: yah, I think we could 19:24:38 that way lies the trap of the openstack install guides ;) 19:24:55 i don't think the distro part of it is that big 19:25:01 yeah ;( Been there and had the fun ;( 19:25:27 granted, i don't have a good answer for the balance between lots of distro-specific install guides and trying to make distro-less inslall instructions 19:25:38 the zuul/nodepool bits will all be from source 19:25:52 yah 19:25:58 that doesn't leave much distro-specific stuff 19:26:06 the distro part is mostly around if you want to installed packaged version of things 19:26:08 fedora has bwrap by default, xenial needs external PPA atm 19:26:09 like zookeeper, etc 19:26:16 we can have distro specific initial paragraphs - if somebody provides them. Like install package A, B,C 19:26:17 jeblair: ya I think the thing that made it clunky to me is that for eg centos you already have to break out of the distro to zk 19:26:24 so maybe just do that in the base docs 19:26:32 or create bindep.txt file :D 19:27:21 maybe that specific section could be install from tarball if you don't have distro package for it then it applies, but details like that aren't important for now :) 19:28:46 sure 19:29:07 sure, i suppose we don't need the guide to cover "how to install " we can just link to their documentation 19:29:20 fungi: yah, good point 19:29:38 yes - we're targetting knowledgable admins 19:30:16 no doubt the zk docs have ample detail on how to get up and running on a variety of distros anyway 19:30:21 ... who might be more interested in reading puppet/ansible anyway? 19:30:49 who knows, maybe this'll fork out to windmill and/or the nodepool/zuul roles anyway 19:30:51 :D 19:31:19 ya I think the thing I would try to avoid is having so much detail that we end up with 10 ways to zookeeper 19:31:25 (currently we have 2) 19:31:57 any other zuul related topics people want to discuss? 19:32:30 yes 19:32:44 some folks are confused by the change to zuul.projects 19:33:02 and I failed to find an announcement for that, do we have one or is that still pending? 19:33:02 frickler: from a list to a dict? 19:33:06 clarkb: yep 19:33:40 it's already out there, what did i miss? 19:33:48 some jobs are currently failing since it was activated last week 19:33:48 i thought we updated users 19:34:16 there was a case earlier today in -infra 19:34:31 sounds like we thought we updated all the uses of it but may have missed some? 19:34:46 why don't we just use codesearch to find the remaining cases and fix them? 19:34:55 ianw: possible we missed a few cases.. after all, we reverted like twice and during that time span people may have used it 19:34:58 http://logs.openstack.org/59/527059/1/check/ansible-role-k8s-glance-kubernetes-centos/657411d/job-output.txt.gz#_2017-12-12_06_16_17_847993 19:35:18 flaper87 wanted to fix that ^ 19:35:27 yep ... i can take an action item to double check things 19:36:41 hrm, that project doesn't show up in codesearch 19:36:43 ianw: frickler and maybe send a note to the dev list saying its due to a change and we are working to correct occurences of it let us know if you find them? 19:37:07 ok, if we thing there's more than one? 19:37:16 there's likely to continue to be people who have pending changes which reintroduce the old structure, so it will take time for them to catch on and fix their changes too 19:37:19 jeblair: possible codesearch is stuck, last time that happened puppet hadn't ran or wasn't updating the list of projects properly 19:37:20 i'm not seeing that in codesearch, which is why i'm guessing i didn't fix it :/ 19:37:34 pabelanger fixed it iirc 19:37:55 we need to stop /start codesearch to pick up new projects 19:37:58 it is a manual process 19:38:13 I guess step one is restart the service then 19:38:14 pabelanger: that's it? just restart? We could set up a cron to run daily or something .. 19:38:20 then requery and see what we missed 19:38:20 ? that's the first i've heard of that ... that seems like something we should puppet 19:38:23 yes, hound doesn't have a mechanism for adding or removing repos to index on the fly 19:38:36 ianw: it was puppeted in the past but that resulted in the service always being down iirc 19:38:52 dmsimard: right, we've never picked a day for an outage. Takes about 5 / 10 mins to reindex 19:38:57 yeah, because it takes so long to reindex everything when starting 19:39:06 probably more than that these days 19:39:24 lets time it this time around then we'll have info on what might work for automating it 19:39:27 well, also the puppeting was broken and it kept restarting the service whether or not there were new projects to add 19:39:30 (like pick a time for it once a day or something) 19:39:40 I guess that is what dmsimard said 19:40:00 if it just replaced the webpage with "indexing ... give me a minute" i think that would be enough 19:40:04 rather than just disappearing 19:40:43 i will volunteer to 1) reindexing 2) looking into .projects failures 3) proposing some approach to automated reindexing if we like 19:40:52 cool, sounds like a plan 19:40:55 that node is to be upgraded to xenial anyway, isn't it? 19:41:01 frickler: yes 19:41:32 lets go with that plan and the upgrade will happen somewhere in there but one doesn't really block the other 19:41:42 good point, i can try brining it up as xenial first ... if it's going ok, then maybe switching is easy 19:42:04 but if it gets stuck we can use the existing 19:42:31 this sets up the next topic nicely though which we should get into before we run out of time 19:42:34 #topic General topics 19:42:58 We are upgrading control plane servers in a virtual sprint this week 19:43:15 if you go over xenial updates, please check also whether we have dead repos or files in system-config. Let's retire and cleanup. 19:43:25 I think things have gone reasonably well but we've also hit a few speedbumps. 19:43:30 My small cleanup : https://review.openstack.org/#/q/topic:retire-puppet-apps - retire puppet-apps_site 19:43:56 There is a bit of boilerplate necessary to to add digits to server fqdns for example 19:44:04 I'm trying to figure out a generic playbook (which might end up living in a role) to re-install things, especially considering 18.04 is coming soon: https://review.openstack.org/#/c/527301/ I'll work on it some more and let you know when I feel it's ready for review but feel free to leave comments etc 19:44:33 AJaeger: maybe put that under infra-xenial topic if you like, i'm checking that regularly 19:44:37 dmsimard: I'm worried that we'll need service specific upgrade roles/playbooks just to cover all the bases 19:44:48 i'd be wary of automating reinstallation given we have services relying on remote databases, with data on cinder volumes, et cetera 19:44:53 clarkb: yeah, that's why it might end up forking out to different roles 19:44:56 dmsimard: because we run such a diverse set of services there isn't always consistency in ya what fungi said 19:45:06 it is possible we could do this in post pipeline now, and only trigger when we add new projects 19:45:16 ianw: ^ 19:45:17 re-install what? 19:45:19 clarkb: shall I follow ianw 's suggestion to change the topic? 19:45:29 fungi: the goal is mostly to target the "easy" ones, like logstash-workers. 19:45:43 There's a lot of them and it's not very hard to automate. 19:45:56 AJaeger: ya that works for me especially since it is related 19:46:09 I understand there's maybe 50% of "pets" we might not be able to automate but if we can automate the 50% of cattles, it'll save a good amount of time 19:46:22 http://git.openstack.org/cgit/openstack/ansible-role-cloud-launcher also has logic to control server creation, we just never finished off running puppet after server was launched 19:46:39 clarkb, ianw will do 19:46:45 well, also a lot of the work this time around, like clarkb noted, is switching from our old model of replacing servers with more of teh same name to having numbered servers 19:46:57 that won't have to be redone in the future (in theory) 19:47:41 AJaeger: do you know the status of install_tox.sh removal in projects? 19:47:44 i think anything's automatable ... the question for me is if you spend more time automating something than actually using the automation. i do think there's diminishing returns 19:48:29 pabelanger: all the ones that can be done, are done ;) Meaning: Repos that need to install other repos like horizon or neutron are not done yet. We don't have a solution for them so far. 19:48:39 ianw: cue xkcd :D 19:48:45 Given what we've gone through the last couple days or so are we finding this time is valuable (new roots finding it useful)? 19:48:52 dmsimard: https://xkcd.com/1205/ 19:48:58 I've sort of been all over the place playing support so want to make sure that I'm not missing anything important 19:49:05 fungi: oh I had another one in mind 19:49:17 ahh 19:49:27 clarkb: I think so, we've had new people creating servers 19:49:34 fungi: https://xkcd.com/1319/ 19:49:53 dmsimard: yes, also appropriate 19:50:02 pabelanger: https://review.openstack.org/#/q/status:open++topic:rm-tox_install show open ones - and we currently have 84 tox_install.sh files still in tree (including the open ones) 19:50:07 clarkb: I'm learning a bunch so it's been a great experience for me 19:50:14 definitely a good way to ramp up new roots imo 19:50:23 pabelanger: so, those need some more work - mordred wanted to find a solution, help is welcome. 19:50:41 heh, i mean you should see the makefile for my thesis. it's a work of make art ... built on a solid foundation of procrastination 19:50:45 AJaeger: thanks, will look 19:51:31 anything else for general topics ? 19:51:42 ok I think there was some concern the boilerplate stuff was a lot of effort for minimal return and that we might consider a different approach. I am glad we decided to do this early and not wait until next month as I think we are all learning things :) 19:52:16 +1 19:52:48 The last general topic I wanted to bring up was project renames and possibly trying ianw's fix for nova-specs 19:53:16 maybe what I'll do is send out an email and poll for when is a good day to try and do that. Mordred had at least one additional project rename that isn't listed on the meeting agenda yet iirc 19:53:55 sounds good 19:54:11 do we have a procedure worked out now? 19:54:32 ianw: not a complete on I don't think 19:54:44 that means step 0 is write up a doc first I guess 19:54:49 then propose days 19:54:53 ok, we can reuse the old doc @ https://etherpad.openstack.org/p/rename-2017-10-20 19:54:55 it was close 19:55:16 i will be happy to translate that into actual docs after we're done 19:55:21 ok I'll see if I have time to poke at that 19:55:28 i'm fairly sure we're going to hit unexpected issues :) 19:55:32 indeed 19:55:39 #topic Open Discussion 19:55:56 we have ~5 minutes left by my local clock. Anything else? 19:55:58 Better late than never, but I put up the spec for continuous deployment dashboard: https://review.openstack.org/#/c/527500/ 19:56:16 It's a first draft but ready for reviews 19:58:08 Holidays are fast approaching. I will be around next week and can host a meeting then (probably good for recapping sprint and zuul things) but week after I won't be here and then week after that I am not sure 19:58:29 good point 19:58:30 I guess 1st is monday not tuesday so I will probably be here first week of the year 19:58:51 I guess the infrastructure is not solicited as much during the holidays, I guess it's infra-root best effort during that period ? 19:59:04 dmsimard: it's best effort any time 19:59:13 yah, I some some PTO I have to burn before Dec.31, I might be getting an early start in coming days. Will keep people updated 19:59:16 and it definitely gets very quiet around here starting around now 19:59:17 fungi: you're correct :D 19:59:23 but yes, there tend to be fewer people around to fix the fewer things that break this time of year 19:59:52 and teh community has been generally accustomed to that idea 19:59:53 i'll also be away the last 2 weeks 20:00:00 i am out from 12/21 - 01/07 fyi 20:00:16 i won't be away, but i'll be dealing with visiting family so won't be at the computer as much 20:00:45 so ya just be aware that it is quiet and we should probably be a bit more slushy/frozen as people may not be able to fix problems as quickly 20:00:51 but I think that should naturally happen with people being afk 20:00:55 and we are at time 20:01:01 Thank you everyone 20:01:03 #endmeeting