19:00:09 #startmeeting infra 19:00:10 Meeting started Tue Sep 19 19:00:09 2017 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:13 The meeting name has been set to 'infra' 19:00:24 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:00:33 #topic Announcements 19:00:50 really quickly going to remind everyone to sign the queens release key. I Haven't done it yet and feel very guilty over this 19:01:14 o/ 19:01:20 Also if you haven't noticed we just upgraded gerrit and while a bumpy start I think we are starting to get a handle on things? more on that later 19:01:32 #topic Actions from last meeting 19:01:49 We had a couple but I think we agreed to put them on the back burner for now. Particularly the infracloud one (also more on that later :) ) 19:01:52 o/ 19:02:08 #topic Specs approval 19:02:29 I havne't really had a chance to look at these since the last meeting. I don't think there is anything outstanding right now. Please do ping me if there is something urgent 19:02:38 o/ 19:02:56 and with that we can move on to the portion of the meeting that everyone is waiting for 19:02:58 #topic Priority Efforts 19:03:05 #topic Zuul v3 19:03:22 Lots of things related to Zuulv3 happened last week at hte PTG 19:03:36 lots of on zuulv3 things happened too which we'll get to after the priority specs 19:03:44 lots 19:04:00 my memory is that we worked on jobs + migration script a lot 19:04:03 ;) 19:04:04 I know the zuul meeting was cancelled yesterday in favor of getting gerrit upgraded so now is the chance ot catch up on all that goodness 19:04:15 lots of open changes in the queue to review... 19:04:18 mordred is continuing to work on migration script 19:04:28 mordred: i think it emits actual job output now? 19:04:44 mordred: do you think we can, say, run some nova check jobs tomorrow or something? 19:05:24 andreaf wrote a basic native v3 tempest job that builds on the basic native v3 devstack job 19:05:30 i haven't reviewed that yet, but hope to soon 19:05:44 that's in devstack-gate repo (for now; we'll move it to tempest soon) 19:05:49 is that the first job addition from outside of zuul development? 19:06:26 clarkb: i don't want to figure out who's "inside" or "outside" enough to answer that question :) 19:06:48 but having andreaf self-bootstrap into writing a v3 job is awesome :) 19:06:54 ++ 19:06:59 * fungi cheers 19:07:14 Yeah! 19:07:29 tristanC showed us his in-progress dashboard work which will be important once we finish the cutover 19:07:38 (hopefully we're not far away from having a job history dashboard) 19:07:58 we have some ideas of how it and openstack-health can be complementary 19:08:22 tobias has patches in progress to nodepool to improve quota support 19:08:35 that will allow us to use smaller flavors for, say, pep8 jobs 19:08:35 jeblair: yes - there is a giant stack of patches I'll bug people about tomorrow when this migration is done 19:08:41 (and therefore increase our apparent capacity) 19:08:49 mordred: awesome! 19:09:12 for us to make use of that we may need some quota changes (as we have instance quotas that line up with ram use in most cases) 19:09:26 clarkb: yeah, but it'll be a good problem to have :) 19:09:30 indeed 19:09:41 I have a series of patches that should be mostly ready to land as far as multi node in v3 in concerned 19:09:58 https://review.openstack.org/#/q/topic:zuulv3-multinode 19:10:17 we also did a roadmap exercise at the end of the week. we were exhausted, so it's only in rough etherpad form at this point. some time this week or next, i'll write it up for further discussion, so we can all get on the same page for what we do between the openstack cutover and v3 release, and what comes after. 19:10:18 zuul-jobs has the role bits while the jobs/integration testing is in openstack-zuul-jobs 19:10:27 dmsimard: great! 19:10:59 pabelanger and I discussed that it's worth thinking about when and where we'll want to trigger the base-integration and multinode-integration jobs (filtering on files?) 19:11:30 i need to track down problems with logstash job submission. that's the only transition blocker (aside from migration script) that i'm aware of 19:11:42 stack of patches related to migration script from me ends with: https://review.openstack.org/#/c/504968 19:11:57 #link https://review.openstack.org/#/c/504968 Zuulv3 job migration scripting 19:12:09 #link https://review.openstack.org/#/q/topic:zuulv3-multinode Native zuulv3 multinode jobs 19:12:33 Once the zuulv3-multinode stack has landed, I'll try and see how hard it would be to bring up a native devstack multinode 19:12:47 #link https://etherpad.openstack.org/p/zuulv3-roadmap rough draft post-cutover roadmap 19:12:56 hrm - I'm going to re-topic migration patches to zuulv3-migration 19:13:31 rather than just zuulv3? 19:13:38 #link https://review.openstack.org/#/q/topic:zuulv3-migration Also for migration scripting 19:13:58 oh, right, they're on the feature/zuulv3 branch already 19:14:06 so show up in queries that way 19:14:34 it's probably trivial to filter on topic:^zuulv3.* or something regardless 19:14:52 because those patches are not necessarily on the zuul repo with feature/v3 branch 19:15:00 i.e, project config, zuul jobs, etc 19:15:14 i guess i should update my query 19:15:28 if folks could use 'zuulv3' as the topic for any patch not on a feature/zuulv3 branch, that would be great 19:15:48 on feature/zuulv3, a topic *other than* zuulv3 is helpful. :) 19:16:03 #action everyone use 'zuulv3' as the topic for any patch not on a feature/zuulv3 branch, that would be great. on feature/zuulv3, a topic *other than* zuulv3 is helpful. 19:16:40 hmm, at first glance it doesn't seem like it's possible to put a regex for the topic field search :/ 19:16:40 (so mordred's zuulv3-migration here is great) 19:17:04 jeblair: WELL - except that zuulv3-migration has a bunch of patches on project-config too - it's a tricky little devil 19:17:37 oh, gerrit has an 'intopic' search parameter so https://review.openstack.org/#/q/intopic:zuulv3 works 19:17:53 mordred: those are the most important to have 'zuulv3' as the topic 19:18:19 #link https://review.openstack.org/#/q/intopic:zuulv3 for everything zuulv3 related 19:18:42 jeblair: mordred would probably be good to have a summary of the currnet plan going forward as well 19:18:48 (once through general status updates) 19:19:31 mordred: did you send out that email? 19:19:40 he did 19:19:42 he did :) 19:20:28 that probably covers the current plan afaik 19:20:35 k. I've updated the zuulv3-migration topic on project-config patches to just be zuulv3 19:20:36 ok let me dig up a link for it 19:21:05 well - that plan is "planning on doing a rollout on Monday" - we probably want to have a slightly more detailed infra-team plan 19:21:10 #link http://lists.openstack.org/pipermail/openstack-dev/2017-September/122241.html Zuulv3 rollout plans email 19:23:19 anything else we want to go over on Zuulv3 related items? 19:23:39 well - lemme say one more quick thing about migration script 19:24:07 once the patches from me in zuul, project-config, openstack-zuul-jobs are landed ... 19:24:14 to my knowledge the migration script is good to go 19:24:37 mordred: I'd like to test drive some sanity checks against what I feel is most vulnerable -- deployment projects (openstack-ansible, puppet-openstack, kolla, tripleo, etc) 19:24:40 so it's important to start combing through the generated content for bugs 19:25:07 Can I go ahead and "whitelist" those projects in main.yaml and submit what would be their "migrated" jobs as zuul.yaml in their repos ? 19:25:24 i second mordred's suggestion. there WILL be bugs. some of that script is... complicated 19:25:36 dmsimard: let's not submit any auto-generated content to project repos 19:25:47 dmsimard: I'd prefer we didn't, as we won't have any way of knowing which things we've submitted to project repos 19:25:53 gah. jeblair said that with less words 19:26:05 dmsimard: let's put that in project-config and/or openstack-zuul-jobs 19:26:07 jeblair, mordred: I'm not planning on merging anything, it will be -W 19:26:26 if we want test drives to be in project-config and openstack-zuul-jobs, that requires merging and reverting things 19:27:05 dmsimard: i'm fine with that. 19:27:18 ok, sure. 19:27:27 i'd rather do that than have people get the mistaken idea that we might want to merge the auto-generated stuff in their repos 19:27:36 mordred: perhaps we should outline what to expect for migration script output? like expanded templates, job name changes, etc. might help others review 19:27:39 this is a PR issue, not a technical one. :) 19:27:43 Shrews: ++ 19:27:56 Shrews: I think that would be helpful 19:28:07 mordred: did that zuul-sphinx zuul.d fix land so we could at least put the tests in a separate file ? 19:28:27 also - it's worth pointing out that several of the migration script 'fixes' I've made over the weekend were actually fixes or changes to old jobs or just making a new v3 version 19:28:27 mordred: do you think we need to add anything to the infra-manual migration doc? 19:28:40 dmsimard: https://review.openstack.org/#/c/504797/ did not merge yet 19:28:53 oh - right ... jeblair ^^ 19:29:02 AJaeger: thanks, just added zuulv3 topic 19:29:02 jeblair: zuul-sphinx no support zuul.d 19:29:20 ++ 19:30:03 so I think a doc (and/or update to infra-manual) about expected outcome from migration script - as well as a few notes about where these things expect to go ... 19:30:29 like, migrated project-pipeline stuff goes into project-config zuul.yaml but jobs and project-templates go into openstack-zuul-jobs 19:30:36 and playbooks 19:31:08 that sounds helpful 19:31:22 ("i heard you migrated stuff, where the cloud is it?") 19:31:34 but - just spot-checking things randomly often finds things with the eyeballs 19:31:41 for instance, I now notice this: 19:31:43 http://logs.openstack.org/79/505379/1/check/zuul-migrate/06587d2/playbooks/legacy/manila-tempest-minimal-dsvm-lvm-centos-7/run.yaml 19:31:51 echo "Detailed logs: http://logs.openstack.org/$LOG_PATH/" 19:32:01 should likely do something about that :) 19:32:36 who wants to write the infra-manual changes? 19:33:04 I can take that on 19:33:10 unless someone else wants to :) 19:34:04 lets say mordred tag you are it and if anyone else finds time or wants to help they can ping mordred? 19:34:14 mordred: i'll email you a section on "matchers" that you can include 19:34:31 #action mordred update infra-manual documentation on what to do with migrated zuulv3 jobs 19:34:36 That look right? 19:36:33 ready to move on to the other priority spec relevant to yesterday and today? 19:36:41 + 19:36:46 indeed! 19:37:05 #topic Gerrit 2.13 Upgrade 19:37:16 This mostly happened yesterday 19:37:34 amazingly we fit it into the allocated outage time even though database migrations took 5 hours 19:37:42 Thank you everyone for helping make that happen 19:37:46 that was nuts 19:38:09 Unfortunately we've turned over some new and exciting behavior around stream events and gerrit email and memory usage and so on that we are trying to address 19:38:19 and special thanks to zaro for doing work a year ago to cause us to finish within our window :) 19:38:23 ++ 19:38:38 Current issues are being tracked at https://etherpad.openstack.org/p/gerrit-2.13-issues 19:38:42 #link https://etherpad.openstack.org/p/gerrit-2.13-issues 19:38:58 For the most part I think we have a handle on the problems and a good chunk of fixes or attempts at fixes 19:39:00 absolutely. i don't know for sure that the reindex ordering optimization sped it up, but i'm happy to believe that was it 19:39:56 My biggest concerns right now are the memory use and email slowness. I think the memory use is actually likely to be ok based on utilization today. Jeblairs change to bump max memory seemed to get us to a happy place and gerrit isn't leaking memory all over the floor (it decreases even) 19:40:30 though the weirdness around edits via api/webui are troubling 19:40:36 That all said does anyone think we need to be considering a rollback? For me I think that was on the table before the memory stuff got sorted but am feeling much more comfortable with moving forward and fixing things 19:40:46 now 19:41:01 i'm still good with rolling forward at this point 19:41:08 yeah, the problems yesterday may have been caused by some anomolous event (there was a 500mbps outbound spike when it got bad). or we may have needed more jvm ram. it's not clear. 19:41:46 yeah, if we've managed to get this far without a memory catastrophe, i think forward is the way to go 19:42:09 agree ... it's not out of bounds from what we saw with previous release that went mad occasionally too 19:42:53 ok good, I didn't want to put a bunch of effort into fixing problems if we were feeling like a revert is warranted. Glad to know no one is ready to go down that route yet :) 19:43:18 After the meeting we will be applying fixes for a bunch of the problems on that list, hopefully we see improvements. 19:43:30 Once again thank you everyone for helping 19:44:20 Probably the other big item related to this is that we are not merging new project creation changes 19:44:39 want to get everything working and happy before doing that as we have had some puppet related issues around project creation and nodepool image builds 19:44:44 ++ 19:45:44 anything else I've missed here? I guess see everyone after the meeting to do the great service restart of Tuesday September 19 19:46:33 #topic PTG recap 19:46:39 #topic Infracloud 19:47:02 We talked about things that weren't zuulv3 at the PTG too. I think one of the more important items was the future of infracloud 19:47:45 I've summarized the plans around that at http://lists.openstack.org/pipermail/openstack-infra/2017-September/005585.html if you have an interest in infracloud and haven't read that email please do. Feedback is very welcome 19:48:24 TL;DR is that it probably isn't viable to move existing hardware due to cost and logistcal problems (we need rails..). As a result we likely don't want to put effort into upgrading the current cloud regions and instead let them die on the vine 19:48:44 #topic PTG recap 19:49:41 Other items that came up were the future of puppet testing. I think we've got a plan where we can replace beaker jobs with something a bit more test suite agnostic and even config management tool independent. This fits into zuulv3 and potentially allows for individuals to write ansible replacements of puppet things and have them be tested in similar manners 19:50:01 We also talked about replacing puppetboard with ara. I think dmsimard said there will be a spec up for that 19:50:03 ooh that would be nice 19:50:07 that would be nice too 19:50:15 (my oohs and aahs are lagging) 19:50:18 :) 19:50:32 The other big item on my list was talkin about monitoring of control plane srevices 19:50:57 There was general agreement this would be ok as long as any alerting was purely opt in by roots and if someone or group would be responsible for tuning things 19:51:15 I expect that that will be an interesting spec with probably lots of alternatives weighing the pros and cons of a variety of monitoring tools out there 19:51:21 dmsimard: ^ you still up to writing those? 19:52:01 clarkb: will you be issuing pagers :) 19:52:08 (I know everyone is currently busy with zuul and gerrit things so no rush, but expect tosee that in infra-specs at some point if interested or maybe you can even help write the specs) 19:52:18 ianw: please no :) and in fact that was basically that jeblair said. No pager duty 19:52:34 we can collect the info and use it but we aren't oncall and shouldn't have our sleep interrupted 19:53:20 Ok I think I managed to get through the entire agenda I had. Sorry if it felt rushed. 19:53:25 #topic Open Discussion 19:53:39 anything else for the last ~5 minutes or so of the alotted time? 19:53:42 monitoring and metrics can help inform future development and such, but shouldn't be waking volunteers up :D 19:53:45 good work on the agenda! 19:54:09 * AJaeger put one item on the agenda: 19:54:22 There's a discussion about <= mitaka EOL branches at http://lists.openstack.org/pipermail/openstack-dev/2017-August/121432.html 19:54:27 oh the mitaka eol. Sorry that didn't make it to my local text file 19:54:35 and mordred has a change up at https://review.openstack.org/#/c/504964 19:54:57 Do we want to remove all the regexes etc like in 504964 - or first retire all <= mitaka branches? 19:55:02 there was also something on the agenda about dropping cached git clones (or not) but i think that was from the meeting prior to the ptg? 19:55:06 or are those not related? 19:55:24 fungi: ya I think that is leftovers from last meeting when infra cohosted zuul 19:55:40 AJaeger: I think we can likely do them independently especially if plan is to remove those branches anyways 19:55:53 AJaeger: but we can sync up with tonyb later today to make sure that works for him 19:56:37 clarkb: yeah, syncing with tonyb sounds best next step... 19:57:01 AJaeger / tonyb: i can also help with branch removal in our tz if needs be ... i figured it out :) 19:57:01 AJaeger: I know your timezones don't overlap much. I can try pinging him once not swamped with gerrit related items 19:57:06 oh ++ to ianw 19:57:24 on a related note, we're slowly getting feedback on projects okay with us deleting their date-based releases from pypi 19:57:44 i did a batch of them right before the ptg where the release team was able to make the call 19:57:48 ianw: will you take care of it and discuss with tonyb ? Would be best IMHO 19:58:01 but now the ones which aren't under release management are slowly trickling in 19:58:06 AJaeger, ianw: we could also just remove those right before the migration 19:58:20 clarkb: want to give ianw an #action? 19:58:20 the main thing is tht they cause a bunch of projects to not actually use the project-template in v3 19:58:29 as long as ianw is ok with it 19:58:39 mordred: I don't care about timing, just that it gets done if needed 19:58:43 yes, ok 19:58:45 #action ianw Work with tonyb to coordinate old stable branch removal from projects 19:58:47 and in v3 we can, should we want, add a branch exclusion to the project-tempalte itself 19:58:54 AJaeger: cool 19:59:13 and we are out of time 19:59:16 Thanks everyone! 19:59:18 #endmeeting