19:02:40 #startmeeting infra 19:02:41 Meeting started Tue Nov 22 19:02:40 2016 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:44 The meeting name has been set to 'infra' 19:02:48 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:02:55 #topic Announcements 19:03:32 #info Our "Ocata Cycle" [2048R/0x8B1B03FD54E2AC07] signing key is now in production. Thanks to all who attested to it! 19:04:03 o/ 19:04:07 we should probably also announce the upcoming xenial job change for improved visibility 19:04:56 #info REMINDER: Remaining master and stable/newton jobs will be switched from ubuntu-trusty to ubuntu-xenial nodes on December 6. 19:05:01 #link http://lists.openstack.org/pipermail/openstack-dev/2016-November/106906.html 19:05:37 i won't bother doing an info for the gerrit maintenance since people would only have an hour to see it in the minutes before we take gerrit down 19:05:45 as always, feel free to hit me up with announcements you want included in future meetings 19:05:55 #topic Actions from last meeting 19:06:11 here is where i publicly shame myself again 19:06:19 #action fungi send summit session summary to infra ml 19:06:24 it's still being edited :/ 19:06:33 pabelanger un-wip "Force gate-{name}-pep8-{node} to build needed wheels" change 19:06:36 fungi: you didn't say which summit ;) 19:06:37 that happened, right? 19:06:45 jeblair: touché! 19:07:08 it happened, yes 19:07:53 there we are 19:07:58 #link http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/python-jobs.yaml#n106 19:08:26 so all standard gate-{name}-pep8 job runs now happen without our custom pre-built wheel mirrors 19:08:34 thanks pabelanger! 19:08:46 clarkb pabelanger work on replacing translate-dev.openstack.org with xenial server running newer zanata release 19:08:54 i saw some of that going on 19:09:05 thats in progress https://review.openstack.org/#/c/399789/ is next 19:09:05 the change to allow translate-dev01.o.o just merged righth? 19:09:12 hoping to get that in then boot a replacement today 19:09:18 #link https://review.openstack.org/399789 19:09:24 thanks for working on that 19:09:43 #link https://etherpad.openstack.org/p/zanata-dev-xenial-update 19:09:47 is the rough plan outline 19:10:00 oh, right, pabelanger is busy with openstackday montreal which explains why he's so quiet ;) 19:10:12 oh right. canada 19:10:21 * clarkb wonders if those events are strategically timed to keep muricans out 19:10:31 apparnetly there si an openstack day paris this week too 19:10:40 i mean, i would do it that way if i were them 19:10:53 clarkb: yeah, that is today too 19:10:55 fungi: ++ 19:11:13 ianw send maintenance announcement for project rename 19:11:15 #link http://lists.openstack.org/pipermail/openstack-dev/2016-November/107379.html 19:11:26 we'll be working on that right after the tc meeting today 19:11:50 clarkb is picking up the pieces from my incomplete patch to fix the renames playbook 19:12:07 I am about to push a new patchset 19:12:16 so here's hoping this time around sees fewer hangups than i ran into last time 19:12:17 would be great if people that grok ansible better than I could review it after the meeting 19:12:23 thanks ianw and clarkb for working on this 19:12:36 not sure i do, but will take a look... 19:12:59 and pushed 19:13:00 and yeah, i grok so little ansible that i had a jolly time testing that playbook on the last rename 19:13:13 #link https://review.openstack.org/365067 19:13:27 mordred: ^ hint hint :P 19:13:30 okay, i think that's it for our action items from last week 19:13:47 if only rcarillocruz wasn't watching spongebob 19:14:02 you mean if only you _were_ watching spongebob, right? 19:14:20 #topic Specs approval 19:14:31 we have a couple of these up 19:14:51 #topic Specs approval: PROPOSED [priority] Gerrit 2.13 Upgrade (zaro) 19:15:14 #link https://review.openstack.org/388190 Spec for upgrading gerrit from 2.11.x to 2.13.x 19:15:15 anybody interested in this? 19:15:41 absolutely 19:15:59 i think i linked the wrong second change to make it a priority effort, just a sec 19:16:16 ahh, yep, here 19:16:31 #link https://review.openstack.org/388202 add gerrit-upgrade to priority efforts query 19:17:23 looks like AJaeger and jhesketh have already given the spec a once-over 19:18:01 * AJaeger just commented again. I wonder what kind of cool features we get... 19:18:12 So far I'm neutral ;) 19:18:18 many, many.. :) 19:18:27 I will be happy if the GC bug goes away 19:18:36 s/GC/memory leak that leads to GC/ 19:18:38 not falling behind on gerrit releases is the biggest "feature" i'm interested in, honestly 19:18:40 for that one alone I would update ;) 19:18:43 but i don't remember them all off the top of my head 19:18:59 but yes, the supposed fix for our gc woes is very high on my interest list too 19:19:00 i think topic submission was in 2.12 19:19:23 zaro: If you have one or two really noteworthy or the bug fix, feel free to add it. Or link to the announcements... 19:19:27 I know its not in this release but I saw a thread on email ingestion on the gerrit mailing list 19:19:35 jeblair: ^ 19:19:43 yeah, that pretty cool! 19:19:54 i guess one procedural question for those here... should i still be putting specs up for council vote until 19:00 utc thursday, or with it being a national holiday in lots of countries should i give it until sometime next week? Maybe up until 15:00 utc tuesday? 19:20:09 AJaeger: i have a few up already, topic is gerrit-upgrade 19:20:30 #link https://review.openstack.org/#/q/status:open+topic:gerrit-upgrade 19:20:42 fungi: ++ to delaying this week 19:20:44 clarkb: oh neat, i'll look that up 19:20:59 I personally have a ton of stuff to do for infra and at home its, short week doesn't mean less work just less time to get it done :) 19:21:01 i'm leaning toward having specs approved early-ish in the day tuesday to give people time to review but still an opportunity for me to have them merge before the next meeting 19:21:20 fungi: wfm 19:21:41 i've already done some testing with 2.13 and it's looking good so far 19:21:46 zaro: great! 19:22:08 #info The "Gerrit 2.13 Upgrade" spec is open for Infra Council vote until 15:00 UTC Tuesday, November 29. 19:22:41 #info The "add gerrit-upgrade to priority efforts query" change is open for Infra Council vote until 15:00 UTC Tuesday, November 29. 19:22:43 jeblair: you might also want to take a look at Robo comments feature that google guys are working on 19:23:10 zaro: is there a link where one could read about that? 19:25:13 mordred: #link https://gerrit.googlesource.com/summit/2016/+/HEAD/index.md 19:25:20 "robot comments" 19:25:29 asimov would be proud 19:26:39 so it sounds like the "toggle ci" effect we're doing? 19:27:08 jeblair: batch plugin might be good fit with zuul 19:27:10 oh, but also the generated fixes idea is interesting 19:27:41 #link https://gerrit.googlesource.com/plugins/batch/ 19:28:31 fungi: toggle ci is sorta independent and it's already in. but will not be availbe until 2.14 19:28:37 oh, cool 19:28:52 okay, anything else we need to cover on this in the meeting instead of in review? 19:29:35 i would like people to review changes to get all things install on zuul-dev.o.o like schedular, launcher, nodepoole etc.. 19:30:02 that is to get a dev env to test gerrit-zuul-nodepool integration 19:30:24 zaro: do they all use a common topic? I am guessing there is a topic in the spec 19:30:31 i think i'll need help getting those working 19:30:43 clarkb: topic is gerrit-upgrade 19:30:43 #info In addition to the specs changes, please review open Gerrit/Puppet changes under topic:gerrit-upgrade 19:30:50 thanks 19:31:16 thanks, zaro! 19:31:19 ok. i think that's it for me 19:31:26 #topic Specs approval: PROPOSED Zuul v3: Add section on secrets (jeblair) 19:31:33 #link https://review.openstack.org/386281 Zuul v3: Add section on secrets 19:31:39 sick kid at home so gotta run for now.. 19:33:06 i think this is fairly well reviewed, so it's ready for formal voting 19:33:14 i guess there's not too much to say on this spec... it's been out there for a while, so... yeah 19:33:27 jeblair: also - I swear those two looked the same to me 19:33:31 it doesn't really go in a different direction or anything, it's just very detailed and complicated and worth close examination 19:33:36 we talked through much of it in the barcelona infra devroom on friday 19:33:41 mordred: yeah, base64 is fun that way :) 19:34:01 4 characters are different 19:34:25 i guess the rest is base64 padding, etc 19:34:28 #info The "Zuul v3: Add section on secrets" spec addition is open for Infra Council vote until 15:00 UTC Tuesday, November 29. 19:35:07 i'm eager to see what others think about it 19:35:38 but yeah, we already have 4 rollcall votes on it 19:35:47 anything else, jeblair? 19:36:23 fungi: nope! 19:36:26 #topic Priority Efforts: Nodepool: Use Zookeeper for Workers (jeblair) 19:36:35 something something production rollout something 19:36:50 yeah! 19:36:57 i take it we're close then 19:37:01 we're ready to start running the new zookeeper based builder in parallel 19:37:08 basically, as soon as we can get the puppet changes for it 19:37:20 i would have said this week 19:37:36 pesky feast-oriented holidays 19:37:38 but apparently everyone is conferencing or spongebobing or something 19:37:39 nodepoold won't make use of those images until we do an explicit change of the code/config there, but this allows us to see how the builder does in the real world before using it 19:37:52 so that might be pushing it. but i'm still going to work on it :) 19:37:57 clarkb: that's correct 19:38:12 we can run this as long as we want without interfering with the prod setup 19:38:34 ++ 19:38:37 and then, when we're ready, we can switch nodepoold to use it 19:38:38 jeblair: one thing that occurred to me with that 19:38:39 do we have puppetry in place to deploy from the feature branch, or is this stuff that's all merged down to master? 19:38:50 fungi: we'll deploy the builder from the feature branch 19:38:55 okay, cool 19:38:59 is maybe we can put apache in front of the image files too and see how it does (since we aren't using it for prod prod yet) 19:39:02 (there's already revision arguments in puppet-nodepool) 19:39:12 perfect 19:39:20 since a common request is for people to get our images and that allows us to fairly easily check if that has an impact on the service without impacting the service 19:39:44 we'll be deploying this and running it on a new host 19:39:48 nb01.o.o 19:40:02 amusing abbrev. 19:40:09 o/ 19:40:20 ohai! is't a pabelanger 19:40:22 and i intend to keep running the separate builder on that after it's in prod 19:40:31 clarkb: and i think that will work out well with your idea of serving images 19:40:53 we can use a longer name if folks want; pabelanger seemed to prefer the shorter one :) 19:41:11 yay 19:41:14 we have zm and zl, so whatev. 19:41:15 "nib" would also be a good option ;) 19:41:49 too bad no one is here to advocate for that one. :( 19:41:54 i guess we'll go with nb 19:42:04 it's one of my favorite black sabbath songs 19:42:09 if that counts 19:42:18 fungi: for something, yes 19:42:57 given n and d are adjacent on a qwerty layout, i can see some finger memory confusion between nib and dib 19:43:11 anyway, unless there are any concerns... i guess that's about it for this? 19:43:15 oh, wait, they're not ;) 19:43:33 i guess builder01.nodepool.openstack.org would be a pita 19:43:44 jeblair: oh! yes one last thing, I would recommend an ssd volume 19:43:56 clarkb: oh, interesting... 19:44:07 I think I went with not ssd when I expanded the volume and I think that may be related to some of the slowness in building 19:44:13 i wouldn't mind creative use of subdomains (we already do it for infra-cloud), but i'm not here to bikeshed on naming things 19:44:22 +1 for ssd 19:44:24 jeblair: dib is very io bound so ya 19:45:03 right, we had it on sata, then switched to ssd for a while but didn't see much improvement at the time, then switched back to sata and since then have added a lot more elements/images 19:45:14 so worth trying again 19:46:07 hrm, i'm actually a little surprised by that 19:46:53 looks like we have 1T of space for that 19:47:09 what's the max ssd volume size we can set up? 19:47:09 so we're saying we want 1T of ssd space? 19:47:29 jeblair: yes I think so (we have been getting closer to that 1TB too) 19:47:39 maybe time to revisit adding a control plane server in vexhost (they claim high-performance ceph backend) 19:48:11 a big issue is we end up with at least 3 copies of a 20GB image during building in order to do conversions and such 19:48:17 cacti says we push 15MBps and 3000iops 19:48:19 so we need a fair bit of breathing room just to do a build 19:48:22 we did say we should consider them first for i/o-bound uses 19:48:28 the 3k iops is a shorter peak 19:48:42 fungi: Ya, would be interesting to test 19:51:23 do we need to cover anything else on the push to production for this? 19:51:44 i'd like to make sure we really want the 1tb ssd volume 19:51:53 yep, i'm with you on that 19:52:19 i'm not even sure rackspace will let us make one, so we'd need to spread an lv over several if not 19:52:31 I think not ssd will continue to work 19:52:32 what kind of time difference are we talking about here? 19:52:48 jeblair: currently builds run from 1100UTC to ~1700 UTC iirc 19:53:11 but if the question is what kind of speedup should we expect, i don't think we know 19:53:15 not even ballpark 19:53:15 maybe we should consider doing a benchmark between the 2 options? 19:53:19 I expect we can trim an hour or so off with an ssd (but I haven't tested that on rax just based on my local io performance) 19:54:33 playing with rackspace, looks like the sata volume size range is 75-1024gb and ssd is 50-1024gb so i guess it's possible without splitting 19:55:04 i think they used to limit ssd to 300 or something, but have apparently upped it if so 19:55:16 yeah, it's about 4x as expensive, so i hesitate to ask unless we need it 19:55:41 we can also have more than one builder 19:55:53 short-term benchmarking seems like a reasonable compromise if someone has the time and interest to do it 19:55:56 (that's less than 2x as expensive :) 19:56:06 also, yes, that 19:56:07 yay for conversions 19:56:10 gah 19:56:16 (more than one builder means more than one uploader too) 19:56:23 2 builders would be interested to test too 19:56:51 more uploaders is awesome aswell 19:56:53 although it also means less re-use of cache - but if it's taking us 6 hours to build an image, I'm guessing somehting is going poorly with caching anyway 19:57:00 we're about out of time. i'm going to move to open discussion for the last few minutes but we can keep talking about this too 19:57:04 mordred: 6 hours to build all the images 19:57:08 jeblair: yah 19:57:12 sorry - that's what I meant 19:57:14 #topic Open discussion 19:57:48 i think i'd like to start with sata, and switch if someone does a benchmark that suggests we'd get a noticable improvement 19:57:49 so anyway, let's plan to deploy on sata and someone who wants to can test ssd for comparison 19:57:54 yeah, that 19:58:00 (or possibly just add an 02 if we feel like it) 19:58:07 jeblair: wfm 19:58:17 pabelanger: can we work on gem mirror anytime soon? https://review.openstack.org/#/c/253616/ 19:58:18 fungi: you convinced me 19:58:21 mordred: the problem is mostly how much data we write 19:58:27 it's reasonably trivial to switch back and forth between sata and ssd too, we just lose the warm cache unless we bother to rsync them 19:58:30 mordred: so cache works fine but then we spend 20 minutes doing an image copy 19:58:33 then we make several images copies 19:58:35 adds up 19:58:45 pabelanger: it's not high prio, so we can postpone it again 20:00:01 pabelanger: I tried to build a local gem mirror at home, it took me 500 GB to download (and my ISP to block my account in the meantime) - we might need some space 20:00:08 we're out of time--thanks everyone! 20:00:12 #endmeeting