19:01:05 #startmeeting infra 19:01:06 Meeting started Tue Nov 24 19:01:05 2015 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:10 o/ 19:01:11 The meeting name has been set to 'infra' 19:01:13 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:15 o/ 19:01:20 #info Many thanks to pleia2 for chairing last week! 19:01:25 #topic Announcements 19:01:25 * mordred mach sich doch gern 19:01:31 s/mach/macht/ 19:01:33 #info The gerrit 2.11 upgrade scheduled for last Wednesday (November 18) was indefinitely postponed pending completion of a thorough rollback plan and one pending OpenID redirect URL fix. 19:01:38 #link http://lists.openstack.org/pipermail/openstack-dev/2015-November/079769.html 19:01:41 o/ 19:01:43 #link https://review.openstack.org/245598 19:01:49 #link https://code.google.com/p/gerrit/issues/detail?id=3365 19:01:59 o/ 19:02:04 oh good, someone told fungi we didn't upgrade :) 19:02:08 any other announcements before i move on to other topics? 19:02:14 jeblair: wait, we didn't? 19:02:26 o/ 19:02:31 fungi: ansible 2.0 is in rc1 now 19:02:34 mordred: (new gerrit looks _a lot_ like the old gerrit) 19:02:45 mordred: neat--what are the implications for us? 19:02:46 fungi: so, it's a thing to keep our eyes on, as I believe our servers will upgrade when it releases 19:02:51 o/ 19:02:54 fungi: it SHOULD have zero effect 19:03:05 fungi: I've poked at our playbooks and everything _should_ be solid 19:03:13 so implication is something other than massively broken, we hope 19:03:16 but still - just keep eyes out 19:03:18 yah 19:03:24 they've done a good job with it 19:03:39 yay ansible 2.0 being good! 19:03:45 I had a tes tchange in d-g to make sure d-g won't break either 19:03:54 #info Be on the lookout for ansible 2.0 release in case we see any automation fallout resulting. 19:04:08 #topic Actions from last meeting 19:04:13 #link http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-11-17-19.00.html 19:04:18 there were none, all executed successfully! 19:04:24 #topic Specs approval 19:04:30 phschwartz has a spec to extend openstackci, but i've taken its proposal off the current agenda since it seems to still have some unaddressed comments 19:04:36 #link https://review.openstack.org/239810 19:04:47 add it back to the agenda when it's ready for council voting again' 19:04:58 fungi: correct, I am addressing the last comments currently 19:05:06 thanks for bearing with us! 19:05:13 #topic Priority Efforts: Gerrit 2.11 Upgrade 19:05:16 now is the time on sprockets when we dance... 19:05:21 zaro: mordred: any status on fixing the rollback plan, and guesses at a timeline for a fix/workaround to the openid url bug? 19:06:06 * mordred has not touched it - zaro - any luck in getting a happy reproduction of the rollback with changes past the data overlap? 19:06:09 o/ 19:06:30 zaro has a test instance up that I am playing with 19:06:31 yeah, i got a working post rollback site. 19:06:45 I have no conclusions as of yet, just getting oriented 19:06:47 anteaya is testing it now 19:06:53 cool! 19:07:11 anyone else want to help test? 19:07:12 the only issue is in regards to sortkey getting removed in 2.9 19:07:32 zaro: thats an issue with the rollback? 19:07:38 sounds like we probably want to put off discussions of a rescheduled window until these last known details are ironed out 19:07:41 #link explaination of sortkey getting removed in this patch https://review.openstack.org/#/c/245598/ 19:07:47 I think that's just used to put changes in some order from a query. 19:07:55 clarkb: yes 19:08:13 I'm not sure what we want to do there. 19:08:14 same change i linked in the announcements, fwiw 19:08:34 sorry 19:08:55 there are probably a few things we can do to fix it 19:08:58 or make it work 19:09:03 zaro: _david_'s suggestion seems like the most correct? 19:09:07 clarkb: ++ 19:09:09 basically make sort keys for any new changes 19:09:15 yes, i believe so 19:09:43 what i did was just put a duplicate key in the table and reindex worked. 19:10:02 \o 19:10:30 https://gerrit.googlesource.com/gerrit/+/e800b1e0f3452e5be1537a67f1fa3e44a58c6dda/gerrit-server/src/main/java/com/google/gerrit/server/ChangeUtil.java#181 19:10:41 oh that looks like we could probably even make a quick python script to do it? 19:10:48 oh 19:10:51 maybe even a sql query? 19:10:52 I could do that in sql 19:10:53 yea 19:10:56 in the downgrade script 19:11:06 it's just a simple data manipulation 19:11:11 yep, my only objection was to the suggestion that we could consider not writing it until we discover we need to use it 19:11:19 cool! 19:11:27 mordred: sql++ 19:11:28 "The encoding uses minutes since Wed Oct 1 00:00:00 2008 UTC." 19:11:30 WHAT???? 19:11:33 come on guys 19:11:35 mordred: yay 19:11:40 it's the "new epoch!" 19:11:43 Morning 19:11:47 * mordred cries 19:11:50 epoch 2.0? 19:11:51 i measure time from when sortkey was removed 19:11:52 * jhesketh joins a little late 19:11:55 also known as "gerrit crazypants time" 19:11:57 hey jhesketh 19:12:03 fungi: Gerrit Standard Time 19:13:28 mordred: would you like to come up with something? then i can test it? 19:14:22 zaro: yes. I will writ eyou some lovely sql 19:14:42 lovely then.. 19:15:01 okay, so it _sounds_ like we'll probably have that sorted and retested by next week's meeting 19:15:08 regarding the other issue.. gerrit url redirect 19:15:16 the openid redirect issue was also brought up as an upgrade blocker though, roght? 19:15:19 yeah that 19:15:28 I am not sure it has to be a blocker but it is quite annoying 19:15:38 I am happy with revert of change that breaks it until upstream figures it out 19:15:54 o/ 19:15:59 o/ 19:16:04 I also can live with revert of change that breaks it 19:16:13 the info is up on the bug #link https://code.google.com/p/gerrit/issues/detail?id=3365 19:16:30 (also #link'd in the announcements part of the meeting, for reference_ 19:16:49 my lack of knowledge regarding apache rewrite config is shining thru here. 19:17:10 i'm hoping hugo either reverts or finds some sort of fix in the current code. 19:17:37 i'm we are open to changing our apache config, that might be an option as well. 19:17:49 *i'm/we 19:17:59 so as a way forward, we would backport a revert of that change and build our custom war with the revert in place if it ends up being the only blocker and we can't fix it in apache? 19:18:05 we are open to changing our config but not at the expense of those features 19:18:29 I have attempted to fix in apache 19:18:34 +1 what fungi said 19:18:35 as have i 19:18:39 can we change our config while keeping the rewrite rules? 19:18:49 the trouble there appears to be that http://foo.openstack.org// is treated as foo.openstack.org/ in many situations 19:19:00 i hacked on possible workarounds for a while back when we first encountered it last may 19:19:03 zaro: that is what we have tried and nothing works and I think its due to ^ 19:19:33 I turned on rewrite debug logging and it wasn't 100% clear but it seemed like we weren't matching // because raisins 19:19:40 and its possibly my apache regex foo was just bad 19:20:04 so i guess due diligence should include testing review-dev with a custom-built 2.11.x war with that commit reverted just to make sure it's not hiding other problems 19:20:30 makes sense 19:20:31 is there a way to use both 'ProxyPass / http://127.0.0.1:8081/ nocanon' with our current rewrite rules? 19:20:37 fungi: ++ 19:20:41 zaro: happy to keep testing things you serve up 19:20:48 zaro: no 19:20:56 clarkb: if you need any help with apache regex i'm happy to take a gander - but i trust your regex foo. 19:21:02 the ProxyPass rule is going to proxy everything 19:21:39 ok. i'll install a revert of that change 57800 on review-dev.o.o for testing. 19:21:48 notmorgan: i think it's more dealing with apache's selective application of regular expressions 19:21:59 zaro: thanks 19:22:22 fungi: ++ 19:22:46 i was also getting tripped up on the stacking order of rewrites and proxying in apache when i was trying to work around that 19:22:47 fungi: probably, but if i can help, lmk - happy to dust off my apache skills if it will mkae a difference :) 19:23:08 * notmorgan used to do tons of rewrite/proxy/etc stuff for webhosting. 19:23:16 notmorgan: i believe we have an easy reproducer if you get bored and want to stand up a vm to play around with 19:23:35 fungi: cool will ping you post meeting / zaro and look into it 19:23:38 :) 19:23:39 reverted 57800 is now on review-dev.o.o 19:23:45 (same goes for anyone who wants to take a crack at leveraging their apache-fu) 19:23:51 thanks zaro! 19:24:18 dimtruck: has graciously volunteer to help repro it. 19:24:23 +1 19:24:39 dimtruck: thank you 19:24:41 okay, so we're agreed we'll defer scheduling of the 2.11 upgrade until at least next tuesday's meeting? 19:24:46 agreed 19:24:52 sure 19:24:54 rather, REscheduling 19:25:34 #agreed Rescheduling of the Gerrit 2.11 upgrade is deferred to the next meeting while final details are worked out and suggested fixes tested. 19:26:01 anything else on this topic (almost typed tapioca) before we move on? 19:26:08 ha ha ha 19:26:35 redirect on review-dev.o.o seems to work great for me now. 19:26:46 great news 19:26:57 okay, next topic! 19:26:59 #topic Priority Efforts: maniphest migration 19:27:05 ruagair has a comment in the agenda about maniphest early adopters which i didn't spot discussed in last week's log--any details on who and why? 19:27:13 this is more for my edification/education 19:27:21 * anteaya has no details on who and why 19:27:28 i got a beta access for maniphest 19:27:31 if he's not around and nobody knows i won't waste meeting time on it 19:27:36 and GheRivero as well, to work on infra cloud 19:27:39 may be that? 19:27:58 he says "There are a number of users already using Manifest in anger, as guinea pigs." 19:28:20 oh 19:28:22 maybe? 19:28:24 i take that to mean as (semi?)production location for tracking bugs in their projects 19:28:26 i wish we'd talked about that before doing it? 19:28:31 same 19:28:46 i just created a project, and Ghe poked a bit, but nothing on production at the moment 19:28:52 topic was raised last week on meeting 19:29:12 I don't think anything was discussed about anyone using it though 19:29:17 okay, i'll try to sync up with him when he's awake and get some details on what that comment means 19:29:21 I'm still just trying to get working mod auth openid with apache 19:29:30 i would not say we use on production, but got access to poke a bit 19:29:40 yolanda: that's fine 19:29:53 that would make perfect sense, of course. i'm hopefully just misinterpreting the agenda comment there 19:29:58 i just don't think any project should start using it for real until we think it's ready for that 19:30:07 yes, he even may not be talking of us 19:30:44 #agreed Maniphest test deployment is for testing the feasibility of Maniphest, not for production use. 19:31:06 yolanda: is the openid thing there and working? 19:31:15 mordred,not, just a user/pass 19:31:29 kk 19:31:46 #topic Enable Tempest for Magnum (dimtruck) 19:31:49 #link https://review.openstack.org/248123 19:32:08 woo hoo! yes please :) yolanda already +2'ed ... this is needed to add tempest to magnum 19:32:10 dimtruck: can you explain what's unusual about this change? 19:32:24 nothing unusual. Just wanted to get it in front of cores :) 19:32:29 mostly just trying to figure out why it's a meeting topic 19:32:36 dimtruck: this was not the way to do that 19:32:43 oh, sorry about that 19:32:43 dimtruck: yeah, adding it to the meeting topic isn't a good choice 19:32:52 o/ 19:32:57 that was one of the suggestions proposed to me. won't do it again. 19:33:16 dimtruck: can you also give that feedback to whoever proposed it to you 19:33:21 i will. 19:33:23 and thanks 19:33:24 yep, i thought maybe there was something happening with that change we needed to discuss as a group, but i didn't check it at all before starting the meeting. sorry about that! 19:33:36 #topic Translations check website (pleia2) 19:33:40 hi there 19:33:43 #link http://lists.openstack.org/pipermail/openstack-infra/2015-November/003465.html 19:33:45 so Daisy sent an email to the list ^^ 19:34:04 I replied, but I'd really like to see others chime in to help make it possible for Daisy to get this to the finish line 19:34:17 if there's anything I missed, references that may be helpful, etc 19:34:53 I also wanted to mention that we never decided upon a URL for accessing this service, and none is defined in the spec, so it would be great to get some ideas 19:35:13 pleia2: what do you think is appropriate? 19:35:14 #link http://specs.openstack.org/openstack-infra/infra-specs/specs/translation_check_site.html 19:35:30 i18n-devstack.openstack.org ? 19:35:51 daisy suggested translation-test.openstack.org 19:36:03 pleia2: what do you like? 19:36:03 but i18n-devstack.openstack.org or translation-devstack.o.o works too 19:36:28 why devstack in the url? 19:36:33 yeah, i'd prefer not to bikeshed too much. but i'm also not keen on embedding the names of tools in domain names (in case we change the tools we use to provide them later). that said, we're pretty inconsistent between naming servers after the services they provide and naming them for the software they run 19:37:13 it seems very specific to the tool hence my suggestion 19:37:22 whereas translation-test could be anything 19:37:36 * pleia2 nods 19:37:39 but I don't care strongly enogh to want one over the other that strongly 19:37:43 either will work 19:37:54 so daisy's suggestion works for me 19:37:58 * fungi wonders if it's there to test anything besides horizon... do we have translated api responses for example? 19:38:18 fungi, no we don't have. 19:38:29 glance is currently working on something in this area. 19:38:44 and their focus for this has been horizon 19:39:00 fungi, https://review.openstack.org/232304 is related to the glance work 19:39:03 anyway, at the end of the day, i don't care to much what we call it. pick something that seems appropriate 19:39:18 er, too much 19:39:18 pleia2: throw a dice ;) 19:39:22 ok, I'll propose a change to the spec once I decide on one 19:39:36 use the turbo-hipster model of naming things ;) 19:39:55 thanks pleia2! 19:40:02 that's all 19:40:12 #topic Open discussion 19:40:28 looks like we blew through this week's agenda with 20 minutes to spare 19:40:35 ok, infra-cloud 19:40:42 we got nearly the 100 nodes up on east 19:40:51 nice! 19:40:53 wanted to check with greghaynes, crinkle, about status of west 19:40:55 do they cloud? 19:40:59 woot! 19:41:10 hiya 19:41:12 clarkb, mordred, we are pending on a glean fix 19:41:17 oh - which one/ 19:41:19 yeah 19:41:19 ? 19:41:24 i'm still working out some issues automating the baremetal deployment in west 19:41:27 https://review.openstack.org/#/c/244625/3 19:41:36 yolanda: rcarrillocruz so what does up mean? 19:41:49 clarkb, they are correctly deployed with bifrost 19:41:59 yolanda crinkle is the hardware entirely different on east and west is that why you are working in two groups? 19:42:02 we had some blockers with ilos, disk problems, network 19:42:14 anteaya: there are different network setups 19:42:20 crinkle: ah thanks 19:42:20 anteaya , Ricky and myself joined recently, so we got agreement to work on Easts 19:42:41 I'm going to approve that change unless someone else wants to scream 19:42:42 yolanda: right but yeah if different networks you would have different issues 19:42:52 it's the vlan interfaces bug for debian in glean we discussed last week 19:42:54 mordred: what change? 19:42:55 mordred: no that appears to be what we agreed on after the discussion 19:43:04 anteaya, that's right. networking is different, as well as hardware 19:43:04 mordred: interface.vlan 19:43:05 crinkle: anything we can assist on? we have some playbooks on ansible we've been using ad-hoc in east 19:43:10 not sure what you're at 19:43:11 yolanda: okay thanks 19:43:18 oh wait 19:43:19 mordred: hold on 19:43:27 clarkb: kk 19:43:29 the tests are still broken 19:43:30 so yes, crinkle, we can help on network, hard problems... we had to open a few tickets 19:43:34 rcarrillocruz: ^ I pointed that out to somewhere I thought 19:43:37 rcarrillocruz: there is no eth4 19:43:49 rcarrillocruz: i've been working from the patches in gerrit, if you have ad hoc playbooks that aren't pushed up that's not really good 19:43:51 right? 19:44:05 fungi: https://review.openstack.org/#/c/244625 19:44:13 thanks! 19:44:15 mordred: also, mind if i de-op you in here? 19:44:21 fungi: sure thing 19:44:21 crinkle, we consume same playbooks 19:44:21 crinkle: pending push, yep 19:44:27 but we have additional tools 19:44:39 i wonder what automation issues are those 19:44:40 fungi: i'm wondering if there are plans for an infra midcycle? 19:44:44 rcarrillocruz: or did we end up deciding if that was correct? 19:45:00 I need to find the input to the tests to figure out if outputs look good 19:45:12 clarkb: not sure 19:45:18 crinkle: i need to circle back around with pleia2, she was looking into logistics for one good option 19:45:37 yeah, vmbrasseur just followed up again internally for me 19:45:44 so hopefully we'll know something soon 19:45:48 the short answer is that we're hoping to do something in mid february and focus on driving whatever's left to knock out for infra-cloud 19:46:08 cool, i think some puppet-apply things could get knocked out too 19:46:26 or turn it into a get-to-know-you session for infra-cloud perhaps, if there's nothing left to knock out ;) 19:46:53 crinkle: what we have pending push is some helper scripts to wipe servers on ironic and doing rebuilds 19:46:53 that help us re-test deployments 19:46:53 i'll push up tomorrow 19:47:16 fungi, ++ on mid cycle for that 19:47:16 crinkle: I'm _hoping_ to finish puppet-apply this week :) 19:47:23 mordred: oooh 19:47:41 crinkle, so you got an operational cloud in west right? 19:47:48 yolanda: not currently 19:47:49 crinkle: yep, it's risky coming up with an agenda this far in advance since we don't know what we'll have finished when, but that seemed like a good time and when the other infra core reviewers/root admins where discussing in tokyo we concluded we'll probably have something worthy of an in-person sprint 19:47:58 ++ 19:48:00 clarkb: it was not clear to me why having 5 interfaces on the config drive fixture glean doesn't produce eth4 no 19:48:01 :/ 19:48:11 crinkle, i connected to controller node last week and saw some services up, what's missing? 19:48:21 rcarrillocruz: it is making an eth4 19:48:25 crinkle: yeah, we can assist 19:48:27 rcarrillocruz: but the old change wasn't iirc 19:48:33 * mordred wants to point out that LCA is late this year, so mid-feb is potentially weird timing-wise 19:48:34 yolanda: there weren't static ips assigned at the time, and now i've restarted dnsmasq enough that the ips they had are lost 19:48:40 rcarrillocruz: so I am just trying to reconcile the two in my head and ifgure out if the current make an eth4 is correct 19:49:00 yolanda: rcarrillocruz are you using static dhcp ip assignments in east? 19:49:02 clarkb: k, looking forward for your input in the change 19:49:07 also, AnsibleFest London is Feb 18 and they're talking about having a one-day contributors summit the day before which Shrews and I probably want to be in 19:49:15 crinkle: no, to speed up testing we are using dynamic dhcp 19:49:17 crinkle, no 19:49:24 ah that's what i'm working on 19:49:25 cos we really needed to know what's broken and what not from HW perspective 19:49:29 mordred: feb 1-5: http://lcabythebay.org.au/ 19:49:41 yep, we can try to work the date so it's at least not overlapping or hopefully too back-to-back with lca 19:49:42 and concurrently i've pushed that glean fix i mentioned earlier, cos vlan in glean is broken 19:49:47 mordred: so if mid-feb = week of feb 15th we sould be okay 19:50:00 mordred: depending on how much time after lca you want to be in geelong 19:50:06 rcarrillocruz: looks lik there are 5 mac addrs so should be 5 interfaces. eth0-eth4 19:50:10 rcarrillocruz: I think its correct 19:50:13 mordred: ^ if you want to approve 19:50:15 \o/ 19:50:54 woot! 19:50:55 crinkle: if you need help or split the work on testing the HW, let us know 19:50:55 done 19:50:58 i've split the log_processor python scripts out of the puppet module and into the new project openstack-infra/log_processor, interested persons should add it to their watched projects list 19:51:05 happy to help 19:51:13 yolanda: rcarrillocruz do y'all sometimes have problems with glean not writing out /etc/network/interfaces or /etc/network/interfaces.d/.cfg on boot? 19:51:25 heh 19:51:28 hmm, yeah 19:51:39 BUT 19:51:49 i need to see if that happens on our vlans when the fix merges 19:51:58 not sure what's that coming from, if legit or not 19:52:01 maybe a race 19:52:05 if we did feb 10-12 wed-fri that should fit in between lca and ansiblefest 19:52:07 i think it's unrelated to the vlan change, i think it's an issue with the upstart script 19:52:18 yeah, unrelated 19:52:36 i mean i didn't want to go that rabbit hole before fixing the vlan thing 19:52:48 and see if we hit that on the vlan interfaces, since it's the one we will care about 19:52:51 in these weeks we've been mostly focused on hardware problems 19:52:53 i also think it's about upstart 19:53:00 i hit similar races in the past in gozer 19:53:18 where glean would run before network interfaces would come up, but iirc clarkb fixed that 19:53:29 so not sure, anyway, we'll let you know what we get 19:53:36 i think we really should stress test the deployments 19:53:42 fungi, clarkb, jhesketh, yolanda, pleia2: whie we're here and talking - could I get a +A on https://review.openstack.org/#/c/241495/ so that I can move forward with puppet apply? 19:53:43 and document failures 19:54:09 rcarrillocruz, i agree. We've been bringing up machines slowly, but we need to be sure that it's repeatable 19:54:36 rcarrillocruz: the only problem is if you do that too much you end up having no nodes left 19:54:44 because ILO bugs and motherboards die 19:55:06 really? you are scaring me... :-) 19:55:51 we're still on the short-term hardware, right? 19:56:05 clarkb, we had several ilo problems, but were fixed resetting ilos at the moment 19:56:07 jeblair: yeah, thing is we have to make a 'inventory' of what's working , what not 19:56:07 rcarrillocruz: aiui that was part of the problems in west 19:56:11 jeblair, yes 19:56:17 then come up to purp with that info 19:56:18 where the plan is "continue working on this to get the framework in place and identify potential problems, but don't spend too much time worrying about this particular hardware" 19:56:21 jeblair: we won't see additional hardware until at least january 19:56:23 What's the status on the wheel mirror patches? I've got a patch chain that's waiting on it. 19:56:51 krotscheck: greghaynes is awaiting reviews on it 19:56:52 krotscheck: they are up and in review with passing CI results now I think 19:57:53 clarkb, crinkle: Thanks. The last activity I see is on the 17th, one week ago. 19:58:31 I'm guessing this is not super high priority when compared to the other infra goals? 19:58:36 mordred: yep, lgtm 19:59:20 jhesketh: thank you! 19:59:37 krotscheck: compared to last week and gerrit pgrade correct 19:59:49 If I could solicit some reviews on greghaynes's behalf, it'd be much appreciated :) 19:59:49 we're just about out of time--thanks everyone! 20:00:01 #endmeeting