19:00:47 #startmeeting infra 19:00:49 Meeting started Tue Apr 14 19:00:47 2015 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:53 The meeting name has been set to 'infra' 19:01:01 #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:01 #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-04-07-19.03.html 19:01:06 o/ 19:01:08 #topic Actions from last meeting 19:01:08 o/ 19:01:08 o/ 19:01:12 o/ 19:01:13 o/ 19:01:18 fungi: thanks for chairing the previous meeting 19:01:28 np 19:01:29 o/ 19:01:30 any time 19:01:50 hello ! 19:02:47 clarkb migrate more jobs to swift log hosting 19:02:56 o/ 19:03:08 I have not done that yet, worked with jhesketh on a plan for non log data in swift first 19:03:22 clarkb: does the plan have an etherpad? 19:03:25 there is a stack of os-loganalyze changes to make that work much better that he pushed today 19:03:25 that sounds important 19:03:32 ah 19:03:35 anteaya: no, it was mostly in review 19:03:37 I can get links 19:03:40 k thanks 19:03:56 #link https://review.openstack.org/#/c/107267/ starts there 19:04:01 do they use a common change topic? oh just that series 19:04:09 if they have the enable_swift topic they should be discoverable 19:04:23 ya I can update the topics on them now 19:04:36 thanks! 19:05:08 zaro test gerrit 2.10 19:05:43 looks good. 19:05:50 it's running on review-dev now, yeah? 19:05:58 an issue we need to deal with: https://review.openstack.org/#/c/172534/ 19:06:08 yes, it's running on review-dev. 19:06:11 #link https://review.openstack.org/172534 19:06:19 aw bummer (re https://review.openstack.org/#/c/172534/) 19:06:21 right ^ concerns me because we intentionally installed from system packages to get security updates 19:06:33 and bouncy castle is the one that is most likely to haev important security updates 19:06:42 I was testing replication and it seems like i'm gtetting the following error: org.eclipse.jgit.errors.TransportException: git@github.com:testproj1.git: reject HostKey: github.com 19:07:20 zaro: does bouncy castle from packages work with gerrit 2.9? 19:07:27 o/ 19:07:31 yes, it does 19:07:42 * anteaya leans toward 2.9 19:07:59 zaro: does 2.10 really not work with 1.49, or does 2.10 just say that it requires it? 19:08:07 are there specific features in 2.10 that use new functionality in later bc? 19:08:48 jeblair: no it doesn't. 19:09:05 jeblair: 2.10 does not work with 1.49 19:09:33 i wonder what the best way would be to keep on top of bc security updates if we take this path 19:09:41 i've tested 2.10 and it seems like it will only work with 1.51 libs 19:09:53 * anteaya notes we are 3.5 weeks out from the gerrit upgrade 19:10:20 was there anything specific we wanted in 2.10 that 2.9 doesn't give us? 19:10:34 anteaya: yes, but we're likely to want to install 2.10 on trusty, so it's worth seeing what would be involved in solving this problem. 19:10:34 fungi: i don't know the answer to your question. 19:10:34 I want close connection which works with both 19:10:51 jeblair: ah right 19:11:17 the next ubuntu lts won't be until next spring 19:11:47 but might there be a new ubuntu boucy castle package before the next lts release? 19:11:49 i tried looking for newer libs on backports but didn't find it. 19:11:56 Next northern spring? 19:12:03 tchaypo: good point 19:12:08 12 months from now 19:12:09 the ubuntu maintainer for bc might be amenable to adding newer packages on trusty-backports 19:12:29 tchaypo: yes, my bad 19:12:36 next april 19:12:37 do we have contact wtih the bouncy castle maintainer? 19:13:02 this might be a possibilty: https://groups.google.com/forum/#!topic/repo-discuss/2F6eeXZwABE 19:13:40 looks like ubuntu vivid still only has 1.49 anyway 19:13:46 I'm assuming that doing our own back port is not something we want to consider 19:13:54 pretty new, so i don't think anybody has tried 19:14:01 debian experimental has 1.51 19:14:14 but jessie is releasing with 1.49 19:14:47 my guess is the debian maintainer for it plans to put 1.51 in unstable once jessie releases, so it might appear in jessie-backports thereafter 19:14:58 ubuntu seems to just be importing from debian on this one 19:15:23 so our options are: (a) stick with 2.9 for now, (b) ask ubuntu for newer bc backport, (c) install from maven and keep an eye on it ourselves, (d) switch to debian, (e) hold 2.10 until next ubuntu lts 19:15:31 anyway, seems like it might be possible but would likely require some help 19:15:53 #vote (a) and (b) 19:15:57 (d) implies also running packages from unstable or waiting for a backport there anyway 19:16:18 maybe c and wait until a is available? 19:16:23 fungi: yeah, i'm assuming that the next lts would get the newer debian packport, but timing could slip 19:16:32 i'd be okay with c transitioning to b 19:16:41 fungi: ya I am happy with that too 19:16:53 like do it from maven and keep an eye on it while trying to get it into trusty-backports in a couple months from now 19:17:07 i agree with fungi & clarkb 19:17:11 e.g. once it ends up in debian/testing 19:17:13 zaro, tchaypo: we don't have a packaging build infrastructure, so we can't build one ourselves (still has the same update problem anyway), and we can't use the gerritforge packages directly if we want to be able to apply local patches (we'd still need our build infra to do that) 19:17:38 I'm more in line with c transitioning to b 19:18:05 +1 for setup a packaging team 19:18:18 is it in vivid? 19:18:21 aren’t we working on that already? 19:18:27 SpamapS: no 19:18:33 what's the process to get b? 19:18:39 SpamapS: nope, they're importing from debian and it looks like the maintainer is holding it in experimental until jessie releases 19:18:39 so have ot wait for 'w' to open 19:18:43 tchaypo: it's on the list - no work has been done towards it yet- largely due to needing to clear the current priority efforts first 19:18:57 yeah debian and ubuntu frozen at the same time always sucks. ;) 19:19:10 let's clarify that 2.9 is still an upgrade, since we are currently running 2.8 19:19:21 #agreed install bouncy castle for gerrit 2.10 from maven, and see about adding it to trusty-backports after it appears in debian unstable 19:19:27 jeblair: ++ 19:19:31 anyone disagree with that ^ ? 19:19:41 looks like it's group-maintained by the debian-java team so it probably will end up in testing within the next couple months 19:19:51 * fungi #agrees 19:19:54 it is more work, but it isn't me who will be doing it, so if that is what folks want 19:19:55 +1 19:20:07 +1 19:20:20 anteaya: zaro wrote the patch already, and it's less work than upgrading twice 19:20:27 installing gerrit 2.9 when 2.10 is tested working is more tech debt than keeping up with bc for a little while, in my opinion 19:20:32 so technically not less work 19:20:44 okay 19:21:02 I thought we already had a 2.9 branch that was working 19:21:12 and was just experimenting with 2.10 19:21:13 thank you for the work, zaro and fungi 19:21:28 zaro: any other gerrit 2.10 things we need to look into/prepare for? 19:21:36 we do, but we're going to need to keep upgrading gerrit indefinitely, so installing 2.9 now is falling further behind the curve than we should probably aim for 19:21:50 fungi: fair enough 19:21:58 given that we seem to get windows to upgrade gerrit only about every 6 months 19:22:05 jeblair: i wanted to make sure replicsation still works, so maybe someone can help with the error i mentioned above. 19:22:07 true 19:22:13 zaro: oh right 19:22:14 hostkey error 19:22:31 jeblair: I think you just need to have that host accept github's hostkey(s) 19:22:36 yah 19:22:38 er 19:22:39 zaro: ^ 19:22:44 other than that, i think just need to fix LP integration. 19:22:45 you know - we could probably put github's hostkey into puppet 19:22:53 so start an ssh connection or use ssh-keyscan 19:23:04 o/ 19:23:05 it's a known quantity - and we should probably know about it via puppet when it changes 19:23:25 zaro: can you accepting the host key and grab an infra-root if that still doesn't work? 19:23:44 mordred: agreed, but i think we can make puppeting that be something outside the critical path for the gerrit upgrade 19:23:48 jeblair: ++ 19:24:00 zaro: what's broken with lp integration? 19:24:05 jeblair: not sure what that means, but can follow up with an infra-root to flush out 19:24:36 jeblair: jeepyb says it doesn't know about change-owner 19:24:37 zaro: yeah, i can help you after the meeting 19:24:54 a bit late but o/ Hello 19:24:59 i thought that was fixed a while ago, but i see the error in the gerrit log 19:25:02 * jhesketh joins late 19:25:12 jeepyb's gerrit hook processing will just need a patch to accept that parameter if it exists but not depend on it existing 19:25:31 there are a few others already in there, i can show you if necessary 19:25:31 anyways that LP issue is probably not a big deal. 19:25:57 np, i've done that before (i think) 19:26:23 cool, thanks! 19:26:27 #topic Schedule next project renames 19:26:42 jhesketh: I think you're joining EARLY 19:27:24 Heh (yay 5am) 19:27:31 gotta love it 19:27:34 so there are 11 projects that want to rename, 10 non-attic ones 19:27:48 all 10 are going from stackforge to openstack 19:27:55 jeblair: don't approve more projects into the big tent in the tc ;) 19:28:17 AJaeger_: i want to move all of them and then stop adding things to stackforge, but we'll talk about that some other time :) 19:28:33 or shall we wait for more to come before changing? 19:28:43 jeblair: that works as well... 19:29:00 AJaeger_: yeah, i think it's reasonable to try to batch them since we expect a lot 19:29:29 my only two time contraints are I will be in seattle thursday not in front of a computer if doctor gives ok and I will be busy on the 25th and 26th 19:29:41 otherwise I can likely be around to help 19:29:47 this weekend is pretty bad for me. races on both saturday and sunday 19:30:09 * mordred has a todo list item to write an ansible playbook we can try to use for one of these renames ... has not done it yet 19:30:14 also, we are in the RC phase... we should probably either do it this friday, or wait for the release. 19:30:40 jeblair: I'm fine with this friday 19:31:20 yeah, i can do friday 19:31:33 I'm around this friday 19:31:36 would it be appropriate to schedule the gerrit db migration with this? 19:31:43 but not much help with renames 19:32:03 so proposal: renames friday april 17 us afternoons, next renames (if needed) during the migration 19:32:13 "afternoon"; not plural 19:32:41 the utf-8 table update? we'll need to time the database export/transform/import for this dataset since it's largeish. just so we know how much extra outage we'll need 19:33:03 jeblair: I'm fine with your proposal 19:33:15 has utf8 process been tested with production data? 19:33:20 jeblair: wfm 19:33:25 i believe it takes 1 hr for the process. 19:33:49 that's what i remember but can make sure if you like. 19:33:53 zaro tested it on a copy of our production database and spot-checked the results, i think he said? 19:34:13 fungi is correct 19:34:24 i know i provided him with access to a copy of the db anyway and he made success noises a little while later 19:34:42 do we want to do that on friday, or wait for the upgrade for that? 19:35:09 (that would make friday a 1.5 hour outage) 19:35:17 * jhesketh wonders what a success noise is... 19:35:22 * anteaya too 19:35:30 wait for upgrade 19:35:32 i prefer 2 seperate steps in case we need to debug errors 19:35:36 i might be nice to have that done this week and baking for a few weeks prior to the upgrade, just so we can tell if one introduces a subtle problem and not have to untangle which it was 19:36:00 yeah, baking is the word 19:37:49 raw bugs are so much less palatable than baked bugs 19:37:49 * zaro is available fri 19:37:59 #agreed schedule a 2-hour outage for friday april 17 for project renames and utf8 conversion 19:38:09 if trove supports replication now - there is a cleverer way to do the operation without resorting to 1.5 hour downtime 19:38:15 fungi: I hear eating crickets are quite the thing 19:38:52 mordred: i'm more of a fan of an announced and planned outage than a clever but untested alternative 19:39:07 which itself could turn into a lengthy unplanned outage ;) 19:39:08 some of us ate bugs during the seattle meetup :) 19:39:20 which involves turning on replication, doing a dump, loading the dump into a slave, stopping replication on the slave, performing the data transformation, turning replication back on so that it's caught up, and then the actual migration involves stopping incoming data to the master, letting teh slave lag become zero, and swap the two 19:39:20 how about 2200 utc for friday? 19:39:26 * anteaya notes to avoid seattle meetups 19:39:29 cinerama: i eat bugs constantly. the ocean here is full of them 19:39:32 I'm not saying we should do it right now 19:39:34 2200 sounds great 19:39:36 jeblair: I'm up 19:39:40 but we should, for future things, test such a process 19:39:42 so taht we can use it 19:39:55 because it's the standard way to do long mysql transformations 19:40:01 #action jeblair send announcement for april 17 2200 utc 2-hour outage for renames and utf8 conversion 19:40:13 mordred: or we can try to never do a transform again :) 19:40:22 i like the second plan 19:40:36 have we scheduled a time for the 2.10 upgrade? 19:40:37 jeblair: sure. avoiding maintanence is always the best way to make sure the plan for maintanence is solid ;0 19:40:43 good luck with utf-8 conversion :/ 19:40:49 jeblair: just a date so far I think 19:40:53 hashar: thanks! 19:41:00 jeblair: may 9th? 19:41:01 Wikimedia Gerrit has been utf-8 since day 1 iirc 19:41:04 jeblair: and I don't recall there has been an post to the ml yet 19:41:22 clarkb: that is the date I am planning for 19:41:42 mordred: okay, you have convinced me it is worthwhile for you to write up and test your alternative. thank you. 19:42:02 clarkb: yeah, i think we did decide on that date... did we pick a time? 19:42:12 I don't recall a time 19:42:18 neither do I 19:42:34 I vote for >=9am PDT 19:42:54 roughly how long do we think we need for the upgrade? 19:43:01 that is 1600 utc or later, I believe 19:43:01 +1 19:43:26 jeblair: I shall add it to my list 19:43:48 * anteaya has no answer to fungi's question 19:44:05 zaro: what do you think? 19:44:29 30 mins 19:44:43 just install the new stuff, and, what, run a reindex? 19:45:09 opps reindex takes about 30mins 19:45:12 well, keep in mind we need time for all the steps to be performed, and then to test it out to make sure it's sane, and then to _undo_ the upgrade and double-check it on the off-chance that something goes horribly wrong 19:45:35 zaro: where are the steps? 19:45:38 there was an etherpad, right? 19:45:50 let me look 19:46:19 yikes it looks horrible: https://etherpad.openstack.org/p/gerrit-2.9-upgrade 19:46:35 opps refresh fixed it 19:46:42 so my guess, if the upgrade steps are 30 minutes and the reindex is 30 minutes that we need a bare minimum of 2 hours planned outage, and we should probably inflate that so we have breathing room 19:46:58 i'm just guessing we need a reindex 19:47:17 fungi: + 19:47:21 yep it's in the steps 19:47:27 2 hrs will be plenty. 19:47:44 okay, so 4 hours in the announcement then :) 19:47:46 ? 19:47:50 agreed 19:48:07 i have a script ready anways, #link https://github.com/zaro0508/gerrit-upgrade 19:48:08 always add the scotty factor 19:48:13 #action jeblair send announcement for may 9 1600 utc 4-hour outage for 2.10 upgrade 19:48:37 if you tell people 4 hours and have it done in 1, they'll be impressed. if you tell them 1 hour and take 2, they'll be annoyed 19:48:41 no-one ever gets upset when you don’t use the scotty time 19:48:56 +1 19:49:02 #topic Spec proposal - Integration tests for System-config Openstack_project using containers (fbo) 19:49:44 Hello, yes I submitted this spec do you already read it ? 19:50:03 i have not reivewed this spec 19:50:07 * anteaya has not read the spec in question 19:50:14 The purpose of this specs/work is to be able to apply puppet modules in a test environment in order to check if services are properly configured 19:50:15 The idea is too improve integration test for openstack_project puppet module 19:50:25 +1 on the idea 19:50:29 i need to review the implementation 19:50:34 yeah, it sounds pretty cool 19:50:36 +1 for the concept, yes 19:50:36 I think we should do functional testing of the puppet modules first 19:50:45 the other thing we should keep in mind is the module splits we're doing 19:50:49 it will be easier to get started and fix many bugs that will be easier t odebug in that env 19:50:58 then once we have that rolling work on integration testing 19:51:17 and that quite a bit of the openstack_project module will probably end up in the puppet-openstackci module 19:51:31 but i don't think we settled on how to test puppet-openstackci yet 19:51:36 I'm a fan of testing, how critical is the 'using containers' bit in the meeting agenda topic? 19:51:38 clarkb: though the end goal is to have integration test like: does a gerrit change trigger a test and does zuul report properly for example 19:51:40 fbo, http://specs.openstack.org/openstack-infra/infra-specs/specs/openstackci.html 19:51:41 Yes you are right ! that's the idea and after we can add more smoke tests to validate a deployment 19:51:44 so i banged on envassert, doesn't really seem like its for us 19:51:53 since its very tied to fabric 19:51:58 like if we want to test with another non-container method, will that be considered? 19:51:59 so maybe the same thing applies, but possibly just with a different module 19:52:06 tristanC: yes I just think getting there now is harder than the other thing 19:52:21 tristanC: and it will be far more benefiicial in the near term and long term to do the functional testing soon 19:52:22 and we'd probably prefer to build that kind of testing out of ansible than fabric 19:52:51 anteaya: i think it's a reasonable way to get a number of services installed on one host; i don't necessarily think we need to have 6 nodes spun up for one test 19:52:52 yah - expecially since the plan is to use ansible to deploy our servers with puppet 19:52:54 having the integration tests not do that seems like the wrong direction 19:53:24 jeblair: that is fair, just wondering how integral that point is to the spec 19:53:33 sounds like it is a large part 19:53:59 One other main benefit of that spec is it will force us to remove more hardcoded values in manifests 19:54:24 fbo: okay, it sounds like we generally think it's a good idea, we need to get into detials on spec review, and we also need to look at how it fits in with other methods of puppet testing 19:54:33 ++ 19:54:40 yep, no immediate objections 19:54:42 So making openstack_project module more reusable by allowing other wanting to build a CI 19:54:51 +1 on that 19:54:52 I really want to see functional testing with envassert first 19:54:52 if it comes with someone willing to work on implementation, all the better ;) 19:55:06 fbo: well the openstack_project module is designed to be not reusable 19:55:08 we should learn from openstack and do this correctly 19:55:15 fbo: make sure you've read http://specs.openstack.org/openstack-infra/infra-specs/specs/openstackci.html 19:55:24 anteaya: ++ 19:55:27 jeblair: regarding infra-manual: https://review.openstack.org/#/c/138595/ is waiting for a final approval for some time - others wanted you to have a chance to review it. 19:55:28 fbo: because that's where we're going, so we'll want to incorporate your ideas with that 19:55:30 fbo: that was the effort of splitting all the modules out of it in january 19:55:38 jeblair: oops, wrong channel. Sorry! 19:55:56 evaluating success of the puppet run is harder than picking a virt tech to run puppet on 19:56:05 clarkb: you disagree with nibalizer's analysis of envassert, or you just mean functional testing of individual modules in general? 19:56:07 clarkb: yeah, though it sounds like we need a followup conversation to figure out how to do that 19:56:22 fbo: please spend some time with asselin_ and nibalizer 19:56:27 fungi: in general 19:56:30 we don't have to use envassert 19:56:35 Ok so let's discuss more about the details via gerrit comments ? 19:56:37 but asselin_ bsically has functional testing already 19:56:38 fbo: would be great to have your efforts with theirs 19:56:40 ok I'll do that ! 19:56:54 and I would hate to see that owrk die while we spin our wheels doing integration testing that would be easier if we just had functioanl tests first 19:56:56 nice glad you appreciate the idea 19:57:08 fbo: testing is good 19:57:15 sounds like we should do both, and just need to work out prioritization then 19:57:21 clarkb: I do not understand why envassert is important 19:57:25 fbo: just sync up with ongoing puppet efforts 19:57:30 mordred: its not 19:57:31 but we can take that out of meeting 19:57:33 clarkb: awesome 19:57:34 mordred: functioanl testing is important 19:57:36 yes 19:57:49 clarkb, fbo: yeah, let's look at both. we may end up needing to do similar things to determine success anyway 19:57:53 fbo: okay, thanks! 19:58:03 #topic Antoine "hashar" Musso (can not attend) 19:58:09 hashar has his own topic! :) 19:58:10 actually around :) 19:58:12 nice! 19:58:16 will be short 19:58:21 I approve of hashar 19:58:22 hashar lies in the topic! ;) 19:58:25 we are going to deploy Nodepool on Debian/Jessie 19:58:35 and at wikimedia we use debian packages so I am packaging nodepool \o/ 19:58:42 that's wonderful! and terrifying! : ) 19:58:45 we have a bunch of debian developers internally so that helps 19:59:05 that said I noticed a change in nodepool that change one of the requirement ( https://review.openstack.org/#/c/170289/ ) 19:59:05 i can tag nodepool 19:59:09 could use a new tag 19:59:14 awesome 19:59:19 and I am not sure what is the usual policy when people change requirements.txt files 19:59:22 hashar: do you need zuul tagged too? 19:59:32 hashar: i'm not sure i've tagged it in a very, very, very, long time 19:59:34 a zuul minor tag would be nice as well 19:59:47 hashar: you are aware of the upcoming shade patches for nodepool, yeah? 19:59:55 yeah, the nodepool 0.0.1 tag is pretty ancient 20:00:02 mordred: what is it ? 20:00:09 we should also tag a few other things, bit no need to take up meeting time on those 20:00:18 yeah we can follow on list 20:00:26 I am going to package zuul for debian/jessie as well :) 20:00:41 hashar: thanks, as always! 20:00:45 and thanks everyone else 20:00:54 maybe next meeting, we'll talk about our priority efforts :) 20:00:54 so if you guys could tag zuul/nodepool before huge changes land in. That would be nice :) 20:00:58 hehe 20:01:01 #endmeeting