19:02:10 #startmeeting infra 19:02:12 Meeting started Tue Dec 9 19:02:10 2014 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:15 The meeting name has been set to 'infra' 19:02:24 o/ (despite thunderstorms and my isp doing their best to keep me away) 19:02:31 I'm here 19:02:33 \o 19:02:43 oh yay 19:02:46 Morning 19:02:50 o/ 19:02:52 o/ 19:02:57 yo 19:03:00 asselin has a time constraint today, so we'll take his topic first 19:03:11 jeblair, thanks. 19:03:22 I'm proposing an in-tree 3rd party ci solution. 19:03:30 #topic in-tree 3rd party ci solution (asselin) 19:03:54 I have a spec written. looking for link... 19:04:02 #link https://review.openstack.org/#/c/139745/ 19:04:21 o/ 19:04:24 cool, i think this sounds like a good idea 19:04:36 o/ 19:04:37 thanks. I've been discussing it in 3rd party meeting and with others, and generally lots of support with the idea 19:04:37 and a logical next step after the puppet module breakup 19:04:50 (i have to leave at quarter-till tho) 19:05:29 asselin: flagged that to read soon. i had similar thoughts a while back 19:05:29 jeblair: right I don't think having a new independent repo helps much if we do that before we have the module split done 19:05:30 I was hoping to start looking at the possible solutions and get somethign proposed by end of K1. 19:05:45 hogepodge: you may also be interested in that spec 19:06:20 I took an initial look at what it would take to set up a log server. 19:06:26 clarkb: yeah, i'm assuming this depends on finishing the module split 19:06:32 and it's going to uncover alot of other gotchas too 19:06:43 but it should help us start to nail down our interfaces 19:06:47 ++ 19:06:53 got good feedback, and looking at starting to 19:06:58 since having >1 consumer is really helpful for that sort of thing :) 19:07:03 +1 19:07:03 jeblair, right, exactly :) 19:07:11 o/ 19:07:29 also, we might be able to do better testing for this limited part of our system 19:07:45 asselin: logserver is likely to be the hardest part, since setting up a public-facing webserver is often a clash with corporate network admins' firewall policies and needs extra deployment considerations 19:07:55 but definitely still worth covering the simple case 19:08:00 jeblair, +1 can add that to the spec 19:08:09 anyway, so it sounds like next steps are for us to try to find some time to review the spec, and if we find any contentious/complicated bits, come back here and hash them out? 19:08:22 fungi, the assumption is to setup the log server in public place, and the rest can operate behind the firewall 19:08:29 asselin: also, offline from this meeting, I'd like to sync up with you on the rework-launch-node things I've been poking at and haven't written up 19:08:48 mordred, ok sure 19:09:02 jeblair, yes 19:09:21 #link https://review.openstack.org/#/q/topic:thirdpartyci,n,z 19:09:40 I created a topic to track the spec and initial attempt at the log server ^ 19:10:06 so that's it, just wanted to get awareness and support 19:10:11 asselin: cool, thanks very much! 19:10:16 asselin: nice work 19:10:22 a worthwhile endeavor 19:10:29 thanks 19:10:37 #topic Actions from last meeting 19:10:50 o/ 19:10:57 oh, forgot my links 19:11:00 #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting 19:11:02 #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-12-02-19.01.html 19:11:23 anteaya draft messaging to communicate the new third-party account process 19:11:33 that happened 19:11:34 she even sent it out 19:11:39 I did 19:11:41 above and beyond! 19:11:46 heh 19:11:57 and we are mostly transitioned off of the old stuff. 19:12:02 yay 19:12:14 so that seems to be going well, aside from apparently we had a way to block gerrit emails being sent on behalf of 3p systems 19:12:16 #link http://lists.openstack.org/pipermail/third-party-announce/2014-December/000130.html 19:12:17 the old groups still exist but are owned by administrators and are not visible 19:12:19 which we lost 19:12:28 pleia2: has a system-config patch up to remove the old ml from puppet 19:12:36 we have planes to archive it 19:12:49 the third-party requests ml 19:12:56 what should we do about the email functionality? 19:13:20 drop the "feature", try to get people to manage a non-voting-ci group, exim rules? 19:13:41 I like not manageing this group 19:14:03 can we filter on IC 19:14:07 CI 19:14:11 laggy wifi 19:14:28 create a "don't send e-mail" group in gerrit and have a lot of people who can drop accounts in there for well-defined reasons? 19:14:38 ohhh, I like that 19:14:41 doesn't necessarily have to be ci-specific, but likely would be anyway 19:14:42 anteaya: yes, we could filter outgoing mail with exim based on having "CI" in the name 19:14:49 who can add to that group? 19:14:51 I strongly feel this should be managed in clients. Similar to filtering noise in irc channels 19:15:11 not everyone will agree on how to filter a thing and thankfully for email there are lots of tools available to do this independent of gerrit 19:15:12 clarkb: except it isn't happening 19:15:20 anteaya: why not? anyone can do it 19:15:23 and folks make noise in -infra 19:15:41 based on the number of questions so far it isn;t happening 19:15:53 if I had to pick a way to go back to sort of what we had before I would create some Central CI group that has a pretty large management group to add/remove members 19:16:14 I like fungi's no email group 19:16:15 i think some of the complaint is devs using muas they can't or don't know how to configure to filter this (which should be a lot easier now that we have naming consistency) 19:16:20 we could reuse the existing Third-Party CI group 19:16:28 which is preseeded with a number of ci users 19:16:38 and anyone who complains gets added to the management group for that group ;) 19:16:43 ha ha ha 19:16:44 jeblair: +1 :) 19:16:46 yes! 19:16:47 hahaha 19:17:24 i'm on board. if we do implement a no-emails group, infra core shouldn't add anyone to it, we should only add coordinators and make them take responsibility 19:17:40 * anteaya touches her nose 19:17:43 so I can make this change. Should I go ahead and remake Third-Party CI a thing and give Third-Party Coordinators ownership and pressed that group with people to manage it? 19:18:04 unfortunately, there's no real audit trail on group membership management, so the larger the list of coordinators the less likely you'll be able to figure out when, why and by whom a member was added/removed 19:18:04 everyone attending the third-party ci meetings should be in the management group, i think 19:18:14 fungi: so you wouldn't start with the preexisting list of accounts? 19:18:16 terrific 19:18:17 fungi: isn't there such a thing in the db? 19:18:21 fungi: just not exposed? 19:18:48 clarkb: yeah, whether or not the group is pre-seeded doesn't change the future accountability problem potential though 19:18:52 clarkb: i think we should start with the current list. just not add any more :) 19:19:00 then folks can complain at the meetings not infra channel 19:19:03 jeblair: ya that is what I was going with 19:19:11 jeblair: i don't think there is, unless you cound the mysql journal in trove 19:19:32 fungi: hrm, there's an audit table for groups, but we can dig into that later 19:19:44 oh, indeed. i'll double-check it 19:20:30 #action clarkb add DENY Email Third-Party CI rule to gerrit ACLs, giev Third-Party Coordinators ownership of Third-Party CI, seed Third-Party Coordinators with third party meeting attendees 19:20:38 is that what we have agreed on? 19:20:42 i think so 19:20:51 I like it 19:20:55 yep, i'm on board 19:20:55 also, we should make it clear that jenkins and anything not a third-party ci is off-limits :) 19:21:09 jeblair: ya I can put that in the description of the group too 19:21:34 back to actions... 19:21:35 fungi nibalizer get pip and github modules split out 19:21:46 according to nibalizer that was already done last week 19:21:53 yup 19:21:55 oh neato 19:21:59 and so i readded it to the actions list in error 19:22:05 clarkb script new gerrit group creation for self service third party accounts 19:22:13 also done :) 19:22:17 fungi close openstack-ci and add openstack-gate to e-r bugs 19:22:21 done 19:22:31 we haven't needed the hat of shame at all this week 19:22:39 :D 19:22:45 * mordred is the hat of shame 19:22:48 #topic Priority Efforts (swift logs) 19:23:13 haz 19:23:19 i have the hat of shame, i didn't make the docker thing i was supposed to do 19:23:33 Okay, so zuul-swift-upload now works as a publisher so we can get logs on failed tests 19:23:40 WOOT 19:23:40 yay 19:23:47 those are the best kind to have :) 19:23:52 The experimental job (as seen here https://review.openstack.org/#/c/133179/) has disk log storage turned off. 19:24:07 for the most recent run 19:24:11 So everything should be in place now to switch jobs over. Some of the project-config jobs are logging to swift. The next step is to turn off disk logging for infra to be the guinnepigs 19:24:18 and no more races with getting the end of the console log? or is that still sometimes an issue? 19:24:29 What do people think? 19:24:37 jhesketh: I support this 19:24:37 fungi: none issue with swift 19:25:05 jhesketh: progress progress progress let's make some 19:25:17 yes I think I am ready to dogfood. I did have some ideas that came up looking at the index page for the above job. It would be nice if we had timestamps and file size there but I think we can add that later 19:25:36 #agreed start infra dogfooding logs in swift 19:25:37 fungi : do you mean that we miss the end due to fetching? We kinda do in that we cut off the wget stuff 19:25:38 I'm excited that 4 years in we may be about to use swift 19:25:41 looks good. our publisher is still a little noisy with certificate warnings when grabbing the console from jenkins 19:25:54 jhesketh: right 19:25:55 fungi: is that because of the RFC deprecation thing? 19:26:07 mordred: nope, it's because it's self-signed 19:26:09 ah 19:26:38 also we still need index page text besides the links themselves 19:26:39 We could try and silence wget 19:26:54 or we could just get 8 real certs 19:27:02 mordred: -- 19:27:15 fungi:? 19:27:35 Do you mean the returned links? 19:27:44 jhesketh: the readme we embed in on-disk apache autoindexes 19:28:06 Ah, right, yes 19:28:10 fungi: not for our dogfooding though 19:28:18 hrm except for d-g I guess 19:28:22 but d-g is the interesting one 19:28:30 I guess each job can do their own index somehow 19:28:49 what if we just put that tempate in a known location and link to it in every index? 19:28:51 we could add generate-index macro in jjb maybe 19:28:53 Either that or we make os-loganalyze smarter 19:29:14 where link is something different than a hyperlink so thatit renders nicely 19:29:14 i think os-loganalyze seems like a better place to fix that 19:29:46 since that will allow us to alter readmes over time rather than having them stuck in whatever the state was at the time the logs were uploaded 19:30:17 the opposite approach has advantages too -- as things evolve, the readmes can co-evolve 19:30:28 Except the readme might not match the old logs, so storing it with the job may make more sense 19:30:30 true 19:30:34 (if, say, devstack-gate wrote its own readme) 19:30:54 Doing it as a macro adds the greatest flexibility to the jobs 19:31:34 what writes the index right now? 19:31:40 yeah, and i guess the inefficiency of having hundreds of thousands of copies isn't terrible since that'll be a small bit of text in teh end 19:31:49 This shouldn't affect the project-config jobs we want to dogfood, so maybe we tackle it when we move the other jobs over 19:32:02 btw, logs.openstack.org/79/133179/1/experimental/experimental-swift-logs-system-config-pep8-centos6/cc75c20 is really slow to load for me 19:32:30 jeblair: the upload script can generate an index.html which is just a list of files it uploaded 19:33:05 on performance, yes it does seem that the index page generation is slow. requesting specific files is very quick by comparison 19:33:43 fungi: the index.html isn't generated and is also a specific file. So Ithink any file may have that problem 19:33:44 huh... suddenly it sped up for me 19:33:46 jhesketh: so we could add times/sizes to that, and have it insert the text of a readme if one exists 19:33:50 Yes so indexes suck in that the object in the url is first attempted to fetch and failing that it appends index.html and tries again 19:34:00 So it needs to make a few calls to swift 19:34:12 jeblair: yep 19:34:41 jhesketh: can we have os-log-analyze append 'index.html' if the uri terminates with a '/'? 19:34:55 (which it should anyway, and then we can go update zuul, etc, to leave the proper / terminated link) 19:35:03 fungi: the speed will depend if there is an established connection with swift available in the pool 19:35:21 ahh 19:35:38 jeblair: that seems reasonable (so we assume object names never end in a trailing slash) 19:36:04 i _think_ for our pseudo-filesystem case we can make that assumption 19:36:20 Yep, good idea 19:36:22 after all, that's basically how it's working with apache and a real filesystem 19:36:38 I like that idea 19:36:59 Also swift apparently have a plan to not be terrible at threads that'll help our connection pool management and stop so much lag (hopefully) 19:37:37 cool, agreement to dogfood and some next steps... anything else? 19:37:53 Okay so it sounds like we're ready to dog food and will make some tweaks to the index loading and then documentation as we go 19:38:10 jhesketh: thanks! 19:38:12 #topic Priority Efforts (Puppet module split) 19:38:13 jhesketh: couldn't the output page be cached by osloganalyze? 19:38:57 asselin is probably gone by now... 19:39:05 nibalizer: anything related to this we should chat about? anything blocking? 19:39:13 fairly related to this, i have two changed for httpd puppet module that need final reviews 19:39:20 ianw: hmm, with something like memcache it might not be a bad idea. Let's take that as an offline improvement 19:39:22 https://review.openstack.org/136959 (update rakefile) 19:39:35 https://review.openstack.org/136962 (adding the module) 19:39:38 there are couple of things that can be merged 19:39:41 #link https://review.openstack.org/136959 19:39:41 #link https://review.openstack.org/#/q/status:open++topic:module-split%29,n,z 19:39:47 jeblair: i think we're swell 19:40:05 mmedvede: ah thanks 19:40:19 i think things have been sorta slow lately, but i attribuet that to infra-manual sprint and thanksgiving so im not worried 19:40:50 so we should consider those priority reviews 19:41:23 There was also some movement towards more automation on splits 19:41:30 ianw: maybe you should change your topics to 'module-split' ? 19:42:10 i mean, it's split out, but it still seems related to the effort :) 19:42:54 mmedvede: can you exapnd on the automation 19:42:59 I have some questions 19:43:15 i think there's a pending change to add a script to system-config 19:43:22 cool 19:43:56 #link https://review.openstack.org/#/c/137991/ 19:44:01 anteaya: we are trying to maintain all the splits automatically, before they are pulled in 19:44:19 great 19:44:20 asselin and sweston were who worked on it 19:44:24 I have to step out, thanks everyone! 19:44:45 and asselins says that the script updates sweston's github repo 19:44:53 was is the trigger for the update? 19:44:55 anteaya: correct 19:45:34 my concern is extra overhead having source under sweston's control and asselin's patch 19:45:48 in case the seed repo needs to be respun 19:45:52 jeblair: the important one, https://review.openstack.org/#/c/136962/ , has had two +2's and 4 +1's ... so it's been seen ... but we can't use the module until it's in 19:45:56 which can happen 19:46:08 anteaya: I see. There could be another step added, that actually would validate 19:46:26 i.e. run a simple diff with their upstream vs system-config 19:46:40 mmedvede: if we can have anyone trigger a respin of the repo, especially a patch owner that is all I am looking for 19:47:47 #topic Priority Efforts (Nodepool DIB) 19:47:59 dib images work in rackspace now 19:48:05 congrats! 19:48:08 I have repeatable scripts to make them work and whatnot 19:48:12 mordred, using glance import and siwft? 19:48:16 yolanda: yes 19:48:31 it turns out that in rax we need to use glance v2 and swift, and in HP we need to use glance v1 19:48:51 jroll also got rax to put info we need into config-drive 19:48:52 the trusty nodepool server is not working at scale, and we're digging into why. 19:49:08 so I think before we roll this out, we want to wait for that change to go live so that we can drop nova-agent 19:49:12 however, nova-agent does work 19:49:17 and we have dib elements for it 19:49:22 mordred: ++ 19:49:30 my next step is to turn the shell scripts I have into python 19:49:43 at which point I'm probably going to want to have a conversation about nodepool+shade 19:50:03 because "I want to upload an image" having two completely different paths per cloud is not really nodepool specific logic 19:50:08 also we seem to possibly be overloading our git server farm during snapshot image updates, and switching to dib will reduce that a whole bunch 19:50:09 mordred: and everyone else really. I would really appreciate it if we stopped pushing so many new features to nodepool until we get the current ones working :) 19:50:29 its great that people are excited about nodepool but there are a few things currently wrong with it and I don't think they are getting much attention 19:50:59 in any case, if people want to look at the elements, they're up for review against system-config 19:51:07 I'll move them to project-config and repropose soon 19:51:25 mordred: i'm probably going to want to have a conversation about shade's project hosting and maintenance situation :) 19:51:32 jeblair: me too :) 19:51:43 jeblair: I imagine that will be part of that conversation 19:51:58 ALSO - I'd like to suggest that we move nodepool/elements to just elements/ - because I think we're going to wind up with non-nodepool elements too (see sane base images for infra servers) 19:52:02 but that's not urgent 19:52:33 *nod* 19:52:34 mordred: they are in different repos though 19:52:46 clarkb: idea is to not propose the elements I've been working on to system-config 19:52:50 but to project-config 19:52:57 and have one set of elements and have it be tehre 19:53:18 elements for non-nodepool things would live in project-config? 19:53:34 i feel like we're getting really close to a chicken and egg problem 19:53:50 what non-nodepool elements would be likely to get used for project-specific stuff? 19:54:39 probably we should start with duplication and see what we end up with and if it makes sense to de-dup? 19:54:58 this might be a bit hard to reason about at the current level of abstraction/understanding :) 19:55:14 ya I think that is something to revisit when dib is a bit more concrete for us 19:55:21 right now its very much hand wavy 19:55:29 cart before the horse then 19:56:02 #topic Priority Efforts (jobs on trusty) 19:56:10 i prodded bug 1348954, bug 1367907 and bug 1382607 again a couple weeks ago, but no response from anyone in ubuntu/canonical on having a less-bug-ridden py3k in trusty 19:56:14 Launchpad bug 1348954 in python3.4 "update Python3 for trusty" [Undecided,Confirmed] https://launchpad.net/bugs/1348954 19:56:15 Launchpad bug 1367907 in python3.4 "Segfault in gc with cyclic trash" [High,Fix released] https://launchpad.net/bugs/1367907 19:56:16 Launchpad bug 1382607 in python3.4 "[SRU] Backport python3.4 logging module backward incompatibility fix." [High,Fix released] https://launchpad.net/bugs/1382607 19:56:28 i'm open to suggestions on how to raise the visibility of those 19:56:53 zul asked me for the bug numbers again last week, but not sure if he was looking into them now too 19:57:05 switch to an os that supports python3.4? 19:57:22 there's an alternative 19:57:42 or build our own real python interpreters for tests rather than using distro-provided python ;) 19:57:54 or use the ones they pushed into that other repo 19:58:06 (which come with who-knows-what features back/forwardported into them) 19:58:14 that other repo? 19:58:17 i could look at pushing it to a ppa again as well 19:58:39 fungi: py3.4 is fixed in one of the package maintainers repos iirc 19:58:48 fungi: which you tested to confirm that the new package fixed our problem 19:59:00 clarkb: yeah 19:59:06 zul: can you just yell at them to release it already? :) 19:59:11 clarkb: at least fixed for that one patch 19:59:16 jeblair: ive done that 20:00:12 thanks everyone 20:00:17 we skipped over them, but core eyeballs on the priority specs would be much appreciated 20:00:23 anyway, that's the current state. patched python 3.4 in trusty is the current only blocker to moving our py3k jobs to 3.4/trusty 20:00:25 pleia2: ++ 20:00:36 and gerrit topics are up next time 20:00:38 #endmeeting