19:02:38 #startmeeting infra 19:02:38 o/ 19:02:38 Meeting started Tue Apr 9 19:02:38 2013 UTC. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:39 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:41 The meeting name has been set to 'infra' 19:02:44 o/ 19:03:08 mordred: do you have anything before you get on a plane? 19:03:20 jeblair: I need to unbreak 2.6 for pbr projects 19:03:25 #topic mordred plane flight 19:03:55 we had wanted to just wait for rhel to fix that for us 19:04:06 fungi: but i'm guessing that's not going to happen today 19:04:10 yeah - but rhel seems unhappy too I thought? 19:04:25 or is it legit just 2.6 ubuntu that's giving us the problem? 19:04:25 mordred: and pbr is broken now 19:04:26 i believe dprince was still working on backports from master 19:04:51 which is to say we could switch to rhel6 today for master/havana afaik 19:05:09 mordred: so what if we switch the pbr projects to rhel for 26? 19:05:21 jeblair: let's try that as step one 19:05:32 jeblair: and see how it goes before trying a plan b 19:05:39 mordred: i'm worried that otherwise it involves puppet hacking 19:05:43 it _should_ solve the underlying problem 19:05:43 mordred: have an example of a job you'd like me to fire on a rhel6 slave as a test? 19:06:00 mordred: which would be very useful, i mean, clarkb thinks we are going to run into this again with python3 19:06:06 or is it already testing successfully on rhel6 non-voting? 19:06:15 jeblair: I actually spoke to someone about that and assuming pip -E works properly we should be able to patch the puppet provider 19:06:27 or we run puppet twice with two different python envs set 19:06:29 fungi: gate-os-config-applier-python26 and gate-gear-python26 19:06:44 jeblair: yah. I believe we ultimately need to solve this more resiliently 19:06:46 mordred, clarkb: you want to fight over which of you hacks that? :) 19:06:55 but I think that doing that with some time to think about it properly would be nice 19:06:55 jeblair: I did install zmq for the log pusher script through apt and not pip to avoid this problem :) 19:07:06 mordred: those are also failing on rhel6... 19:07:07 clarkb: also, that's preferred anyway. :) 19:07:09 #link https://jenkins.openstack.org/job/gate-os-config-applier-python26-rhel6/ 19:07:20 fungi: GREAT 19:07:35 no idea if they're failing the same way, but they're failing 19:07:37 jeblair: I can bring it back up with finch over at puppetlabs, and see if we can hack something useful for all of puppet users 19:07:38 mordred: fascinating -- and that's with a 2.6 egg 19:08:00 ok. that blows my previous theory 19:08:39 basically requires me to do more pip testing and feed him the info so that we can get a patch into the provider 19:08:50 how about I spin up a rhel env and debug and get back to everyone. in the mean time, os-config-applier and gear could disable python2.6 tests for the time being if they're blocked 19:08:58 mordred: +1 19:09:05 sounds good 19:09:09 clarkb: +1 as well 19:09:13 (not ideal, but, you know, neither project will die without 2.6 for a day) 19:09:16 clarkb: ++ 19:09:27 * mordred will add better 2.6 testing to pbr as well 19:09:30 * mordred cries 19:09:35 mordred: and did you see the zuul bug i filed? 19:09:44 * dprince is willing to help out if needed too 19:09:46 #link https://bugs.launchpad.net/zuul/+bug/1166937 19:09:47 Launchpad bug 1166937 in zuul "Option to group multiple jobs together in job trees" [Wishlist,Triaged] 19:09:54 jeblair: I did not. I'll look 19:09:59 awesome 19:10:20 dprince - if you happen to get bored and figure out why https://jenkins.openstack.org/job/gate-os-config-applier-python26-rhel6/ is breaking in the next hour before I get back online, I will buy you a puppy 19:10:22 i encapsulated what we talked about, including after you dropped off 19:10:29 jeblair: thanks! 19:10:47 mordred: i think that implementation is mostly in zuul/model.py 19:10:58 dprince: make sure it is a house trained puppy 19:10:59 excellent 19:11:00 mordred: and a little bit in the config parser in scheduler.py 19:11:07 mordred: I already have one (a humpy one) 19:11:14 nice 19:11:18 ok - me run to plane 19:11:21 back online in a bit 19:11:29 mordred: godspeed 19:11:53 there were no actions from last meeting 19:11:57 #topi gerrit/lp groups 19:11:59 #topic gerrit/lp groups 19:12:25 mmm, did we still have any to-dos on that? 19:12:42 i think it's wrapped up aside from any other cleanup ttx might have wanted to do in lp 19:12:52 fungi: did the ptl change land? 19:13:01 i was just checking... 19:13:24 #link https://review.openstack.org/25806 19:13:30 merged yesterday 19:13:53 woo 19:13:55 oh, i probably should add a note in that groups cleanup bug of ttx's 19:14:03 #topic grenade 19:14:19 dtroyer pointed me at some changes he wants to merge first, and then... 19:14:30 we can cut stable/grizzly branches of grenade and devstack 19:14:50 and then i think we'll be set to run non-voting grenade jobs widely on both master and stable/grizzly 19:15:14 \o/ 19:15:20 is it working now? 19:15:29 clarkb: it has occasionally succeeded 19:16:14 nice 19:16:16 clarkb: i haven't really analyzed the failures to know more about when it succeeds/fails 19:17:09 #topic gearman 19:17:40 so on my side, i wrote a new python gearman client that is much more suited to how we want to use it it zuul 19:17:43 this is depressing me 19:17:58 i've been debugging. 19:18:02 #link https://github.com/openstack-infra/gear 19:18:11 finally figured out exactly why getting double builds. 19:18:57 it is because error occurs when attempting to reregister functions while current build is running. 19:19:17 zaro: i can't see a need to register functions while a build is running 19:19:43 when error occurs on worker it will close the connection with gearman then reopen but build is still on gearman queue so it runs again. 19:20:34 jeblair: code i've got re-registers on events from jenkins. 19:20:55 zaro: right, but it doesn't need to do that while a build is running 19:20:57 jeblair: you might want to register at anytime. 19:21:24 jeblair: you mean block until build finishes? 19:21:30 zaro: functions are registered per-worker; a worker doesn't need to change its functions while a build is running 19:22:26 zaro: i would postpone changing functions until after the build is complete (which i included in my sketch of a worker routine i sent the other day) 19:22:46 jeblair: i see what you mean. i was looking for a way to get more granular in registering, but didn't see a way. i can look again. 19:24:28 ok. will try this approach again. can't remember why i gave up last time. 19:24:33 zaro: i'm of the opinion that the gearman-java GearmanWorkerImpl makes too many assumptions about how it's being used; I think we probably will need to write our own GearmanWorker. I'm still reading, but I'd like you to consider that as you continue to dig into it. 19:25:08 will do. 19:25:43 #topic pypi mirror/requirements 19:26:07 we are gating openstack/requirements on the ability to isntall all requirements together 19:26:38 and when https://review.openstack.org/#/c/26490/ merges, we will actually be running the requirements gate jobs for projects 19:27:51 we probably need to make the jobs and repo branch aware pretty soon... 19:28:06 i think that depends on how openstack/requirements wants to handle branches. maybe a summit question. 19:28:47 I do have a question about reviewing openstack/requirements. currently we have +2 and approve perms, but it seems like we should defer to the PTLs for most of those reviews? 19:29:35 clarkb: i think so. i only really intend on weighing in when it seems to affect build/test oriented things... 19:29:47 i've been refraining from approving them in most cases if it's only ci core votes on them, unless there's some urgency 19:30:26 usually only when it's breaking the gate or holding back a ci project 19:30:39 i don't feel i have a lot of input on random library versions, so yeah, i'd say we should be conservative and mostly the openstack-common and ptls should be weighing in most of the time 19:30:40 fungi: +1 19:31:04 cool. I figured we had the perms to sort out problems, but wasnt sure if we had been asked to actuall manage the repo 19:31:08 er, poor wording. not a ci project but rather ci work on an openstack project which uses the requirements repo 19:31:40 markmc did explicitly want us to be involved, so afaik, we're not stepping on anyone's toes. 19:32:10 we didn't accidentally get perms to the repo, we really are supposed to have them. :) 19:32:57 #topic releasing git-review 19:33:03 it happened 19:33:10 woo! 19:33:16 1.21 is on pypi, manually this time 19:33:33 1.22 may be automated, if the gods are willing 19:33:52 i've been running back through and closing out bug reports if they're fixed in 1.21 19:34:02 fungi: do we need to schedule a chat (perhaps when mordred is around) about pbr/etc for git-review? 19:34:29 yes, some time in one of those rare moments when he's not on a plane 19:34:41 fungi: i'll put it on the agenda so we don't forget 19:34:51 thanks 19:35:07 #topic baremetal testing 19:35:31 so the tripleo folks have been changing up diskimage-builder and how it creates the bootstrap node a bit 19:36:04 so I've been testing changes as it comes along so we're ready once they're ready to start doing formal testing 19:37:13 also been working on getting this going https://github.com/openstack-infra/devstack-gate/blob/master/README.md#developer-setup but keep bumping into issues with the instructions (they're a bit slim, need some local additions and modifications) 19:37:23 * ttx lurks 19:37:32 pleia2: they may have bitrotted too, let me know if you have questions 19:37:44 pleia2: i haven't actually had to follow those in months 19:37:49 did have a wip-devstack-precise-1365534386.template.openstack.org started on hpcloud this morning though, even if it failed once it tried to grab hiera data from puppet 19:38:09 jeblair: great, thanks 19:38:24 just trying to work with it to get a feel for how this works 19:38:31 (aside from just reading scripts) 19:38:50 that's about it though for baremetal 19:38:56 that's awesome progress 19:39:03 o/ 19:39:03 pleia2: you may need to combine the 'install_jenkins_slave.sh' trick of running puppet apply with the devstack-gate developer setup to avoid it trying to talk to our puppetmaster 19:39:28 jeblair: makes sense, thanks 19:39:32 pleia2: did you get past the sqlite db content requirements, i guess? 19:39:44 fungi: yeah, devananda got me sorted :) (I'll be updating the docs) 19:39:53 pleia2: thanks much. :) 19:39:59 excellent 19:40:33 i think the config portion of the sqlite db should become a yaml file (though the status portion should probably remain a sqlite db) 19:40:55 that is not high on my todo list. :( 19:41:23 #topic open discussion 19:41:37 logstash 19:41:45 o/ I added openstackwatch as an agenda item to the wrong wikipage 19:41:53 anteaya: which page? 19:42:06 https://wiki.openstack.org/wiki/Meetings/CITeamMeeting 19:42:13 anteaya: yeah should have been https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting 19:42:13 * clarkb blazes ahead with logstash because it should be short 19:42:21 yeah 19:42:22 d'oh 19:42:23 clarkb: go 19:42:39 logstash is running on logstash.openstack.org. you can get the web gui and query it at http://logstash.openstack.org 19:42:39 anteaya: (didn't know that existed; will delete) 19:42:44 k 19:42:51 currently all jenkins job console logs should be getting indexed there. 19:43:42 I may delete data for cleanup purposes 19:44:03 and some of the current data is ugly, but it is getting better as I add filters to logstash to properly parse things 19:44:18 clarkb: i'd say feel free to delete/reset at will as we work this out up until (if/when) we decide logstash is the primary repository instead of logs.o.o 19:44:18 over the course of 24 hours we have added over 21 million log lines 19:44:44 any feel for how much retention we can reasonably shoot for there? 19:44:46 the index for today (UTC time) is up to almost 12GB compressed 19:44:55 and this is just console logs 19:45:16 at the end of today UTC time I will run the optimize operation on that index to see if that results in a smaller index 19:45:43 clarkb: that is a lot. 19:46:15 yeah, we may need to aggressively filter and/or run a proper elasticsearch cluster if we want to use this for long term storage 19:46:17 clarkb: as in, almost certainly too much, especially since it's a fraction of what we're storing. 19:46:25 yeah, I think we're averaging about 2G/day compressed in static form 19:46:59 fwiw I think logstash is viable as a short term storage location for easy querying 19:47:12 then have an archive like logs.openstack.org for long term storage. 19:47:30 then if we need to run the log-pusher script over particular logs to reshove them into logstash if we need something from the past 19:48:01 thats about all I have 19:48:14 clarkb: if we have to compromise on what we use it for, we should actually set out some goals and requirements and make sure we achieve them. 19:48:27 clarkb: good summit conversation fodder 19:48:29 reminds me, someone might want to check that my find command to rotate logs is behaving as expected 19:48:33 jeblair: yup 19:48:37 pretty sure it should have deleted some by now 19:49:09 pleia2: maybe not... as i said before i didn't restore any logs from prior to september 26 when i rebuilt the server 19:49:19 anteaya: I think you are up 19:49:31 openstackwatch is alive: http://rss.cdn.openstack.org/cinder.xml 19:49:37 but serves no content 19:49:43 fungi: oh right, I had an off by one month in my head month-wise 19:49:55 this is what it should be serving: http://rss.chmouel.com/cinder.xml 19:50:11 so somewhere part of the script is not getting what it expected 19:50:32 so the question came up, do we stay with making swift work or do we go with serving xml files 19:50:34 anteaya: i thought the hypothesis was that review-dev was overwriting it? 19:50:49 that was a potential hypothesis yes 19:50:57 clarkb might be able to expand on that more 19:51:13 jeblair: i chimed in later when i got back from dinner and pointed out that the config on review-dev lacks swift credentials, so could not 19:51:20 ah 19:51:28 I too had missed that, thank you fungi 19:51:36 so in terms of a way forward 19:51:57 stay with swift, or go with xml was my understanding of the question 19:52:05 and also suggested that the stdout capability openstackwatch has as a fallback would be a useful way to troubleshoot it 19:52:39 yes, it seems the lack of content is a separate question from the output format. that just needs debugging. 19:52:45 so at this point, do we have a way to debug what is running? 19:53:06 at least to understand why no content is being served? 19:53:16 i think it would be useful for this group to decide what we actually want to do with this 19:53:18 anteaya: you can run it yourself but don't put swift credentials in the config and it should spew on stdout what it would otherwise upload to swift 19:53:29 what service are we trying to provide? and how should we host that service? 19:53:34 at least from what i could tell reading through the script 19:53:53 yes, well chmouel's feed bears witness to that 19:54:09 but I am at a loss as to why our configuration of the script serves no content 19:54:24 well, the short description is "rss feeds of new changes uploaded for review on individual projects" 19:54:29 jeblair: I think the service here is providing reviewers/interested parties an alternative to the gerrit project watches and email 19:54:30 maybe it would be best to generate xml static file than uploading to swift? 19:55:02 clarkb: that sounds like a useful service; so in that case, i think we should have it automatically generate a feed for every project on review.o.o 19:55:04 chmouel any idea why our config would serve no content yet yours does? 19:55:31 humm i'm not sure 19:55:36 let me check the scollback 19:55:53 chmouel: that's one suggestion which came up. basically modify it so that we serve those rss xml files directly from the apache instance our gerrit server 19:56:00 chmouel: the script is the same: https://github.com/openstack-infra/jeepyb/blob/master/jeepyb/cmd/openstackwatch.py 19:56:01 if we want to make this a more seamless integration with gerrit, then i think we sholud host it at review.o.o. perhaps at a url like 'review.openstack.org/rss/org/project.xml' 19:56:22 and then link that in the gerrit theme? 19:56:32 jeblair: the proper solution would be that gerrit itself provide rss feeds :) 19:56:35 we could have it either read project.yaml or 'gerrit ls-projects' to get the list 19:57:05 chmouel: true. :) people so rarely volunteer for java hacking projects around here. 19:57:32 heh fair java+xml is not much fun 19:58:11 fungi: the theme linking will take some thought i think, especially a way to handle the per-project feeds 19:58:33 jeblair: should we continue to mull on it and discuss it again next week? 19:58:40 I'm not feeling a decision is nigh 19:58:42 right, i'm not immediately coming up with any great ideas as to how to make that visible in the gerrit interface 19:59:03 the projects list is not something people hit often, for example 19:59:06 next week is probably going to be beer^H^Hsummit time :) 19:59:18 plenty of time for discussion :) 19:59:19 * anteaya notes to use the correct wiki page next time 19:59:25 yeah, let's think about that. there's always just documenting it in the wiki; but it would be nice to get some kind of link going on. 19:59:33 thanks everyone! 19:59:41 see you next week, in person, i hope 19:59:44 see you all soon! 19:59:52 see you soon! 19:59:57 #endmeeting