19:02:40 #startmeeting infra 19:02:41 Meeting started Tue Jul 12 19:02:40 2016 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:44 The meeting name has been set to 'infra' 19:02:50 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:02:55 #topic Announcements 19:03:01 #info Reminder: late-cycle joint Infra/QA get together to be held September 19-21 (CW38) in at SAP offices in Walldorf, DE 19:03:06 #link https://wiki.openstack.org/wiki/Sprints/QAInfraNewtonSprint 19:03:16 #topic Actions from last meeting 19:03:20 Gerrit hackathon is on same week! 19:03:24 #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-07-05-19.02.html 19:03:33 there were "(none)" 19:03:34 zaro: at the asme place, see you there? :) 19:03:41 #topic Specs approval 19:03:44 i bee there! 19:03:50 #topic PROPOSED: Spec for testing Newton on Xenial transition (clarkb) 19:03:56 #link https://review.openstack.org/337905 19:04:02 #info Council voting is open on "Spec for testing Newton on Xenial transition" until 19:00 UTC, Thursday, July 14 19:04:09 note that's being proposed as a priority spec 19:04:32 anything we need to say about it this week clarkb? 19:04:40 other than it's basically already underway 19:04:59 just that there is also a zuul and nodepool change proposed that I should update the topics on and I am doing a ton of testing for tox things across the projects that run python-jobs in zuul 19:05:17 cool 19:05:19 and other claenup stuff is shaking out of that like empty repos and repos missing tox.ini and so on 19:05:30 so will likely transition to trying to clean that up post xenial switch 19:05:48 its amazing what you find when you run all the tests in a giant loop :) 19:05:58 yeah, daunting! 19:06:14 but no major issues with xenial yet 19:06:17 thank you for dusting out the corners 19:06:26 a possible sqlite3 problem and nodepool is unhappy with the newer mysql 19:06:34 notmorgan: ^ I think you have a change to address the nodepool thing? 19:06:43 looks like bindep is the trick to make this work however i don't see any examples of an actual others-requirements.txt file. can we add something that's definative? 19:06:53 i mean in the docs 19:07:25 zaro: AJaeger has been proposing a bunch of them to projects so far, so get up with him when he's around 19:07:34 or i can find you some after the meeting 19:07:35 cool. 19:07:42 oh right for python-jenkins 19:07:45 I can help with that too 19:08:01 thanks clarkb! 19:08:06 #topic PROPOSED: Firehose, a unified message bus for Infra services (mtreinish, fungi) 19:08:13 #link https://review.openstack.org/329115 19:08:19 #info Council voting is open on "Firehose, a unified message bus for Infra services" until 19:00 UTC, Thursday, July 14 19:08:30 I am excited for this 19:08:34 fun side project i promised to help mtreinish with 19:08:52 don't really need to discuss it in meeting unless there are serious concerns with it at this point 19:09:13 my questions are already addressed in the spec, so yay exciting 19:09:19 but please review. we're submitting a talk for the upstream dev track about it 19:09:50 i'm very intrigued what things we'll be able to do with it :-) 19:10:18 #topic Priority Efforts: Zuul v3 and Nodepool+Zookeeper (jeblair) 19:10:37 i guess i should have listed Shrews there as well 19:10:46 prepared text bomb to save time: 19:10:47 (consider yourself pinged!) 19:10:58 i think the zuulv2.5 work is stable now, and i and some other folks are focusing significantly on zuulv3 now. we're setting an ambitious goal of having enough done to do serious migration work by the time of the infra/qa latecycle. 19:10:58 i'm volunteering to help people find things to work on in v3 -- here's what people are working on now 19:10:58 Shrews is working on zookeeper for nodepool, which i consider in the critical path for v3 19:10:58 notmorgan is working on converting zuul to py3 so that we can have nice websockets and console streaming 19:10:58 i'm working on the mind-bending configuration changes in the v3 spec 19:10:59 jhesketh is working on forward-porting some of the stuff we learned about ansible in v2.5 19:10:59 mordred is working on figuring out how we can securely run jobs that are native ansible 19:11:00 so if you want to pitch in or follow any of those efforts, those are the folks to talk to 19:11:00 if you have significant time and ability to contribute to zuulv3, let me know and i can find something for you 19:11:00 if you are looking for smaller things, then at this point, probably helping review those efforts would be best, 19:11:01 or, alternatively, i have been sprinkling the code with lots of "TODOv3" items -- many of those could probably be worked in smaller chunks (they have my name on them, but that just means i have to explain what they mean, not that i have to do them) 19:11:05 [bam] 19:11:29 (also, i'm stuffing my face with a sandwich, so that helps) 19:11:54 jeblair: can we also tack on there "fix floating ip support in nodepool" 19:12:12 its not directly related but is important because we have ltos of users of nodepool that may not be able to use provider nets 19:12:24 i have doubts that fip support in nodepool is tied to zookeeper 19:12:27 so fwiw 19:12:28 it isn't 19:12:32 clarkb: s/nodepool/openstack/? :) .... but er, what's that about? 19:12:39 but yeah, that's something that can be worked on regardless i'm sure 19:12:40 the py3 stuff, zuulv3 will not be py27 compat. 19:12:48 jeblair: well that too, but we know that nodepool can be made to work better bceause it did work better in the past 19:12:54 we will be dropping (currently planned) py27 support 19:13:01 jeblair: but recent chagnes have been a major regression in the reliability even though things weren't perfect before 19:13:06 I'm happy to help get our zookeeper cluster online. https://review.openstack.org/#/c/324037/ adds a single instance to nodepool.o.o but wonder if we should expand it to the full cluster out of box 19:13:15 jeblair: i'm happy to get streams of work 19:13:26 pabelanger: i'd like to just run one zk on nodepool.o.o 19:13:36 so i'll follow up with people you mentioned or feel free to ping me later with whatever you think i could work on 19:13:40 jeblair: then we are ready :) 19:13:40 rcarrillocruz: cool, we'll chat later 19:14:03 we're mainly not impacted by the fip issues at this point because one of our fip-requiring providers went out of business, one switched to provider networks and one is offline 19:14:34 but presumably other users of nodepool are in situations where fip use is mandatory for silly reasons 19:14:51 clarkb: this feels like a +mordred+Shrews conversation; mordred is not in a compatible tz atm 19:15:10 ok maybe we can pcik this up when mordred and shrews are in compatible timezones 19:15:39 yeah, let's pick this up later... but i agree, i don't want to be in a position where we have to turn down a cloud because fips dont work 19:15:48 i think the fip takeaway (and i share clarkb's concern) is that we ought to make sure that new development on nodepool stuff doesn't cause us to forget that we still need to fix a use case we were supporting and severely regressed 19:16:08 ++ 19:16:09 we are technically turning down bluebox right now for that reason 19:16:12 I'd also be interested in a discussion to maybe convert a few of our JJB templates into ansible roles. When time permits, maybe get a short list together 19:16:22 or ansible playbooks 19:16:23 i hope the fix for that is in shade anyway 19:16:38 so it should at least have a good api barrier 19:16:40 morning all 19:16:41 fungi: yup that 19:16:49 pabelanger: what's the impetus behind that? 19:16:58 pabelanger: i'm hoping that i can pitch in on the jjb->ansible conversion effort when zuul v3 draws closer 19:17:06 bkero: we won't have JJB for zuulv3 19:17:24 Ah, makes sense 19:17:44 pabelanger: you can likely automate/compile the vast majority to start 19:17:53 pabelanger: shell builder becomes ansible shell module 19:17:54 fungi, pabelanger: yeah, i'm hoping we can do a lot of that closer to the end -- maybe during the latecycle meetup. but starting to poke at it a little may help inform development/testing 19:17:55 and so on 19:18:47 cool, anything else on the zuul v3 and nodepool+zookeeper topic? 19:18:49 clarkb: there's two forks -- how to convert all of our jobs, but also, how would we redesign the jobs with the additional tools we have 19:18:49 Moving to ansible should get us better resources to analyze run failures tool \o/ 19:18:51 clarkb: Yup, if we get some good working examples, other projects can be quickly bootstrapped for the conversion 19:19:02 bkero: yep 19:19:10 jeblair: ya definitely don't want to stick to the jjb model long term if we can make things better 19:19:21 ++ 19:19:26 fungi: the only other point i wanted to raise is.... 19:19:49 aha, see it in the agenda now 19:19:51 that i had been thinking we would drop dedicated zuul mergers in v3, in favor of combined ansible merge/launch workers 19:20:22 but after writing 340597, i think zuul may need some horsepower to gather proposed config changes 19:20:40 and i'm wondering if having a dedicated merger option might be a good idea 19:20:52 it does seem reasonable to make the merge work independently scalable from the job runnning 19:20:58 i honestly don't know, but it's a potentially significant deployment difference 19:21:12 one is a per-changeish/enqueue workload, the other is a per-job-run workload 19:21:13 so i wanted to bring it up as something for folks to think about 19:21:36 fungi: exactly -- and i'm wondering, can we skim off the top of the merge-launchers for that, or will it be too much... 19:22:07 jeblair: the underlying concern being related to in project configs? 19:22:13 fortunately, i hadn't gotten far in gutting them, so it's not too hard if we want to keep em around 19:22:19 jeblair: i guess if the merge task is treated in the abstract like a kind of job, then maybe? 19:22:23 clarkb: yes -- we need mergers to merge and read those files 19:22:52 will this affect zuul downstream consumers? 19:22:55 so you have a "job" to create teh merge commits, and then your other jobs depend on that "job" 19:23:17 fungi: it's a bit less abstract than that because it happens before zuul decides what jobs to run 19:23:30 and i guess teh v3 ref push model does get simpler if the launcher has those commits locally on hand already 19:23:48 fungi: but the main thing is that it's using *part* of the merger-launcher resources, but not as much as a real launched job 19:23:54 jeblair: good point, since the merge commits reflect repo states that have job configuraiton in them 19:24:54 oh, though one launcher would be generating the merge commits, other launchers might run the jobs? 19:25:04 so you'd still need to shuffle them around the network 19:25:18 fungi: indeed. so instead of that, we'd probably just end up merging commits more than once. :( 19:25:25 anteaya: yes the whiole thing is a large non backard compat change 19:25:34 clarkb: okey dokey 19:25:35 anteaya: so anyone downstream will need new configs and deployment stuff 19:25:48 or can pin to v2.5 and stay there a while 19:25:58 (if we don't have dedicated workers, there is a small chance a merge-launcher could reuse a commit on hand, but we will be large enough that won't happen often) 19:26:04 zuul 3.0.0 will be a major configuration, dependency and behavior change from <3 19:26:21 ack 19:26:26 anteaya: yeah, so all this is happening in a branch; deployment of it will be a very considered affair later on 19:26:42 fungi: (that's eot from me) 19:26:42 ah okay, sorry, I'll table my question 19:26:59 anybody else have questions on this work before i move on to general meeting topics? 19:27:43 What is the current status of the gerrit its-storyboard plugin? (anteaya) 19:27:46 er 19:27:52 #topic: What is the current status of the gerrit its-storyboard plugin? (anteaya) 19:28:07 #undo 19:28:08 Removing item from minutes: 19:28:11 #topic What is the current status of the gerrit its-storyboard plugin? (anteaya) 19:28:21 must be time for my afternoon coffee 19:28:27 this is a question for zaro 19:28:35 so there are some patches: https://review.openstack.org/#/q/topic:its-storyboard 19:28:44 #link https://review.openstack.org/#/q/topic:its-storyboard 19:28:44 and some look ready to go save for some reviews 19:28:59 zaro: do you just need reviews at this point or do you need anything else? 19:29:00 yes, they are all ready 19:29:08 jesusaur says change https://review.openstack.org/#/c/340605/ is failing due to https://review.openstack.org/#/c/340529/ 19:29:25 well the one system-config patch depends on two puppet-gerrit patches, one of which is failing lint tests 19:29:27 i really have no idea if that's true or not 19:29:34 i guess it would be good to go ahead and get those through. i should probably have put this under Priority Efforts: A Task Tracker for OpenStack 19:30:00 zaro: this has a lint failure: https://review.openstack.org/#/c/331523/ 19:30:11 ah sorry, I'll put it there next time 19:30:22 yes, 340529 is causing that lint failure according to jesusaur 19:30:30 zaro: ah thank you 19:31:06 but he also says he wasn't able to test it so i'm not sure whether it's a fact 19:31:12 so nibalizer crinkle rcarrillocruz could anyone spare some puppety type review time for https://review.openstack.org/#/c/340529/ 19:31:27 and I guess that is it for this topic, thank you 19:31:34 unless you have more zaro 19:31:53 nope, other than it's been testing with review-dev & sb-dev 19:32:02 seems to be working 19:32:04 \o/ 19:32:08 wonderful, thank you 19:32:11 thanks fungi 19:32:21 that's good news, thanks for working on this zaro :) 19:32:28 awesome 19:32:35 #topic Mascot/logo for your project (heidijoy, fungi) 19:32:40 #link http://lists.openstack.org/pipermail/openstack-dev/2016-July/099046.html 19:32:44 just a heads up that the foundation wants some consistent logos/mascots for each project team, primarily for them to use on foundation sites, but they have an illustrator on board to work with us on the design and we can reuse the artwork for any other community things we want 19:33:26 i figure there are probably some on the team who have an affinity for this sort of bikeshed, so we can do that via ml or etherpad 19:33:26 infra badger? 19:33:28 awww 19:33:28 preference? 19:33:41 well we have had a gear 19:33:42 omfra animal 19:33:44 fungi's pet betta? 19:33:45 * rcarrillocruz remembers pleia2 showing zuul mascot in some talk :D 19:34:04 that was a pink dragon I think 19:34:05 yeah, pleia2's awesome mascot is disqualified for being a mythological creature 19:34:14 hah 19:34:16 it was a pink dragon 19:34:29 fungi: what, there are rules? 19:34:34 apparently that's an important aspect for reasons i'm unable to entirely reconcile 19:34:40 ocelot 19:34:44 anteaya: see the linked e-mail for details 19:35:22 It's hard to get mythical creatures for the OpenStack field trip to the Sydney petting zoo 19:35:23 but anyway, i don't want to take up meeting time soliciting mascots, just to figure out where we should coordinate ideas and come to some consensus (for those who care about it) 19:35:31 it just says "from nature" 19:35:38 it doesn't say no dragons 19:36:00 It says "animal or natural feature" at http://www.openstack.org/project-mascots 19:36:00 anteaya: touché. komodo dragon is probably already taken though 19:36:01 I'll pick it up later, moving on 19:36:12 we can try 19:36:30 I'm arguing for dragon as a feature 19:36:35 okay I'll stop 19:37:10 #link https://etherpad.openstack.org/p/infra-mascot-ideas 19:37:44 #info post mascot ideas and discussion to the etherpad, and we'll have a vote in the meeting next week 19:38:13 or we can decide in next week's meeting to do a civs poll instead 19:38:17 whatever 19:38:26 anyway, week for feedback 19:38:39 #topic Intel/Mellanox Barcelona talk proposal: Openstack CI - testing SRIOV/NFV (wznoinsk) 19:39:18 wznoinsk_: around>? 19:39:58 i guess let's come back to this at the end if we still have time 19:40:02 #topic Translations Checksite status and call for help, now with more data! (pleia2) 19:40:05 #link http://lists.openstack.org/pipermail/openstack-infra/2016-July/004524.html 19:40:58 mostly I just need some feedback here 19:41:06 I outlined the issues we're having, hopefully clearly, in that email 19:41:15 happy to have this discussion on list, I just need that to happen :) 19:41:40 I'll admit to raeding the email and not knowign where to start 19:41:43 yep, so the primary issue is that redeploying devstack periodically is extremely failure-prone? 19:42:33 yeah, i'm confused why devstack is needed 19:42:41 specifically deploying from master branch tip with most services (those getting active translations) enabled? 19:43:07 fungi: and configuring the fakes on the backend I think 19:43:19 rcarrillocruz: well, we don't need devstack exactly, we just need running openstack services deployed from master branch tip, with accounts configured for translators to log into and check their applied translations 19:43:27 well 19:43:30 yeah, so translations cleanly apply 19:43:37 since they're also translating from dev 19:43:38 that looks like a job the launcher could help with 19:43:49 i assume the issue is doing the initial boostrap of the 'cloud' 19:43:51 projects, users et 19:43:53 c 19:43:55 ? 19:44:01 rcarrillocruz: no I don't think so, that all works fine with devstack 19:44:02 yeah, we need to import/carry over the config 19:44:28 so yeah, or no? 19:44:29 could we run it as a periodic zuul job on a dedicated static host? 19:44:37 do we also need to preserve database contents between redeploys? 19:44:48 then we don't have to worry about puppet or any of that 19:44:49 the trouble is we're not really involving CI in this right now, since the in-progress translations come from Zanata, not git 19:45:05 they only hit git when they reach 75% complete 19:45:23 fungi: no 19:46:12 well, we _could_ have a zuul job which does the devstack deployment and retrieves the translations i guess, with a static node as clarkb mentioned, though i'm not sure that changes the nature of the problem much 19:46:24 fungi: it avoids the runtime issues 19:46:24 yeah, i mean 19:46:24 fungi: nods 19:46:31 the real issue is fragility of puppetry 19:46:31 fungi: and uses the tools we expect to work all day 19:46:33 ? 19:46:42 it would be cool if we could use selenium some how to long into devstack and take screenshots of the current pages, then store them for people to review 19:46:51 I have no idea how much work that is 19:46:51 basically trying to keep up with master of openstack is hard 19:47:04 the benefit of using the CI tooling is it by definition needs to keep up 19:47:15 pabelanger: we already do that 19:47:20 could you snapshot a .qcow2 that people run locally? 19:47:21 pabelanger: in the horizon functional job 19:47:22 clarkb: really? 19:47:26 wow 19:47:26 pabelanger: yup 19:47:27 TIL 19:47:28 problem though is unreliability of devstack deploying, so whether it's puppet or ansible/shell doing that if we're reusing teh same environment it's hard to fall back to an old but working version 19:47:42 clarkb: yeah, that's why I had my #4 in my email - we have this "make devstack every day" tooling, can we use it? 19:48:39 for red/blue i guess we could deploy to alternating nodes and update dns, but we don't currently have any dns automation either 19:48:49 ianw: most translators don't have access to the resources (and sometimes lack the expertise) to run such a thing, they need a hosted solution they can just log into with a web browser 19:49:56 yep, which was basically the crux of the request from the translation team to start with 19:50:10 run something where we can demo our translations, because we can't all run openstack 19:50:14 * pleia2 nods 19:50:54 I don't think we need to solve this in this meeting, but I could use one or two of you to hash over some options with 19:51:06 it boils down to a very constrained continuous deployment of openstack problem, but it is as noted still very hard to continuously deploy openstack from master with any real consistency 19:51:14 fungi: it won't sovle anything to use the gate infrastructure but so much of that tooling is literally make devstack reliable 19:51:53 clarkb: yep, i think using the ci for this is a viable choice, it just wasn't what got chosen in the original spec and subsequently partially implemented 19:52:15 well, i can help with devstack ... but yeah, devstack just isn't very good for keeping things running longer-term in a no-touch manner 19:52:28 anyway, i can try to carve out some time to poke at this, though hopefully there are others with available time to help 19:52:47 I'm not traveling again until the end of the month, so just grab me whenever, I'm around 19:52:59 #topic Make it more obvious that OpenStack source on GitHub is a mirror (pleia2) 19:53:13 this will almost certainly eat up the remainder of the meeting time 19:53:28 wznoinsk_: if you're around, find us in #openstack-infra after the meeting 19:53:30 pleia2: yeah, happy to help on ansible/puppet front 19:53:38 we talked about this before the meeting a bit :) 19:53:43 #link http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-07-12.log.html#t2016-07-12T18:35:15 19:53:44 i need to really read the whole thing to understand the problems 19:54:21 I think the idea in general pleia2. Update the descriptions is a good first step I think 19:54:24 rcarrillocruz: thanks, the spec + what we have in the puppet module is pretty comprehensive 19:54:35 I've seen an increase in people submitting pull requests on github against OpenStack projects, about once a day 19:54:37 s/think/like/ 19:54:52 I think this is just an increase in popularity of github in general, but we don't have obvious blinking banners telling people not to do that 19:55:18 Just a link to the cgit repo in the README is probably enough for most things. 19:55:32 Part of the issue is that github has better google-fu than cgit 19:55:35 persia: yep, and we do that, but this is solving the corner cases 19:55:38 There's always something like https://nopullrequests.appspot.com/ 19:55:53 pull request bot that closes them and leaves them a message about the proper place to submit patches 19:55:58 bkero: we also automatically close all pull requests with a polite message 19:56:00 bkero: we have one 19:56:06 oh cool 19:56:07 but it makes people feel sad and angry 19:56:14 yeah, iirc jeepyb does something like that 19:56:19 they put in effort, just to be told "nope, go here" 19:56:26 Yup, I can confirm the rage of closing a PR 19:56:43 anyway, the point of this agenda item is to make sure it's worth my time and effort for a multi-prong approach to improve the situation 19:56:57 as pabelanger said, maybe appending descriptions we sync with Read Only: 19:57:00 pleia2: what do you want to fix and do you think it will be fixable? 19:57:02 as mentioned before the meeting, i'm in favor of doing whatever else is relatively easy and automated to help make this even more obvious. just be wary that we'll never cover 100% of people who don't read what's right in front of them nor people who get incredulous over the idea of using anything besides github 19:57:03 also looking into adding a PR template explaining 19:57:13 if you want to spend time doing this, I support you 19:57:13 fungi: My suggestion is to have that near the top, rather than later on. As an example, maybe replace "OpenStack Compute (Nova) http://openstack.org" with "OpenStack Compute (Nova) http://openstack.org https://git.openstack.org/cgit/openstack/nova/" or something. 19:57:21 I feel like better contributors would be willing to understand, so I'd be wary of trying too hard to please the grumps 19:57:30 Zara: agreed 19:57:32 but I don't think we will reduce or eliminate people who have bad experiences 19:57:38 it won't solve the problem, but I seek to reduce it as much as possible 19:57:42 also be aware that mass api operations against github are terribly slow and unreliable 19:57:56 we used to update project descriptions via jeepyb/manage-projects 19:58:00 we had to stop 19:58:15 I didn't realize we had stopped 19:58:22 /o\ 19:58:23 a couple years ago, yeah 19:58:43 alright, I'll see what's feasible for automation and report back in the form of patches and things 19:58:43 it was making important ops like make project fail 19:58:45 you can fairly reliably update _a_ project's description via the github api 19:58:52 I think you get 5k requests per hour 19:59:12 if you try to update a hundred in a loop, you should expect it to not work so well 19:59:15 for a one time thing, i'm sure we can handle updating descriptions slowly 19:59:28 and if you want to do all 1k+ of our repos in one go, brace yourself for disappointment 19:59:43 noted 20:00:05 yep, that's basically my point. don't expect that we'll be able to dynamically adjust this at will. plan for it to be a manual update and for it to take a while 20:00:21 thanks for the meeting 20:00:31 thanks everyone! 20:00:35 #endmeeting