19:15:04 #startmeeting infra 19:15:05 Meeting started Tue Mar 1 19:15:04 2016 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:15:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:15:08 The meeting name has been set to 'infra' 19:15:11 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:15:19 #topic Announcements 19:15:28 #info Huge thanks to crinkle, yolanda, rcarrillocruz, greghaynes, nibalizer, SpamapS and jeblair for their hard work and planning on infra-cloud prior to our event last week. 19:15:30 #info Thanks to pleia2, jhesketh, wendar, purp and Hewlett-Packard Enterprise for logistics/sponsoring of the infra-cloud sprint. 19:15:32 #info Also thanks to HPE and IBM for picking up the group tab at dinner a couple nights in beautiful Fort Collins. 19:15:34 #link http://princessleia.com/journal/?p=11335 pleia2's awesome event summary 19:15:36 #topic Actions from last meeting 19:15:40 #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-02-16-19.03.html 19:15:42 "1. (none)" 19:15:49 #topic Specs approval 19:15:51 nothing new this week 19:15:55 #topic Priority Efforts: maniphest migration (craige) 19:15:57 #link https://etherpad.openstack.org/p/maniphest-migration Created the maniphest-migration etherpad to track issues and progress 19:16:02 okay, now we're caught up 19:16:20 craige: that looks nice and detailed 19:16:35 craige: anything specific you need to call out on that list you need help with next? 19:18:12 i'm taking the silence here as a "no" 19:18:24 i guess we can come back to this one if craige returns 19:18:28 fungi: craige was after a new db dump but not sure if he got it yet 19:18:53 craige: get up with one of our infra-root admins to get you an updated database dump 19:19:12 #topic Stackviz deployment options (timothyb89, austin81) 19:19:16 #link https://etherpad.openstack.org/p/BKgWlKIjgQ 19:19:18 #link https://review.openstack.org/279317 19:19:20 #link https://review.openstack.org/212207 19:19:33 care to catch us up on the current state there? 19:19:52 sure, basically, we're looking for some high-level feedback on the two linked patches 19:20:36 the both have wip in the title of the commit message 19:20:38 our goal is to get stackviz running during gate cleanup, and we're hoping that what we've got can accomplish that without impacting anything else 19:20:39 timothyb89: what is stackviz and what makes it that special? 19:20:54 would you care to remove that in case they gather enough reviews to be merged? 19:21:14 anteaya: Sure, we just wanted to make sure they were in a close-to-merge state first 19:21:16 * AJaeger misses the context for this 19:21:22 stackviz is a visualization utility for tempest runs 19:21:29 probably what's special is that they want to have it run in/after every devstack-based job 19:21:36 I think there is a freenode split which is killing our meeting, FYI 19:21:41 for example, https://stackviz.timothyb89.org/s/5a325c4c-be4d-463f-b282-c313c0d987b0/#/ 19:21:46 oh, or maybe just the ones that use tempest at least 19:21:48 some folks are complaining about "meeting is silent" in another channel 19:21:58 greghaynes: which channel? 19:22:01 a private one 19:22:05 greghaynes: I thought I had checked them all 19:22:06 * craige returns 19:22:28 a meeting in a private channel? weird 19:22:31 So, stackviz just visualizes tempest runs? 19:22:47 AJaeger: currently, yes, and hopefully other items in the future 19:22:49 a meeting in a private channel where they miss logging? Really weird 19:22:52 We think it's a netsplit 19:23:02 AJaeger: It's primary purpose is to help devs get to the root of the problem when their patch fails and see what caused it 19:23:04 heh ;) in a social channel, just wanted to let youall know the likely reason for the silence 19:23:26 without diving into stackviz, I feel moving it out of dib elements and into puppet-stackviz is more inline with how we manage dependencies. 19:23:36 line 7 and 8 of the etherpad has demo links 19:23:48 that's my only real comments atm 19:23:52 pabelanger: is the goal to bake this in to our test images? 19:24:00 pabelanger: would moving it into a puppet module still allow it to run on build nodes? 19:24:01 * nibz returns from the wrong side of the netsplit 19:24:19 nibz: wb 19:24:21 greghaynes: I am not sure the goal to be honest 19:24:39 I really like the details page: https://static.timothyb89.org/stackviz/#/0/timeline 19:24:40 pabelanger: ok. Asking because the future for images we buld is more dib less puppet AFAIK 19:24:41 we do have a longer-term goal of possibly dropping image-build-time puppet from our workers since we're in teh process of drastically simplifying what puppet does for us on them 19:24:42 timothyb89: yes, we manage all of our jenkins slave mostly with puppet, which runs after we do image builds 19:25:11 greghaynes: right. but this would be the first time where we don't manage with puppet. Just confirming if we are okay with that 19:25:24 pabelanger: ah, good question to ask 19:25:49 so, our requirements are to build our repository, and then make the output available on build nodes during devstack-gate host cleanup - would that be possible to do with puppet? 19:26:14 i'm curious why this isn't baked into devstack 19:26:24 it seems like a natural fit 19:26:54 e.g. add a script to the devstack libs dir which pulls in and sets up stackvis before you run tempest 19:27:06 er, stackviz 19:27:09 fungi: that's been discussed before, but I believe the verdict was that it made more sense for performance to do it during image build 19:27:27 the build process is fairly time consuming so we wanted to cache the output 19:27:33 setup takes a while then? would add too much run-time repetition to devstack? 19:27:51 it takes ~10 minutes 19:27:59 is that because of npm install? 19:28:04 nibz: yes 19:28:05 nibz: yup 19:28:12 ahh, okay, i recall you saying something about it pulling reference data from our subunit2sql database? 19:28:13 install? or js minimizing/lint? 19:28:22 greghaynes: all of the above 19:28:38 we didn't deploy npm mirrors yet? or did we? 19:28:45 fungi: its similar to subunit2sql, but it reads test repositories directly 19:28:49 the typical pattern for solving that is to make a package OOB from installation, so installation happens quickly... 19:29:08 okay, so npm-provided dependencies are the time-consuming part 19:29:41 fungi: yes, exactly 19:29:46 and yeah, being able to pre-cache your node.js deps locally isn't an option? you need to fully install them on the worker images? 19:29:47 Having reviewed earlier https://review.openstack.org/#/c/279317 - that did not include any reference on other patches or what stackviz what doing - I was very confused. 19:30:02 * AJaeger is now getting more what you intent to do - and suggest you write a spec for this 19:30:20 They should be cacheable in ~/.npm/$PACKAGE/$VER/ 19:30:36 timothyb89: how do you see this potentially interacting with other job payloads that use packages from npm? is there good sandboxing/isolation in node.js to prevent cross-dependency interactions? 19:31:03 fungi: afaik node dependencies are local to the directory you run 'npm install' from 19:31:13 fungi: yes, npm install creates a dir node_modules where you run it 19:31:17 fungi: I am still unclear if its just installing deps or js minimization (apparently its both?) 19:31:33 fungi: since those two need to be solved differently 19:31:41 greghaynes: it does both, we need to install the dependencies required to run the build process 19:31:54 okay, so it's preinstalling rather than simply caching, but inherent isolation with npm in theory makes that safe 19:31:57 greghaynes: it's basically 2 commands: 'npm install' and then 'gulp prod' to build 19:32:02 timothyb89: yea, but I am curious which one consumes most of the time 19:32:02 umm meetbot seems to be taking the helm regarding the topic 19:32:13 greghaynes: npm install by far 19:32:17 austin81: ah, ty 19:32:18 greghaynes: 'npm install' is the time consuming one, 'gulp prod' takes maybe 15 seconds 19:32:19 npm caches things in ~/.npm, and if not there it fetches it from the internet 19:32:26 timothyb89: awesome 19:32:29 leguin apparently unsplit and reset our topic, but that's fine 19:32:34 So if you throw them in there they should be immutable, useful, and fast 19:32:51 it'll get updated on next topic change unless something goes terribly wrong with the meetbot 19:33:36 timothyb89: so is most of the time in npm install spend downloading, or unpacking/building? 19:33:44 bkero: npm may cache things, but the install process is still time consuming because it will check remotely to verify package versions 19:33:52 AJaeger: So with your added context, what do you think of https://review.openstack.org/#/c/279317 now? 19:33:59 fungi: downloading 19:34:18 austin81: Please write a spec ;) 19:34:48 timothyb89: so in that case, if we pre-cache them it should speed that up tremendously? or are you saying that it's actually the link checking/remote indexing which is the time-consuming part of the download process? 19:35:00 austin81: with the questions here, there's too much unknown and unusual going on, so we should document this properly 19:35:38 fungi: in my experience, pre-caching npm packages doesn't help much, you have to run a full 'npm install' on the project directory to see a real benefit 19:35:44 yeah, i'm mostly trying to nail down what aspect of this makes it necessary to fully install components into the workers when it will only be used by a subset of jobs 19:35:44 AJaeger: Absolutley. Any helpful links to get us started? 19:36:13 austin81: see the infra-specs repository 19:36:15 since that seems to be a major part of the design thrust 19:36:48 #link http://git.openstack.org/cgit/openstack-infra/infra-specs/tree/template.rst 19:36:55 #link https://git.openstack.org/cgit/openstack-infra/infra-specs/tree/README.rst 19:36:58 has instructions 19:37:34 thanks fungi and anteaya for the links 19:37:42 AJaeger: welcome 19:37:46 timothyb89: i'm also wondering if the npm install phase gets considerably faster when krotscheck's npm mirrors get deployed 19:37:51 fungi, anteaya: Thanks guys. 19:37:56 fungi: I think it would! 19:38:00 <-- female 19:38:04 and you are welcome 19:38:21 fwiw i'd also like to see a spec, because we've gone through having in devstack, having a specialised worker, installing via nodepool, but now have better mirroring infrastructure and other options around puppet 19:38:26 anteaya: My mistake :) 19:38:35 because we have jobs already which npm install a lot of stuff (horizon functional test, i'm looking at you) which make improving that experience during job run-time imperative anyway 19:38:46 austin81: no worries 19:38:56 * david-lyle blushes 19:38:57 a proper spec seems like a good plan :) 19:39:17 I think we've taken up enough of yalls time? Thanks for all the help 19:39:31 sounds great--thanks timothyb89, austin81! 19:39:38 thanks everyone! 19:39:41 #topic Spec review : dib from git for nodepool (ianw / 2016-03-01) 19:39:48 #link https://review.openstack.org/283877 19:39:59 just a request for review on this spec, i don't have much to say that isn't in the spec 19:40:00 what's the current contention on this one? 19:40:04 david-lyle: thanks for sharing 19:40:19 :) 19:40:27 oh, i see. new spec just as most of our reviewers disappeared to spring on infra-cloud 19:40:29 ianw: i haven't read this but ill add it to the list 19:40:39 there was some contention in the reviews that the extra flag was not great 19:41:04 i confess, i'm surprised 19:41:05 fungi: The fear I would have is the general tightly-coupled projects issue, rather than nodepool breaking when someone who releases is around itll be whenever we merge something 19:41:09 so this goes back to the conversation we were having a couple weeks ago about whether or not installing dib from git vs having to wait for a dib release to have new features in image builds was desirable 19:41:18 which is fine by me, but its something the folks who run nodepool should be aware of 19:41:24 why is dib changing so fast this matters? 19:41:39 jeblair: mostly fedora releases IME 19:41:41 I'm not crazy about running from git on projects that infra-core does not have +2 on 19:41:44 is dib encoding too much image building logic in itself? 19:41:51 it sounded like there was disdain at the up-to-several-day wait times for dib to tag a new release 19:42:08 fungi: yeah, but i feel like i expected dib to be a mostly static dependency 19:42:12 i don't know if distain is the word, but it can really slow things down 19:42:19 there are some elements we use 19:42:23 jeblair: it is, until it isn't ... especially if you're brining up something new 19:42:23 like simple-init 19:42:26 that are in the dib repo 19:42:31 for me the problem is not that it doesn't change fast enough, but rather that it changes at all 19:42:39 and if we need to update it - it means that we have to update it in dib and then get a release 19:42:41 infra-specific needs should probably be in our nodepool elements rather than wrapped up in dib's default element set, yeah? 19:42:45 well 19:42:51 ianw: could we solve this with better dib testing? 19:43:00 but simple-init and ubuntu-miniaml are generally applicable elements 19:43:11 we just happen to be their primary user, so we find the errors before other people 19:43:16 I don't think we have ever really had any desire for new dib changes beyond 1) random bugfix which IME isnt too common 2) something were not in a rush to get or 3) new distro release which is what we seem to hit a lot 19:43:26 jeblair: better dib testing will not hurt. but when you've fixed an issue, and then it takes literally up to a week to get out there, it can be a real pain 19:44:02 so i guess the real push here is to find a way to be able to consume fixes for issues in dib elements more quickly? 19:44:08 ianw: right, so i'm thinking a change of focus: it's *dib* that should support a new distro release, and we should find a way to make that work happen in dib 19:44:11 jeblair: ianw re: better dib testing - I think testing of the images infra builds would help with this a lot, especially if we can test dib from git in those tests 19:44:32 greghaynes: yeah, so maybe if we put a job on dib that builds the images specified in project-config? 19:44:46 we are working on that 19:44:47 jeblair: or a job in project-config? 19:44:58 is simply building them sufficient, or do we need to try to boot and exercise them to turn up most of the errors in question? 19:45:04 greghaynes: i think we wanted that too 19:45:39 mostly curious where the split between build-time and boot-time falls for the bulk of the issues encountered 19:45:52 fungi: good q 19:46:24 when you're bringing up a new platform, a lot of the issues are build time 19:46:28 As for dib testing - theres a large patch series I have been getting slowly merged about adding more functional testing, the issue we seem to hit most often is integration between dib and services it installs (simple-init being the great example) 19:46:38 which is harder to test in dib 19:46:40 for stable platforms, not so much 19:47:23 but if you get a change into project-config, it is there on the next build 19:47:35 my primary concern with this suggestion is that we're trading bugs in new stuff we want to try but take time to get working for potentially hitting more bugs in stuff we already rely on 19:48:05 but once a change is in dib, it's not like it gets any more testing until release 19:48:09 the main thing holding back the fuctional dib testing is just work hours, but I also worry it won't help us as much as we want since those tests are good at determining if a build can succeed and if it can boot, but not so much 'does glean run at the correct time to set up networking' 19:48:29 of course, my theory is based on the (possibly baseless) assumption that dib releases get more testing than just merged commits in master 19:50:01 fungi: that is kind of my point, as described in the spec 19:50:12 hello o/ 19:50:26 hello \o 19:50:27 i don't want nodepool to be the last line of dib's ci 19:50:48 greghaynes: just throwing this out there, but octavia jobs use dib from master to build their service vms, boot them, and then do load balancy tests through them. they've been fairly breaky due to that, and might be a reasonable short-term stand in for functional tests. 19:50:55 i agree about the problem, but i'd like the solution to head the other direction 19:51:01 so on the issue of consuming unreleased code of deps not maintained by infra-core, i suppose that concern is also driven by an assumption that dib releases are better tested than dib's master branch state at any given point in time? 19:51:32 dougwig: Yea, we have some dib tests which build and boot images and ping them, but theres a hard question of to what level dib should be testing the services it installs (since dib's just is really just to install them) 19:51:49 or is it that we can more easily downgrade to an earlier dib release if a new one is broken, but figuring out what arbitrary commit to pin ourselves to is harder? 19:51:56 fungi: They arent better tested, no 19:52:05 My main concern as dib-core would be firefighting 19:52:13 I try to release when I am around to solve issues 19:52:18 but I don't necessarially merge with that mindset 19:52:45 nodepool is quite tolerant to failures, and many of us watch the builds 19:52:48 fungi: er, sorry, thery *arent 19:52:55 so release time is a controlled breakage window, as opposed to having to be careful not to merge a change unless you're around to see whether it breaks your downstream consumers 19:53:06 yep 19:53:07 i'd say it's better for us to fix any issue that affects nodepool the day after it hits, rather than wait until more changes are behind it 19:53:41 that definitely turns nodepool into a post-merge ci test of dib though 19:53:49 perhaps what mordred was getting at was an expansion of dib-core to include nodepool-core to mitigate that 19:53:58 yah 19:54:09 that's worth bringing up to the tripleo team i guess 19:54:13 that's my concern - not being able to react swiftly to a production outage 19:54:33 if we're running from pip, we can always pin back at a previous pip revision if a release happens that breaks the world 19:54:37 now - it's image builders 19:54:42 so breaking isn't QUITE breaking the gate 19:54:49 and there is more buffer 19:54:51 Its a good thing to talk about, dib *really* needs more cores. I don't know how wide infra-core is so that might be a bit overkill 19:55:06 but it _can_ be if the issue is one which causes a run-time problem rather than build-time/boot-time 19:55:07 greghaynes: there's 8 of us right now 19:55:17 plus ianw on nodepool-core 19:55:17 what i want is for dib to be solid and unchanging. if that's not possible, then if we are going to consider tighter integration, i think we need to consider *much* tighter integration. like, shared cores and shared change queues so we have working cross-testing. 19:55:31 though in that case we have the option of deleting the current images if we catch it within a day 19:55:58 I feel like dib has been pretty solid everywhere except adding new features which infra wants, which is something that can't be solved dib side 19:56:16 specifically supporting new dristros 19:56:56 jeblair: definitely better testing, and as mentioned, focus for dib currently. but speed of working with infra is not always great, and this can be a pain point 19:57:13 mordred: Yea, so we should bring that up, I am pretty sure I would be +1 on that 19:57:14 especially for new distros 19:57:36 anyway, maybe it's just me :) not sure what other distros are being worked on 19:57:39 ianw: yeah, which is why i think the solution is to not do all of this work in openstack's production nodepool. get it out on the edges in dib. 19:58:01 also, we're down to 2 minutes, so i'm going to need to punt the last several agenda topics to next week. i hope that's okay pabelanger, yorkisar, zaro, AJaeger? if not, we can try to cover stuff in #-infra between now and next week 19:58:02 at least, the *ideal* solution :) 19:58:22 fungi, I'll paste my prepared lines to #openstack-infra ;) 19:58:22 all good here 19:58:25 sorry, didn't expect so much conversation on that :) 19:58:34 so... nodepool-dev where we run the same config as production but with master dib instead of releases? 19:58:43 Yea, I think I didn't word it too correctly, dib should definitely be sure that a newly supported distro works, but using a new distro as a downstream consumer will always be a matter of pulling in some change so there has to be some level of integration testing 19:59:33 nodepool does have namespacing for its alien cleanup now, so in theory production and dev can share an environment as long as we tweak max-servers to accommodate a small slice of quota dedicated to dev 19:59:48 i wasn't suggesting that 19:59:56 and i don't like the idea of maintaining 2 nodepools 20:00:01 one is hard enough 20:00:15 i know, i was throwing out another idea between drinking from the firehost in production and only being able to use dib releases 20:00:34 anyway, we're out of time, but can continue in the review or #-infra 20:00:41 thanks everybody! 20:00:45 #endmeeting