#openstack-meeting log

19:15:04 <fungi> #startmeeting infra
19:15:05 <openstack> Meeting started Tue Mar  1 19:15:04 2016 UTC and is due to finish in 60 minutes.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:15:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:15:08 <openstack> The meeting name has been set to 'infra'
19:15:11 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:15:19 <fungi> #topic Announcements
19:15:28 <fungi> #info Huge thanks to crinkle, yolanda, rcarrillocruz, greghaynes, nibalizer, SpamapS and jeblair for their hard work and planning on infra-cloud prior to our event last week.
19:15:30 <fungi> #info Thanks to pleia2, jhesketh, wendar, purp and Hewlett-Packard Enterprise for logistics/sponsoring of the infra-cloud sprint.
19:15:32 <fungi> #info Also thanks to HPE and IBM for picking up the group tab at dinner a couple nights in beautiful Fort Collins.
19:15:34 <fungi> #link http://princessleia.com/journal/?p=11335 pleia2's awesome event summary
19:15:36 <fungi> #topic Actions from last meeting
19:15:40 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-02-16-19.03.html
19:15:42 <fungi> "1. (none)"
19:15:49 <fungi> #topic Specs approval
19:15:51 <fungi> nothing new this week
19:15:55 <fungi> #topic Priority Efforts: maniphest migration (craige)
19:15:57 <fungi> #link https://etherpad.openstack.org/p/maniphest-migration Created the maniphest-migration etherpad to track issues and progress
19:16:02 <fungi> okay, now we're caught up
19:16:20 <fungi> craige: that looks nice and detailed
19:16:35 <fungi> craige: anything specific you need to call out on that list you need help with next?
19:18:12 <fungi> i'm taking the silence here as a "no"
19:18:24 <fungi> i guess we can come back to this one if craige returns
19:18:28 <jhesketh> fungi: craige was after a new db dump but not sure if he got it yet
19:18:53 <fungi> craige: get up with one of our infra-root admins to get you an updated database dump
19:19:12 <fungi> #topic Stackviz deployment options (timothyb89, austin81)
19:19:16 <fungi> #link https://etherpad.openstack.org/p/BKgWlKIjgQ
19:19:18 <fungi> #link https://review.openstack.org/279317
19:19:20 <fungi> #link https://review.openstack.org/212207
19:19:33 <fungi> care to catch us up on the current state there?
19:19:52 <timothyb89> sure, basically, we're looking for some high-level feedback on the two linked patches
19:20:36 <anteaya> the both have wip in the title of the commit message
19:20:38 <timothyb89> our goal is to get stackviz running during gate cleanup, and we're hoping that what we've got can accomplish that without impacting anything else
19:20:39 <AJaeger> timothyb89: what is stackviz and what makes it that special?
19:20:54 <anteaya> would you care to remove that in case they gather enough reviews to be merged?
19:21:14 <austin81> anteaya: Sure, we just wanted to make sure they were in a close-to-merge state first
19:21:16 * AJaeger misses the context for this
19:21:22 <timothyb89> stackviz is a visualization utility for tempest runs
19:21:29 <fungi> probably what's special is that they want to have it run in/after every devstack-based job
19:21:36 <greghaynes> I think there is a freenode split which is killing our meeting, FYI
19:21:41 <timothyb89> for example, https://stackviz.timothyb89.org/s/5a325c4c-be4d-463f-b282-c313c0d987b0/#/
19:21:46 <fungi> oh, or maybe just the ones that use tempest at least
19:21:48 <greghaynes> some folks are complaining about "meeting is silent" in another channel
19:21:58 <anteaya> greghaynes: which channel?
19:22:01 <greghaynes> a private one
19:22:05 <anteaya> greghaynes: I thought I had checked them all
19:22:06 * craige returns
19:22:28 <fungi> a meeting in a private channel? weird
19:22:31 <AJaeger> So, stackviz just visualizes tempest runs?
19:22:47 <timothyb89> AJaeger: currently, yes, and hopefully other items in the future
19:22:49 <AJaeger> a meeting in a private channel where they miss logging? Really weird
19:22:52 <bkero> We think it's a netsplit
19:23:02 <austin81> AJaeger: It's primary purpose is to help devs get to the root of the problem when their patch fails and see what caused it
19:23:04 <greghaynes> heh ;) in a social channel, just wanted to let youall know the likely reason for the silence
19:23:26 <pabelanger> without diving into stackviz, I feel moving it out of dib elements and into puppet-stackviz is more inline with how we manage dependencies.
19:23:36 <anteaya> line 7 and 8 of the etherpad has demo links
19:23:48 <pabelanger> that's my only real comments atm
19:23:52 <greghaynes> pabelanger: is the goal to bake this in to our test images?
19:24:00 <timothyb89> pabelanger: would moving it into a puppet module still allow it to run on build nodes?
19:24:01 * nibz returns from the wrong side of the netsplit
19:24:19 <bkero> nibz: wb
19:24:21 <pabelanger> greghaynes: I am not sure the goal to be honest
19:24:39 <anteaya> I really like the details page: https://static.timothyb89.org/stackviz/#/0/timeline
19:24:40 <greghaynes> pabelanger: ok. Asking because the future for images we buld is more dib less puppet AFAIK
19:24:41 <fungi> we do have a longer-term goal of possibly dropping image-build-time puppet from our workers since we're in teh process of drastically simplifying what puppet does for us on them
19:24:42 <pabelanger> timothyb89: yes, we manage all of our jenkins slave mostly with puppet, which runs after we do image builds
19:25:11 <pabelanger> greghaynes: right. but this would be the first time where we don't manage with puppet.  Just confirming if we are okay with that
19:25:24 <greghaynes> pabelanger: ah, good question to ask
19:25:49 <timothyb89> so, our requirements are to build our repository, and then make the output available on build nodes during devstack-gate host cleanup - would that be possible to do with puppet?
19:26:14 <fungi> i'm curious why this isn't baked into devstack
19:26:24 <fungi> it seems like a natural fit
19:26:54 <fungi> e.g. add a script to the devstack libs dir which pulls in and sets up stackvis before you run tempest
19:27:06 <fungi> er, stackviz
19:27:09 <timothyb89> fungi: that's been discussed before, but I believe the verdict was that it made more sense for performance to do it during image build
19:27:27 <timothyb89> the build process is fairly time consuming so we wanted to cache the output
19:27:33 <fungi> setup takes a while then? would add too much run-time repetition to devstack?
19:27:51 <timothyb89> it takes ~10 minutes
19:27:59 <nibz> is that because of npm install?
19:28:04 <timothyb89> nibz: yes
19:28:05 <austin81> nibz: yup
19:28:12 <fungi> ahh, okay, i recall you saying something about it pulling reference data from our subunit2sql database?
19:28:13 <greghaynes> install? or js minimizing/lint?
19:28:22 <timothyb89> greghaynes: all of the above
19:28:38 <nibz> we didn't deploy npm mirrors yet? or did we?
19:28:45 <timothyb89> fungi: its similar to subunit2sql, but it reads test repositories directly
19:28:49 <greghaynes> the typical pattern for solving that is to make a package OOB from installation, so installation happens quickly...
19:29:08 <fungi> okay, so npm-provided dependencies are the time-consuming part
19:29:41 <timothyb89> fungi: yes, exactly
19:29:46 <fungi> and yeah, being able to pre-cache your node.js deps locally isn't an option? you need to fully install them on the worker images?
19:29:47 <AJaeger> Having reviewed earlier https://review.openstack.org/#/c/279317 - that did not include any reference on other patches or what stackviz what doing - I was very confused.
19:30:02 * AJaeger is now getting more what you intent to do - and suggest you write a spec for this
19:30:20 <bkero> They should be cacheable in ~/.npm/$PACKAGE/$VER/
19:30:36 <fungi> timothyb89: how do you see this potentially interacting with other job payloads that use packages from npm? is there good sandboxing/isolation in node.js to prevent cross-dependency interactions?
19:31:03 <timothyb89> fungi: afaik node dependencies are local to the directory you run 'npm install' from
19:31:13 <nibz> fungi: yes, npm install creates a dir node_modules where you run it
19:31:17 <greghaynes> fungi: I am still unclear if its just installing deps or js minimization (apparently its both?)
19:31:33 <greghaynes> fungi: since those two need to be solved differently
19:31:41 <timothyb89> greghaynes: it does both, we need to install the dependencies required to run the build process
19:31:54 <fungi> okay, so it's preinstalling rather than simply caching, but inherent isolation with npm in theory makes that safe
19:31:57 <timothyb89> greghaynes: it's basically 2 commands: 'npm install' and then 'gulp prod' to build
19:32:02 <greghaynes> timothyb89: yea, but I am curious which one consumes most of the time
19:32:02 <anteaya> umm meetbot seems to be taking the helm regarding the topic
19:32:13 <austin81> greghaynes: npm install by far
19:32:17 <greghaynes> austin81: ah, ty
19:32:18 <timothyb89> greghaynes: 'npm install' is the time consuming one, 'gulp prod' takes maybe 15 seconds
19:32:19 <bkero> npm caches things in ~/.npm, and if not there it fetches it from the internet
19:32:26 <greghaynes> timothyb89: awesome
19:32:29 <fungi> leguin apparently unsplit and reset our topic, but that's fine
19:32:34 <bkero> So if you throw them in there they should be immutable, useful, and fast
19:32:51 <fungi> it'll get updated on next topic change unless something goes terribly wrong with the meetbot
19:33:36 <fungi> timothyb89: so is most of the time in npm install spend downloading, or unpacking/building?
19:33:44 <timothyb89> bkero: npm may cache things, but the install process is still time consuming because it will check remotely to verify package versions
19:33:52 <austin81> AJaeger: So with your added context, what do you think of https://review.openstack.org/#/c/279317 now?
19:33:59 <timothyb89> fungi: downloading
19:34:18 <AJaeger> austin81: Please write a spec ;)
19:34:48 <fungi> timothyb89: so in that case, if we pre-cache them it should speed that up tremendously? or are you saying that it's actually the link checking/remote indexing  which is the time-consuming part of the download process?
19:35:00 <AJaeger> austin81: with the questions here, there's too much unknown and unusual going on, so we should document this properly
19:35:38 <timothyb89> fungi: in my experience, pre-caching npm packages doesn't help much, you have to run a full 'npm install' on the project directory to see a real benefit
19:35:44 <fungi> yeah, i'm mostly trying to nail down what aspect of this makes it necessary to fully install components into the workers when it will only be used by a subset of jobs
19:35:44 <austin81> AJaeger: Absolutley. Any helpful links to get us started?
19:36:13 <AJaeger> austin81: see the infra-specs repository
19:36:15 <fungi> since that seems to be a major part of the design thrust
19:36:48 <anteaya> #link http://git.openstack.org/cgit/openstack-infra/infra-specs/tree/template.rst
19:36:55 <fungi> #link https://git.openstack.org/cgit/openstack-infra/infra-specs/tree/README.rst
19:36:58 <fungi> has instructions
19:37:34 <AJaeger> thanks fungi and anteaya for the links
19:37:42 <anteaya> AJaeger: welcome
19:37:46 <fungi> timothyb89: i'm also wondering if the npm install phase gets considerably faster when krotscheck's npm mirrors get deployed
19:37:51 <austin81> fungi, anteaya: Thanks guys.
19:37:56 <timothyb89> fungi: I think it would!
19:38:00 <anteaya> <-- female
19:38:04 <anteaya> and you are welcome
19:38:21 <ianw> fwiw i'd also like to see a spec, because we've gone through having in devstack, having a specialised worker, installing via nodepool, but now have better mirroring infrastructure and other options around puppet
19:38:26 <austin81> anteaya: My mistake :)
19:38:35 <fungi> because we have jobs already which npm install a lot of stuff (horizon functional test, i'm looking at you) which make improving that experience during job run-time imperative anyway
19:38:46 <anteaya> austin81: no worries
19:38:56 * david-lyle blushes
19:38:57 <timothyb89> a proper spec seems like a good plan :)
19:39:17 <austin81> I think we've taken up enough of yalls time? Thanks for all the help
19:39:31 <fungi> sounds great--thanks timothyb89, austin81!
19:39:38 <timothyb89> thanks everyone!
19:39:41 <fungi> #topic Spec review : dib from git for nodepool (ianw / 2016-03-01)
19:39:48 <fungi> #link https://review.openstack.org/283877
19:39:59 <ianw> just a request for review on this spec, i don't have much to say that isn't in the spec
19:40:00 <fungi> what's the current contention on this one?
19:40:04 <anteaya> david-lyle: thanks for sharing
19:40:19 <anteaya> :)
19:40:27 <fungi> oh, i see. new spec just as most of our reviewers disappeared to spring on infra-cloud
19:40:29 <nibalizer> ianw: i haven't read this but ill add it to the list
19:40:39 <ianw> there was some contention in the reviews that the extra flag was not great
19:41:04 <jeblair> i confess, i'm surprised
19:41:05 <greghaynes> fungi: The fear I would have is the general tightly-coupled projects issue, rather than nodepool breaking when someone who releases is around itll be whenever we merge something
19:41:09 <fungi> so this goes back to the conversation we were having a couple weeks ago about whether or not installing dib from git vs having to wait for a dib release to have new features in image builds was desirable
19:41:18 <greghaynes> which is fine by me, but its something the folks who run nodepool should be aware of
19:41:24 <jeblair> why is dib changing so fast this matters?
19:41:39 <greghaynes> jeblair: mostly fedora releases IME
19:41:41 <mordred> I'm not crazy about running from git on projects that infra-core does not have +2 on
19:41:44 <jeblair> is dib encoding too much image building logic in itself?
19:41:51 <fungi> it sounded like there was disdain at the up-to-several-day wait times for dib to tag a new release
19:42:08 <jeblair> fungi: yeah, but i feel like i expected dib to be a mostly static dependency
19:42:12 <ianw> i don't know if distain is the word, but it can really slow things down
19:42:19 <mordred> there are some elements we use
19:42:23 <ianw> jeblair: it is, until it isn't ... especially if you're brining up something new
19:42:23 <mordred> like simple-init
19:42:26 <mordred> that are in the dib repo
19:42:31 <jeblair> for me the problem is not that it doesn't change fast enough, but rather that it changes at all
19:42:39 <mordred> and if we need to update it - it means that we have to update it in dib and then get a release
19:42:41 <fungi> infra-specific needs should probably be in our nodepool elements rather than wrapped up in dib's default element set, yeah?
19:42:45 <mordred> well
19:42:51 <jeblair> ianw: could we solve this with better dib testing?
19:43:00 <mordred> but simple-init and ubuntu-miniaml are generally applicable elements
19:43:11 <mordred> we just happen to be their primary user, so we find the errors before other people
19:43:16 <greghaynes> I don't think we have ever really had any desire for new dib changes beyond 1) random bugfix which IME isnt too common 2) something were not in a rush to get or 3) new distro release which is what we seem to hit a lot
19:43:26 <ianw> jeblair: better dib testing will not hurt.  but when you've fixed an issue, and then it takes literally up to a week to get out there, it can be a real pain
19:44:02 <fungi> so i guess the real push here is to find a way to be able to consume fixes for issues in dib elements more quickly?
19:44:08 <jeblair> ianw: right, so i'm thinking a change of focus: it's *dib* that should support a new distro release, and we should find a way to make that work happen in dib
19:44:11 <greghaynes> jeblair: ianw re: better dib testing - I think testing of the images infra builds would help with this a lot, especially if we can test dib from git in those tests
19:44:32 <jeblair> greghaynes: yeah, so maybe if we put a job on dib that builds the images specified in project-config?
19:44:46 <ianw> we are working on that
19:44:47 <greghaynes> jeblair: or a job in project-config?
19:44:58 <fungi> is simply building them sufficient, or do we need to try to boot and exercise them to turn up most of the errors in question?
19:45:04 <jeblair> greghaynes: i think we wanted that too
19:45:39 <fungi> mostly curious where the split between build-time and boot-time falls for the bulk of the issues encountered
19:45:52 <jeblair> fungi: good q
19:46:24 <ianw> when you're bringing up a new platform, a lot of the issues are build time
19:46:28 <greghaynes> As for dib testing - theres a large patch series I have been getting slowly merged about adding more functional testing, the issue we seem to hit most often is integration between dib and services it installs (simple-init being the great example)
19:46:38 <greghaynes> which is harder to test in dib
19:46:40 <ianw> for stable platforms, not so much
19:47:23 <ianw> but if you get a change into project-config, it is there on the next build
19:47:35 <fungi> my primary concern with this suggestion is that we're trading bugs in new stuff we want to try but take time to get working for potentially hitting more bugs in stuff we already rely on
19:48:05 <ianw> but once a change is in dib, it's not like it gets any more testing until release
19:48:09 <greghaynes> the main thing holding back the fuctional dib testing is just work hours, but I also worry it won't help us as much as we want since those tests are good at determining if a build can succeed and if it can boot, but not so much 'does glean run at the correct time to set up networking'
19:48:29 <fungi> of course, my theory is based on the (possibly baseless) assumption that dib releases get more testing than just merged commits in master
19:50:01 <ianw> fungi: that is kind of my point, as described in the spec
19:50:12 <hashar> hello o/
19:50:26 <abregman> hello \o
19:50:27 <jeblair> i don't want nodepool to be the last line of dib's ci
19:50:48 <dougwig> greghaynes: just throwing this out there, but octavia jobs use dib from master to build their service vms, boot them, and then do load balancy tests through them. they've been fairly breaky due to that, and might be a reasonable short-term stand in for functional tests.
19:50:55 <jeblair> i agree about the problem, but i'd like the solution to head the other direction
19:51:01 <fungi> so on the issue of consuming unreleased code of deps not maintained by infra-core, i suppose that concern is also driven by an assumption that dib releases are better tested than dib's master branch state at any given point in time?
19:51:32 <greghaynes> dougwig: Yea, we have some dib tests which build and boot images and ping them, but theres a hard question of to what level dib should be testing the services it installs (since dib's just is really just to install them)
19:51:49 <fungi> or is it that we can more easily downgrade to an earlier dib release if a new one is broken, but figuring out what arbitrary commit to pin ourselves to is harder?
19:51:56 <greghaynes> fungi: They arent better tested, no
19:52:05 <greghaynes> My main concern as dib-core would be firefighting
19:52:13 <greghaynes> I try to release when I am around to solve issues
19:52:18 <greghaynes> but I don't necessarially merge with that mindset
19:52:45 <ianw> nodepool is quite tolerant to failures, and many of us watch the builds
19:52:48 <greghaynes> fungi: er, sorry, thery *arent
19:52:55 <fungi> so release time is a controlled breakage window, as opposed to having to be careful not to merge a change unless you're around to see whether it breaks your downstream consumers
19:53:06 <greghaynes> yep
19:53:07 <ianw> i'd say it's better for us to fix any issue that affects nodepool the day after it hits, rather than wait until more changes are behind it
19:53:41 <fungi> that definitely turns nodepool into a post-merge ci test of dib though
19:53:49 <jeblair> perhaps what mordred was getting at was an expansion of dib-core to include nodepool-core to mitigate that
19:53:58 <mordred> yah
19:54:09 <fungi> that's worth bringing up to the tripleo team i guess
19:54:13 <mordred> that's my concern - not being able to react swiftly to a production outage
19:54:33 <mordred> if we're running from pip, we can always pin back at a previous pip revision if a release happens that breaks the world
19:54:37 <mordred> now - it's image builders
19:54:42 <mordred> so breaking isn't QUITE breaking the gate
19:54:49 <mordred> and there is more buffer
19:54:51 <greghaynes> Its a good thing to talk about, dib *really* needs more cores. I don't know how wide infra-core is so that might be a bit overkill
19:55:06 <fungi> but it _can_ be if the issue is one which causes a run-time problem rather than build-time/boot-time
19:55:07 <mordred> greghaynes: there's 8 of us right now
19:55:17 <mordred> plus ianw on nodepool-core
19:55:17 <jeblair> what i want is for dib to be solid and unchanging.  if that's not possible, then if we are going to consider tighter integration, i think we need to consider *much* tighter integration.  like, shared cores and shared change queues so we have working cross-testing.
19:55:31 <fungi> though in that case we have the option of deleting the current images if we catch it within a day
19:55:58 <greghaynes> I feel like dib has been pretty solid everywhere except adding new features which infra wants, which is something that can't be solved dib side
19:56:16 <greghaynes> specifically supporting new dristros
19:56:56 <ianw> jeblair: definitely better testing, and as mentioned, focus for dib currently.  but speed of working with infra is not always great, and this can be a pain point
19:57:13 <greghaynes> mordred: Yea, so we should bring that up, I am pretty sure I would be +1 on that
19:57:14 <ianw> especially for new distros
19:57:36 <ianw> anyway, maybe it's just me :)  not sure what other distros are being worked on
19:57:39 <jeblair> ianw: yeah, which is why i think the solution is to not do all of this work in openstack's production nodepool.  get it out on the edges in dib.
19:58:01 <fungi> also, we're down to 2 minutes, so i'm going to need to punt the last several agenda topics to next week. i hope that's okay pabelanger, yorkisar, zaro, AJaeger? if not, we can try to cover stuff in #-infra between now and next week
19:58:02 <jeblair> at least, the *ideal* solution :)
19:58:22 <AJaeger> fungi, I'll paste my prepared lines to #openstack-infra ;)
19:58:22 <pabelanger> all good here
19:58:25 <ianw> sorry, didn't expect so much conversation on that :)
19:58:34 <fungi> so... nodepool-dev where we run the same config as production but with master dib instead of releases?
19:58:43 <greghaynes> Yea, I think I didn't word it too correctly, dib should definitely be sure that a newly supported distro works, but using a new distro as a downstream consumer will always be a matter of pulling in some change so there has to be some level of integration testing
19:59:33 <fungi> nodepool does have namespacing for its alien cleanup now, so in theory production and dev can share an environment as long as we tweak max-servers to accommodate a small slice of quota dedicated to dev
19:59:48 <jeblair> i wasn't suggesting that
19:59:56 <jeblair> and i don't like the idea of maintaining 2 nodepools
20:00:01 <jeblair> one is hard enough
20:00:15 <fungi> i know, i was throwing out another idea between drinking from the firehost in production and only being able to use dib releases
20:00:34 <fungi> anyway, we're out of time, but can continue in the review or #-infra
20:00:41 <fungi> thanks everybody!
20:00:45 <fungi> #endmeeting