19:01:01 <clarkb> #startmeeting infra
19:01:02 <openstack> Meeting started Tue Sep 25 19:01:01 2018 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:05 <openstack> The meeting name has been set to 'infra'
19:01:07 <ianw> o/ yay, last !summer-time meeting
19:01:11 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:01:20 <clarkb> #topic Announcements
19:01:40 <clarkb> Forum session submissions are due tomorrow at midnight Pacific time (I think). If you want to propose a session do so now
19:01:42 <cmurphy> o/
19:02:11 <clarkb> #topic Actions from last meeting
19:02:17 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-09-18-19.01.txt minutes from last meeting
19:02:42 <clarkb> last meeting was a little different in that I tried to use it to get PTG recap thoughts written down in some format before all that memory gets replaced
19:03:31 <mordred> we had a PTG?
19:03:47 <clarkb> No actions, but if you are interested in PTG happenings that is a great place to start. I think ttx is hoping there will be more emails to the dev list recapping the various happenings. I'll try to get one out this week based on what I wrote down in the meeting last week
19:04:12 <clarkb> This reminds me I will be at ansiblefest next week and likely won't be able to run the meeting
19:04:33 <clarkb> if anyone is willing to volunteer as meeting chair next week let me know
19:04:47 <clarkb> #topic Specs approval
19:05:16 <clarkb> ianw: did you have a chance to see my response on the third party ci direction spec?
19:05:44 <ianw> ummm, not that i remember ...
19:06:03 <clarkb> #link https://review.openstack.org/#/c/563849/ third party testing direction spec
19:06:12 <ianw> oh right, yeah, windmil v SF
19:06:43 <clarkb> ianw: summary is I think we can go with the proposal as is since windmill has some advantages over SF that make it good for general direction setting (multi platform in particular). Then reevaluate if everyone simply uses SF
19:07:19 <ianw> right, i mean i've put it under help wanted, so it's going to be a situation of someone who needs this says "what do i do" and we say "here's what we'd like to help you do"
19:07:34 <clarkb> yup. Given that thoughts on putting it up for approval this week?
19:07:49 <ianw> i'm fine with that, i don't have anything more to add
19:08:24 <clarkb> ok infra specs reviewers can you review ^ and I'll check back in on it later this week (likely friday morning local time) for approval
19:09:12 <clarkb> #topic Priority Efforts
19:09:19 <clarkb> #topic Update Config Management
19:10:07 <clarkb> topic:puppet-4 and topic:update-cfg-mgmt is the gerrit query to follow for this
19:10:44 <clarkb> I need to pick up the puppet futureparser updates again. Those have been going reasonably well as our testing catching most problems before we run into them
19:11:54 <clarkb> There are a few outstanding items on the ansible/zuul/cd front though. We've got the inventory group membership improvements, triggering of ansible on bridge.openstack.org from zuul executors, and running useful playbooks in that way on the todo list I think
19:12:15 <mordred> yeah
19:12:16 <clarkb> mordred: re inventory group membership fixes, did you figure out the negative membership? is that change ready for review?
19:12:38 <mordred> I need to swing back around to finish that off - and no, it is not ready for review yet
19:12:52 <clarkb> ok
19:13:08 <clarkb> #link https://review.openstack.org/#/c/604932/2 and its parent aim to fix the ssh issues from zuul to bridge.openstack.org
19:13:08 <mordred> the negative membership is still weird ... but, it's no worse than today, so maybe we get it in shape to be faster
19:13:16 <mordred> an then making disabled: a special thing can come later
19:13:26 <clarkb> mordred: that seems reasonable
19:13:36 <corvus> wfm
19:14:01 <clarkb> for 604932 that should probably get careful review from people that understand ansible better than I do :)
19:14:40 <clarkb> the basic idea there is we want to manage multiple dynamic ssh keys (so they can be rotated) in the zuul user's authorized keys file so that zuul jobs can ssh into the bastion and run playbooks for CD stuff
19:15:02 <clarkb> I've written a small ansible module in python to handle the logic around generating the authorized_keys content for that
19:16:13 <clarkb> once that gets in we can start writing job that run playbooks that hopefully do interesting things :)
19:16:28 <corvus> ++
19:16:36 <clarkb> corvus: any thoughts on the getting to where we can decouple a useful unit of work from the puppet?
19:18:03 <corvus> clarkb: our last thoughts at the ptg were that in order to do the scheduler reload we were thinking of, we'd need to port a bunch of puppet to ansible first.  it's probably a few days of work, and i think needs to be done anyway.  we can proceed with that, or if someone has another idea of something smaller to test out first, we could try that first.
19:18:18 <corvus> though, in the mean time, i'd be happy with a working "debug: msg='hello world'" playbook :)
19:18:33 <clarkb> ok I'll have to keep my eyes out for useful units of work
19:18:36 <clarkb> corvus: ++
19:19:08 <clarkb> if anyone else has suggestions ^ let us know or push a change to run that playbook instead of the current hello world playbook
19:19:24 <clarkb> Any other config management update changes/topics worth calling out?
19:20:28 * cmurphy nope
19:20:32 <clarkb> #topic Storyboard
19:20:43 <clarkb> I wasn't able to join the storyboard room in denver
19:21:08 <clarkb> diablo_rojo: did send out a PTG recap though
19:21:17 * clarkb finds a link to that
19:21:51 <clarkb> #link http://lists.openstack.org/pipermail/openstack-dev/2018-September/134923.html Storyboard PTG recap
19:21:56 <clarkb> thank you for putting that together diablo_rojo
19:22:16 <clarkb> diablo_rojo: SotK fungi anything worht calling out on ^ or new news since that was sent?
19:22:57 <fungi> i'm currently working on the task footer linking configuration for the its-storyboard gerrit plugin, diablo_rojo is working through test imports of oslo and neutron
19:23:35 <diablo_rojo> I'm running a Neutron migration (don't expect it to finish for a week or so)
19:23:45 <fungi> there's yet another resurgence of the bucket-priority discussion on the -dev ml
19:23:58 <diablo_rojo> Also investigating the discrepancy in numbers of oslo bugs in lp vs sb
19:25:03 <clarkb> diablo_rojo: neutron is migrating soon then?
19:25:39 <diablo_rojo> clarkb, they are interested
19:25:51 <diablo_rojo> they wanted to see what it would look like in dev first
19:26:13 <clarkb> gotcha
19:26:42 <clarkb> I think one of the big items of interest to the infra team is the work around file attachments to stories
19:27:11 <clarkb> diablo_rojo: fungi when the spec gets written we should try to get as many eyebalsl on that as possible
19:27:21 <diablo_rojo> Agreed.
19:27:37 <diablo_rojo> After I figure out this oslo bugs mismatch I think thats the next thing on my todo list
19:27:46 <clarkb> great
19:28:53 <clarkb> anything else before we go to the next topic
19:29:32 <clarkb> #topic General Topics
19:29:41 <clarkb> fuentess: hello, still hanging out?
19:29:51 <fuentess> clarkb, hi
19:30:02 <ssbarnea> i do also have a subject i want to propose
19:30:10 <clarkb> ssbarnea: ok, I expect we'll have time
19:30:37 <clarkb> everyone meet fuentess. fuentess works on kata things and has helped with my work of getting a zuul job running against kata
19:30:56 <fuentess> hello everyone, first meeting here
19:31:10 <clarkb> I think we got the general thing working, but there are some rough edges (mnaser doesn't seem to be around to talk about the one related to the new vexxhost region)
19:31:10 <fuentess> we also have gabyc_, which also helps on the kata CI
19:31:27 <mnaser> o/
19:31:38 <clarkb> oh there is mnaser
19:31:40 * mordred waves to fuentess and gabyc_
19:31:44 <mordred> and to mnaser
19:31:57 <fuentess> clarkb, ok, for that, I was thinking that as a short term solution, we can modify our job to run with another storage driver, instead of using devicemapper
19:32:05 <clarkb> a quick recap (at least as I understand things and fuentess can fill in where I miss details or am just wrong):
19:32:38 <clarkb> we have got a generalized kata functional job running that compiles the kata components then tests them with various test suites and with docker, crio, k8s (and probably other things I don't even know about)
19:32:56 <mnaser> kata has a custom flavor in the montreal region, which we added ephemeral storage to get /dev/vdb (and nested virt).  because we have nested virt in all of sjc1, i defaulted to the 'normal' flavor.. but that doesn't have /dev/vdb in it because we don't provide it by default.. and thats the differences
19:33:02 <mnaser> i'll let clarkb continue and comment if/when necessary :)
19:33:23 <clarkb> kata requires nested virt for these tests because kata relies on kvm to run the vm containers
19:33:45 <clarkb> due to this we set up jobs only in vexxhost since vexxhost was already running their jenkins jobs (meant we didn't have to spend so much time debugging nested virt)
19:33:54 <fuentess> mnaser, do you think that in the future, it would be easy to add those ephemeral volumes?
19:34:10 <clarkb> I think this is stable in the montreal region, but is resource constrained (special host aggregate there to run the kata tests)
19:34:38 <clarkb> in the sjc1 region this aggregate and special flavor don't exist because the entire region supports nested virt
19:35:00 <clarkb> but that exposed that the tests currently rely on the special flavor to provide a /dev/vdb disk device in the montreal region
19:35:07 <mnaser> maybe it could be a zuul feature to attach an extra empty volume as part of a node definition :)
19:35:14 <mnaser> but then im signing up people for work :-P
19:35:20 <clarkb> fuentess: I think the tests are still running against the proxy repo but you have disabled the app on the other repo?
19:35:46 <fuentess> mnaser, clarkb: we used to use loopback devices, but we had some stability issues with cri-o, so we moved to use a block device
19:36:04 <fuentess> clarkb, right, we had it in the runtime repo, but removed it for the moment
19:36:10 <fungi> having nodepool create, attach, track and delete volumes is probably a nodepool design discussion
19:36:44 <clarkb> ok I think that covers the current setup. I think fuentess and I and probably others are looking to see if we can figure out how to move forward
19:36:53 <clarkb> and get back to running reliable jobs for them
19:37:32 <fungi> nodepool's focus at the moment is around managing pools of compute instances, though there has been discussion among the nodepool maintainers of turning it into a more generic thingpool manager
19:37:52 <clarkb> ya I think we should decouple having nodepool manage those resources from this effort. We've talked about adding other resource tracking but its not the current dev priority for anyone that I know of
19:38:05 <mnaser> yeah.. also, i feel like it makes it easier for users to run tests locally if a device is not required (imho)
19:38:07 <ianw> hrmmm, i'm hesitant to suggest it but DIB has LVM support ... you could create an image with volumes if that's really what's required
19:38:10 <corvus> in openstack, we've use loopback for this... but that doesn't work?
19:38:10 <clarkb> and I think it would represent significant work in nodepool (so not a quick solution)
19:38:17 <fuentess> clarkb, yes, it would be great if we can have the jobs running successfully. So if currently nodepool cannot handle these requests, I can patch my tests to use another storage-driver when running on zuul
19:38:28 <clarkb> corvus: ya that was my suggestion. We use it for swift and cinder to great effect
19:39:15 <clarkb> fuentess: what is the difference between the storage drivers?
19:39:15 <fuentess> corvus, we have tried with loopback, but seems like cri-o has some stability issues when using them
19:39:54 <fuentess> clarkb, well, cri-o and docker support different storage drivers, the others are overlay and btrfs, which do not require block devices
19:40:18 <fuentess> so I think that we could use overlay, which is the default of cri-o
19:40:23 <clarkb> got it. We wouldn't get test coverage of support for that specific driver, but get coverage of the general api stuff?
19:40:32 <clarkb> fuentess: that seems reasonable if it is their default
19:40:50 <fuentess> right, we still have tests running on devicemapper in jenkins jobs
19:40:58 <fungi> pretty sure it's also possible to feed lvm2 a loopback block device as a pvm and create volumes in there
19:41:10 * fungi tries
19:41:11 <clarkb> fungi: that is what cinder tests do
19:41:19 <fungi> ahh, saves me trying ;)
19:41:52 <fuentess> fungi, clarkb: ohh, that could also handle our case, although not sure if the stability issues would come back
19:41:57 <fungi> so anyway, wouldn't necessarily have to point cri-o at a raw loopback device, could still hand it lvm2 volumes or a vg
19:42:26 <clarkb> fuentess: assuming one of those options gets us back to reliable testing. How do you see the next steps from there going? Add the job back to the runtime repo, then if stable run it more globally?
19:42:52 <fungi> usually when we've seen stability issues it's been due to sparse backing/thin provisioning
19:42:56 <clarkb> fuentess: my other concern was with the on demand rather than continuous testing, is it a problem if zuul continues to do continuous testing?
19:43:06 <fuentess> right, now if we make this work on the proxy repository, we could enable the jobs on the other repos
19:43:32 <fuentess> clarkb, right now I dont think it is a problem
19:43:52 <fuentess> maybe in the future, we would like to change it to on-demand
19:43:58 <fuentess> is this possible for zuul?
19:44:11 <clarkb> ok good, configuring things to handle the on demand stuff is probably out of scope for what I'd be interested in :) it is possible with zuul but we would have to add kata specific pipelines
19:44:23 <clarkb> and I think we'd like to see kata drive more of that itself if that was the need/interest
19:44:35 <fungi> openstack currently has an on-demand independent zuul pipeline it calls "experimental"
19:44:36 <clarkb> (happy to help, but I won't be able to debug github's esoteric behavior when it doesnt' work right)
19:44:45 <fuentess> clarkb, ok, cool, I think it would be great as a next step, once everything is stable
19:44:51 <fungi> presumably something similar would be possible with a gh-triggered pipeline
19:45:23 <clarkb> fungi: ya should be possible, but getting all the little details right with github api is likely to be a learning experience :)
19:45:54 <fungi> agreed. those of us who eschew github have the privilege of not concerning ourselves with its proprietary api behaviors
19:46:04 <clarkb> ok I think that gives us a bit of a path forward. The other thing I'll mention is that most of the infra team don't subscribe to the kata ci (and other repo) issues on github
19:46:34 <clarkb> fuentess: if there are issues or concerns (particularly if urgent) it would be helpful to reach out directly to the team to get eyes on things. #openstack-infra or the openstack-infra mailing list are probably the best places for that today
19:46:58 <fungi> openstack-infra@lists.openstack.org is the mailing list (in case that wasn't clear)
19:47:06 <fuentess> clarkb, ok, good to know, I didnt know this until last week, hehe
19:47:12 <fuentess> thanks for the infio
19:47:21 <clarkb> alright anything else? if not I'll give the floor to ssbarnea
19:47:34 <fuentess> I think thats it, lets move on
19:47:37 <corvus> fuentess: thanks!
19:47:40 <fuentess> thanks everyone for your comments
19:47:42 <ssbarnea> i would like to propose changing the ansible output callback used by zuul from"default" to a more human readable one, either "yaml" or "debug" ones (debug used to be called human_log). We already have a wip change to do this on tripleo. initial attempt to change to yaml didn't work very well because this callback was introduced only in ansible 2.5, but debug one which is similar was introduced in ~2.2.
19:48:13 <ssbarnea> for those that do not know the differences, it may help to have a look at https://sbarnea.com/f/ansible-output/
19:48:33 <ssbarnea> click on "default", "debug" and "yaml" to see how they differ.
19:48:33 <fungi> we're using ansible 2.5 on the zuul.openstack.org executors currently, right?
19:48:46 <corvus> ssbarnea: what's the use case?
19:48:50 <ssbarnea> mainly is about *not* displaying stdout/err as JSON
19:49:18 <fungi> the text console logfile doesn't serve that purpose?
19:49:33 <clarkb> fungi: you still get the json output in the console logfile when there are errors iirc
19:49:40 <fungi> ahh
19:49:57 <ssbarnea> default callback displays the stdout, stderr, (and ther splitted version) in a way that makes it very user unfriendly, especially for commands that return more than a couple of lines.
19:50:33 <ssbarnea> maybe my example does not contian enough lines to make the difference obvious
19:50:36 <clarkb> ya I agree that the yaml is a bit more readable
19:51:15 <corvus> ssbarnea: we (zuul) have discussed using the json output that we save to render a console log which is more readable but also can have functionality like expanding/contracting plays/tasks/etc.
19:51:27 <corvus> ssbarnea: would that serve your purpose?
19:52:19 <ssbarnea> corvus: not really because a fix can be achieved with one line, and you are talking about a complex feature which needs to developed, tested.....
19:52:27 <corvus> (we still need a streaming console output which would probably be similar to what we have now, but that doesn't have to be the same thing that shows up in the archived logs)
19:52:42 <ssbarnea> and colapsable output is not something you ship after a weekend of writing it.
19:52:57 <corvus> ssbarnea: wait, you're saying it won't satisfy your use-case because it will be hard to do?
19:53:24 <ssbarnea> is what we can do now, vs what we could do in the future (as in long term).
19:53:27 <clarkb> I think ssbarnea is pointing out that its a minor change to make that would make the situation better today (anddoesn't conflict with the harder but maybe overal better UX work)
19:53:50 <ssbarnea> i am not against the colapsable feature, but I am also realistic. if I can sort something today, i do.
19:54:03 <corvus> i think it's fine to point that out.  i would, however, appreciate a straight answer to my question since i'm trying rather hard to understand the request.
19:54:05 <ssbarnea> clarkb: well pointed.
19:55:59 <clarkb> maybe this discussion would be a good one to have in #zuul or on the zuul mailing list? we have a few minutes left in our hour. I'd like to make sure we don't ignore any other pressing concerns
19:56:11 <clarkb> #topic Open Discussion
19:56:27 <ssbarnea> in few minutes I can update the examples to make it easier to spot the difference in output
19:56:52 <corvus> ssbarnea: that would be helpful.  also, answering my question would be helpful.
19:56:55 <clarkb> corvus: I spoke quickly with the foundation last week about drafting the opendev message and they agreed that it made sense to figure that out. I'll try to get a discussion going in some venue now that we are all back from travel and vacation
19:57:20 <clarkb> ianw: gary_perkins: anything we can do to help with the new arm cloud work?
19:58:07 <ianw> clarkb: i have outstanding queries to gary_perkins ATM ...
19:58:11 <fungi> i think the branding discussion we had in irc at the beginning of the name selection process worked well. something along those lines again seems reasonable
19:59:39 <clarkb> alright we are basically at time. as mentioned before find us ing #openstack-infra or at openstack-infra@lists.openstack.org for further discussion
19:59:45 <clarkb> Thank you everyone!
19:59:53 <clarkb> #endmeeting