19:01:01 #startmeeting infra 19:01:02 Meeting started Tue Sep 25 19:01:01 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:05 The meeting name has been set to 'infra' 19:01:07 o/ yay, last !summer-time meeting 19:01:11 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:20 #topic Announcements 19:01:40 Forum session submissions are due tomorrow at midnight Pacific time (I think). If you want to propose a session do so now 19:01:42 o/ 19:02:11 #topic Actions from last meeting 19:02:17 #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-09-18-19.01.txt minutes from last meeting 19:02:42 last meeting was a little different in that I tried to use it to get PTG recap thoughts written down in some format before all that memory gets replaced 19:03:31 we had a PTG? 19:03:47 No actions, but if you are interested in PTG happenings that is a great place to start. I think ttx is hoping there will be more emails to the dev list recapping the various happenings. I'll try to get one out this week based on what I wrote down in the meeting last week 19:04:12 This reminds me I will be at ansiblefest next week and likely won't be able to run the meeting 19:04:33 if anyone is willing to volunteer as meeting chair next week let me know 19:04:47 #topic Specs approval 19:05:16 ianw: did you have a chance to see my response on the third party ci direction spec? 19:05:44 ummm, not that i remember ... 19:06:03 #link https://review.openstack.org/#/c/563849/ third party testing direction spec 19:06:12 oh right, yeah, windmil v SF 19:06:43 ianw: summary is I think we can go with the proposal as is since windmill has some advantages over SF that make it good for general direction setting (multi platform in particular). Then reevaluate if everyone simply uses SF 19:07:19 right, i mean i've put it under help wanted, so it's going to be a situation of someone who needs this says "what do i do" and we say "here's what we'd like to help you do" 19:07:34 yup. Given that thoughts on putting it up for approval this week? 19:07:49 i'm fine with that, i don't have anything more to add 19:08:24 ok infra specs reviewers can you review ^ and I'll check back in on it later this week (likely friday morning local time) for approval 19:09:12 #topic Priority Efforts 19:09:19 #topic Update Config Management 19:10:07 topic:puppet-4 and topic:update-cfg-mgmt is the gerrit query to follow for this 19:10:44 I need to pick up the puppet futureparser updates again. Those have been going reasonably well as our testing catching most problems before we run into them 19:11:54 There are a few outstanding items on the ansible/zuul/cd front though. We've got the inventory group membership improvements, triggering of ansible on bridge.openstack.org from zuul executors, and running useful playbooks in that way on the todo list I think 19:12:15 yeah 19:12:16 mordred: re inventory group membership fixes, did you figure out the negative membership? is that change ready for review? 19:12:38 I need to swing back around to finish that off - and no, it is not ready for review yet 19:12:52 ok 19:13:08 #link https://review.openstack.org/#/c/604932/2 and its parent aim to fix the ssh issues from zuul to bridge.openstack.org 19:13:08 the negative membership is still weird ... but, it's no worse than today, so maybe we get it in shape to be faster 19:13:16 an then making disabled: a special thing can come later 19:13:26 mordred: that seems reasonable 19:13:36 wfm 19:14:01 for 604932 that should probably get careful review from people that understand ansible better than I do :) 19:14:40 the basic idea there is we want to manage multiple dynamic ssh keys (so they can be rotated) in the zuul user's authorized keys file so that zuul jobs can ssh into the bastion and run playbooks for CD stuff 19:15:02 I've written a small ansible module in python to handle the logic around generating the authorized_keys content for that 19:16:13 once that gets in we can start writing job that run playbooks that hopefully do interesting things :) 19:16:28 ++ 19:16:36 corvus: any thoughts on the getting to where we can decouple a useful unit of work from the puppet? 19:18:03 clarkb: our last thoughts at the ptg were that in order to do the scheduler reload we were thinking of, we'd need to port a bunch of puppet to ansible first. it's probably a few days of work, and i think needs to be done anyway. we can proceed with that, or if someone has another idea of something smaller to test out first, we could try that first. 19:18:18 though, in the mean time, i'd be happy with a working "debug: msg='hello world'" playbook :) 19:18:33 ok I'll have to keep my eyes out for useful units of work 19:18:36 corvus: ++ 19:19:08 if anyone else has suggestions ^ let us know or push a change to run that playbook instead of the current hello world playbook 19:19:24 Any other config management update changes/topics worth calling out? 19:20:28 * cmurphy nope 19:20:32 #topic Storyboard 19:20:43 I wasn't able to join the storyboard room in denver 19:21:08 diablo_rojo: did send out a PTG recap though 19:21:17 * clarkb finds a link to that 19:21:51 #link http://lists.openstack.org/pipermail/openstack-dev/2018-September/134923.html Storyboard PTG recap 19:21:56 thank you for putting that together diablo_rojo 19:22:16 diablo_rojo: SotK fungi anything worht calling out on ^ or new news since that was sent? 19:22:57 i'm currently working on the task footer linking configuration for the its-storyboard gerrit plugin, diablo_rojo is working through test imports of oslo and neutron 19:23:35 I'm running a Neutron migration (don't expect it to finish for a week or so) 19:23:45 there's yet another resurgence of the bucket-priority discussion on the -dev ml 19:23:58 Also investigating the discrepancy in numbers of oslo bugs in lp vs sb 19:25:03 diablo_rojo: neutron is migrating soon then? 19:25:39 clarkb, they are interested 19:25:51 they wanted to see what it would look like in dev first 19:26:13 gotcha 19:26:42 I think one of the big items of interest to the infra team is the work around file attachments to stories 19:27:11 diablo_rojo: fungi when the spec gets written we should try to get as many eyebalsl on that as possible 19:27:21 Agreed. 19:27:37 After I figure out this oslo bugs mismatch I think thats the next thing on my todo list 19:27:46 great 19:28:53 anything else before we go to the next topic 19:29:32 #topic General Topics 19:29:41 fuentess: hello, still hanging out? 19:29:51 clarkb, hi 19:30:02 i do also have a subject i want to propose 19:30:10 ssbarnea: ok, I expect we'll have time 19:30:37 everyone meet fuentess. fuentess works on kata things and has helped with my work of getting a zuul job running against kata 19:30:56 hello everyone, first meeting here 19:31:10 I think we got the general thing working, but there are some rough edges (mnaser doesn't seem to be around to talk about the one related to the new vexxhost region) 19:31:10 we also have gabyc_, which also helps on the kata CI 19:31:27 o/ 19:31:38 oh there is mnaser 19:31:40 * mordred waves to fuentess and gabyc_ 19:31:44 and to mnaser 19:31:57 clarkb, ok, for that, I was thinking that as a short term solution, we can modify our job to run with another storage driver, instead of using devicemapper 19:32:05 a quick recap (at least as I understand things and fuentess can fill in where I miss details or am just wrong): 19:32:38 we have got a generalized kata functional job running that compiles the kata components then tests them with various test suites and with docker, crio, k8s (and probably other things I don't even know about) 19:32:56 kata has a custom flavor in the montreal region, which we added ephemeral storage to get /dev/vdb (and nested virt). because we have nested virt in all of sjc1, i defaulted to the 'normal' flavor.. but that doesn't have /dev/vdb in it because we don't provide it by default.. and thats the differences 19:33:02 i'll let clarkb continue and comment if/when necessary :) 19:33:23 kata requires nested virt for these tests because kata relies on kvm to run the vm containers 19:33:45 due to this we set up jobs only in vexxhost since vexxhost was already running their jenkins jobs (meant we didn't have to spend so much time debugging nested virt) 19:33:54 mnaser, do you think that in the future, it would be easy to add those ephemeral volumes? 19:34:10 I think this is stable in the montreal region, but is resource constrained (special host aggregate there to run the kata tests) 19:34:38 in the sjc1 region this aggregate and special flavor don't exist because the entire region supports nested virt 19:35:00 but that exposed that the tests currently rely on the special flavor to provide a /dev/vdb disk device in the montreal region 19:35:07 maybe it could be a zuul feature to attach an extra empty volume as part of a node definition :) 19:35:14 but then im signing up people for work :-P 19:35:20 fuentess: I think the tests are still running against the proxy repo but you have disabled the app on the other repo? 19:35:46 mnaser, clarkb: we used to use loopback devices, but we had some stability issues with cri-o, so we moved to use a block device 19:36:04 clarkb, right, we had it in the runtime repo, but removed it for the moment 19:36:10 having nodepool create, attach, track and delete volumes is probably a nodepool design discussion 19:36:44 ok I think that covers the current setup. I think fuentess and I and probably others are looking to see if we can figure out how to move forward 19:36:53 and get back to running reliable jobs for them 19:37:32 nodepool's focus at the moment is around managing pools of compute instances, though there has been discussion among the nodepool maintainers of turning it into a more generic thingpool manager 19:37:52 ya I think we should decouple having nodepool manage those resources from this effort. We've talked about adding other resource tracking but its not the current dev priority for anyone that I know of 19:38:05 yeah.. also, i feel like it makes it easier for users to run tests locally if a device is not required (imho) 19:38:07 hrmmm, i'm hesitant to suggest it but DIB has LVM support ... you could create an image with volumes if that's really what's required 19:38:10 in openstack, we've use loopback for this... but that doesn't work? 19:38:10 and I think it would represent significant work in nodepool (so not a quick solution) 19:38:17 clarkb, yes, it would be great if we can have the jobs running successfully. So if currently nodepool cannot handle these requests, I can patch my tests to use another storage-driver when running on zuul 19:38:28 corvus: ya that was my suggestion. We use it for swift and cinder to great effect 19:39:15 fuentess: what is the difference between the storage drivers? 19:39:15 corvus, we have tried with loopback, but seems like cri-o has some stability issues when using them 19:39:54 clarkb, well, cri-o and docker support different storage drivers, the others are overlay and btrfs, which do not require block devices 19:40:18 so I think that we could use overlay, which is the default of cri-o 19:40:23 got it. We wouldn't get test coverage of support for that specific driver, but get coverage of the general api stuff? 19:40:32 fuentess: that seems reasonable if it is their default 19:40:50 right, we still have tests running on devicemapper in jenkins jobs 19:40:58 pretty sure it's also possible to feed lvm2 a loopback block device as a pvm and create volumes in there 19:41:10 * fungi tries 19:41:11 fungi: that is what cinder tests do 19:41:19 ahh, saves me trying ;) 19:41:52 fungi, clarkb: ohh, that could also handle our case, although not sure if the stability issues would come back 19:41:57 so anyway, wouldn't necessarily have to point cri-o at a raw loopback device, could still hand it lvm2 volumes or a vg 19:42:26 fuentess: assuming one of those options gets us back to reliable testing. How do you see the next steps from there going? Add the job back to the runtime repo, then if stable run it more globally? 19:42:52 usually when we've seen stability issues it's been due to sparse backing/thin provisioning 19:42:56 fuentess: my other concern was with the on demand rather than continuous testing, is it a problem if zuul continues to do continuous testing? 19:43:06 right, now if we make this work on the proxy repository, we could enable the jobs on the other repos 19:43:32 clarkb, right now I dont think it is a problem 19:43:52 maybe in the future, we would like to change it to on-demand 19:43:58 is this possible for zuul? 19:44:11 ok good, configuring things to handle the on demand stuff is probably out of scope for what I'd be interested in :) it is possible with zuul but we would have to add kata specific pipelines 19:44:23 and I think we'd like to see kata drive more of that itself if that was the need/interest 19:44:35 openstack currently has an on-demand independent zuul pipeline it calls "experimental" 19:44:36 (happy to help, but I won't be able to debug github's esoteric behavior when it doesnt' work right) 19:44:45 clarkb, ok, cool, I think it would be great as a next step, once everything is stable 19:44:51 presumably something similar would be possible with a gh-triggered pipeline 19:45:23 fungi: ya should be possible, but getting all the little details right with github api is likely to be a learning experience :) 19:45:54 agreed. those of us who eschew github have the privilege of not concerning ourselves with its proprietary api behaviors 19:46:04 ok I think that gives us a bit of a path forward. The other thing I'll mention is that most of the infra team don't subscribe to the kata ci (and other repo) issues on github 19:46:34 fuentess: if there are issues or concerns (particularly if urgent) it would be helpful to reach out directly to the team to get eyes on things. #openstack-infra or the openstack-infra mailing list are probably the best places for that today 19:46:58 openstack-infra@lists.openstack.org is the mailing list (in case that wasn't clear) 19:47:06 clarkb, ok, good to know, I didnt know this until last week, hehe 19:47:12 thanks for the infio 19:47:21 alright anything else? if not I'll give the floor to ssbarnea 19:47:34 I think thats it, lets move on 19:47:37 fuentess: thanks! 19:47:40 thanks everyone for your comments 19:47:42 i would like to propose changing the ansible output callback used by zuul from"default" to a more human readable one, either "yaml" or "debug" ones (debug used to be called human_log). We already have a wip change to do this on tripleo. initial attempt to change to yaml didn't work very well because this callback was introduced only in ansible 2.5, but debug one which is similar was introduced in ~2.2. 19:48:13 for those that do not know the differences, it may help to have a look at https://sbarnea.com/f/ansible-output/ 19:48:33 click on "default", "debug" and "yaml" to see how they differ. 19:48:33 we're using ansible 2.5 on the zuul.openstack.org executors currently, right? 19:48:46 ssbarnea: what's the use case? 19:48:50 mainly is about *not* displaying stdout/err as JSON 19:49:18 the text console logfile doesn't serve that purpose? 19:49:33 fungi: you still get the json output in the console logfile when there are errors iirc 19:49:40 ahh 19:49:57 default callback displays the stdout, stderr, (and ther splitted version) in a way that makes it very user unfriendly, especially for commands that return more than a couple of lines. 19:50:33 maybe my example does not contian enough lines to make the difference obvious 19:50:36 ya I agree that the yaml is a bit more readable 19:51:15 ssbarnea: we (zuul) have discussed using the json output that we save to render a console log which is more readable but also can have functionality like expanding/contracting plays/tasks/etc. 19:51:27 ssbarnea: would that serve your purpose? 19:52:19 corvus: not really because a fix can be achieved with one line, and you are talking about a complex feature which needs to developed, tested..... 19:52:27 (we still need a streaming console output which would probably be similar to what we have now, but that doesn't have to be the same thing that shows up in the archived logs) 19:52:42 and colapsable output is not something you ship after a weekend of writing it. 19:52:57 ssbarnea: wait, you're saying it won't satisfy your use-case because it will be hard to do? 19:53:24 is what we can do now, vs what we could do in the future (as in long term). 19:53:27 I think ssbarnea is pointing out that its a minor change to make that would make the situation better today (anddoesn't conflict with the harder but maybe overal better UX work) 19:53:50 i am not against the colapsable feature, but I am also realistic. if I can sort something today, i do. 19:54:03 i think it's fine to point that out. i would, however, appreciate a straight answer to my question since i'm trying rather hard to understand the request. 19:54:05 clarkb: well pointed. 19:55:59 maybe this discussion would be a good one to have in #zuul or on the zuul mailing list? we have a few minutes left in our hour. I'd like to make sure we don't ignore any other pressing concerns 19:56:11 #topic Open Discussion 19:56:27 in few minutes I can update the examples to make it easier to spot the difference in output 19:56:52 ssbarnea: that would be helpful. also, answering my question would be helpful. 19:56:55 corvus: I spoke quickly with the foundation last week about drafting the opendev message and they agreed that it made sense to figure that out. I'll try to get a discussion going in some venue now that we are all back from travel and vacation 19:57:20 ianw: gary_perkins: anything we can do to help with the new arm cloud work? 19:58:07 clarkb: i have outstanding queries to gary_perkins ATM ... 19:58:11 i think the branding discussion we had in irc at the beginning of the name selection process worked well. something along those lines again seems reasonable 19:59:39 alright we are basically at time. as mentioned before find us ing #openstack-infra or at openstack-infra@lists.openstack.org for further discussion 19:59:45 Thank you everyone! 19:59:53 #endmeeting