#openstack-meeting log

19:03:17 <clarkb> #startmeeting infra
19:03:18 <openstack> Meeting started Tue Aug 15 19:03:17 2017 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:22 <openstack> The meeting name has been set to 'infra'
19:03:39 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:03:57 <clarkb> #topic Announcements
19:04:27 <clarkb> I wasn't told of any announcements and don't see any on the wiki. I'll remind people that the PTG is coming up so make sure you are prepared if planning on attending
19:04:40 <clarkb> is there anything else that people want to add?
19:05:01 <jeblair> o/ (sorry late)
19:05:10 * mordred waves - also sorry late
19:05:18 <cmurphy> congrats clarkb on being PTL!
19:05:43 <pabelanger> ++
19:05:45 <jeblair> clarkb: congrats!
19:05:50 <clarkb> I suppose its mostly official at this point since no one else is running
19:05:54 <AJaeger> clarkb: congrats!
19:06:04 <clarkb> don't forget to vote in the docs or ironic elections if eligible
19:06:05 <jeblair> or as fungi would say: condolences!
19:06:10 <AJaeger> clarkb: yeah, no chance to get out of it anymore ;)
19:06:14 <clarkb> ha
19:06:43 <clarkb> ok if nothing else lets move on to actions
19:07:05 <clarkb> #topic Actions from last meeting
19:07:16 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-08-08-19.03.txt Minutes from last meeting
19:07:24 <clarkb> #action fungi get switchport counts for infra-cloud
19:07:44 <clarkb> I haven't seen that that has happened yet. I don't think we have heard back from hpe on the networking questions we sent them either
19:08:34 <clarkb> ianw: replacing mirror update server and upgrading bandersnatch was also on the list of items from last meeting. Have you had a chance to do that yet (or any word from the release team on it?)
19:08:52 <ianw> oh, no, i haven't got into the details of that yet
19:09:56 <clarkb> #action ianw Upgrade mirror-update server and upgrade bandersnatch on that server
19:10:24 <clarkb> #topic Specs approval
19:10:59 <clarkb> Skimming through the specs I don't see any new specs that are ready for approval
19:11:18 <clarkb> we do have a couple cleanups for completed items though.
19:11:32 <clarkb> #link https://review.openstack.org/#/c/482422/ is an implemented zuul spec
19:11:49 <clarkb> jeblair: ^ maybe you can review that one and if ACK'd we can go ahead and get fungi to approve it?
19:12:36 <clarkb> the other is fungi's change to mark the contact info changes to gerrit as completed, but I think we'll talk about that under priority efforts
19:12:37 <jeblair> i'll take a look
19:12:38 <clarkb> #link https://review.openstack.org/#/c/492287/
19:13:08 <clarkb> are there any specs that I missed or that people want to bring up in order to get more review?
19:14:16 <clarkb> ok moving on
19:14:18 <clarkb> #topic Priority Efforts
19:14:55 <clarkb> Fungi has several changes up to complete the Gerrit Contact store removal process
19:15:12 <clarkb> #link https://review.openstack.org/#/c/492287/
19:15:20 <clarkb> #link https://review.openstack.org/#/c/491090/
19:15:28 <clarkb> #link https://review.openstack.org/#/c/492329/
19:15:49 <clarkb> at this point Gerrit is running without that feature enabled so we just need reviews on those changes and approvals
19:16:39 <clarkb> Related to that is the Gerrit 2.13 upgrade but I don't have anything new on that effort. Its mostly in holding while I deal with release items and will pick it up again around the PTG to make sure we are ready to implement the upgrade
19:16:52 <clarkb> I should however send another gerrit outage announcement to the dev list
19:17:06 <clarkb> #action clarkb Send reminder notice for Gerrit upgrade outage to dev mailing list
19:17:37 <clarkb> jeblair: do you have anything on the zuulv3 effort?
19:18:54 <jeblair> i don't have anything pressing for the group
19:19:20 <clarkb> #topic General topics
19:20:01 <clarkb> There weren't any general topics added to the agenda. I did want to mention general CI resources though
19:20:29 <pabelanger> Ya, I've been trying to keep an eye on tripleo jobs, they are doing a lot of gate resets current
19:20:34 <pabelanger> which doesn't really help
19:20:46 <clarkb> The OSIC cloud has gone away and we are temporarily without OVH while we wait on word of voucher reset for our account
19:21:02 <pabelanger> most of the issues were not using our AFS mirrors for things, most have been corrected. But docker.io is now the issue
19:21:03 <clarkb> this has put extra strain on our other clouds
19:21:44 <clarkb> Thank you to everyone that has helped out in addressing these issues. Rackspace and vexxhost have given us more quota than we had before whcih has been helpful and a lot of people have been working on debugging mirror related issue
19:22:03 <clarkb> As a result we are a lot more stable today than we were a week ago
19:23:07 <clarkb> One lesson here is we may want to take on artificial cloud outages once a week or something. We turn off a cloud or a region every week and ensure we still hold up? I haven't really thought about the logistics of that but in theory we should handle it well and proving it in practice is probably a good idea
19:23:30 <clarkb> in the past it wasn't artificial we had outages all the time but the cloud we use seem to have gotten a lot more reliable over the years
19:23:37 <clarkb> (we don't have to discuss this now was just an idea I had)
19:23:42 <clarkb> (and probably needs a spec)
19:24:14 <clarkb> #topic Open discussion
19:24:22 <mordred> clarkb: ++ to artificial outages
19:25:11 <pabelanger> So, for some reason we are still getting a large amount of failures downloading yum packages: http://status.openstack.org/elastic-recheck/#1708704 I'd be interested in talking with more people about it when they have time
19:26:36 <clarkb> pabelanger: I can probably dig into that later today or tomorrow. I'm trying to make sure we are as stable as possible for when all the final RCs get cut next week
19:26:57 <ianw> pabelanger: is it slow mirrors?
19:26:57 <pabelanger> lastly, it would be great for people to look at https://review.openstack.org/492671/ again. Adds zuulv3-dev ssh key so we can upload tarballs to that server
19:27:30 <pabelanger> ianw: I am not sure, for sto2 and kna1, we just bumpted them to 4C/8GB, from 2C/4GB
19:27:30 <clarkb> #link https://review.openstack.org/492671/ Adds zuulv3-dev ssh key for uploading tarballs to that server
19:27:45 <clarkb> pabelanger: I'll take a look after lunch. I'm also going to review those chagnes of fungi's to finish up the contact store work
19:27:46 <pabelanger> but it is possible that it is just flaky networking
19:27:56 <ianw> i have definitely seen issues, mostly in dib testing, where our known slow afs problems have lead to timeouts
19:29:10 <ianw> i've been pretty closely watching devstack+centos7 runs lately (https://etherpad.openstack.org/p/centos7-dsvm-triage) and not seen mirror issues there
19:29:13 <pabelanger> it mostly seems limited to rax and citycloud, so I am leaning to provider issue
19:29:40 <clarkb> #link https://review.openstack.org/#/c/493057/ updates devstack-gate to do grenade upgrades from pike to master and ocata to pike
19:29:52 <clarkb> this change isn't passing yet, but we should keep it on our radar for merging once it does
19:30:46 <ianw> http://logs.openstack.org/39/492339/1/gate/gate-tripleo-ci-centos-7-undercloud-oooq/72dc8e2/console.html#_2017-08-10_15_59_32_571607
19:30:58 <ianw> random change ^ ... that looks like mirror.rax.ord hung up on it
19:31:05 <clarkb> #link http://logs.openstack.org/39/492339/1/gate/gate-tripleo-ci-centos-7-undercloud-oooq/72dc8e2/console.html#_2017-08-10_15_59_32_571607 for yum related failures
19:31:23 <ianw> oh, that's very old though
19:31:52 <clarkb> ya we probably want to be looking at logs from today since the apache cache storage config was changed out yesterday
19:31:55 <ianw> in the time when that sever was quite unhealthy i guess
19:32:45 <clarkb> we've also upgraded the size of the servers over the last couple days
19:32:47 <pabelanger> ya, today forward will be a good sample
19:33:30 <ianw> not many in last 12 hours
19:33:32 <ianw> http://logs.openstack.org/11/487611/4/gate/gate-kolla-dsvm-build-centos-source-centos-7/2b70414/console.html#_2017-08-15_09_33_48_479030
19:33:46 <clarkb> #link http://logs.openstack.org/11/487611/4/gate/gate-kolla-dsvm-build-centos-source-centos-7/2b70414/console.html#_2017-08-15_09_33_48_479030 newer yum failure
19:33:55 <pabelanger> maybe another issue, but not all mirrors have swap. I noticed our make_swap.sh script hasn't been creating swapfiles if we have extra drives
19:34:13 <pabelanger> so, the fix for that is to move d-g swap function into ansible role
19:34:24 <pabelanger> which happens to line up well with zuulv3 erffort
19:34:28 <ianw> 2017-08-15 09:33:48.464417 | INFO:kolla.image.build.zun-base:http://mirror.iad.rax.openstack.org/centos/7/os/x86_64/Packages/centos-logos-70.0.6-3.el7.centos.noarch.rpm: [Errno 14] curl#18 - "transfer closed with 4024836 bytes remaining to read"
19:34:33 <ianw> is the root cause there
19:35:57 <clarkb> probably best to debug that in the infra channel? Is there anything else we want to talk about in the meeting?
19:36:30 <clarkb> I'll give it a couple minutes but if there isn't anything else going to give everyone ~20 minutes back for lunch or breakfast or afternoon tea
19:36:49 <jeblair> how will i decide!
19:37:19 <AJaeger> already past of lunch, breakfast, and afternoon tea ;(
19:37:46 <clarkb> AJaeger: beer and dinner is good too :)
19:37:52 <AJaeger> ;)
19:39:03 <clarkb> Alright, I think that is it then. Thank you everyone
19:39:08 <clarkb> #endmeeting