19:03:17 #startmeeting infra 19:03:18 Meeting started Tue Aug 15 19:03:17 2017 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:22 The meeting name has been set to 'infra' 19:03:39 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:57 #topic Announcements 19:04:27 I wasn't told of any announcements and don't see any on the wiki. I'll remind people that the PTG is coming up so make sure you are prepared if planning on attending 19:04:40 is there anything else that people want to add? 19:05:01 o/ (sorry late) 19:05:10 * mordred waves - also sorry late 19:05:18 congrats clarkb on being PTL! 19:05:43 ++ 19:05:45 clarkb: congrats! 19:05:50 I suppose its mostly official at this point since no one else is running 19:05:54 clarkb: congrats! 19:06:04 don't forget to vote in the docs or ironic elections if eligible 19:06:05 or as fungi would say: condolences! 19:06:10 clarkb: yeah, no chance to get out of it anymore ;) 19:06:14 ha 19:06:43 ok if nothing else lets move on to actions 19:07:05 #topic Actions from last meeting 19:07:16 #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-08-08-19.03.txt Minutes from last meeting 19:07:24 #action fungi get switchport counts for infra-cloud 19:07:44 I haven't seen that that has happened yet. I don't think we have heard back from hpe on the networking questions we sent them either 19:08:34 ianw: replacing mirror update server and upgrading bandersnatch was also on the list of items from last meeting. Have you had a chance to do that yet (or any word from the release team on it?) 19:08:52 oh, no, i haven't got into the details of that yet 19:09:56 #action ianw Upgrade mirror-update server and upgrade bandersnatch on that server 19:10:24 #topic Specs approval 19:10:59 Skimming through the specs I don't see any new specs that are ready for approval 19:11:18 we do have a couple cleanups for completed items though. 19:11:32 #link https://review.openstack.org/#/c/482422/ is an implemented zuul spec 19:11:49 jeblair: ^ maybe you can review that one and if ACK'd we can go ahead and get fungi to approve it? 19:12:36 the other is fungi's change to mark the contact info changes to gerrit as completed, but I think we'll talk about that under priority efforts 19:12:37 i'll take a look 19:12:38 #link https://review.openstack.org/#/c/492287/ 19:13:08 are there any specs that I missed or that people want to bring up in order to get more review? 19:14:16 ok moving on 19:14:18 #topic Priority Efforts 19:14:55 Fungi has several changes up to complete the Gerrit Contact store removal process 19:15:12 #link https://review.openstack.org/#/c/492287/ 19:15:20 #link https://review.openstack.org/#/c/491090/ 19:15:28 #link https://review.openstack.org/#/c/492329/ 19:15:49 at this point Gerrit is running without that feature enabled so we just need reviews on those changes and approvals 19:16:39 Related to that is the Gerrit 2.13 upgrade but I don't have anything new on that effort. Its mostly in holding while I deal with release items and will pick it up again around the PTG to make sure we are ready to implement the upgrade 19:16:52 I should however send another gerrit outage announcement to the dev list 19:17:06 #action clarkb Send reminder notice for Gerrit upgrade outage to dev mailing list 19:17:37 jeblair: do you have anything on the zuulv3 effort? 19:18:54 i don't have anything pressing for the group 19:19:20 #topic General topics 19:20:01 There weren't any general topics added to the agenda. I did want to mention general CI resources though 19:20:29 Ya, I've been trying to keep an eye on tripleo jobs, they are doing a lot of gate resets current 19:20:34 which doesn't really help 19:20:46 The OSIC cloud has gone away and we are temporarily without OVH while we wait on word of voucher reset for our account 19:21:02 most of the issues were not using our AFS mirrors for things, most have been corrected. But docker.io is now the issue 19:21:03 this has put extra strain on our other clouds 19:21:44 Thank you to everyone that has helped out in addressing these issues. Rackspace and vexxhost have given us more quota than we had before whcih has been helpful and a lot of people have been working on debugging mirror related issue 19:22:03 As a result we are a lot more stable today than we were a week ago 19:23:07 One lesson here is we may want to take on artificial cloud outages once a week or something. We turn off a cloud or a region every week and ensure we still hold up? I haven't really thought about the logistics of that but in theory we should handle it well and proving it in practice is probably a good idea 19:23:30 in the past it wasn't artificial we had outages all the time but the cloud we use seem to have gotten a lot more reliable over the years 19:23:37 (we don't have to discuss this now was just an idea I had) 19:23:42 (and probably needs a spec) 19:24:14 #topic Open discussion 19:24:22 clarkb: ++ to artificial outages 19:25:11 So, for some reason we are still getting a large amount of failures downloading yum packages: http://status.openstack.org/elastic-recheck/#1708704 I'd be interested in talking with more people about it when they have time 19:26:36 pabelanger: I can probably dig into that later today or tomorrow. I'm trying to make sure we are as stable as possible for when all the final RCs get cut next week 19:26:57 pabelanger: is it slow mirrors? 19:26:57 lastly, it would be great for people to look at https://review.openstack.org/492671/ again. Adds zuulv3-dev ssh key so we can upload tarballs to that server 19:27:30 ianw: I am not sure, for sto2 and kna1, we just bumpted them to 4C/8GB, from 2C/4GB 19:27:30 #link https://review.openstack.org/492671/ Adds zuulv3-dev ssh key for uploading tarballs to that server 19:27:45 pabelanger: I'll take a look after lunch. I'm also going to review those chagnes of fungi's to finish up the contact store work 19:27:46 but it is possible that it is just flaky networking 19:27:56 i have definitely seen issues, mostly in dib testing, where our known slow afs problems have lead to timeouts 19:29:10 i've been pretty closely watching devstack+centos7 runs lately (https://etherpad.openstack.org/p/centos7-dsvm-triage) and not seen mirror issues there 19:29:13 it mostly seems limited to rax and citycloud, so I am leaning to provider issue 19:29:40 #link https://review.openstack.org/#/c/493057/ updates devstack-gate to do grenade upgrades from pike to master and ocata to pike 19:29:52 this change isn't passing yet, but we should keep it on our radar for merging once it does 19:30:46 http://logs.openstack.org/39/492339/1/gate/gate-tripleo-ci-centos-7-undercloud-oooq/72dc8e2/console.html#_2017-08-10_15_59_32_571607 19:30:58 random change ^ ... that looks like mirror.rax.ord hung up on it 19:31:05 #link http://logs.openstack.org/39/492339/1/gate/gate-tripleo-ci-centos-7-undercloud-oooq/72dc8e2/console.html#_2017-08-10_15_59_32_571607 for yum related failures 19:31:23 oh, that's very old though 19:31:52 ya we probably want to be looking at logs from today since the apache cache storage config was changed out yesterday 19:31:55 in the time when that sever was quite unhealthy i guess 19:32:45 we've also upgraded the size of the servers over the last couple days 19:32:47 ya, today forward will be a good sample 19:33:30 not many in last 12 hours 19:33:32 http://logs.openstack.org/11/487611/4/gate/gate-kolla-dsvm-build-centos-source-centos-7/2b70414/console.html#_2017-08-15_09_33_48_479030 19:33:46 #link http://logs.openstack.org/11/487611/4/gate/gate-kolla-dsvm-build-centos-source-centos-7/2b70414/console.html#_2017-08-15_09_33_48_479030 newer yum failure 19:33:55 maybe another issue, but not all mirrors have swap. I noticed our make_swap.sh script hasn't been creating swapfiles if we have extra drives 19:34:13 so, the fix for that is to move d-g swap function into ansible role 19:34:24 which happens to line up well with zuulv3 erffort 19:34:28 2017-08-15 09:33:48.464417 | INFO:kolla.image.build.zun-base:http://mirror.iad.rax.openstack.org/centos/7/os/x86_64/Packages/centos-logos-70.0.6-3.el7.centos.noarch.rpm: [Errno 14] curl#18 - "transfer closed with 4024836 bytes remaining to read" 19:34:33 is the root cause there 19:35:57 probably best to debug that in the infra channel? Is there anything else we want to talk about in the meeting? 19:36:30 I'll give it a couple minutes but if there isn't anything else going to give everyone ~20 minutes back for lunch or breakfast or afternoon tea 19:36:49 how will i decide! 19:37:19 already past of lunch, breakfast, and afternoon tea ;( 19:37:46 AJaeger: beer and dinner is good too :) 19:37:52 ;) 19:39:03 Alright, I think that is it then. Thank you everyone 19:39:08 #endmeeting