19:01:16 #startmeeting infra 19:01:17 Meeting started Tue Oct 29 19:01:16 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:20 The meeting name has been set to 'infra' 19:01:23 #link http://lists.openstack.org/pipermail/openstack-infra/2019-October/006502.html Our Agenda 19:01:29 #topic Announcements 19:02:22 The summit and ptg are next week in shanghai, china 19:02:38 For this reason we'll cancel next week's meeting 19:03:20 safe travels to all going 19:04:17 thanks! 19:04:49 I expect we'll meet per usual the week after 19:04:55 I'll be jet lagged but around 19:05:00 same 19:05:41 i'll be away that week 19:06:52 I expect it will be quiet all around due to everyone else traveling/vacationing/being jetlagged 19:07:07 but I'll do my best to recap the events in shanghai that tuesday 19:07:48 #topic Actions from last meeting 19:07:54 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-10-22-19.01.txt minutes from last meeting 19:08:05 No actions 19:08:10 #topic Specs approval 19:08:25 #link https://review.opendev.org/#/c/683852/ Migrating off of static.openstack.org spec 19:08:29 * fungi saw a spec approved 19:08:54 well I haven't approved it yet because fungi was the only person to review its latest patchset 19:09:08 current plan is to review it today giving everyone a last chance to look it over 19:09:16 corvus: ^ did you want to check that the edits cover your comment? 19:09:16 oh! i misread the gerrit notification in that case 19:09:29 will do 19:09:40 thank you! 19:09:41 but i suppose i should still be thrilled that i managed to review something for a change 19:09:55 #topic Priority Efforts 19:10:01 #topic OpenDev 19:10:06 #link https://etherpad.openstack.org/p/rCF58JvzbF Governance email draft 19:10:07 sorry, late joining due to day light saving time switch ;( 19:10:24 did it actually save you any daylight? 19:10:26 Speaking of things to review ^ if you have thoughts or comments on that they would be much appreciated. At this rate I rpobably won't send it until after the summit anyway 19:10:57 i'm good with it, only left the one comment 19:11:06 which was more a question for the group i suppose 19:12:04 * AJaeger is also fine with the email 19:12:24 clarkb: lgtm 19:13:17 thanks 19:13:21 #topic Update Config Management 19:13:36 I know mordred has been making progress with review-dev recently though I've not been able to keep up the last few days 19:13:55 he isn't here to correct me but I believe the ansible has been run by hand and is working and now we just need to get changes merged 19:14:25 #link https://review.opendev.org/#/c/630406/69 stack starts there 19:14:49 if you have time to review I'm sure mordred would be happy for that and we can probably let mordred approve things that he is confident in? though it is also the -dev server so not really scary in any case 19:15:44 Any other config management related items to bring up? 19:17:53 #topic Storyboard 19:18:06 we've got working webclient drafts again! 19:18:31 via the plan to wildcard sources? 19:19:00 all it took was a couple new config features for the api server and supporting an additional action for the auth endpoint 19:19:25 yeah, now cors and openid acls can have entries which start with ^ and those are treated as a regex 19:20:04 \o/ 19:20:25 however we discovered that the client redirect url is so long on our draft builds (due to object storage) that launchpad switches from get to post for passing the data along to the api 19:20:43 so that endpoint needed to be adjusted to also accept post 19:21:06 and then of course we updated the config for storyboard-dev to basically allow webclients hosted anywhere 19:22:14 because our rax swift cdn hosted files can come from ~3 * 4096 urls? 19:22:33 well, that's just the hostnames 19:22:40 but yeah 19:23:14 previously storyboard-dev.o.o allowed clients at storyboard-dev.o.o and logs.o.o 19:24:59 anything else on this subject or should we continue? 19:25:12 there's been some new changes pushed by SotK to support deploying on newer platforms 19:25:22 but no, nothing else i'm aware of to highlight right now 19:26:09 #topic General topics 19:26:15 First up is graphite.opendev.org 19:26:29 We've seen a few questions come up about this so I wanted to make sure everyone was up to speed on the situation there. 19:26:55 Sometime last week we ran out of disk on the server. After investigating I found taht the stats.timers. stats paths in particular were using the bulk of the disk 19:27:21 From there we updated the config to reduce retention of that subpath to a year (instead of 5 years) as well as reducing the granularity in that time period 19:27:51 in order for that to take effect we had to stop services and run whisper-resize.py across all of the files which took ~3 days 19:28:10 This means that we have a statsd gap of ~3 days, but all the data should match its config and we have plenty of disk to spare too 19:28:56 in retrospect, we probably didn't need to change the retention period, as it's likely that the openstacksdk bug was responsible for a large amount of the space used. 19:29:11 corvus: ya the sdk bug was a good chunk of it 19:29:19 i confess i was partial to the old retention period -- i've found year-on-year stats to be really helpful in the past 19:29:38 corvus: I expect we can increase the retention now without doing a whisper resize 19:29:45 since the resize is for reducing retention? 19:30:03 yeah, though we've lost the old stats now presumably 19:30:05 but before we do that we should check that sdk hasn't created a bunch of resource id specific files 19:30:07 i'm not entirely sure about that; i thought the retention was encoded in the whisper files when they were created? 19:30:12 was this the sdk bug that was already fixed? 19:30:18 fungi: yes corvus ah 19:30:24 Shrews: yep 19:30:33 Shrews: we think it may have been fixed and cleared out the data that was buggy so that it iwll be apparent if it returns 19:30:45 *nod* 19:30:47 Shrews: I can check the filesystem after the meeting to see if new buggy resources have shown up 19:30:55 note the swift api is still "buggy" 19:31:04 but that is because its api is different 19:31:20 and for our use case has a tiny impact comparatively 19:31:27 clarkb: how is that manifesting? 19:31:40 corvus: containers and object names get stats files 19:31:52 clarkb: we make a lot of objects 19:31:53 its probably ok to have per container stats but per objcet is a bit much 19:32:24 disk consumption is like 70MB for swift vs 200GB for nova though 19:32:34 so I think we can sort that out without it being an emergency 19:32:42 why is nova so large? 19:33:29 corvus: I think it is because we only report swift stats to statsd for image uploads to rax. But compute was recording every instance as its own thing 19:33:44 we create ~20 images a day and ~20000 instances 19:33:45 oh, 200 is the old nova usage 19:33:47 ya 19:34:09 verb tenses to the rescue 19:34:32 i am understanding better now than i was before 19:34:53 The other thing we will want to bring up with sdk is that splitting operations by tenant or cloud is likely desireable 19:35:06 it doesn't? 19:35:32 no project ids are removed from the urls (or that is the intent and was failing with the bug) and I don't think there is any cloud info 19:35:48 this means our statsd data isn't useful since we have a bunch of clouds all reporting to the same buckets and it will end up being noise 19:35:49 Shrews: can nodepool pass in a statsd prefix with the provider info? 19:36:11 clarkb: yes, that means it's completely useless for the purpose we created it for 19:36:17 corvus: we can probably make it so 19:36:39 but iirc, i think maybe the statsd prefix is something that can be passed in per-connection... 19:36:55 corvus: ah if so then we'll need to update the config but that is doable 19:36:59 if so, then nodepool could fix that for us without a change to sdk, since nodepool provider is how we want that grouped anyway 19:37:07 ++ 19:37:49 for reference, these starts are supposed to enable the graphs at the bottom of pages like http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 19:37:53 under "api operations" 19:38:07 s/starts/stats/ 19:38:55 we can follow up on that after the meeting. Basically there may be other sdk bugs or nodepool bugs and we should fix them but the emergency is over 19:39:11 Next up is PTG/Summit planning 19:39:56 I mentioned this last week but for those of us attending we should expect to be flexible and roll with things that pop up 19:40:14 #link https://etherpad.openstack.org/p/OpenDev-Shanghai-PTG-2019 Feel free to add topics here for discussion/ptg hacking 19:40:18 i finally got my entry visa yesterday, so i am definitely going to try to be there (barring issues with any of the various modes of transportation which lurk between here and there) 19:40:21 #link https://www.openstack.org/ptg/#tab_schedule PTG Schedule 19:40:55 Monday night during the booth crawl thing is a meet the project leaders time as well as lunch on wednesday 19:41:20 I plan to be there wearning my opendev and zuul hats 19:41:35 other than that I expect it will be an adventure 19:41:37 * fungi needs a zuul logo fez 19:42:29 Next I wanted to bring up the opensuse-150 image cleanup 19:42:47 we've been cleaning up images recently (opensuse-423, fedora-28) 19:43:01 opensuse 150 is next on the cleanup list (was replaced by opensuse-15) 19:43:09 #link https://review.opendev.org/#/q/topic:use-opensuse-15 19:43:31 clarkb: I'm using that topic now as well... 19:43:41 cleaning these up gives us capacity to add newer images and reduces overall load (building and uploading large images is a time sink) 19:43:43 AJaeger: thank you 19:43:45 (but not for those in flight) 19:44:22 Also be aware that by removing fedora-28 we may break jobs that were consuming those atomic images 19:44:29 I think the official story from us is use 29 or 30 isntead 19:45:04 and finally fungi any updates on the wiki replacement? 19:45:06 (that should all be cleaned up & released as of now) 19:45:58 i've not found time to press further on the wiki. looks like i need to start checking for errors logged to apache as the next step 19:46:22 likely it will entail sync'ing up the main config for it 19:46:49 i've still got a stack of changes for it which need review too 19:47:08 topic:xenial-upgrades project:opendev/puppet-mediawiki 19:47:29 getting those landed will help me make more progress 19:47:48 * clarkb adds them to the review list 19:48:02 #topic Open Discussion 19:48:36 I am likely to take it easy the next couple days as I've traveling straight through the weekend and want a day or two of rest prior to the summit marathon 19:48:52 #link https://review.opendev.org/691939 Replace old Train cycle signing key with Ussuri 19:49:05 i leave tomorrow and don't expect to be around the rest of this week 19:49:16 if any config-core reviewers is up for approving that today then we don't need to amend the dependent change for the release docs 19:49:37 +2 from me 19:50:20 #link https://storyboard.openstack.org/#!/story/2006762 19:50:38 this is a story to replace our mirrors with opendev.org versions 19:51:05 ianw: thanks for putting that together I guess we should all grab one or two and start getting them done 19:51:10 i drive to durham thursday and then fly out friday morning. need to find out if Shrews has any ideas for what i should be doing around there on halloween 19:51:29 yeah, if anyone wants to do mirrors, welcome ... 19:52:00 and speaking off, i've spent an increasing amount of time dealing with broken vos releases lately 19:52:02 #link https://review.opendev.org/691824 19:52:47 there's hopefully sufficient detail in that review, but it has been suggested that using -localauth with "vos release" is really the best you can do 19:53:25 if that idea is acceptable to us, i'd like to get that in and probably stage it with fedora initially 19:54:06 in that case I guess we rely on ssh to be a reliable connection but it typically is so should be fine 19:54:44 did you ever manage to work out why vos release of the fedora volumes takes so very, very long to complete? 19:55:02 sort of, not really ... 19:55:34 https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/fedora-mirror-update#L124 19:55:54 is about the best guess ... but it still seems to want to do full releases constantly 19:57:30 that is not how i understood vos release to work, but i'm no expert 19:58:15 ouch, that's quite the strangeness 19:58:39 We are just about at time. Thank you everyone ! We'll be back in 2 weeks. 19:58:44 i thought it behaved more like your standard cow 19:59:04 but i guess there's lots of varieties of cows 19:59:11 Feel free to continue discussions over in #openstack-infra or on the mailing list but I'm going to formally end the meeting now 19:59:15 #endmeeting