19:01:16 <clarkb> #startmeeting infra
19:01:17 <openstack> Meeting started Tue Oct 29 19:01:16 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:20 <openstack> The meeting name has been set to 'infra'
19:01:23 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2019-October/006502.html Our Agenda
19:01:29 <clarkb> #topic Announcements
19:02:22 <clarkb> The summit and ptg are next week in shanghai, china
19:02:38 <clarkb> For this reason we'll cancel next week's meeting
19:03:20 <Shrews> safe travels to all going
19:04:17 <fungi> thanks!
19:04:49 <clarkb> I expect we'll meet per usual the week after
19:04:55 <clarkb> I'll be jet lagged but around
19:05:00 <fungi> same
19:05:41 <corvus> i'll be away that week
19:06:52 <clarkb> I expect it will be quiet all around due to everyone else traveling/vacationing/being jetlagged
19:07:07 <clarkb> but I'll do my best to recap the events in shanghai that tuesday
19:07:48 <clarkb> #topic Actions from last meeting
19:07:54 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-10-22-19.01.txt minutes from last meeting
19:08:05 <clarkb> No actions
19:08:10 <clarkb> #topic Specs approval
19:08:25 <clarkb> #link https://review.opendev.org/#/c/683852/ Migrating off of static.openstack.org spec
19:08:29 * fungi saw a spec approved
19:08:54 <clarkb> well I haven't approved it yet because fungi was the only person to review its latest patchset
19:09:08 <clarkb> current plan is to review it today giving everyone a last chance to look it over
19:09:16 <clarkb> corvus: ^ did you want to check that the edits cover your comment?
19:09:16 <fungi> oh! i misread the gerrit notification in that case
19:09:29 <corvus> will do
19:09:40 <clarkb> thank you!
19:09:41 <fungi> but i suppose i should still be thrilled that i managed to review something for a change
19:09:55 <clarkb> #topic Priority Efforts
19:10:01 <clarkb> #topic OpenDev
19:10:06 <clarkb> #link https://etherpad.openstack.org/p/rCF58JvzbF Governance email draft
19:10:07 <AJaeger> sorry, late joining due to day light saving time switch ;(
19:10:24 <fungi> did it actually save you any daylight?
19:10:26 <clarkb> Speaking of things to review ^ if you have thoughts or comments on that they would be much appreciated. At this rate I rpobably won't send it until after the summit anyway
19:10:57 <fungi> i'm good with it, only left the one comment
19:11:06 <fungi> which was more a question for the group i suppose
19:12:04 * AJaeger is also fine with the email
19:12:24 <corvus> clarkb: lgtm
19:13:17 <clarkb> thanks
19:13:21 <clarkb> #topic Update Config Management
19:13:36 <clarkb> I know mordred has been making progress with review-dev recently though I've not been able to keep up the last few days
19:13:55 <clarkb> he isn't here to correct me but I believe the ansible has been run by hand and is working and now we just need to get changes merged
19:14:25 <clarkb> #link https://review.opendev.org/#/c/630406/69 stack starts there
19:14:49 <clarkb> if you have time to review I'm sure mordred would be happy for that and we can probably let mordred approve things that he is confident in? though it is also the -dev server so not really scary in any case
19:15:44 <clarkb> Any other config management related items to bring up?
19:17:53 <clarkb> #topic Storyboard
19:18:06 <fungi> we've got working webclient drafts again!
19:18:31 <clarkb> via the plan to wildcard sources?
19:19:00 <fungi> all it took was a couple new config features for the api server and supporting an additional action for the auth endpoint
19:19:25 <fungi> yeah, now cors and openid acls can have entries which start with ^ and those are treated as a regex
19:20:04 <diablo_rojo> \o/
19:20:25 <fungi> however we discovered that the client redirect url is so long on our draft builds (due to object storage) that launchpad switches from get to post for passing the data along to the api
19:20:43 <fungi> so that endpoint needed to be adjusted to also accept post
19:21:06 <fungi> and then of course we updated the config for storyboard-dev to basically allow webclients hosted anywhere
19:22:14 <clarkb> because our rax swift cdn hosted files can come from ~3 * 4096 urls?
19:22:33 <fungi> well, that's just the hostnames
19:22:40 <fungi> but yeah
19:23:14 <fungi> previously storyboard-dev.o.o allowed clients at storyboard-dev.o.o and logs.o.o
19:24:59 <clarkb> anything else on this subject or should we continue?
19:25:12 <fungi> there's been some new changes pushed by SotK to support deploying on newer platforms
19:25:22 <fungi> but no, nothing else i'm aware of to highlight right now
19:26:09 <clarkb> #topic General topics
19:26:15 <clarkb> First up is graphite.opendev.org
19:26:29 <clarkb> We've seen a few questions come up about this so I wanted to make sure everyone was up to speed on the situation there.
19:26:55 <clarkb> Sometime last week we ran out of disk on the server. After investigating I found taht the stats.timers. stats paths in particular were using the bulk of the disk
19:27:21 <clarkb> From there we updated the config to reduce retention of that subpath to a year (instead of 5 years) as well as reducing the granularity in that time period
19:27:51 <clarkb> in order for that to take effect we had to stop services and run whisper-resize.py across all of the files which took ~3 days
19:28:10 <clarkb> This means that we have a statsd gap of ~3 days, but all the data should match its config and we have plenty of disk to spare too
19:28:56 <corvus> in retrospect, we probably didn't need to change the retention period, as it's likely that the openstacksdk bug was responsible for a large amount of the space used.
19:29:11 <clarkb> corvus: ya the sdk bug was a good chunk of it
19:29:19 <corvus> i confess i was partial to the old retention period -- i've found year-on-year stats to be really helpful in the past
19:29:38 <clarkb> corvus: I expect we can increase the retention now without doing a whisper resize
19:29:45 <clarkb> since the resize is for reducing retention?
19:30:03 <fungi> yeah, though we've lost the old stats now presumably
19:30:05 <clarkb> but before we do that we should check that sdk hasn't created a bunch of resource id specific files
19:30:07 <corvus> i'm not entirely sure about that; i thought the retention was encoded in the whisper files when they were created?
19:30:12 <Shrews> was this the sdk bug that was already fixed?
19:30:18 <clarkb> fungi: yes corvus ah
19:30:24 <corvus> Shrews: yep
19:30:33 <clarkb> Shrews: we think it may have been fixed and cleared out the data that was buggy so that it iwll be apparent if it returns
19:30:45 <Shrews> *nod*
19:30:47 <clarkb> Shrews: I can check the filesystem after the meeting to see if new buggy resources have shown up
19:30:55 <clarkb> note the swift api is still "buggy"
19:31:04 <clarkb> but that is because its api is different
19:31:20 <clarkb> and for our use case has a tiny impact comparatively
19:31:27 <corvus> clarkb: how is that manifesting?
19:31:40 <clarkb> corvus: containers and object names get stats files
19:31:52 <corvus> clarkb: we make a lot of objects
19:31:53 <clarkb> its probably ok to have per container stats but per objcet is a bit much
19:32:24 <clarkb> disk consumption is like 70MB for swift vs 200GB for nova though
19:32:34 <clarkb> so I think we can sort that out without it being an emergency
19:32:42 <corvus> why is nova so large?
19:33:29 <clarkb> corvus: I think it is because we only report swift stats to statsd for image uploads to rax. But compute was recording every instance as its own thing
19:33:44 <clarkb> we create ~20 images a day and ~20000 instances
19:33:45 <corvus> oh, 200 is the old nova usage
19:33:47 <clarkb> ya
19:34:09 <fungi> verb tenses to the rescue
19:34:32 <corvus> i am understanding better now than i was before
19:34:53 <clarkb> The other thing we will want to bring up with sdk is that splitting operations by tenant or cloud is likely desireable
19:35:06 <corvus> it doesn't?
19:35:32 <clarkb> no project ids are removed from the urls (or that is the intent and was failing with the bug) and I don't think there is any cloud info
19:35:48 <clarkb> this means our statsd data isn't useful since we have a bunch of clouds all reporting to the same buckets and it will end up being noise
19:35:49 <corvus> Shrews: can nodepool pass in a statsd prefix with the provider info?
19:36:11 <corvus> clarkb: yes, that means it's completely useless for the purpose we created it for
19:36:17 <Shrews> corvus: we can probably make it so
19:36:39 <corvus> but iirc, i think maybe the statsd prefix is something that can be passed in per-connection...
19:36:55 <clarkb> corvus: ah if so then we'll need to update the config but that is doable
19:36:59 <corvus> if so, then nodepool could fix that for us without a change to sdk, since nodepool provider is how we want that grouped anyway
19:37:07 <clarkb> ++
19:37:49 <corvus> for reference, these starts are supposed to enable the graphs at the bottom of pages like http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1
19:37:53 <corvus> under "api operations"
19:38:07 <corvus> s/starts/stats/
19:38:55 <clarkb> we can follow up on that after the meeting. Basically there may be other sdk bugs or nodepool bugs and we should fix them but the emergency is over
19:39:11 <clarkb> Next up is PTG/Summit planning
19:39:56 <clarkb> I mentioned this last week but for those of us attending we should expect to be flexible and roll with things that pop up
19:40:14 <clarkb> #link https://etherpad.openstack.org/p/OpenDev-Shanghai-PTG-2019 Feel free to add topics here for discussion/ptg hacking
19:40:18 <fungi> i finally got my entry visa yesterday, so i am definitely going to try to be there (barring issues with any of the various modes of transportation which lurk between here and there)
19:40:21 <clarkb> #link https://www.openstack.org/ptg/#tab_schedule PTG Schedule
19:40:55 <clarkb> Monday night during the booth crawl thing is a meet the project leaders time as well as lunch on wednesday
19:41:20 <clarkb> I plan to be there wearning my opendev and zuul hats
19:41:35 <clarkb> other than that I expect it will be an adventure
19:41:37 * fungi needs a zuul logo fez
19:42:29 <clarkb> Next I wanted to bring up the opensuse-150 image cleanup
19:42:47 <clarkb> we've been cleaning up images recently (opensuse-423, fedora-28)
19:43:01 <clarkb> opensuse 150 is next on the cleanup list (was replaced by opensuse-15)
19:43:09 <clarkb> #link https://review.opendev.org/#/q/topic:use-opensuse-15
19:43:31 <AJaeger> clarkb: I'm using that topic now as well...
19:43:41 <clarkb> cleaning these up gives us capacity to add newer images and reduces overall load (building and uploading large images is a time sink)
19:43:43 <clarkb> AJaeger: thank you
19:43:45 <AJaeger> (but not for those in flight)
19:44:22 <clarkb> Also be aware that by removing fedora-28 we may break jobs that were consuming those atomic images
19:44:29 <clarkb> I think the official story from us is use 29 or 30 isntead
19:45:04 <clarkb> and finally fungi any updates on the wiki replacement?
19:45:06 <ianw> (that should all be cleaned up & released as of now)
19:45:58 <fungi> i've not found time to press further on the wiki. looks like i need to start checking for errors logged to apache as the next step
19:46:22 <fungi> likely it will entail sync'ing up the main config for it
19:46:49 <fungi> i've still got a stack of changes for it which need review too
19:47:08 <fungi> topic:xenial-upgrades project:opendev/puppet-mediawiki
19:47:29 <fungi> getting those landed will help me make more progress
19:47:48 * clarkb adds them to the review list
19:48:02 <clarkb> #topic Open Discussion
19:48:36 <clarkb> I am likely to take it easy the next couple days as I've traveling straight through the weekend and want a day or two of rest prior to the summit marathon
19:48:52 <fungi> #link https://review.opendev.org/691939 Replace old Train cycle signing key with Ussuri
19:49:05 <corvus> i leave tomorrow and don't expect to be around the rest of this week
19:49:16 <fungi> if any config-core reviewers is up for approving that today then we don't need to amend the dependent change for the release docs
19:49:37 <clarkb> +2 from me
19:50:20 <ianw> #link https://storyboard.openstack.org/#!/story/2006762
19:50:38 <ianw> this is a story to replace our mirrors with opendev.org versions
19:51:05 <clarkb> ianw: thanks for putting that together I guess we should all grab one or two and start getting them done
19:51:10 <fungi> i drive to durham thursday and then fly out friday morning. need to find out if Shrews has any ideas for what i should be doing around there on halloween
19:51:29 <ianw> yeah, if anyone wants to do mirrors, welcome ...
19:52:00 <ianw> and speaking off, i've spent an increasing amount of time dealing with broken vos releases lately
19:52:02 <ianw> #link https://review.opendev.org/691824
19:52:47 <ianw> there's hopefully sufficient detail in that review, but it has been suggested that using -localauth with "vos release" is really the best you can do
19:53:25 <ianw> if that idea is acceptable to us, i'd like to get that in and probably stage it with fedora initially
19:54:06 <clarkb> in that case I guess we rely on ssh to be a reliable connection but it typically is so should be fine
19:54:44 <fungi> did you ever manage to work out why vos release of the fedora volumes takes so very, very long to complete?
19:55:02 <ianw> sort of, not really ...
19:55:34 <ianw> https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/fedora-mirror-update#L124
19:55:54 <ianw> is about the best guess ... but it still seems to want to do full releases constantly
19:57:30 <corvus> that is not how i understood vos release to work, but i'm no expert
19:58:15 <fungi> ouch, that's quite the strangeness
19:58:39 <clarkb> We are just about at time. Thank you everyone ! We'll be back in 2 weeks.
19:58:44 <corvus> i thought it behaved more like your standard cow
19:59:04 <corvus> but i guess there's lots of varieties of cows
19:59:11 <clarkb> Feel free to continue discussions over in #openstack-infra or on the mailing list but I'm going to formally end the meeting now
19:59:15 <clarkb> #endmeeting