19:01:55 #startmeeting infra 19:01:56 Meeting started Tue Dec 4 19:01:55 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:57 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:59 The meeting name has been set to 'infra' 19:02:26 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:02:34 o/ 19:02:42 #topic Announcements 19:03:06 Yesterday fungi merged the openstack mailing lists (tc, ops, dev, general, sigs) into the new openstack-discuss list 19:03:15 if you have not yet joined that list now is a good time to do so 19:03:31 we're up to 515 subscribers now! 19:03:43 Woohoo! 19:03:47 Also k8s just had its first major CVE so if you are running a k8s you should upgrade (I upgraded nodepool's small cluster yesterday) 19:03:49 that's pretty close to our expected level based on active posters, yeah? 19:03:57 reasonably 19:04:24 my stats suggest we had 933 unique posters to any of the original lists in the first 2/3 of 2018 19:04:38 oh i thought it was 600. i guess i misremembered 19:04:48 factor in address changes and people who only joined to ask a question and it's likely a lot lower 19:04:55 fungi: did you send a final email to all the old lists? 19:05:03 yes, i see it now :) 19:05:06 yep, to each one before closing them 19:05:06 didn'y you have an active (eg more than 2 emails sent) list that was closer to 500? 19:05:13 still problems with that 't' key 19:05:22 * mordred waves from aeroplane 19:05:27 yeah, i had graduated counts. i can always do some fresh ones 19:05:28 aeroplane internet! 19:05:30 Note that the infra list is still there as its own thing 19:05:55 #topic Actions from last meeting 19:06:03 #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-11-27-19.01.txt minutes from last meeting 19:06:09 fungi did the thing with the mailing lists. Thank you 19:06:30 ++ 19:06:32 ianw I think you did end up cleaning up pypi volume in afs. What about the fedora 27 volume? 19:07:03 clarkb: pypi volume is all cleared out and gone. f27 was just some directory components, but that's gone too 19:07:18 afs servers have plenty of space now, so that's good 19:07:20 http://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1 graphs look much happier 19:07:30 ianw: thank you for cleaning that stuff up 19:07:32 o/ 19:08:21 np, there's nothing like running out of space for the thing you're interested in to prompt action ;) 19:08:29 #topic Specs Approval 19:08:49 #link https://review.openstack.org/#/c/607377/ Storyboard Attachments spec 19:08:59 I'll advertise this one again, but it looks like sphinx may be failing on it? 19:09:09 diablo_rojo: ^ fyi probably want to clean that up so it shows up in people's review queues 19:09:13 Yeah I will have that fixed today 19:09:37 #link https://review.openstack.org/#/c/581214/ Anomoly detection in CI logs 19:09:52 this is the other one I'll "advertise" I intend on reviewing that one myself shortly. 19:10:07 I was in tristanC and dirks summit presentation and its neat stuff 19:10:51 #link https://review.openstack.org/622439 Zuul governance proposal 19:11:02 probably worth noting that change here as well. 19:11:15 These changes all deserve (y)our feedback :) 19:11:27 #topic Priority Efforts 19:11:34 #topic Storyboard 19:12:00 diablo_rojo: fungi SotK seems like last week was the summit feedback digestion week. 19:12:06 anything to add here? 19:12:42 SotK has posted rather a few excellent looking improvements based on feedback we got at the ptg and summit 19:12:58 Yeah, definitely have patches to review 19:13:17 What needs to be done to actually move the trove db? 19:13:36 * diablo_rojo has brought this up twice but keeps forgetting to actually get action items out of the conversation 19:14:05 diablo_rojo: create the new db server somewhere, schedule a storyboard outage window, stop storyboard, migrate data to new db, start storyboard pointed at new db 19:14:15 i think we need to add the mysql module to the manifest for the server first so that we get a local db created 19:14:24 regarding attachments, why not using logs.openstack.org for storing attachments? attachment retention policy could be decided later, but it would be a system that is already serving files. 19:14:24 fungi: ya that would be step zero if doing local db 19:14:44 then a manual step to shut down the service and dump/source the db content 19:14:55 clarkb, this sounds like a thing I need an infra core friend for. 19:14:58 clarkb, diablo_rojo: i'd add to that that you can do test migrations beforehand without an outage to verify that it works and establish timing (so you know what the outage window will be) 19:15:12 ssbarnea|rover: I think the big concern with that is we'd like to not start with "legacy" file storage and just move ahead with scalable storage instead 19:15:14 corvus: ++ 19:15:26 diablo_rojo: yup you'll need an infra root in particular for the outage window tasks 19:16:17 corvus, good idea 19:16:27 ssbarnea|rover: i recommend reading through the spec if you haven't already. the requirements/expectations are already laid out there 19:16:42 logs.o.o is on its way out (slowly) 19:16:59 I can add why we aren't planning on using logs.o.o into the spec too 19:17:25 well, the current implementation of logs.o.o is on its way out anyway. we may still have something *named* logs.o.o with all new backend implementation? 19:17:37 * mordred can't wait for logs.o.o to go away 19:17:38 clarkb: i do not find those two conflicting. i read it, is ok. still i have a more practical view: some attachment support today is much more useful than a very well designed (and scalable) solution that is delivered in 2020 or later ;) 19:17:39 diablo_rojo: fungi: I think as far as actionable next steps go getting the new db set up via puppet ansible is step 0 then step 0.5 is write up a etherpad document for the process you'll need infra-root for and we can schedule a time for that and get volunteers? 19:17:57 diablo_rojo: fungi: similar to how we do gerrit project renames and other outages with a documented process 19:18:47 clarkb, happy to take point, but I might need a little help getting the ball rolling 19:18:48 diablo_rojo: I can help fill in the details of the process too 19:18:49 ssbarnea|rover: the main point is that we provide automatic indices on logs.o.o. we would need to _not_ do that to satisfy the need for attachments on private stories, which is one of the primary reasons we want attachments for sb at all 19:18:53 ++ some of the zanata work we had great checklists for, it was very easy 19:18:54 clarkb, sweet 19:19:53 anything else storyboard before we move on to config management modernization? 19:20:05 ssbarnea|rover: so an attachments "solution" which doesn't meet our need for private story attachments isn't really any solution at all 19:20:10 ssbarnea|rover: fungi diablo_rojo I do think we can take that to the spec review (and maybe add a section for that so its clear) 19:20:18 yep 19:20:34 fungi: yep, but indexes can be diabled on "/sb/" folder. anyway. 19:21:03 I got nothing else clarkb :) 19:21:05 ssbarnea|rover: sure, there's lots of things we *can* change to make something which isn't a solution into something which is 19:21:09 #topic Update Config Management 19:21:29 Last week we ran into a couple of issues with dynamic ansible inventories 19:21:44 we fixed one, which ran puppet/ansible once which installed new things which broke it in different ways 19:21:52 or rather one was fixed (we didn't fix it) 19:22:03 The end result was a switch to using static inventory 19:22:21 that seems to be working well so far though I don't think anyone has created new servers since 19:22:39 oh, yeah, we need a script for that 19:22:44 mordred: ^ were you planning on adding a script to generate the static inventory dynamically? I seem to recall that was on your list of todos? 19:23:00 that's how i remember it too :) 19:23:01 .... or we could get a free jira instance running and use it, with all features already implemented. it would not be OSI kind of opensource but... also it would allow us to focus on making openstack better. 19:23:35 clarkb: yes 19:23:39 mordred: awesome 19:23:51 clarkb: I should probably write that :) 19:23:51 #action mordred Add script to generate static inventory dynamically 19:24:02 so /etc/ansible/hosts/openstack.yaml on bridge is now not used, right? 19:24:11 can i move that to make it clear? 19:24:23 ianw: it is used but it is no longer dynamically generated it should be static content 19:24:32 or did we rename the file too? /me checks 19:24:40 it's not, i assumed we're taking it out of system-config 19:24:59 oh sorry it is the fiel in system-config/inventory that is used not /etc/ansible/hosts 19:25:00 huh, was that ever used? 19:25:11 i got a little confused on it last night, because the emergency file was broken and had stopped puppet 19:25:25 ianw: I believe that file is safe to remove 19:25:26 because we changed a file that looked just like that in system-config 19:25:42 ok, i've moved it so it's clear, i'll note in infra 19:25:44 inventory=/opt/system-config/inventory/openstack.yaml,/opt/system-config/inventory/groups.yaml,/etc/ansible/hosts/emergency.yaml <- doesn't include that file 19:25:56 from ansible.cfg 19:26:15 yah - the file originally was a pointer for ansible, so it would read it, see it was a plugin description file, then run the dymanic inventory 19:26:28 it is now not needed because we point directly to the file in system-config 19:26:55 (yes, and i think that was true even before the change to static inventory) 19:27:05 corvus: I think that is correct 19:27:07 ah - yes, you're probably right about that 19:27:09 ++ i like it when reality matches with what i think is going on :) 19:27:12 ianw: me too 19:27:46 ianw: mordred: for the docker base changes any news there? Do we need to review things? 19:28:31 not quite yet 19:28:31 ianw: oh, i must have broken the emergency disable list when i removed the entry for static.o.o... i guess the disabled list can't be empty? 19:28:41 I'm so behind on that - I need to catch up with ianw on what he's been doing 19:28:55 ianw: sorry about that! 19:29:21 fungi: or maybe needed [] ? ( I don't know what the list looked like) 19:29:46 should probably put a comment in there when we figure it out :) 19:29:57 corvus: ++ 19:29:59 fungi: np, i just got all confused because puppet seemed broken, then it seemed like the static inventory hadn't rolled out, but eventually i saw what was going on :) 19:30:00 ++ 19:30:06 Anything else on this effort? I think the emergency inventory changes sort of took precedence last week so not surprising we moved slowly otherwise 19:30:27 ianw's topic is /sort of/ related to cfg mgmt 19:30:39 clarkb: yeah, [] would likely have solved it 19:30:58 dmsimard: ya but ianw put them under general topics so I figured we'd discuss them there :) 19:31:03 ++ 19:31:17 fungi: happy to try that, i've just commented out the dict keys atm so that you don't have to figure out what to put in 19:31:49 ok lets move on then. 19:31:55 #topic General Topics 19:32:18 ssbarnea|rover: asks if we should run with ops on all our channels (did I read that correctly?) 19:32:31 i have no interest in appearing special in irc at all 19:33:12 Does the "infra-root" ping not work ? 19:33:24 corvus: ya I too want to avoid the impression this is a reward to work towards 19:33:24 clarkb: yeah. just an idea, for visibility. 19:33:32 there are lots of folks well equipped to answer infrastructure questions regardless of whether they have approval rights to certain repos. i don't want to establish the marking of a two-tier system in irc. 19:33:50 ssbarnea|rover: is there a specific visibility concern you have in mind? is it finding people who can help you with IRC, or with infra services or something else? 19:33:52 for example i didn't know about the infra-root, and I think many others do not. 19:34:12 but i also worry a little bit about potential abuse. 19:34:42 clarkb: the only concern was to make it visible who from infra is online. 19:34:48 there's not much that needs an infra-root anyway. 19:34:48 we do have operators, they just don't get the hat by default all the time. So the ops have to escalate privs in order to handle things like abuse 19:35:14 ssbarnea|rover: There's two keywords that are set up to light our irc clients like christmas trees, "infra-root" and "config-core". We already get a lot of pings every day, is it an issue that people do not know where to ask things ? 19:35:39 ssbarnea|rover: https://docs.openstack.org/infra/manual/irc.html#technical-support 19:35:48 these keywards are probably enough 19:36:11 ok, fwiw I also try to skim through scrollback to get a sense for when questions were asked and left unanswered so I can pick those up 19:36:15 not always possible though 19:36:21 I think others (fungi in particular) do similar 19:36:22 being able to see who from the infra root sysadmins is online and active sounds like a risk, not a reward 19:36:27 fungi: the topic in #openstack-infra is already long enough but perhaps we could add that link to it 19:36:53 yeah, looking at the topic ... maybe making it something like "Welcome, we're happy to help, please see ... ^" might be a better topic 19:36:59 i frequently go on "silent running" in irc so i can get other work done without being interrupted 19:37:01 ok, you convinced me. subject closed/ 19:37:07 not sure links to git are that helpful there 19:37:20 to summarize though, those of us with ops would rather not wear the hat all the time and instead maybe we can pouint people to docs more effectively using the channel topic 19:37:30 ianw: ya that does sound more welcoming 19:37:42 especially part-time ops :) 19:37:46 we can brainstorm a replacement after the meeting then one of us ops can apply it 19:37:51 +1 19:37:52 indeed, channel topic could clearly help. 19:38:12 git is super important in the topic 19:38:12 Next item is OpenDev. corvus has configured DNS 19:38:36 we need to let folks know everything we do is open 19:38:45 and inspectable and self-service 19:39:37 i could see collapsing the topic down to one link, and then linking off from there 19:39:48 but the main thing is that whatever we do, that needs to be front and center 19:40:03 corvus: I actually like that idea as it will make the topic fit in my window :) 19:40:22 lets move on to opendev though as we only have 20 minutes left 19:40:31 dns is working 19:40:37 we're hosting a website with no content 19:40:39 I have a todo item to reboot the one opendev dns server that is unhappy with an update 19:40:45 I will do that after the meeting 19:40:53 i think someone who is not me should write some initial content for the website 19:41:08 it goes here: http://git.openstack.org/cgit/openstack-infra/opendev-website/ 19:41:22 #link http://git.openstack.org/cgit/openstack-infra/opendev-website/ The new website needs content 19:41:36 I can take a stab at writing some stuff 19:41:39 should we say which framework we want to use? or up to initial commit? 19:41:47 I'm far less useful at the formatting for browser rendering aspect of it 19:41:53 What's the starting point ? opendevconf's code ? 19:42:03 i rather like our http 401 website ;) 19:42:10 pabelanger: dmsimard: I think we have a clean slate here and using somethign like what zuul has done is probably simplest to start 19:42:27 we don't need complex tooling for it and can splat out a fairly straightforward set of html and css? 19:42:28 clarkb: agree, I think what zuul has done is a fine start 19:42:34 i believe we suggested that having a page that described the effort and current status was desirable 19:42:41 yeah, splatting raw html/css from a git repo onto the fileserver seems plenty fine for now 19:42:43 from a maintenance perspective, it's probably best to keep things similar 19:42:44 corvus: yup for content that was the intent 19:42:58 and we could post updates there so people can track the effort 19:43:13 great idea 19:43:29 Why don't I start with a change with some unformatted content then yall can help with the the rendering aspects 19:44:03 clarkb: thanks; you want to action yourself? :) 19:44:18 #action clarkb Write up opendev website content then seek help for formatting it nicely 19:44:29 we were looking at using gatsby for the zuul website ... so we could do netlify-cms integration 19:44:39 also, if folks want to use a static site generater for a blog thing, that will work fine 19:44:44 I have a todo list item for zuul to get that done - I agree, not blocking on that for this is better 19:44:46 mordred: yes, i think we should move to that when it exists 19:44:51 yes 19:45:05 #info move to gatsy with netlify-cms integration when that exists 19:45:17 mordred: in the mean time, i don't think there are any wrong choices that might prevent us from doing that later 19:45:25 I agree with that 19:45:38 (at least, any static site generator, or just plain static site, should be easy to port into that) 19:45:57 ok anything else opendev before we talk about ara? 19:45:57 I've used hugo for the new ara website -- mostly because that's what I was familiar with. Works well. 19:46:02 that said - getting just gatsby up with clarkb's unformatted text will be easy 19:46:07 next i'm going to try to write up a plan to move gerrit 19:46:21 so consider my hand raised to work with clarkb after there is some content 19:46:29 mordred: thanks 19:46:43 clarkb: ++ 19:46:49 corvus: ++ 19:46:49 and that's [eot] on opendev from me 19:47:18 Collecting inner ara from system-config integration jobs 19:47:25 #link https://review.openstack.org/#/q/topic:inner-ara-results+(status:open+OR+status:merged) 19:47:48 ianw: dmsimard: is this a matter of getting reviews and adding the feature or are there design aspects/cocnerns we should be aware of? 19:47:58 so dmsimard did all the work to install ara on bridge.o.o, but we got a bit stuck on where to save the data and how or if we would export it 19:47:58 dmsimard: I like gatsby because it's react with javascript tooling like the zuul dashboard - but basically just consider it to be a javascript tooling version of hugo - basically all the same idea 19:48:29 ianw: stuck how? 19:49:03 fungi: we can't move forward until we get consensus around what to do with bridge.o.o and ara -- do we re-install on a larger flavor, use a trove instance, a local mysql, etc 19:49:08 fungi: as in, do we do it locally, via trove, to local sqlite, do we export it, does it have sensitive stuff ... 19:49:23 in the meantime, ianw got in-job "nested" ara reports working for system-config 19:49:51 right, that's my proposal, can we move forward with retargetting this for the CI jobs first 19:49:53 ianw: dmsimard: we already said in at least a couple prior meetings that we would use the trove instance for now but plan to make the replacement bridge.o.o large enough to house its own mysql server we can move to using in the future 19:50:09 ya I think there are two things here. The first is updating the CI jobs to have ara for the integration runs. I think we should just go ahead and do that 19:50:21 that will then give us data on the sorts of information that ara will expose on the production server 19:50:34 so that will be a sanity check of whetehr or not its safe to use an exteranlly hosted db for ara 19:51:27 ok, great, well all the changes are out there in that linked topic list, i'd appreciate reviews 19:51:51 ok so to validate my understanding -- 1) enable ara in nested ansible CI 2) enable ara on bridge.o.o pointing to a trove instance 19:52:05 dmsimard: assuming that 1) doesn't reveal any deal breakers for 2) 19:52:06 although for #2, the web application should not be available publicly 19:52:37 right ... is ara that useful without visualisation? 19:52:56 ianw: the idea would be to browse the data over tunnel to audit it first 19:53:07 i.e, passwords without no_log, etc 19:53:32 ya I think we likely want to do a sanity check of the actual production data too once we have it 19:53:43 but then if everything looks good consider making it more public 19:54:08 fwiw, nested results are already available, for example: http://logs.openstack.org/16/617216/20/check/system-config-run-base/0b8074c/bridge.o.o/ara-report/ 19:54:13 to recap, the discussion was in october 19:54:17 #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-10-23-19.01.log.html#l-70 19:54:30 #link http://logs.openstack.org/16/617216/20/check/system-config-run-base/0b8074c/bridge.o.o/ara-report/ example CI ara for system-config integration jobs 19:55:13 fungi: my understanding from back then was that we'd wait to resize bridge.o.o first and then privilege a local mysql installation 19:55:14 fungi: i think my point was mostly that nobody is really driving it, but here is something we can use right now, and would be helpful especially as we do things like incorporate more complex docker stuff into ci jobs 19:55:40 the plan: Update CI jobs for nested ARA, check that ara data set for leaked data, fix any leaks, deploy ara against trove in production but don't expose it publicly. Check this again for any leaks, fix leaks, make public 19:55:58 If we only implement that first step and improve the CI jobs I think that is worthwhile 19:56:02 clarkb: works for me 19:56:12 yep, i think we concluded in october that the risk associated with using an external db to store the ara data was acceptable since the work had already been done to build it 19:56:20 clarkb: it's the rinse and repeat bit i'd be worried about ... it seems like we could very easily leak things into the logs 19:56:36 ianw: as we make new system-config changes you mean? 19:56:57 clarkb: yep, unless someone is constantly auditing it 19:57:22 its possible, though hopefully our experiences lceaning things up show us what not to do? I'm mostly trying to avoid assuming its all bad until we have some examples to look at (and the CI examples are great for that) 19:57:27 we could consider setting up http authentication to the ara web page, unless that defeats the purpose 19:57:46 dmsimard: what the old dashboard gave us was the ability for anyone to check if their thing had worked even if they weren't root 19:58:02 and I think if we set up http auth for this we'd still need to restrict it to roots (since they are the individuals that can see the secrets) 19:58:20 clarkb: yeah, that's what I meant by purpose but I forget the amount of details that puppetboard provided 19:58:29 timecheck: we have two minutes left and a couple other topics. I think we can pick this back up in the infra channel or on the mailing list but I'd like to call out the other topics realyl quickly before we run out of time 19:58:42 we can follow up in -infra 19:58:58 #link https://review.openstack.org/#/q/status:open+topic:fedora29 ianw has work to add network manager support to glean. This work is in an effort to support newer fedora and rhel/centos 8 19:59:13 I've reviewed the stack. They look good to me. Maybe someone else that has glean paged in can also take a look 19:59:25 this is just a request for reviews. it needs reviews, glean releases, staged dib builds etc so i need some help, but everything is there 20:00:05 #link https://review.openstack.org/#/c/617218/4 This updates ansible to 2.7.3 on bridge.o.o to fix the cloud launcher playbook. It has the +2's and just needs to be approved now. I didn't approve it in case there was some reason that others knew to not use newer ansible 20:00:15 ok, one other thing i wanted to call out was 20:00:19 #link http://grafana.openstack.org/d/qzQ_v2oiz/bridge-runtime?orgId=1 20:00:20 I think ianw should likely approve that one when ready if no one objects in the near future :) 20:00:37 ianw: what sends those stats? 20:00:39 which shows runtimes, and actually illustrates pretty well when things are broken (without ara :) 20:00:57 corvus: we have instrumented the scripts that run under cron to send those stats 20:01:13 so if you check on it occasionally, it can be helpful 20:01:38 ianw: can you point me to where those stats are transmitted? 20:02:18 And we are slightly over time. THanks everyone. We'll see you next week 20:02:20 ianw: ah is it here? http://git.openstack.org/cgit/openstack-infra/system-config/tree/run_all.sh#n53 20:02:20 #endmeeting