#openstack-meeting log

19:01:10 <clarkb> #startmeeting infra
19:01:11 <openstack> Meeting started Tue Oct 23 19:01:10 2018 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:15 <openstack> The meeting name has been set to 'infra'
19:01:20 <ianw> o.
19:01:21 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:01:23 <ianw> o/
19:01:30 <clarkb> #topic Announcements
19:01:50 <clarkb> Just another friendly reminder that the summit and forum fast appraoch and you may wish to double check the schedule
19:01:54 <clarkb> #link https://www.openstack.org/summit/berlin-2018/summit-schedule/#track=262 Berlin Forum schedule up for feedback
19:02:00 <fungi> so fast
19:02:20 <clarkb> I think it is fairly solid at this point so mostly just a reminder to go look at the schedule if you haven't arleady. Unsure if changes can be made easily now
19:02:38 <clarkb> fungi: There is rumor that food awaits me after the meeting. Good motivation :)
19:03:01 <fungi> oh, i was simply commenting on the fast approach of the summit and forum
19:03:06 <clarkb> oh that :)
19:03:07 <fungi> but yes, fast food too
19:03:13 <clarkb> indeed it is like 3 weeks away
19:03:49 <clarkb> I'll be visiting the dentist tomorrow midday, but don't expect any drilling or drugs to happen so should be around. Otherwise not seen any announcements
19:04:03 <clarkb> #topic Actions from last meeting
19:04:11 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-10-16-19.01.txt minutes from last meeting
19:04:32 <clarkb> No explicit actions called out but we did some work on things we talked about (which we'll get to later in the agenda)
19:05:30 <clarkb> #topic Specs approval
19:05:57 <clarkb> Any specs we should be aware of? I think we are mostly head down on implementing work that is already captured by specs or general maintenance
19:06:09 <clarkb> I guess the storyboard spec is still outstanding. And I keep meaning to read it
19:06:23 <fungi> you and a bunch of us
19:06:27 <fungi> well, at least me anyway
19:06:52 <clarkb> #link https://review.openstack.org/#/c/607377/ Storyboard attachments spec. Not ready, but input would be helpful
19:07:13 <clarkb> frickler has review comments. diablo_rojo not sure if you've seen those
19:08:10 <clarkb> #topic Priority Efforts
19:08:16 <clarkb> #topic Storyboard
19:08:23 <clarkb> That makes a good transition to storyboard topic
19:08:51 <diablo_rojo> clarkb, I saw them, was waiting for some more before I did an update
19:09:04 <diablo_rojo> just poked SotK for comments
19:09:06 <clarkb> diablo_rojo: ok, I'll try to take a look at it today after the nodepool builder work
19:09:14 <diablo_rojo> clarkb, that would be suuuper helpful
19:09:49 <diablo_rojo> fungi, too if you have some free time in between the house stuff.
19:09:52 <clarkb> mordred: not sure if you are here, but I think you started looking at using pbrx with storyboard to work on server upgrades?
19:09:54 <fungi> yep!
19:10:15 * clarkb will try to be a stand in for mordred
19:10:40 <clarkb> mordred volunteered to work on the storyboard trusty server upgrades and seems to have taken the appraoch of using storaybord(-dev) as an early container deployment system
19:10:59 <diablo_rojo> Seems like a good approach to me
19:11:01 <clarkb> The upside to this is as a python + js application it is actually fairly similar to zuul which means pbrx should work for it
19:11:05 <fungi> it's a fairly self-contained service so might be a good test case
19:11:30 <clarkb> I want to say the early work has been in udpating bindep.txt type files to have pbrx produce working container images for storyboard
19:11:52 <clarkb> if you are interested in this aspect of the config mgmt updates this is probably a good place to follow along and/or participate
19:11:57 <fungi> the api server is straightforward and the webclient is just a wad of files that need to be splatted somewhere a webserver can find
19:12:03 <clarkb> ianw: ^ you may have thoughts in particular since you've looked at similar with graphite
19:12:40 <fungi> though there's also the need to have rabbitmq running, and also the worker process
19:12:48 <fungi> so i guess not completely straightforward
19:13:06 <clarkb> fungi: it should be a good real world application though as it isn't trivial
19:13:17 <clarkb> while still tying into the toolin we've already built
19:13:19 <ianw> clarkb: yeah, i need to get back to the actual "get docker on control plane server bit", some prelim reviews out there that didn't quite work
19:14:13 <clarkb> diablo_rojo: fungi any other storyboard topics to bring up before we move on to the config mgmt udpates?
19:14:28 <fungi> nah
19:14:28 <diablo_rojo> I don't think so.
19:14:34 <fungi> thanks!
19:14:41 <diablo_rojo> Thank you :)
19:14:43 <clarkb> #topic Config Mgmt Update
19:15:14 <clarkb> As mentioned in the context of storyboard the work of updating our config management processes continues to happen
19:16:05 <fungi> i guess there are a couple base classes of work going on there? 1. contanierizing things, and 2. automatic deployment/replacement of containers?
19:16:07 <clarkb> #link https://review.openstack.org/#/c/604925/ add zuul user to bridge.o.o for CD activities could use a second review
19:16:39 <corvus> also https://review.openstack.org/609556
19:16:45 <clarkb> fungi: ya part of the spec is the idea that we'll build images regularly to pick up updates and to avoid being stuck on insecure software unexpectedly
19:16:50 <corvus> i guess i should have used the topic on that
19:16:59 <clarkb> fungi: to make that useful we also need to update the running deployments with the newi mages
19:17:19 <clarkb> #link https://review.openstack.org/#/c/609556/ Install ansible 2.7.0 release on bridge.openstack.org
19:18:03 <clarkb> mostly I think the work here needs a few reviews. The topic:puppet-4 stuff is largely blocked on reviews as well
19:18:24 <clarkb> if we can find time to take a look at topic:puppet-4 and topic:update-cfg-mgmt we should be able to make some pretty big progress over the short term
19:18:42 <dmsimard> FWIW I sent patches to see what it would look like to enable ara on bridge.o.o, would love feedback https://review.openstack.org/#/q/topic:ara-on-bridge
19:19:11 <clarkb> #link https://review.openstack.org/#/q/topic:ara-on-bridge changes to run ara on the bridge server to visualize ansible runs there
19:19:37 <corvus> should we change the topic to update-cfg-mgmt?
19:19:49 <corvus> i think ara is in scope for that
19:19:52 <clarkb> corvus: dmsimard ++ I think this is an aid for that spec and could use the topic
19:19:58 <dmsimard> sure
19:20:35 <dmsimard> updated
19:21:44 <clarkb> Other than "please review as you can" any other items for this topic?
19:22:06 <corvus> re ara --
19:22:19 <corvus> do we need a mysql db for that or are we going to stick with sqlite for now?
19:22:35 <dmsimard> sqlite works well in the context of executors, each job is running on it's own database
19:22:41 <dmsimard> the problem with sqlite is write concurrency
19:22:41 <fungi> sounded like there was a concurrent write issue with sqlite?
19:22:46 <fungi> yeah, that
19:23:03 <corvus> so we should have an infra-root set up a mysql db before we land those ara changes?
19:23:13 * mordred waves
19:23:28 <fungi> do we want trove or just orchestrate a mysql service onto the bridge server?
19:23:29 <clarkb> or add that to the changes and run one locally?
19:23:45 <dmsimard> corvus: the legwork is done in hostvars on bridge.o.o, pabelanger helped
19:24:12 <clarkb> dmsimard: meaning there is already a mysql db running somewhere?
19:24:24 <dmsimard> yes, a trove instance
19:25:40 <corvus> dmsimard: did he provide the connection info so we can update that change to use it?
19:25:41 <dmsimard> We can definitely start with sqlite, I'm just not sure how things might behave with concurrency
19:25:55 <corvus> let's not start with sqlite, let's use the trove which apparently exists :)
19:26:05 <clarkb> and would be helpful if we remember to #status log changes like that that happen outside of config mgmt
19:26:16 <dmsimard> clarkb: my default
19:26:19 <dmsimard> fault*
19:26:44 <clarkb> corvus: ++
19:26:52 <dmsimard> corvus: the connection information is secret though, so it needs to live on bridge.o.o ?
19:27:20 <dmsimard> I guess we could add a node to the job, install mysql on it and use that for the testinfra tests
19:27:23 <corvus> dmsimard: only the password i think; paul should have added that to private hiera and given you the key he put it under so the change can reference thta key
19:27:46 <corvus> i think sqlite is ok for testinfra
19:27:55 <fungi> if it's a trove instance, we usually keep the hostname secret too
19:28:05 <corvus> ok 2 things then :)
19:28:14 <fungi> since there is no connection security there
19:28:16 <dmsimard> corvus: I'm the one who committed the changes, they're in hostvars for bridge.o.o as well as ara.o.o
19:28:30 <ianw> is there some potential for keys/passwords to make it into this?  do we care?
19:28:43 <fungi> any machine for any tenant in that same region can open sockets to the trove mysql service for it, after all
19:28:56 <dmsimard> pabelanger helped me through the process of doing that, he didn't do it for me, sorry for the confusion
19:28:59 <clarkb> ianw: that was going to be my next question. Do we need to use no_log type attributes on tasks to avoid data exposure?
19:29:00 <corvus> dmsimard: oh i thought you said paul did it.  ok then :)
19:29:21 <clarkb> maybe to start we should run it locally only and we have to ssh port forward into bridge to see the web server?
19:29:27 <clarkb> to get a feel for what data is exposed
19:29:33 <corvus> clarkb: wfm
19:29:34 <dmsimard> clarkb: yes
19:29:58 <dmsimard> ianw: and yes, data can be exposed
19:30:02 <fungi> sounds fine
19:30:12 <fungi> (not publishing the ara reports i mean)
19:30:25 <clarkb> ok lets start with localhost only webserver to see what we are exposing, remediate it and learn from there then maybe make it public
19:30:47 <corvus> and go ahead and have bridge report to mysql
19:30:47 <clarkb> (goal should be making it open in some capacity though)
19:30:52 <clarkb> corvus: yup
19:31:07 <ianw> oh, i wasn't really thinking publishing the website, i was more back at "data at rest not on bridge.o.o" tbh
19:31:17 <fungi> yeah, eventually it would be nice to get back what we lost with puppetboard being abandoned
19:31:20 <dmsimard> clarkb: the original spec was to replace puppetboard
19:31:24 <corvus> dmsimard: thanks for this, it's going to help a lot :)
19:32:03 <clarkb> ianw: ah the actual db contents themselves
19:32:20 <fungi> that's also a great point
19:32:38 <dmsimard> are the trove instances on the public internet ?
19:32:43 <clarkb> dmsimard: no
19:32:47 <fungi> possible we're compromising the security of bridge.o.o by putting the database somewhere not locally on that server
19:32:52 <clarkb> they are on the rax private network
19:32:56 <fungi> dmsimard: they might as well be though
19:33:10 <clarkb> which is "private"
19:33:12 <fungi> they're on the shared "private" network which all rackspace tenants can access
19:33:49 <dmsimard> if this is a concern, putting mysql on bridge.o.o is another option which would also happen to reduce latency and improve performance (i.e, no roundtrip to put stuff in a remote database far away)
19:33:49 <fungi> so you need access to a rackspace account/vm and to know (or guess) the database address and credentials
19:34:02 <fungi> or a pre-authentication zero-day vulnerability in mysql
19:34:14 <dmsimard> but then we might as well set up the web application there as well (instead of a dedicated ara.o.o server)
19:34:18 <ianw> yeah, also possibly the communications aren't encrypted?
19:34:19 <corvus> bridge is not sized correctly to even run ansible; there's no way we can add mysql there without a rebuild.
19:35:15 <fungi> ianw: no possible about it. the database connections aren't encrypted, so someone with the ability to subvert that network could also sniff them maybe? i can't remember whether they're sent in the clear or as a challenge/response
19:35:47 <clarkb> another option would be to run a mysql off host ourselves
19:35:53 <clarkb> then add encryption and such
19:35:57 <fungi> but also possible to inject into an established socket with some effort i guess
19:36:03 <clarkb> then we don't need to rebuild bridge.o.o for this
19:36:34 <dmsimard> Need to drop momentarily to pick up kids
19:36:38 <corvus> meh, we need to rebuild bridge anyway, so if we want to stick with on-host, i wouldn't count that as a strike against it
19:36:39 <clarkb> also possible trove has grown the feature to allow us to use tls/ssl for connections
19:36:44 <clarkb> corvus: fair neough
19:36:52 <fungi> maybe we use trove initially with the understanding that when (not if) we resize bridge (because it's already not appropriately-sized for ansible anyway) we add enough capacity to also put mysql on it?
19:36:58 <fungi> or also what corvus said
19:37:03 <clarkb> why don't we pick this back up in #openstack-infra when dmsimard can rejoin? and we can continue with the meeting agenda?
19:37:06 <ianw> it may also be that we consider it low risk enough that we are leaking in the first place, it doesn't matter
19:37:13 <ianw> or what fungi said :) ++
19:37:34 <clarkb> I do think being careful around this is worthwhile while we work to understand what we are exposing
19:37:36 <corvus> yeah, we're not *supposed* to be putting private stuff in this db, the question is what happens if we accidentally do
19:37:54 <fungi> we're all just repeating one another at this point anyway, so yeah we can pick it back up in #-infra later ;)
19:38:13 <corvus> maybe we should pick it back up in #-infra later
19:38:19 <fungi> heh
19:38:21 <clarkb> #topic General topics
19:38:33 <clarkb> I'll take that as a cue
19:39:12 <clarkb> OpenDev is a thing and we now have some concrete early task items planned out. Step 0 is getting dns servers running for opendev.org (thank your corvus for getting this going)
19:39:35 <clarkb> before we can start using opendev.org everywhere we need to communicate what we mean by opendev and what the goals we have are
19:40:06 <clarkb> #link https://etherpad.openstack.org/p/infra-to-opendev-messaging is the document we started working with yesterday between the infra team and the foundation staff to try and capture some of that
19:40:43 <clarkb> I will be working with the foundation staff to write up the short message with a Q&A and use that to send a community wide email type thing. There is also plan to talk about this at the summit at least at the board meeting
19:41:10 <clarkb> My hunch is once we get through the summit we'll be able to start concretely using opendev and maybe etherpad becomes etherpad.opendev.org the week after summit
19:41:35 <clarkb> Still a lot of details to sort out, I expect we'll be reviewing documents in the near future to make sure they make sense from our perspective
19:41:58 <clarkb> and for anyone wondering the current working one liner is: "Community hosted tools and infrastructure for developing and maintaining open source software."
19:42:17 <clarkb> questions, concerns about ^?
19:43:38 <fungi> i like the specificity of "free/libre open source software" but as a first go it's not bad
19:44:09 <fungi> "community hosted and operated" might also be a good expansion
19:44:16 <clarkb> Feel free to reach out to me directly as well if this venue doesn't work for you
19:44:24 <clarkb> I'm reachable via email and irc
19:44:32 <fungi> something to make it more obvious we're running these things as a community, not just hosting them for communities
19:44:34 <clarkb> fungi: you should add those thoughts to the etherpad :)
19:44:36 <fungi> yup
19:44:46 <corvus> ++
19:45:38 <clarkb> Next item is the trusty upgrade sprint. Last week ended up being a wierd week for this as various individuals had less availabiltiy then initially anticipated
19:46:08 <clarkb> That said I did manage to upgrade logstash.o.o, etherpad-dev.o.o, and etherpad.o.o and am working with shrews this week to delete nodepool.o.o and use our newer zookeeper cluster instead
19:46:17 <clarkb> Thank you to all of you that helped me with random reviews to make that possible
19:46:45 <clarkb> I'd like to keep pushing on this as a persistent thing because trusty EOL is about 6 months away
19:46:59 <clarkb> if we are able to do a handful of servers a week we should be done relatively quickly
19:47:26 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup list of trusty servers that need to be upgraded. Please put your name next to those you can help with
19:48:17 <clarkb> That brings up the last thing I wanted to talk about which is the zookeeper cluster move
19:48:42 <clarkb> I got the zookeeper cluster up and running as a cluster yesterday. Shrews and I will be moving the nodepool builders to the cluster after this meeting
19:48:58 <clarkb> Those will then run for a day and a half or so to populate images on the new db
19:49:27 <clarkb> Then on thursday I'd like us to update the zookeeper config on the nodepool launchers and zuul scheduler to use the new cluster
19:49:47 <clarkb> We will likely implement this as a full cluster shutdown (may as well get everything running latest code, and maybe zuul can make a release based on that)
19:50:24 <clarkb> hrm apparently thursday is the stein-1 milestone day though :/
19:50:26 <fungi> sounds great, thanks for working on that
19:50:37 <clarkb> we might have to wait for the milestone stuff to happen first?
19:50:43 <clarkb> I'll talk to release team after htis meeting
19:50:53 <fungi> milestone 1 is usually a non-event, especially now that openstack is no longer tagging milestones
19:51:03 <clarkb> oh in that case ya should be fine. But I will double check
19:51:29 <fungi> though they may force releases for cycle-with-intermediary projects which haven't released prior to the milestone week
19:51:30 <clarkb> Once all that is done we'll want to go back through and cleanup any images and nodepool nodes that are not automatically cleaned up
19:52:10 <clarkb> I'm volunteering myself to drive and do a lot of that because I do think this is an important switch and early cycle is the time to do it
19:52:20 <clarkb> you are now all warned and if you want to follow along or help let me know :)
19:52:25 <clarkb> #topic Open Discussion
19:52:34 <clarkb> And now ~7 minutes for anything else
19:52:45 <Shrews> we need to get them cleaned up since the image numbers will reset (and thus avoid name collisions), but we should have plenty of leeway on doing that
19:53:51 <clarkb> thats a good point, but ya ~1 year of nodepool with zk should give us a big runway for that
19:54:11 <Shrews> smallest sequence number i saw was 300 some, but still plenty
19:54:24 <corvus> we will definitely.  almost certainly.  probably.  get it all cleaned up in a year.
19:54:44 <fungi> i like your optimism!
19:56:44 <clarkb> if anyone is wondering zk has weird way of determining which ip address to listen to
19:57:03 <clarkb> you configure a list of all the cluster members and for the entry that belongs to the current host it binds to the address in that entry
19:57:23 <clarkb> so if you use hostnames that resolve to localhost because /etc/hosts then you only listen to localhost and can't talk to your cluster
19:58:15 <clarkb> looks/sounds like we are basically done here. Thank you everyone. Find us on the openstack-infra mailing list or in #openstack-infra for further discussion
19:58:23 <clarkb> I'll go ahead and end the meeting now
19:58:25 <clarkb> #endmeeting