#airshipit log

14:00:16 <mattmceuen> #startmeeting Airship
14:00:17 <openstack> Meeting started Tue Feb 12 14:00:16 2019 UTC and is due to finish in 60 minutes.  The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:20 <mattmceuen> #topic rollcall
14:00:21 <openstack> The meeting name has been set to 'airship'
14:00:37 <mattmceuen> Here's our agenda for today folks: https://etherpad.openstack.org/p/airship-meeting-2019-02-12
14:01:02 <mattmceuen> Please go ahead and add anything you'd like to discuss as folks trickle in
14:01:52 <roman_g> o/
14:01:56 <georgk> hi
14:01:58 <b-str> o/
14:02:24 <nishant_> o/
14:02:31 <mattmceuen> GM all :)
14:03:00 <mattmceuen> #topic Treasuremap Snapshot
14:03:07 <mattmceuen> https://github.com/openstack/airship-treasuremap/releases
14:03:12 <michael-beaver> o/ Having some connection problems, hopefully don't get cut out in the middle of the meeting
14:03:26 <mattmceuen> good luck michael-beaver
14:03:28 <srwilkers> o/
14:03:45 <mattmceuen> we have a new cut of our approximately-monthly Airship stable release in the airship-treasuremap project
14:04:10 <mattmceuen> this is recommended for new deployments.  Please let us know if any issues are discovered in it; I know issues have been fixed in it
14:05:08 <mattmceuen> #topic Armada/Tiller locking
14:05:21 <mattmceuen> Next -- michael-beaver has been working on https://review.openstack.org/632483
14:05:39 <michael-beaver> Thanks mattmceuen, this has been a change i've been working on for a little while
14:05:43 <mattmceuen> Michael, can you give us an overview of this work, its purpose & status?
14:05:55 <michael-beaver> The locking change is a new feature that was made mainly to keep multiple instances of armada from attempting to make conflicting actions against the cluster at the same time.
14:06:04 <michael-beaver> Although the functionality can be used to lock down any function
14:06:19 <michael-beaver> It uses the K8s API to create a custom resource defining the lock, and continuously updates it as the function being locked is still running.
14:06:54 <michael-beaver> Currently I am waiting on reviews for it as I believe it is pretty close to being finished
14:07:04 <mattmceuen> To put you on the spot - do you happen to have an example document for the CRD handy? :)
14:07:59 <michael-beaver> Haha, sorry I don't have my VM running at the moment, but at the moment it is pretty basic, it just has a field to keep track of when it was last updated
14:08:08 <michael-beaver> Although I tried to design it so if we wanted to put more data there
14:08:16 <mattmceuen> No worries, maybe you can share it in the channel later?
14:08:26 <michael-beaver> It could easily be used to store metadata about what is currently holding the lock
14:08:32 <michael-beaver> Sure, I'll try to pull one up in the mean time
14:09:01 <nishant_> I was going through the PS yesterday Michael,  will continue to review it today.  good work  on it. Would love to see any documentation added for this feature. :)
14:09:33 <michael-beaver> Thanks nishant_
14:09:58 <mattmceuen> I think another good piece of doc would be, in the commit message too, why do we want multiple Armadas in the cluster!
14:10:41 <mattmceuen> I think the main reason is that the single armada pod is a single point of failure, but that's not explicitly stated anywhere, and might not be obvious to someone coming in w/o the background
14:10:57 <michael-beaver> Sure mattmceuen, I can work on wording up something like that
14:11:24 <mattmceuen> Really nice work michael-beaver -- thanks for the work on this, very cool!
14:11:37 <mattmceuen> any other questions for michael-beaver on this, team?
14:12:01 <nishant_> Yes I think it was also developed from security point of view. So would be good to have it mentioned in the commit message.
14:12:43 <michael-beaver> That's a good point, the other main idea of it was so we could have tiller live in the armada pod so it does not communicate over insecure channels
14:13:18 <mattmceuen> michael-beaver yup
14:14:16 <michael-beaver> Any other questions or comments?
14:14:35 <levmorgan> Hello, I'm here. Matt, do you still want me to go over the additions to Pegleg?
14:14:44 <mattmceuen> Hey levmorgan, perfect timing :)
14:14:48 <mattmceuen> #topic Recent Pegleg Enhancements
14:15:14 <mattmceuen> Yup, you and some other folks have been doing several interlocking pieces of work in Pegleg recently, would you ming giving us a bit of overview & status?
14:15:52 <levmorgan> Hey, great! So, there have been three main additions to Pegleg. Certificate generation, passphrase generation, and genesis bundle generation (which is still in review).
14:16:44 <levmorgan> Certificate generation uses cfssl to generate certs and keypairs, which are self-signed encrypted by default.
14:18:27 <levmorgan> Passphrase generation has two parts. It can generate a master passphrase, which can be used to encrypt documents, or it can generate passphrase documents as specified in a passphrase catalog.
14:19:16 <mattmceuen> A primary use case for those certs is for Kubernetes components & etcd -- currently those certs are generated by Promenade, but it makes sense to move that functionality into Pegleg as the external yaml management CLI
14:19:53 <levmorgan> Yep. As for genesis bundle generation, we've pulled in the functionality from Promenade as well.
14:19:54 <mattmceuen> and for the passphrases, a primary use case is for service passwords e.g. for databases and openstack service users
14:20:10 <levmorgan> Yep, exactly.
14:20:47 <mattmceuen> That's awesome levmorgan - nice work and looking forward to using
14:21:09 <mattmceuen> If folks want some more info on Pegleg secret management, more deets can be found in this spec: https://github.com/openstack/airship-specs/blob/master/specs/approved/pegleg-secrets.rst
14:21:58 <mattmceuen> The primary use case for the certificate generation is for "cluster internal" certificates, but for a lab, pegleg could generate the externally-accessible ingress certificates as well, right levmorgan?
14:22:06 <mattmceuen> with all the caveats that come with using self-signed certificates
14:23:18 <georgk> mattmceuen: if I may hook in here
14:23:23 <levmorgan> Yep
14:23:28 <mattmceuen> Basically, the pegleg passphrase catalog doc and pki catalog doc are general-purpose descriptions that tell pegleg whatever secrets you want it to generate (and encrypt), and after they're generated, they can be added to your SCM repo
14:23:31 <mattmceuen> along with other yaml
14:23:35 <mattmceuen> Go for it georgk
14:24:00 <georgk> well, regarding self-signing. our folks are trying to deploy airship and are struggling with the certs
14:24:09 <mattmceuen> what kind of issues are you seeing?
14:24:52 <georgk> currently, teh deployment fails when b ringing up barbican becuase of an SSL handshake mismatch
14:25:02 <georgk> we assume it is due to incorrect certs
14:25:31 <georgk> so, we are wondering if we can work with self-signed certs or need a full ca chain
14:25:47 <georgk> is there are ¨lab¨ switch?
14:26:14 <mattmceuen> hmm that's odd - barbican is not something we expose externally in airship so it seems like it might be a problem with the internal certificates that are generated -- today by promenade, soon by pegleg
14:26:29 <mattmceuen> self-signed certs should be fine for that
14:27:09 <mattmceuen> are you using the promenade pkicatalog to generate the certs -- I believe this is how airship-seaworthy is set up today, or are you substituting in custom certificates that you generate out-of-band?
14:27:53 <georgk> after running into this issue, we have started to generate certs out-of-band
14:28:09 <georgk> https://hastebin.com/iriloweqif.sql
14:29:52 <mattmceuen> Here's a bit of documentation on the normal certificate generation process that would be done as part of site definition: https://airship-treasuremap.readthedocs.io/en/latest/authoring_and_deployment.html#building-site-documents
14:29:53 <georgk> I dont want to highjack the entire meeting - we can follow this up later
14:30:31 <mattmceuen> Cool - thanks for the heads up on the issue, definitely looks cert-related but I don't see any smoking gun.  We'll need to dig in a little more post-meeting I think
14:31:06 <georgk> ok, catch you later
14:31:18 <mattmceuen> #topic New Shipyard pod structure
14:31:29 <mattmceuen> b-str this is you -- enlighten us!
14:31:33 <b-str> Shipyard's k8s pod configuration has changed to combine several items and eliminte a few others.
14:31:41 <b-str> Previously, The following pods existed:
14:31:42 <b-str> shipyard-api (1 container, Shipyard api)
14:31:42 <b-str> airflow-web (1 container, Airflow api and gui)
14:31:42 <b-str> airflow-scheduler (1 container, Airflow celery instance)
14:31:42 <b-str> airflow-flower (1 container, Airflow celery flower instance)
14:31:42 <b-str> airflow-worker (StatefulSet, 2 containers, Airflow worker; logrotate container)
14:32:07 <b-str> The first issue being resolved was that the airflow-scheduler and airflow-worker are both required to keep a workflow running.
14:32:16 <b-str> If the scheduler was updated, the workers would no longer receive new steps to process until the scheduler was running again.
14:32:35 <b-str> Worse, if the scheduler was updated such that it was incompatible with the version of Airflow running in the worker, everything was effectively stopped.
14:32:48 <b-str> A manual deletion of the Workers would rectify the situation, but the running workflow failed.
14:33:04 <b-str> Invocation of an update_site or update_software action in Shipyard deletes/recreates the Airflow Worker after the workflow process has concluded, but this did not occur if the scheduler and worker were out of sync.
14:33:30 <b-str> The second issue was a question of need of exposure of Airflow outside the deployment of Airship, requiring further configuration of the GUI components.
14:33:48 <b-str> Since Shipyard provides the entrypoint for Airship, it was decided to eliminate this exposure of the Airflow GUI components.
14:34:08 <b-str> The new pod configuration is:
14:34:08 <b-str> shipyard-api (2 containers, Shipyard api; Airflow api and gui)
14:34:08 <b-str> airflow-worker (StatefulSet, 3 containers, Airflow worker; logrotate container, Airflow Scheduler)
14:34:27 <b-str> Such that:
14:34:44 <b-str> - The airflow api and gui container is bound to localhost, such that it is only accessible to Shipyard's api
14:35:01 <b-str> - The airflow worker and scheduler are now in the same Statefulset, so they share the same lifecycle.
14:35:10 <b-str> - The airflow-flower pod has been removed from deployment with Shipyard, as it was superfluous to the deployment.
14:35:20 <b-str> - An airflow-scheduler pod is left in service for the upgrade process, and is removable after being on the new version.
14:35:38 <b-str> Finally, another somewhat related change is to the way the Airflow image is built.
14:35:48 <b-str> It now builds the workflows from Shipyard directly into the image, as opposed to transferring them in at deployment time.
14:36:03 <b-str> Commits related to Shipyard and Airflow pod restructuring:
14:36:03 <b-str> https://github.com/openstack/airship-shipyard/commit/6b75c7119a850ee7f0c0651021076dfcf220e0f7 - Move airflow scheduler to worker statefulset
14:36:03 <b-str> https://github.com/openstack/airship-shipyard/commit/9725b0f337ebdc7523973b50afecbf4caf2b7e6c - Build workflows into Airflow image
14:36:03 <b-str> https://github.com/openstack/airship-shipyard/commit/a11e962eef5a5aa8f8fc15c4a324dfa6b2465061 - Move Airflow web container into Shipyard pod
14:36:16 <b-str> Questions?
14:37:02 <mattmceuen> " An airflow-scheduler pod is left in service for the upgrade process" -- that's the one that packs the scheduler and worker into the same pod, right?
14:37:32 <b-str> actually this is the old scheduler configuration, left in place to be the scheduler during the first upgrade
14:38:09 <b-str> after the end of the workflow, the statefulsets will be deleted, and will contain the new scheduler going forward
14:38:18 <mattmceuen> gotcha
14:38:29 <b-str> a follow on action would be to scale the independent scheduler to 0
14:38:52 <b-str> and/or otherwise eliminate it
14:40:30 <mattmceuen> so at some point in the future (maybe in the stable airship version that follows that follows the one that includes your change) we'll take out the old scheduler, since it serves to bridge the gap between the old scheme and the new scheme - right?
14:40:47 * b-str yes, that's its only purpose
14:41:39 <mattmceuen> awesome - THANKS for all this explanation and detail, I'll make sure a note gets sent out on the ML as this is quite valuable
14:42:07 <mattmceuen> Upgrades are hard, you make them easy for the rest of us :)
14:42:19 <b-str> This has been running in Treasuremap for a few days/weeks, so it can be observed there.
14:42:44 <mattmceuen> Any other questions for b-str on this topic?
14:43:13 <mattmceuen> Thanks Bryan
14:43:17 <mattmceuen> #topic Weekly Meeting Time
14:43:41 <mattmceuen> Several people have approached me over the last few weeks asking if it would be possible to adjust our weekly meeting time
14:44:02 <mattmceuen> Not all of them are here (there is a correlation) but I wanted to gauge the general temperature for meeting times here
14:44:04 * b-str earlier?
14:44:14 * mattmceuen b-str :-|
14:44:58 <mattmceuen> Would folks be amenable to this?  Maybe a couple hours later, maybe several hours later?  Or prefer a different day?  Or... ?
14:45:11 <mattmceuen> I'll send something out on the ML, just looking for opinions
14:45:16 <michael-beaver> I wouldn't mind it being later in the day
14:45:23 <levmorgan> Sure, later works for me.
14:46:22 <mattmceuen> Any concerns with that?  We're spread across time zones and I don't want to make it unduly hard for anyone who wants to come regularly, if possible
14:47:01 <mattmceuen> I'll propose some candidate times on the ML and see how it goes over
14:47:06 <sthussey> Anytime is fine with me as long as it is within business hours
14:47:35 <mattmceuen> Cool beans - more to come then
14:47:53 <mattmceuen> #topic Host next week
14:48:18 <mattmceuen> I'll be in a meeting during this time next week; I hope to join and multitask, but dwalt has volunteered to drive discussion
14:48:34 <mattmceuen> #topic State of Airship governance docs
14:48:54 <mattmceuen> ty for bringing this up georgk, this is something we need to get on top of4
14:49:30 <mattmceuen> alanmeadows is planning to drive this in advance of the Denver summit, so we have something to discuss there
14:49:58 <georgk> ok, thanks
14:50:24 <georgk> I was digging through a few etherpads, but wasnt sure which one reflects the latest state
14:50:24 <mattmceuen> I don't think he's here today, but I'll circle back with him to make sure his proposal is written down and shared soon
14:50:39 <mattmceuen> the joys of etherpads :)
14:50:42 <georgk> yeah
14:50:50 <georgk> thanks
14:51:25 <roman_g> that's not what we have in airshipit readthedocs, right?
14:51:39 <roman_g> airshipit.readthedocs.io
14:52:06 <mattmceuen> I think what georgk is talking about is more a proposal for formal governance.  Let me elaborate:
14:52:51 <mattmceuen> Airship is one of the OSF pilot projects, and one of the things it needs to develop and get in order prior to becoming a full-fledged project is documentation of its governance -- its formal operating principles
14:53:38 <mattmceuen> Things like following the Four Opens, the structure around cores and their responsibilities, the way that the project is steered, that kind of thing
14:53:53 <roman_g> got it, thanks
14:53:58 <mattmceuen> It's the kind of thing that we should get ideas / draft out for discussion way in advance
14:54:00 <georgk> right
14:54:32 <mattmceuen> #topic Requests for Review
14:54:41 <mattmceuen> A few PS that nishant_ added:
14:54:45 <mattmceuen> Helm-toolkit for DB initialization
14:54:45 <mattmceuen> https://review.openstack.org/#/c/634625/
14:54:45 <mattmceuen> https://review.openstack.org/#/c/635348/
14:54:45 <mattmceuen> https://review.openstack.org/#/c/634981/
14:55:24 <mattmceuen> I take it those are helm-toolkit functions for making postgres operations easier across airship (and anyone else who uses the osh postgres chart) right nishant_?
14:55:24 <nishant_> So i have been working on multiple Airship components to validate if password rotation works fine.
14:55:53 <nishant_> During validation I found Airship comoponents using postgre needs some modification for password rotation to work effectively
14:56:20 <nishant_> and with some discussion i was advised to in turn create a helm-toolkit for DB initialization
14:56:51 <nishant_> which makes it easier for components to operate postgres operations easily
14:57:17 <nishant_> so you are right mattmceuen
14:57:24 <mattmceuen> awesome - thanks for making your work reusable, effective password rotation is critical to real life ops
14:57:36 <mattmceuen> reviews please :D
14:57:41 <nishant_> I am looking for any feedback and reviews on it. Thanks :)
14:57:54 <levmorgan> I have a couple as well:
14:58:00 <levmorgan> https://review.openstack.org/#/c/634821/ and https://review.openstack.org/#/c/613392/
14:58:02 <mattmceuen> go for it levmorgan
14:58:35 <levmorgan> The first one is an improvement for passphrase generation, the second one is for genesis bundle generation.
14:58:48 <levmorgan> Thanks!
14:59:36 <mattmceuen> great - I've added them to the agenda for anyone who couldn't make it
14:59:44 <mattmceuen> with that I think we're out of time!
14:59:57 <mattmceuen> Thanks everyone for coming, appreciate your discussion
15:00:02 <mattmceuen> Have a great day & week all
15:00:12 <mattmceuen> #endmeeting