14:00:16 #startmeeting Airship 14:00:17 Meeting started Tue Feb 12 14:00:16 2019 UTC and is due to finish in 60 minutes. The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:20 #topic rollcall 14:00:21 The meeting name has been set to 'airship' 14:00:37 Here's our agenda for today folks: https://etherpad.openstack.org/p/airship-meeting-2019-02-12 14:01:02 Please go ahead and add anything you'd like to discuss as folks trickle in 14:01:52 o/ 14:01:56 hi 14:01:58 o/ 14:02:24 o/ 14:02:31 GM all :) 14:03:00 #topic Treasuremap Snapshot 14:03:07 https://github.com/openstack/airship-treasuremap/releases 14:03:12 o/ Having some connection problems, hopefully don't get cut out in the middle of the meeting 14:03:26 good luck michael-beaver 14:03:28 o/ 14:03:45 we have a new cut of our approximately-monthly Airship stable release in the airship-treasuremap project 14:04:10 this is recommended for new deployments. Please let us know if any issues are discovered in it; I know issues have been fixed in it 14:05:08 #topic Armada/Tiller locking 14:05:21 Next -- michael-beaver has been working on https://review.openstack.org/632483 14:05:39 Thanks mattmceuen, this has been a change i've been working on for a little while 14:05:43 Michael, can you give us an overview of this work, its purpose & status? 14:05:55 The locking change is a new feature that was made mainly to keep multiple instances of armada from attempting to make conflicting actions against the cluster at the same time. 14:06:04 Although the functionality can be used to lock down any function 14:06:19 It uses the K8s API to create a custom resource defining the lock, and continuously updates it as the function being locked is still running. 14:06:54 Currently I am waiting on reviews for it as I believe it is pretty close to being finished 14:07:04 To put you on the spot - do you happen to have an example document for the CRD handy? :) 14:07:59 Haha, sorry I don't have my VM running at the moment, but at the moment it is pretty basic, it just has a field to keep track of when it was last updated 14:08:08 Although I tried to design it so if we wanted to put more data there 14:08:16 No worries, maybe you can share it in the channel later? 14:08:26 It could easily be used to store metadata about what is currently holding the lock 14:08:32 Sure, I'll try to pull one up in the mean time 14:09:01 I was going through the PS yesterday Michael, will continue to review it today. good work on it. Would love to see any documentation added for this feature. :) 14:09:33 Thanks nishant_ 14:09:58 I think another good piece of doc would be, in the commit message too, why do we want multiple Armadas in the cluster! 14:10:41 I think the main reason is that the single armada pod is a single point of failure, but that's not explicitly stated anywhere, and might not be obvious to someone coming in w/o the background 14:10:57 Sure mattmceuen, I can work on wording up something like that 14:11:24 Really nice work michael-beaver -- thanks for the work on this, very cool! 14:11:37 any other questions for michael-beaver on this, team? 14:12:01 Yes I think it was also developed from security point of view. So would be good to have it mentioned in the commit message. 14:12:43 That's a good point, the other main idea of it was so we could have tiller live in the armada pod so it does not communicate over insecure channels 14:13:18 michael-beaver yup 14:14:16 Any other questions or comments? 14:14:35 Hello, I'm here. Matt, do you still want me to go over the additions to Pegleg? 14:14:44 Hey levmorgan, perfect timing :) 14:14:48 #topic Recent Pegleg Enhancements 14:15:14 Yup, you and some other folks have been doing several interlocking pieces of work in Pegleg recently, would you ming giving us a bit of overview & status? 14:15:52 Hey, great! So, there have been three main additions to Pegleg. Certificate generation, passphrase generation, and genesis bundle generation (which is still in review). 14:16:44 Certificate generation uses cfssl to generate certs and keypairs, which are self-signed encrypted by default. 14:18:27 Passphrase generation has two parts. It can generate a master passphrase, which can be used to encrypt documents, or it can generate passphrase documents as specified in a passphrase catalog. 14:19:16 A primary use case for those certs is for Kubernetes components & etcd -- currently those certs are generated by Promenade, but it makes sense to move that functionality into Pegleg as the external yaml management CLI 14:19:53 Yep. As for genesis bundle generation, we've pulled in the functionality from Promenade as well. 14:19:54 and for the passphrases, a primary use case is for service passwords e.g. for databases and openstack service users 14:20:10 Yep, exactly. 14:20:47 That's awesome levmorgan - nice work and looking forward to using 14:21:09 If folks want some more info on Pegleg secret management, more deets can be found in this spec: https://github.com/openstack/airship-specs/blob/master/specs/approved/pegleg-secrets.rst 14:21:58 The primary use case for the certificate generation is for "cluster internal" certificates, but for a lab, pegleg could generate the externally-accessible ingress certificates as well, right levmorgan? 14:22:06 with all the caveats that come with using self-signed certificates 14:23:18 mattmceuen: if I may hook in here 14:23:23 Yep 14:23:28 Basically, the pegleg passphrase catalog doc and pki catalog doc are general-purpose descriptions that tell pegleg whatever secrets you want it to generate (and encrypt), and after they're generated, they can be added to your SCM repo 14:23:31 along with other yaml 14:23:35 Go for it georgk 14:24:00 well, regarding self-signing. our folks are trying to deploy airship and are struggling with the certs 14:24:09 what kind of issues are you seeing? 14:24:52 currently, teh deployment fails when b ringing up barbican becuase of an SSL handshake mismatch 14:25:02 we assume it is due to incorrect certs 14:25:31 so, we are wondering if we can work with self-signed certs or need a full ca chain 14:25:47 is there are ¨lab¨ switch? 14:26:14 hmm that's odd - barbican is not something we expose externally in airship so it seems like it might be a problem with the internal certificates that are generated -- today by promenade, soon by pegleg 14:26:29 self-signed certs should be fine for that 14:27:09 are you using the promenade pkicatalog to generate the certs -- I believe this is how airship-seaworthy is set up today, or are you substituting in custom certificates that you generate out-of-band? 14:27:53 after running into this issue, we have started to generate certs out-of-band 14:28:09 https://hastebin.com/iriloweqif.sql 14:29:52 Here's a bit of documentation on the normal certificate generation process that would be done as part of site definition: https://airship-treasuremap.readthedocs.io/en/latest/authoring_and_deployment.html#building-site-documents 14:29:53 I dont want to highjack the entire meeting - we can follow this up later 14:30:31 Cool - thanks for the heads up on the issue, definitely looks cert-related but I don't see any smoking gun. We'll need to dig in a little more post-meeting I think 14:31:06 ok, catch you later 14:31:18 #topic New Shipyard pod structure 14:31:29 b-str this is you -- enlighten us! 14:31:33 Shipyard's k8s pod configuration has changed to combine several items and eliminte a few others. 14:31:41 Previously, The following pods existed: 14:31:42 shipyard-api (1 container, Shipyard api) 14:31:42 airflow-web (1 container, Airflow api and gui) 14:31:42 airflow-scheduler (1 container, Airflow celery instance) 14:31:42 airflow-flower (1 container, Airflow celery flower instance) 14:31:42 airflow-worker (StatefulSet, 2 containers, Airflow worker; logrotate container) 14:32:07 The first issue being resolved was that the airflow-scheduler and airflow-worker are both required to keep a workflow running. 14:32:16 If the scheduler was updated, the workers would no longer receive new steps to process until the scheduler was running again. 14:32:35 Worse, if the scheduler was updated such that it was incompatible with the version of Airflow running in the worker, everything was effectively stopped. 14:32:48 A manual deletion of the Workers would rectify the situation, but the running workflow failed. 14:33:04 Invocation of an update_site or update_software action in Shipyard deletes/recreates the Airflow Worker after the workflow process has concluded, but this did not occur if the scheduler and worker were out of sync. 14:33:30 The second issue was a question of need of exposure of Airflow outside the deployment of Airship, requiring further configuration of the GUI components. 14:33:48 Since Shipyard provides the entrypoint for Airship, it was decided to eliminate this exposure of the Airflow GUI components. 14:34:08 The new pod configuration is: 14:34:08 shipyard-api (2 containers, Shipyard api; Airflow api and gui) 14:34:08 airflow-worker (StatefulSet, 3 containers, Airflow worker; logrotate container, Airflow Scheduler) 14:34:27 Such that: 14:34:44 - The airflow api and gui container is bound to localhost, such that it is only accessible to Shipyard's api 14:35:01 - The airflow worker and scheduler are now in the same Statefulset, so they share the same lifecycle. 14:35:10 - The airflow-flower pod has been removed from deployment with Shipyard, as it was superfluous to the deployment. 14:35:20 - An airflow-scheduler pod is left in service for the upgrade process, and is removable after being on the new version. 14:35:38 Finally, another somewhat related change is to the way the Airflow image is built. 14:35:48 It now builds the workflows from Shipyard directly into the image, as opposed to transferring them in at deployment time. 14:36:03 Commits related to Shipyard and Airflow pod restructuring: 14:36:03 https://github.com/openstack/airship-shipyard/commit/6b75c7119a850ee7f0c0651021076dfcf220e0f7 - Move airflow scheduler to worker statefulset 14:36:03 https://github.com/openstack/airship-shipyard/commit/9725b0f337ebdc7523973b50afecbf4caf2b7e6c - Build workflows into Airflow image 14:36:03 https://github.com/openstack/airship-shipyard/commit/a11e962eef5a5aa8f8fc15c4a324dfa6b2465061 - Move Airflow web container into Shipyard pod 14:36:16 Questions? 14:37:02 " An airflow-scheduler pod is left in service for the upgrade process" -- that's the one that packs the scheduler and worker into the same pod, right? 14:37:32 actually this is the old scheduler configuration, left in place to be the scheduler during the first upgrade 14:38:09 after the end of the workflow, the statefulsets will be deleted, and will contain the new scheduler going forward 14:38:18 gotcha 14:38:29 a follow on action would be to scale the independent scheduler to 0 14:38:52 and/or otherwise eliminate it 14:40:30 so at some point in the future (maybe in the stable airship version that follows that follows the one that includes your change) we'll take out the old scheduler, since it serves to bridge the gap between the old scheme and the new scheme - right? 14:40:47 * b-str yes, that's its only purpose 14:41:39 awesome - THANKS for all this explanation and detail, I'll make sure a note gets sent out on the ML as this is quite valuable 14:42:07 Upgrades are hard, you make them easy for the rest of us :) 14:42:19 This has been running in Treasuremap for a few days/weeks, so it can be observed there. 14:42:44 Any other questions for b-str on this topic? 14:43:13 Thanks Bryan 14:43:17 #topic Weekly Meeting Time 14:43:41 Several people have approached me over the last few weeks asking if it would be possible to adjust our weekly meeting time 14:44:02 Not all of them are here (there is a correlation) but I wanted to gauge the general temperature for meeting times here 14:44:04 * b-str earlier? 14:44:14 * mattmceuen b-str :-| 14:44:58 Would folks be amenable to this? Maybe a couple hours later, maybe several hours later? Or prefer a different day? Or... ? 14:45:11 I'll send something out on the ML, just looking for opinions 14:45:16 I wouldn't mind it being later in the day 14:45:23 Sure, later works for me. 14:46:22 Any concerns with that? We're spread across time zones and I don't want to make it unduly hard for anyone who wants to come regularly, if possible 14:47:01 I'll propose some candidate times on the ML and see how it goes over 14:47:06 Anytime is fine with me as long as it is within business hours 14:47:35 Cool beans - more to come then 14:47:53 #topic Host next week 14:48:18 I'll be in a meeting during this time next week; I hope to join and multitask, but dwalt has volunteered to drive discussion 14:48:34 #topic State of Airship governance docs 14:48:54 ty for bringing this up georgk, this is something we need to get on top of4 14:49:30 alanmeadows is planning to drive this in advance of the Denver summit, so we have something to discuss there 14:49:58 ok, thanks 14:50:24 I was digging through a few etherpads, but wasnt sure which one reflects the latest state 14:50:24 I don't think he's here today, but I'll circle back with him to make sure his proposal is written down and shared soon 14:50:39 the joys of etherpads :) 14:50:42 yeah 14:50:50 thanks 14:51:25 that's not what we have in airshipit readthedocs, right? 14:51:39 airshipit.readthedocs.io 14:52:06 I think what georgk is talking about is more a proposal for formal governance. Let me elaborate: 14:52:51 Airship is one of the OSF pilot projects, and one of the things it needs to develop and get in order prior to becoming a full-fledged project is documentation of its governance -- its formal operating principles 14:53:38 Things like following the Four Opens, the structure around cores and their responsibilities, the way that the project is steered, that kind of thing 14:53:53 got it, thanks 14:53:58 It's the kind of thing that we should get ideas / draft out for discussion way in advance 14:54:00 right 14:54:32 #topic Requests for Review 14:54:41 A few PS that nishant_ added: 14:54:45 Helm-toolkit for DB initialization 14:54:45 https://review.openstack.org/#/c/634625/ 14:54:45 https://review.openstack.org/#/c/635348/ 14:54:45 https://review.openstack.org/#/c/634981/ 14:55:24 I take it those are helm-toolkit functions for making postgres operations easier across airship (and anyone else who uses the osh postgres chart) right nishant_? 14:55:24 So i have been working on multiple Airship components to validate if password rotation works fine. 14:55:53 During validation I found Airship comoponents using postgre needs some modification for password rotation to work effectively 14:56:20 and with some discussion i was advised to in turn create a helm-toolkit for DB initialization 14:56:51 which makes it easier for components to operate postgres operations easily 14:57:17 so you are right mattmceuen 14:57:24 awesome - thanks for making your work reusable, effective password rotation is critical to real life ops 14:57:36 reviews please :D 14:57:41 I am looking for any feedback and reviews on it. Thanks :) 14:57:54 I have a couple as well: 14:58:00 https://review.openstack.org/#/c/634821/ and https://review.openstack.org/#/c/613392/ 14:58:02 go for it levmorgan 14:58:35 The first one is an improvement for passphrase generation, the second one is for genesis bundle generation. 14:58:48 Thanks! 14:59:36 great - I've added them to the agenda for anyone who couldn't make it 14:59:44 with that I think we're out of time! 14:59:57 Thanks everyone for coming, appreciate your discussion 15:00:02 Have a great day & week all 15:00:12 #endmeeting