15:59:23 #startmeeting kolla 15:59:23 Meeting started Wed Mar 15 15:59:23 2017 UTC and is due to finish in 60 minutes. The chair is inc0. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:59:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:59:27 The meeting name has been set to 'kolla' 15:59:37 #topic rollcall, w00t 15:59:40 o/ 15:59:43 you know what to do 15:59:48 o/ bonjour 15:59:50 o/ 15:59:51 WOOT 15:59:57 oops 16:00:01 o/ 16:00:28 o/ 16:00:39 woot 16:00:44 woot o/ 16:00:54 o/ 16:01:03 o/ 16:01:10 woot 16:01:15 woot 16:01:21 woot 16:01:22 o/ 16:01:28 o/ w00t w00t 16:02:19 O/ 16:02:23 #topic announcements 16:02:30 1. we releaed ocata! 16:02:32 so many people today ;) 16:02:39 congrats everyone 16:03:18 gratz 16:03:19 Jeffrey4l: for DST 16:03:27 yay! more reviewers! 16:03:49 aha 16:03:55 2. One more week for voting for duonghq to become core, if anyone from core team missed it, please vote 16:04:19 yep, will vote 16:04:20 thank inc0 16:04:35 duonghq: congrats 16:04:43 so last week we canibalized regular agenda for release discussion, so now let's get back to it 16:05:03 #topic Need to formalize policy around pushing to dockerhub 16:05:10 agree ^ 16:05:14 formalize and automate 16:05:37 #link https://bugs.launchpad.net/kolla-ansible/+bug/1669075/comments/4 16:05:37 Launchpad bug 1669075 in kolla-ansible "kolla-ansible pull with kolla_ansible-4.0.0.0rc1 fails, because of missing tag in docker registry" [Low,Invalid] 16:05:42 regarding automate: i can add this to our jenkins instance 16:05:55 #linkhttps://wiki.openstack.org/wiki/Meetings/Kolla#Agenda_for_next_meeting_.28Mar_8th_2017.29 16:05:59 #link https://wiki.openstack.org/wiki/Meetings/Kolla#Agenda_for_next_meeting_.28Mar_8th_2017.29 16:06:18 rc is unstable, push them will cause lots of issue, imo. 16:06:35 but wont they technically never be pulled 16:06:41 especially for hub.docker.com. 16:06:45 unless you're running an rc release of kolla-ansible/kubernetes? 16:06:58 but push them into tarballs.openstack.org is OK, i think. 16:07:01 Jeffrey4l: when you visit our docs the master documents including the tag 4.0.0 was published 16:07:08 alternatively, instead of rc 16:07:13 because of this david opened this bug 16:07:15 keep pushing stable/ocata 16:07:19 is there a reason we can't push to dockerhub along side tarballs.oo 16:07:22 with some meaninful thag 16:07:24 tag 16:07:31 like 4.0.0-latest 16:07:44 the master branch is only usable when building own images 16:08:00 i think what inc0 makes a lot of sense, that means backports can make their way much faster 16:08:06 pbourke, i want to know how to keep hub.docker.com credential in ci. 16:08:26 yeah and also fixes what egonzalez mentioned on main channel - some other project deploys critical fix 16:08:36 inc0, 4.0.0-latest is a good idea. 16:08:37 we have it upstream immediatly 16:08:51 and :latest for master 16:09:04 berendt: my question is...what jenkins instance?:) 16:09:07 inc0: well technically, you wouldnt, unless you manually trigger stable/ (i could be wrong)? 16:09:19 inc0: company one 16:09:30 not sure if we have to add it to the openstack jenkins 16:09:32 right 16:09:35 berendt, re docs, sorry, i do not get your point ;( 16:10:03 Jeffrey4l: david opened the bug because the kolla-ansible repository on the master branch is not usable without building own images 16:10:07 so how about we will create crontab entries and keep them in our repo 16:10:14 mnaser, we can , there is a period pipeline in zuul. 16:10:32 oh cool1 16:10:52 Jeffrey4l: really? so we can run a gate daily? 16:10:57 inc0, yep. 16:11:00 or rather, job to build+push? 16:11:01 pretty sure. 16:11:01 cool 16:11:10 yes i recall now the periodic pipeline 16:11:29 https://docs.openstack.org/infra/system-config/zuul.html > periodic 16:11:50 do we agree that we create branch :4.0.0-latest for daily stable ocata and :latest for daily master? 16:11:57 #link https://docs.openstack.org/infra/system-config/zuul.html 16:11:58 or maybe not latest 16:12:04 let's call it master or trunk 16:12:11 as latest is default tag 16:12:12 would it be a lot more work to add newton? :X 16:12:21 no it wouldnt 16:12:25 we can do neutron too 16:12:36 neutron? 16:12:38 it would be quite beneficial (as ocata is still "fresh") 16:12:40 newton 16:12:40 i think he means newton :-P 16:12:42 sorry 16:12:54 I'm still waking up;) 16:13:23 so i guess push branch is acceptable by all guys, right? 16:13:25 #action inc0: write bp for daily gerrit jobs 16:13:31 tag name is not a big deal. 16:13:35 yeah 16:13:56 we can continue discussion in bp and as usual 16:14:00 inc0: how are we going to get credentials into these jobs 16:14:03 another thing related to this is: auto bump the service tag in source. 16:14:10 pbourke, good point. 16:14:23 pbourke: that's a good question, I'll check with infra for secret storage 16:14:31 inc0: cool 16:14:47 I think they have hiera (they need to;)) 16:14:55 maybe we can somehow tap into it 16:15:00 they do have hiera 16:15:23 pypi credentials are stored in there for example 16:15:29 cool. 16:15:41 mnaser, you know lots of think about ci? 16:15:54 thing* 16:16:05 * mnaser has been in openstack since 2011 16:16:13 wow 16:16:37 our cloud is running newton (but it started its life off as bexar actually) -- looking to get more involved but we can get into that later :) 16:16:48 mnaser: perfect person for '10y of openstack experience' offers 16:16:54 lol 16:16:59 mnaser: so from Bexar?:0 16:17:00 lol 16:17:09 lol 16:17:09 http://jeffrose.wpengine.netdna-cdn.com/wp-content/uploads/2011/12/dr.-evil-million-dollar-term-policy-300x241.jpg 16:17:20 ok let's move on 16:17:24 we started first environment with bexar, too, funny times 16:17:36 #topic drop root 16:17:43 duonghq: you're u 16:17:45 up 16:17:50 thank inc0 16:18:05 I see we have 2 bugs relate to drop root topic: 16:18:26 #info keystone https://bugs.launchpad.net/kolla/+bug/1576794 16:18:26 Launchpad bug 1576794 in kolla "drop root for keystone" [Critical,In progress] - Assigned to Surya Prakash Singh (confisurya) 16:18:39 #info crontab https://bugs.launchpad.net/kolla/+bug/1560744 16:18:39 Launchpad bug 1560744 in kolla "drop root for crontab" [Critical,Confirmed] 16:19:10 for crontab, I see that sdake commented it cannot be dropped in centos, for keystone, I'm not sure 16:19:21 inc0: first to check is this valid bug , need to b fix ? 16:19:25 so if we can confirm for crontab one, I think we can close the bug 16:19:50 we pbourke comment too for keystone one 16:19:57 that root can't be dropped 16:20:29 well for keystone, and other apache based apis, it can't be dropped 16:20:40 afair 16:20:52 pbourke, how do you think? 16:20:57 would be interested in what the keystone guys have to say on this 16:21:20 suddenly forcing root on operators is a strange decision 16:21:28 regardless of the benefits brought by running behind apache 16:21:37 if it can run on >1024 port then should be doable without root 16:21:38 copy from net: Apache has to run as root initially in order to bind to port 80. If you don't run it as root initially then you cannot bind to port 80. If you want to bind to some port above 1024 then yes, you can. 16:21:38 pbourke: +1 16:21:42 https://superuser.com/questions/316705/running-apache-as-a-different-user 16:21:59 all the port we are using now > 1024 16:22:13 Jeffrey4l: horizon is still 80/443 16:22:17 well 80 16:22:27 oh, right. horizon is. 16:22:29 so, we can move it to higher port, and drop root? 16:22:31 haproxy as well? 16:22:32 for the horizon backends 16:22:50 but technically we could run horizon on 1024< and bind 80 on haproxy 16:23:02 just not backwards compatible change so let's not do it 16:23:02 seems like >1024 would be ok for dropping 16:23:07 inc0, haproxy is optional. 16:23:11 aio deployments might become a bit weird though ^ 16:23:20 everything is optional;) 16:23:27 but yeah, can break stuff 16:23:28 kolla support run without haproxy. 16:23:32 mnaser, in default setting, AIO still use haproxy 16:23:48 it seems like the root requirement is there, regardless 16:23:49 yeah, and keepalived;) 16:24:03 there's quite a few components which will need root at the end of the day 16:24:04 well, either way 16:24:08 we still can bind port from docker side 16:24:11 keystone shouldn't need it because of apache 16:24:11 so we can drop root for apache with port > 1024 , right? 16:24:32 yes Jeffrey4l 16:24:53 duonghq: that's good alternative, but we would need to drop net=host for apis 16:24:59 which I wouldn't be opposed to 16:25:04 at linaro we only deploy nova/neutron/cinder/glance/horizon/keystone + openvswitch + ceph iirc 16:25:24 hrw, so? 16:25:41 inc0, hmm, forgot that, one of our goals 16:25:47 i like net=host being there. it makes life simple. once you get out of it, you have to start playing around overlay networsk and you start adding a lot of the complexities (imho) 16:25:58 there is another parameter may be helpful to drop root: docker run --cap-add 16:26:03 but i am not sure. 16:26:11 actually thats a really good suggestion 16:26:11 yeah, and also there were performance issues 16:26:15 how many people see this as high priority? 16:26:29 pbourke, drop root, or? 16:26:31 as a deployer, i dont really care about the keystone container running as root (honestly) 16:26:42 breaking out of a container is a theoretical exploit... meanwhile we have world readable passwords on all target nodes 16:26:58 btw, even though keystone container running as root, but keystone wsgi run as keystone user. 16:27:03 httpd is going to be running as root in most other deployments methods in the first place and the keystone processes fork under keystone 16:27:08 and getting into container is arguably harder than root host 16:27:18 as we don't run any services there besides one we need 16:27:54 can i say: drop root is not critical issue, but nice to have ? 16:28:00 i would agree with that ^ 16:28:07 I think so 16:28:16 but regardless, can we examine drop root for ks as there doesn't seem to be compelling reason why not? 16:28:30 it's still better to remove it 16:28:37 so if drop-root for any container is possible, and anyone who interested in this? please implement it :) 16:28:39 just not critical 16:28:40 Jeffrey4l: we can its type to medium ? 16:28:43 sure 16:28:50 someone should investigate and update the docs if its not currently feasable 16:28:56 yeah let's make all drop root medium bugs 16:29:08 medium, agree. 16:29:09 so we drop its "importance"? 16:29:15 lol 16:29:19 lol 16:29:22 duonghq: lol 16:29:34 I'll ask sdake later when I see him 16:29:38 about crontab 16:29:38 (and I bet *nobody* actually laughted out loud) 16:29:48 although we need to fix it anyway 16:30:04 inc0: +1 16:30:11 ya, alright 16:30:12 right, let's move on 16:30:19 yes no body 16:30:26 #topic canonical k8s deployment 16:31:02 so I think we don't have our canonical guys around 16:31:11 (do we have kolla-k8s people?) 16:31:22 Canonical company? interesting 16:31:38 kfox1111 around? 16:32:11 ok it seems we don't have quorum for that, pushing to next meeting 16:32:20 ;) 16:32:25 #topic open discussion 16:32:42 since we ran out of agenda items, anything needing our immediate attention? 16:32:44 I'm still deploying kolla-k8s and will update docs as needed. 16:32:45 can I? 16:32:52 duonghq: go ahead 16:33:12 forgot add this to agenda, I drafted on bp from last week 16:33:14 #link https://blueprints.launchpad.net/kolla/+spec/unix-signals-handling 16:33:22 can you give me some comment? 16:33:47 hmm, where is the bot, I think bot'll put the title 16:33:52 Unix singals handling in Kolla image 16:33:59 duonghq: first, we need to figure out which services allows sighup 16:34:10 second, that won't work with CONFIG_ONCE 16:34:16 duonghq: i think he doesn't because of the leading #link 16:34:31 berendt, roger 16:34:52 inc0, ya, but in COPY_ALWAYS, it'll be nice feature to reload setting w/o downtime 16:34:54 duonghq, have u tried sighup. it should work with dumb-init. 16:34:54 also, i think this is a big of a weird situation because not all config values are reloaded 16:34:57 w/o restart container 16:35:17 so for example oslo_log might notice the change but some other part of another component will 16:35:33 Jeffrey4l, I'm not sure w/ dumb-init, just plain service, it's ok 16:35:48 so i think its important to keep in mind of the possible complexity that might introduce knowing which config values will reload and which ones wont 16:35:55 mnaser, ya, and we also have some service support graceful shutdown by signal 16:36:14 i think graceful shutdown is miles more important especially for cases like nova-compute for example 16:36:18 sighup should be handle properly, as long as the real service could handle it. 16:36:55 currently, we use sighup for haproxy configure reload. 16:37:38 so i think this pb is already done ;) 16:37:39 yeah, sigkill is more importnat 16:37:41 but i think on reconfigure's sending signal instead of just killing the container (unless docker already does that?) 16:38:06 mnaser, it's depend on argument we pass to docker 16:38:11 docker does sigkill and then timeout (30s I believe) before force termination 16:38:15 the signal indeed 16:38:26 inc0, 10s 16:38:31 gotcha inc0 that's good for nova-compute 16:38:44 but 10 seconds might be a bit too short but i think that's anothre discussion 16:38:52 it is configurable. 16:39:00 for each container. 16:39:13 thats good to know, thanks Jeffrey4l 16:39:15 docker stop -t 16:39:33 but I don't believe we use this config 16:39:42 maybe that's good bug to kolla_docker? 16:39:52 Jeffrey4l, should we figure what service support SIGHUP to reload whole service config, then passthrough the signal to that service? 16:39:53 sorry to burst in late - we can also controll the signals in k8s - so would be great to get kfox and sbezverk to have some input on that 16:39:57 kolla-ansible do not support this parameter so far. 16:40:24 portdirect: yeah k8s is better in this spae 16:40:33 duonghq, not all parameter support SIGHUP, jut part of them, iirc. 16:40:56 Jeffrey4l, it's docker-py, docker issue or our issue? 16:41:05 our issue 16:41:21 wait 1 min. which issue are u talking? 16:41:22 well, we don't allow to override 10s 16:41:31 that's it 16:41:52 i think a summary of what inc0 is saying is overriding the docker kill timeout for containers 16:42:11 (aka the time period from when it sends a signal to stop and then forcingly terminates the container) 16:42:12 1. kolla container support sighub, it pass to the real process 2. container is killed after 10s without stopped. 16:42:50 and for 2 - let's add this config so we can extend period for services like n-cpu or heat 16:43:04 inc0, ++ 16:43:11 Jeffrey4l, just for sure, we already support passing SIGHUP to container? 16:43:27 as you're using dumb-init i believe you it should happen automagically 16:43:31 inc0, +1 16:43:32 duonghq, yep. with dumb-init, SIGHUP is handle properly. 16:43:39 mnaser, Jeffrey4l roger 16:43:46 i have a few things to bring up if we're done with this 16:43:47 you can try it simplely. 16:43:53 iirc, we're planing to move to another init 16:44:00 yeah, but correct me if I'm wrong but we don't *really* use sighup during reconfigure 16:44:01 tini? 16:44:03 but another thing is: not all parameter in nova.conf support SIGHUP. 16:44:08 inc0, yup 16:44:14 Jeffrey4l, of course 16:44:22 inc0, for haproxy, yes. others no. 16:44:29 it's mnaser said: it's make things go wired 16:44:39 question is, is it a big deal really 16:44:53 i.e. all oslo log support it, 16:44:54 it is impossible, imo. 16:45:21 at least very hard 16:45:28 we do not know which parameter is change, so we can not know whether we should restart or sighup. 16:45:33 so it is impossible. 16:45:44 right 16:45:48 i think if you want to revise the bp duonghq you would maybe look into merge_configs to notice what changed 16:45:50 but for glance, it's support SIGHUP for all config 16:45:51 safer to do full restart 16:46:11 I mean, by the time, maybe more service support this kind of reconfiguration 16:46:14 and then maybe if SIGHUP becomes "the way to go" long term, you'd easily be able to do that 16:46:45 if one service announce he support SIGHUP for all config, i think we can implement this. 16:46:46 so, for services have not supported yet, we can ignore that, 16:46:58 we can have some kind of fully supported list 16:47:00 just on a deployer perspective 16:47:03 duonghq: but if we introduce 2 different modes of reload 16:47:06 that's complexity 16:47:14 i would much rather have a full restart 16:47:22 i doubt SIGHUP reloads have undergone heavy testing 16:47:29 COPY_ONCE is another big concern when using SIGHUP. 16:47:38 mnaser, ++ 16:47:40 inc0, sure, 16:47:44 deploy X change, send SIGHUP, makes sure everthing is working is probably not something that's tested 16:47:54 in most of case, restart is not a big deal. 16:48:19 ok 16:48:25 another question 16:48:31 if it is a big deal then you have mutliple controllers and serial will do controlled restarts so you should be okay 16:48:32 different topic 16:48:39 draining of connections on haproxy 16:48:44 during upgrade 16:48:49 restart means: kill the process and start it again, reload/sighup means recreate the inner class/object again. 16:49:22 inc0, at that point, we should support rolling upgrade first. 16:49:37 right... 16:49:40 instead of draining connections, i think shutting all services down and letting haproxy return 502 is an acceptable thing 16:49:42 any ideas about that btw 16:49:44 ? 16:49:46 otherwise the remaining connection won't work. 16:49:49 about draining connection on haproxy, iirc, egonzalez have a solution 16:50:00 mnaser, i like you. 16:50:03 lolo 16:50:08 inc0, yep, ansible support setting a haproxy backend as maintenance mode 16:50:28 yay..but it doesn't support serial way we need it;) 16:50:29 we can drain connection than upgrade the node, so it appear no downtime at that point 16:50:43 serial is not rolling upgrade. we talked about this 16:50:59 ok, anyway, rolling upgrade 16:51:05 that's what I meant 16:51:28 i would: pull all new images, shutdown all $service containers, run db syncs, start all $service containers. naturally, during the time of this happening, haproxy will be giving back 502s 16:51:29 about rolling upgrade, graceful shutdown is important for achieve that 16:51:49 for rolling upgrades, here's what i'd throw in the table, add multiple steps to it (or maybe even multiple kolla-ansible steps) 16:51:59 step #1, upgrade control plane (this happens with no serial) 16:52:01 mnaser, shutdown all service means shutdown haproxy. 16:52:11 nope, shut down a specific service, ex: glance 16:52:21 got. 16:52:34 step #2, upgrade data plan (this happens with say, 20% serial or whatever) 16:52:35 duonghq, graceful shutdown mean? 16:52:46 as part of step #1, you'd set upgrade_levels on the controllers too 16:52:51 yeah, we thought of 2 different plays 16:52:58 and then the final step would be, remove all upgrade_levels and restart $service 16:53:11 glance (for example) has glance-control to coordinate its microservice, we have not supported that 16:53:21 hmm, upgrade playbook can call 3 plays with serial 16:53:29 Jeffrey4l, we send some signal to container, it'll drain connection by itself 16:53:35 1 - upgrade control, no serial, set upgrade_lebels 16:53:42 2- upgrade compute, with serial 16:53:52 3 - remove controller upgrade_levels 16:54:16 ideally id like to see those split (and one that is combined). we usually prefer to upgrade control plane and make sure everything is a-ok 16:54:25 do all services support upgrade_levels? 16:54:34 the large scale ones do (aka neutron+nova) 16:54:58 the rest i dont really know but they're so lightweight that it's not as big of a deal 16:55:04 most people dont have 300 heat-engine instances for example 16:55:14 yep. 16:56:07 separating upgrade to multiple plays - I really like that 16:56:08 draining connection is trying to reduce the downtime in #1 16:56:15 i have few things 16:56:19 before the end if people dont mind 16:56:19 I'd do it after we make upgrade gats really 16:56:23 two topics we are haveing. 16:56:34 Jeffrey4l, minute, does dumb-init support passsing SIGKILL to the process? in general, every signal? 16:56:36 mnaser, please. 16:56:41 https://review.openstack.org/#/c/445690/ keystone-ssh is broken 16:56:43 inc0, +1 for your 3 plays 16:56:43 duonghq, yep. 16:56:47 multinode rotation of fernet tokens doesnt work 16:56:49 Jeffrey4l, cool 16:56:57 if people can give some love to that review, it would be wonderful 16:57:05 ill backport afterwards 16:57:39 duonghq, dumb-init works like systemd. 16:57:46 and as a closer for next time maybe, i want to float the idea of using bindep when installing from source to avoid problems like this - https://review.openstack.org/#/c/446032/ 16:58:01 +2ed 16:58:08 Jeffrey4l, ok, I'll experiment that, thanks 16:58:28 ok, we're running out of time 16:58:35 thank you all for coming 16:58:41 thanks 16:58:41 #endmeeting kolla