Thursday, 2018-06-07

*** caoyuan has quit IRC		00:39
*** felipemonteiro has joined #openstack-helm		01:12
*** felipemonteiro_ has joined #openstack-helm		01:14
*** felipemonteiro has quit IRC		01:18
*** felipemonteiro_ has quit IRC		01:24
*** caoyuan has joined #openstack-helm		01:33
*** mdih has joined #openstack-helm		01:52
*** jgu has quit IRC		02:03
*** unicell has quit IRC		02:08
*** roman_g has quit IRC		02:49
*** zhangfei has joined #openstack-helm		03:37
*** yamamoto has joined #openstack-helm		04:00
openstackgerrit	Pete Birley proposed openstack/openstack-helm-infra master: WIP: Dex: Chart https://review.openstack.org/572003	04:06
*** mdih has quit IRC		04:40
*** p__arch has joined #openstack-helm		05:08
*** p_arch has quit IRC		05:12
*** mdih has joined #openstack-helm		05:22
*** jgu has joined #openstack-helm		05:29
*** sar has joined #openstack-helm		06:10
*** unicell has joined #openstack-helm		06:21
*** unicell has quit IRC		07:02
*** gkadam has joined #openstack-helm		07:07
*** cNilesh has joined #openstack-helm		07:12
*** radek__ has joined #openstack-helm		07:14
*** jistr\|mtgs is now known as jistr		07:22
*** jgu_ has joined #openstack-helm		07:23
*** jgu has quit IRC		07:26
*** sar has quit IRC		07:28
*** jgu_ has quit IRC		07:29
*** zhangfei has quit IRC		07:51
*** zhangfei has joined #openstack-helm		08:02
*** roman_g has joined #openstack-helm		08:11
*** MarkBaker has quit IRC		09:04
*** MarkBaker has joined #openstack-helm		09:05
openstackgerrit	Bertrand Lallau proposed openstack/openstack-helm master: Update developer references 'networking' section https://review.openstack.org/573144	09:11
openstackgerrit	Bertrand Lallau proposed openstack/openstack-helm master: Update developer references 'networking' section https://review.openstack.org/573144	09:30
*** caoyuan_ has joined #openstack-helm		09:33
*** caoyuan has quit IRC		09:34
*** caoyuan_ has quit IRC		10:07
*** MarkBaker has quit IRC		10:43
*** julim has joined #openstack-helm		10:58
*** caoyuan has joined #openstack-helm		10:59
*** yamamoto has quit IRC		11:14
*** radek__ has quit IRC		11:28
*** MarkBaker has joined #openstack-helm		11:29
*** zhangfei has quit IRC		11:43
openstackgerrit	Bertrand Lallau proposed openstack/openstack-helm master: Update developer references 'upgrade' section https://review.openstack.org/573205	11:56
*** yamamoto has joined #openstack-helm		12:10
*** MarkBaker has quit IRC		12:15
*** yamamoto has quit IRC		12:21
*** yamamoto has joined #openstack-helm		12:49
*** eeiden has joined #openstack-helm		13:24
*** sthussey has joined #openstack-helm		13:30
*** openstackgerrit has quit IRC		13:34
*** zhangfei has joined #openstack-helm		13:54
*** openstackgerrit has joined #openstack-helm		13:57
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: WIP: Use Elasticsearch to gather logs in post-run jobs https://review.openstack.org/573007	13:57
*** gkadam has quit IRC		14:03
*** gkadam has joined #openstack-helm		14:04
*** felipemonteiro has joined #openstack-helm		14:15
*** felipemonteiro_ has joined #openstack-helm		14:15
*** felipemonteiro has quit IRC		14:20
*** radek__ has joined #openstack-helm		14:20
*** cfriesen has quit IRC		14:36
*** cfriesen has joined #openstack-helm		14:36
*** roman_g has quit IRC		14:52
*** zhangfei has quit IRC		15:12
*** MarkBaker has joined #openstack-helm		15:15
*** MarkBaker has quit IRC		15:26
*** yamamoto has quit IRC		16:14
*** MarkBaker has joined #openstack-helm		16:18
*** yamamoto has joined #openstack-helm		16:21
*** ericho has joined #openstack-helm		16:24
*** yamamoto has quit IRC		16:26
*** caoyuan has quit IRC		16:31
*** felipemonteiro_ has quit IRC		16:42
*** felipemonteiro_ has joined #openstack-helm		16:42
*** caoyuan has joined #openstack-helm		16:48
*** caoyuan has quit IRC		16:53
*** MarkBaker has quit IRC		16:54
*** radek__ has quit IRC		16:55
*** cNilesh has quit IRC		17:10
*** gkadam has quit IRC		17:14
*** mdih has quit IRC		17:16
*** yamamoto has joined #openstack-helm		17:23
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: WIP: Use Elasticsearch to gather logs in post-run jobs https://review.openstack.org/573007	17:30
*** yamamoto has quit IRC		17:31
*** openstackgerrit has quit IRC		17:34
*** patrickeast has joined #openstack-helm		17:37
*** felipemonteiro__ has joined #openstack-helm		17:48
*** unicell has joined #openstack-helm		17:49
*** felipemonteiro_ has quit IRC		17:52
*** MarkBaker has joined #openstack-helm		18:34
*** openstackgerrit has joined #openstack-helm		18:38
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm master: Ceph-config-helper: Add rgwadmin python package https://review.openstack.org/573405	18:38
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm master: Ceph-config-helper: Add rgwadmin python package https://review.openstack.org/573405	18:38
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm master: Ceph-config-helper: Add rgwadmin python package https://review.openstack.org/573405	18:43
*** gkadam has joined #openstack-helm		18:51
*** ericho has quit IRC		18:52
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm master: Ceph-config-helper: Add rgwadmin python package https://review.openstack.org/573405	18:53
*** ericho has joined #openstack-helm		18:58
openstackgerrit	Chris Wedgwood proposed openstack/openstack-helm-addons master: artifactory: Update to 5.11.1 https://review.openstack.org/570526	19:08
*** felipemonteiro__ has quit IRC		19:26
*** Talion has joined #openstack-helm		19:36
Talion	Hello good evening.	19:37
*** ericho has quit IRC		19:38
Talion	What is the avantage/disadvantage of installing openstack with kolla-kubernetes vs openstack-helm ? Can anyone explain it please? Exprienced will be appreciated.	19:39
portdirect	Talion: kolla-kubernetes was a great project, with some interesting design features	19:41
*** ericho has joined #openstack-helm		19:41
portdirect	unfortunately it's not maintained anymore https://github.com/openstack/kolla-kubernetes	19:41
*** gmmaha has quit IRC		19:41
Talion	i also notice it but confused when i see it in openstack documentation	19:42
Talion	couldnt be sure	19:42
portdirect	yeah - i think there is some cleanup still ongoing	19:42
Talion	so installing openstack with kubernetes only possible with openstack-helm atm ?	19:42
portdirect	its the path I would recommend	19:43
Talion	One thing i am curious about its management. Since we are using/adding many tool docker + kubernetes / helm guess also ansible etc....	19:44
Talion	Could we have hard times after we go production ?	19:45
Talion	We are in mind "keep it simple" :)	19:46
Talion	Our team also considering openstack-ansible. But using containers comes better option for me.	19:47
*** ericho has quit IRC		19:47
*** ericho has joined #openstack-helm		19:48
*** ericho has quit IRC		19:52
*** ericho has joined #openstack-helm		19:58
*** ericho has quit IRC		20:02
*** ericho has joined #openstack-helm		20:02
*** ericho has quit IRC		20:04
*** ericho has joined #openstack-helm		20:04
*** felipemonteiro has joined #openstack-helm		20:09
p__arch	anticw: not sure if you can help with this but worth asking. After finally getting osh running, i rebooted one of the VMs and now ceph is continually going into error state. basicall ceph-mon is catching a signal and aborting. have you seen that before?	20:10
p__arch	portdirect: feel free to chime in too :) basically everything else is breaking because ceph is down :(	20:11
*** gkadam has quit IRC		20:11
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm master: Add Elasticsearch image with s3 repository plugin https://review.openstack.org/573425	20:15
*** eeiden has quit IRC		20:20
p__arch	apparently asserting in AuthMonitor::update_from_paxos() due to "int ret = get_version(keys_ver+1, bl);" returning 0	20:22
*** felipemonteiro_ has joined #openstack-helm		20:22
p__arch	actually: "int ret = get_version(keys_ver+1, bl);" not returning 0	20:23
*** felipemonteiro has quit IRC		20:25
*** julim has quit IRC		20:40
anticw	p__arch: i don't know if i've seen that exact error, is it possible the storage for ceph got corrupted/lost?	20:42
p__arch	great question. not sure how I would figure that out. unfortunately ceph isn't really my area	20:43
anticw	i would expect if you are use block-logical it would create and mount an fs on the said devices	20:43
anticw	so you should check that is the case	20:43
anticw	p__arch: the mon's crashloop so it's difficult to exec in?	20:45
p__arch	i can get into the pod for a while at least, but don't know what i'm looking for	20:47
anticw	tbqh i'm not 100% sure either, can you show (1) get pods (2) logs from said crashing pod ... pastebin somewhere perhaps?	20:48
p__arch	sure. give me a sec	20:54
p__arch	https://pastebin.com/dcFz42WY	20:56
anticw	is it possible this is slow to come up and the healthchecks are failing?	21:00
anticw	describe po ...	21:00
anticw	would show if that's perhaps the case	21:00
p__arch	all 5 ceph-mon pods are crashing btw. :(	21:00
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: Update start.sh to reflect osh-infra deployment changes https://review.openstack.org/573440	21:00
p__arch	yup one sec	21:00
p__arch	the last events are:	21:01
p__arch	Normal Created 46m (x3 over 47m) kubelet, kube-5.novalocal Created container	21:01
p__arch	Warning BackOff 2m (x188 over 46m) kubelet, kube-5.novalocal Back-off restarting failed container	21:01
anticw	yeah, i see line 315	21:02
anticw	/build/ceph-12.2.3/src/mon/AuthMonitor.cc: 157: FAILED assert(ret == 0)	21:02
p__arch	yup, that's the one	21:03
p__arch	get_version is failing apparently	21:03
anticw	googling for that i find the internet is awash with bug reports in that function	21:04
p__arch	latest code is at https://github.com/ceph/ceph/blob/master/src/mon/AuthMonitor.cc line 233	21:05
anticw	need to look at the branch that represents the version you have there	21:05
anticw	i'm wondering if the auth files are missing/broken	21:06
anticw	https://github.com/ceph/ceph/blob/luminous/src/mon/AuthMonitor.cc#L157	21:08
anticw	p__arch: if you look inside sdb (from the host should be possible) ... what do you see?	21:09
p__arch	https://github.com/ceph/ceph/blob/luminous/src/mon/AuthMonitor.cc	21:09
p__arch	anticw: "look inside sdb" - clarify please?	21:10
anticw	you mentioned the other day you were going to use 'sdb' for ceph on your nodes	21:11
anticw	if that's the case, on a node ... what does sdb have ... and if you mount it (probably already mounted); does it have a plausible on-disk structure for ceph?	21:11
p__arch	not me. i'm just using the default osh chart for helm	21:12
anticw	p__arch: it's using storage on the hosts then?	21:12
p__arch	yes	21:12
anticw	so i think by default you have /var/lib/openstack-helm/ceph/ perhaps?	21:12
p__arch	anticw: correct	21:13
anticw	inside that?	21:14
anticw	maybe tree /var/lib/openstack-helm/ceph/ gives a suitable summary?	21:15
*** gmmaha has joined #openstack-helm		21:15
anticw	sorry, my cluster right now isn't accessible or i would check	21:15
p__arch	https://pastebin.com/gAHk03h4	21:18
*** Talion has quit IRC		21:25
anticw	p__arch: ok, plausible ... he's a thought ... did you install then reinstall without purging those directories?	21:28
*** felipemonteiro_ has quit IRC		21:28
*** felipemonteiro_ has joined #openstack-helm		21:29
p__arch	nope - was a fresh set of VMs. I did have to hard-reboot 2 of them though.	21:29
p__arch	anticw: so there's a definite chance of corruption	21:29
p__arch	do i really have to start again from scratch? grrrrr	21:30
anticw	p__arch: i rarely see issues with this, when i do i have a script which nukes and reinstalls since it's quick	21:32
anticw	that's a terrible answer i'm sorry	21:32
p__arch	nukes and re-installs the k8s cluster? or the openstack/ceph deployments?	21:33
anticw	just ceph	21:33
anticw	k8s is pretty rock-solid for me	21:33
anticw	ceph ... less so	21:33
p__arch	it's possible to redo ceph without breaking the openstack stuff on top of it?	21:33
anticw	for openstack testing/development i don't always use ceph	21:33
anticw	no, openstack puts much stuffs in the db which is on ceph	21:34
p__arch	yeah, that's what i thought.	21:34
anticw	if you are quite sure it's not fully corrupted then ceph people can perhaps tell you how to fix this	21:34
p__arch	there's no multi-node guide for using nfs, just aio	21:34
anticw	but when it exploded with stack traces ... i'm not a good resource, it takes me way too long to work things out	21:34
p__arch	is nfs a possibility for multi-node or will i be fighting with it to get it running?	21:35
anticw	p__arch: nfs required you have robust nfs storage, but it's not hard otherwise	21:35
anticw	you want to use nfs from inside the cluster or outside?	21:35
p__arch	inside	21:35
p__arch	actually, i'm really not sure	21:36
anticw	so ... aio will do that for you ... but it's not going to be resillient if that host serving nfs goes down	21:36
anticw	if you have an external nfs option that would be one approach	21:36
anticw	like a netapp or similar	21:36
p__arch	no - this is a development cloud, no good storage solution at all. which is why my deployments take 10+ hours I think	21:37
portdirect	nfs would be a good choice then	21:37
p__arch	btw, i tried setting up lvmcache, and almost got there, but the volume group the partition is on doesn't have enough extents remaining to add the ramdisk cache volume so I'm SOL there	21:38
anticw	agree, nfs often ends up being faster as well	21:38
portdirect	if you use the nfs provisioner from osh, for it to work you'll need to label a node and pin it to that	21:38
anticw	p__arch: do you have a machine with enough storage you could use as an nfs-sever?	21:39
anticw	server even	21:39
anticw	sever is a bit dark	21:39
p__arch	yes, i have an extra drive on each compute node that's dark right now. I would want to use just one of those right?	21:40
p__arch	portdirect: thanks	21:40
anticw	p__arch: so long as it's large enough	21:41
p__arch	yup it's 2 TB	21:41
anticw	you could also possibly shuffle disks about so that you can raid/lvm/whatever spare disk	21:41
anticw	then from that host export that result via nfs	21:41
anticw	and use the nfs-provisioner	21:41
p__arch	physical hosts are in a hosting env. so I'd need to plan that	21:42
p__arch	but I should be able to mount the second drive on one of the (4) compute hosts and export that via nfs	21:42
anticw	that works	21:43
p__arch	okay. i'll give that a go. thanks for your help.	21:43
anticw	fwiw, i do something a bit like that here most of the time, that is have a storage class on k8s that's nfs ... i then tell osh to use that sc	21:44
anticw	it also lets me snapshot the pvc's should i need to	21:44
anticw	PVs	21:44
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: WIP/DNM: Gate fix https://review.openstack.org/573452	22:06
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: WIP/DNM: Gate fix https://review.openstack.org/573452	22:11
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: WIP/DNM: Gate fix https://review.openstack.org/573452	22:11
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: WIP/DNM: Gate fix https://review.openstack.org/573452	22:15
p__arch	anticw: as an alternative, would it makes sense for me to set up ceph on the underlying hosts to use the second physical disk on each compute node and then connect to that from inside the k8s VMs?	22:17
*** felipemonteiro_ has quit IRC		22:48
*** felipemonteiro has joined #openstack-helm		22:49
anticw	p__arch: in some sense that's what the ceph chart is doing ... it's just using containers to keep all the ceph stuffs together	23:13
anticw	if you can get ceph working robustly outside of k8s that's fine, but i'm not sure it's going to be much easier	23:13
*** felipemonteiro_ has joined #openstack-helm		23:21
openstackgerrit	Steve Wilkerson proposed openstack/openstack-helm-infra master: WIP/DNM: Gate fix https://review.openstack.org/573452	23:22
*** felipemonteiro has quit IRC		23:22
*** ericho has quit IRC		23:38
*** sthussey has quit IRC		23:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!