Friday, 2016-05-27

pabelanger	#status log logstash-worker17.openstack.org now running ubuntu-trusty and processing requests	00:01
clarkb	ok I am out now, will try to sort out why puppet ansible things are broken in the morning	00:01
openstackstatus	pabelanger: finished logging	00:03
pabelanger	wow, that was slow	00:03
pabelanger	clarkb: Oh, I know the issue	00:05
pabelanger	clarkb: it is because puppetmaster.o.o is in emergency file	00:05
clarkb	oh?	00:05
clarkb	ahahahahahaha	00:05
clarkb	RIP	00:05
pabelanger	Yup	00:05
pabelanger	so, we can remove it	00:05
pabelanger	since osic is backonline	00:05
pabelanger	jeblair: you okay with us removeing puppetmaster.o.o from emergency?	00:06
pabelanger	since osic reverted their SSL cert	00:06
clarkb	makes sense to me	00:09
pabelanger	okay, removing	00:09
pabelanger	#status log puppetmaster.o.o remove from emergency file since OSIC is now back online	00:10
openstackstatus	pabelanger: finished logging	00:10
pabelanger	#status log logstash-worker18.openstack.org now running ubuntu-trusty and processing requests	00:18
openstackstatus	pabelanger: finished logging	00:18
*** baoli has quit IRC		00:22
*** baoli has joined #openstack-sprint		00:23
pabelanger	clarkb: booyah: http://logstash.openstack.org/#/dashboard	00:25
pabelanger	#status log logstash-worker19.openstack.org now running ubuntu-trusty and processing requests	00:32
openstackstatus	pabelanger: finished logging	00:33
pabelanger	#status log logstash-worker20.openstack.org now running ubuntu-trusty and processing requests	00:48
openstackstatus	pabelanger: finished logging	00:48
pabelanger	\o/	00:48
pabelanger	very happy how well that went	00:48
*** baoli has quit IRC		01:29
fungi	excellent job!	01:30
*** baoli has joined #openstack-sprint		01:31
jhesketh	I'm going to take the old apps.openstack.org offline after snapshotting fyi	02:43
*** rfolco has quit IRC		02:56
jhesketh	deleted	03:02
-openstackstatus- NOTICE: Gerrit is going offline briefly to check possible filesystem corruption		03:02
*** ChanServ changes topic to "Gerrit is going offline briefly to check possible filesystem corruption"		03:02
anteaya	jhesketh: thank you	03:07
*** anteaya has quit IRC		03:08
*** ChanServ changes topic to "Taking Infra servers running Precise and upgrading them to Trusty \| https://wiki.openstack.org/wiki/VirtualSprints#Infra_Trusty_Upgrade"		03:22
-openstackstatus- NOTICE: after a quick check, gerrit and its filesystem have been brought back online and should be working again		03:22
*** baoli has quit IRC		03:44
*** baoli has joined #openstack-sprint		03:45
*** baoli has quit IRC		03:50
*** baoli has joined #openstack-sprint		03:51
*** yuikotakadamori has joined #openstack-sprint		03:51
*** baoli has quit IRC		05:29
-openstackstatus- NOTICE: zuul required a restart due to network outages. If your change is not listed on http://status.openstack.org/zuul/ and is missing results, please issue a 'recheck'.		07:14
*** yuikotakadamori has quit IRC		10:01
*** rfolco has joined #openstack-sprint		11:18
*** yolanda has quit IRC		12:04
*** yolanda has joined #openstack-sprint		12:06
*** yolanda has quit IRC		12:41
*** baoli has joined #openstack-sprint		12:54
*** baoli_ has joined #openstack-sprint		12:56
*** baoli has quit IRC		12:59
*** yolanda has joined #openstack-sprint		13:20
pabelanger	now that yak shaving is out of the way :)	14:00
pabelanger	going to prep eavesdrop.o.o for launch	14:00
pabelanger	should be able to start at 1600UTC	14:00
pabelanger	clarkb: going to start looking at ES on ubuntu-trusty	14:13
pabelanger	setting cluster.routing.allocation.enable to none	14:14
clarkb	ok, I am not fully here yet but its basi ally the upgradr process	14:14
clarkb	pabelanger: tgose instances do use cinder volumes for the es data	14:14
pabelanger	clarkb: Ya, that is what I am looking at now	14:15
pabelanger	I've stopped elasticsearch on ES02	14:24
fungi	reminder: double-check any servers you've replaced to make sure you remembered to add reverse dns for both ipv4 and ipv6 addresses on each	14:25
pabelanger	++	14:26
fungi	(in other words, don't be a chump like me!)	14:26
pabelanger	okay, dropping dns ttl on elasticsearch hosts to 5mins	14:27
fungi	the paste-precise-backup snapshot completed overnight, so i'm going to delete the old halted instance now	14:29
pabelanger	sorry, I've stopped elasticsearch on ES07	14:29
pabelanger	not ES02	14:29
fungi	2 and 7 look a lot alike	14:30
pabelanger	/facepalm	14:30
pabelanger	no, I stopped ES02	14:30
pabelanger	I'm using ES07 as my SSH tunnel	14:30
pabelanger	okay, need to step away for 5mins to let my brain recover	14:30
pabelanger	and fetch some coffee	14:30
fungi	just soak the brain in caffeine	14:31
pabelanger	could use some help landing https://review.openstack.org/#/c/320642/ for elasticsearch migrations	14:43
clarkb	approved	14:45
pabelanger	danke	14:45
pabelanger	Think I am going to put ES02 into shutdown, so I don't run into the detach issue again	14:46
pabelanger	fungi: safe to start work on ES02? Want to make sure you were able to check the volume before I shutdown	14:59
pabelanger	A quick poke on the server didn't show any errors	14:59
fungi	pabelanger: if the volume is still mounted read/write and dmesg -T doesn't show any filesystem/block device errors from overnight, go ahead with it	15:01
fungi	i haven't gotten that far down the list yet	15:01
pabelanger	fungi: Yup, last logs are from Apr 4	15:01
pabelanger	okay, placing ES02 into shutdown	15:02
pabelanger	Hmm, looks like we're hitting out quote issue	15:04
pabelanger	need to launch 60 GB Performance (performance2-60)	15:04
pabelanger	clarkb: fungi: Are we okay with deleting each elasticsearch host first to recover quota? Then standing up the replacement server	15:07
fungi	pabelanger: that may be the only way to go about it	15:07
clarkb	pabelanger: yes I think its our only sane option	15:07
pabelanger	okay, let me do that	15:08
fungi	unless there are still some instances we need to clean up from other replacements	15:08
pabelanger	I've already detached the volume	15:08
fungi	i think i've deleted all the old instances i've replaced so far though	15:08
pabelanger	Same	15:08
pabelanger	okay, going to delete elasticsearch02.o.o	15:08
clarkb	I have one 2gb insyance that needs deletion but thats not enough for an es host	15:10
clarkb	just make sure the replacement is big like the original. Java and ES use all the memory	15:14
pabelanger	ack	15:27
pabelanger	elasticsearch02.o.o online, I've enabled shard allocation again	15:37
clarkb	pabelanger: were you able to have the launch machinery attach exksting volume or did you do that by hand?	15:39
pabelanger	clarkb: I did it by hand this time	15:39
pabelanger	I can try using launch-node.py for the next one	15:39
pabelanger	I found 1 issue	15:40
pabelanger	after I mounted the cinder volume, I had to chmod -R elasticsearch: /var/lib/elasticsearch because the the uid was not correct	15:40
pabelanger	going to see if we can have puppet manage that	15:41
clarkb	ah ya likely due to how we reserve a chunk of uids now	15:41
pabelanger	running daughter down to school. Waiting for cluster to go green ATM	15:53
pabelanger	#status log elasticsearch02.o.o upgraded to ubuntu-trusty and cluster is green	16:20
openstackstatus	pabelanger: finished logging	16:20
pabelanger	okay, moving on to ES03	16:24
clarkb	the others shouldn't require an apache restart for the proxy since we only proxy to 02 (that needs fixing but has been low priority)	16:24
pabelanger	agreed	16:25
pabelanger	clarkb: So, just confirming, I need to pass --volume and --mount-path to launch-node.py it seems	16:27
clarkb	pabelanger: ya, I need to double check that the script sdo the right thing if the volume already has an fs	16:28
pabelanger	good call	16:28
clarkb	looks like they won't, if an fs already exists they noop	16:29
clarkb	so thats not super useful	16:29
pabelanger	okay	16:29
pabelanger	I'll attach by hand again	16:30
pabelanger	anybody want to review: https://review.openstack.org/#/c/322242/	16:30
clarkb	so the scripts work if you attach a brand new volume but not for migrating volumes between instances, I can look at addressing that	16:31
pabelanger	okay	16:31
clarkb	pabelanger: do you know if puppet will touch the file mode on that dir and its children?	16:32
pabelanger	clarkb: I believe just that directory	16:32
clarkb	it uses the file mode of the file source if you don't explicitly add one to the file resource, not sure what it does with dirs on recurse	16:32
pabelanger	but, I can update it to present to be safe	16:32
pabelanger	TIL	16:34
pabelanger	https://docs.puppet.com/puppet/latest/reference/type.html#file-attribute-recurse	16:34
pabelanger	we need ensure => directory	16:34
clarkb	pabelanger: ya ensure => directory is ok I think, I just don't want to to change the file modes on all those files due to some default behavior	16:38
clarkb	(which it does do on proper files without modes set)	16:38
pabelanger	Oh right, I misread what you were asking	16:38
pabelanger	Ya, that's the reason I left mode off	16:39
pabelanger	#status log elasticsearch03.o.o upgraded to ubuntu-trusty and cluster is green	16:43
openstackstatus	pabelanger: finished logging	16:43
clarkb	pabelanger: so puppet will not touch file modes in this case?	17:00
pabelanger	clarkb: right, it will just noop on them	17:00
pabelanger	that's how I've always understood it	17:01
clarkb	ok +2	17:01
pabelanger	clarkb: So, I have a shard that is still unassigned. Is there any way to kick it to a host?	17:07
pabelanger	"reason": "NODE_LEFT",	17:07
clarkb	pabelanger: no, is it the only outstanding shard or are others recovering?	17:07
clarkb	ES will only process a small number at a time (2 I think)	17:08
pabelanger	clarkb: only oustanding	17:08
pabelanger	others have settled	17:08
clarkb	let me get a proxy running an dwill look	17:08
pabelanger	actually,	17:08
pabelanger	there is 2 purple ATM	17:08
clarkb	ya it may be rebalancing before assigning that one shard	17:09
pabelanger	okay	17:09
clarkb	lets wait until those purple ones are done	17:09
pabelanger	I'll hold off on moving to ES05 until it gets assigned	17:09
clarkb	pabelanger: that unassigned shard does have its master copy on es07 so as long as yo udon't turn off es07 before it gets its replica up we should be ok	17:10
clarkb	but would be nice to see it recover it	17:10
pabelanger	sure, I don't mind waiting for a few minutes	17:10
pabelanger	the upgrades are going smooth	17:10
clarkb	yay	17:11
clarkb	if we turn off es07 in this state the cluster will go read beacuse one shard is completely not available, it should go back to yellow once es07 is back up again though	17:12
pabelanger	clarkb: woot	17:19
pabelanger	here we go	17:19
pabelanger	moved to recovering	17:19
pabelanger	#status log elasticsearch04.o.o upgraded to ubuntu-trusty and cluster is green	17:24
openstackstatus	pabelanger: finished logging	17:24
pabelanger	#status log elasticsearch05.o.o upgraded to ubuntu-trusty and cluster is green	17:47
openstackstatus	pabelanger: finished logging	17:47
pabelanger	#status log elasticsearch06.o.o upgraded to ubuntu-trusty and cluster is green	18:51
openstackstatus	pabelanger: finished logging	18:51
clarkb	pabelanger: \o/ just one more to go?	18:52
pabelanger	clarkb: indeed!	18:52
clarkb	still waiting for puppet to update logtash.o.o	18:52
pabelanger	need to pick up my daughter in 5mins, but should be able to hammer out ES07 once I get back	18:53
clarkb	woot	18:53
clarkb	#status log logstash.openstack.org upgraded to ubuntu trusty	18:59
openstackstatus	clarkb: finished logging	18:59
clarkb	that took entirely too much time but it is done now :)	19:02
clarkb	looks like lists, zuul, static, wiki, and planet are the remaining "hard" upgrades	19:11
clarkb	and eavesdrop, puppetdb, and es07 are the remainder that are possible today (maybe)	19:12
pabelanger	#status log elasticsearch07.o.o upgraded to ubuntu-trusty and cluster is green	19:33
openstackstatus	pabelanger: finished logging	19:33
pabelanger	\o/	19:33
pabelanger	clarkb: Ya, we likely need to schedule lists / zuul / static. wiki is last on my list, I haven't looked at planet	19:34
pabelanger	going to look at eavesdrop.o.o now	19:35
clarkb	pleia2 was looking at planet and apparently trusty's version of the software is broken	19:35
clarkb	pleia2 mentioned possibly doing the jump straight to xenial	19:35
pabelanger	Ah	19:38
pabelanger	I can poke at the puppet manifests next week, see what would be needed	19:39
pabelanger	I suspect we'd want to move to puppetlabs-apache to be safe	19:39
pabelanger	okay, shutting down eavesdrop.o.o to detach volume	19:42
*** openstack has joined #openstack-sprint		21:44
pabelanger	okay, I have no idea how irc-meetings are updated on eavesdrop.o.o	21:48
pabelanger	also, spacex sticks it again	21:49
pabelanger	\o/	21:49
pabelanger	I can only think some crontab that was manually installed	21:49
clarkb	pabelanger: I think it may be a jenkins job	21:50
pabelanger	Oh	21:50
pabelanger	that sounds right	21:50
pabelanger	guess we need to trigger a job	21:51

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!