Friday, 2016-05-27

pabelanger#status log logstash-worker17.openstack.org now running ubuntu-trusty and processing requests00:01
clarkbok I am out now, will try to sort out why puppet ansible things are broken in the morning00:01
openstackstatuspabelanger: finished logging00:03
pabelangerwow, that was slow00:03
pabelangerclarkb: Oh, I know the issue00:05
pabelangerclarkb: it is because puppetmaster.o.o is in emergency file00:05
clarkboh?00:05
clarkbahahahahahaha00:05
clarkbRIP00:05
pabelangerYup00:05
pabelangerso, we can remove it00:05
pabelangersince osic is backonline00:05
pabelangerjeblair: you okay with us removeing puppetmaster.o.o from emergency?00:06
pabelangersince osic reverted their SSL cert00:06
clarkbmakes sense to me00:09
pabelangerokay, removing00:09
pabelanger#status log puppetmaster.o.o remove from emergency file since OSIC is now back online00:10
openstackstatuspabelanger: finished logging00:10
pabelanger#status log logstash-worker18.openstack.org now running ubuntu-trusty and processing requests00:18
openstackstatuspabelanger: finished logging00:18
*** baoli has quit IRC00:22
*** baoli has joined #openstack-sprint00:23
pabelangerclarkb: booyah: http://logstash.openstack.org/#/dashboard00:25
pabelanger#status log logstash-worker19.openstack.org now running ubuntu-trusty and processing requests00:32
openstackstatuspabelanger: finished logging00:33
pabelanger#status log logstash-worker20.openstack.org now running ubuntu-trusty and processing requests00:48
openstackstatuspabelanger: finished logging00:48
pabelanger\o/00:48
pabelangervery happy how well that went00:48
*** baoli has quit IRC01:29
fungiexcellent job!01:30
*** baoli has joined #openstack-sprint01:31
jheskethI'm going to take the old apps.openstack.org offline after snapshotting fyi02:43
*** rfolco has quit IRC02:56
jheskethdeleted03:02
-openstackstatus- NOTICE: Gerrit is going offline briefly to check possible filesystem corruption03:02
*** ChanServ changes topic to "Gerrit is going offline briefly to check possible filesystem corruption"03:02
anteayajhesketh: thank you03:07
*** anteaya has quit IRC03:08
*** ChanServ changes topic to "Taking Infra servers running Precise and upgrading them to Trusty | https://wiki.openstack.org/wiki/VirtualSprints#Infra_Trusty_Upgrade"03:22
-openstackstatus- NOTICE: after a quick check, gerrit and its filesystem have been brought back online and should be working again03:22
*** baoli has quit IRC03:44
*** baoli has joined #openstack-sprint03:45
*** baoli has quit IRC03:50
*** baoli has joined #openstack-sprint03:51
*** yuikotakadamori has joined #openstack-sprint03:51
*** baoli has quit IRC05:29
-openstackstatus- NOTICE: zuul required a restart due to network outages. If your change is not listed on http://status.openstack.org/zuul/ and is missing results, please issue a 'recheck'.07:14
*** yuikotakadamori has quit IRC10:01
*** rfolco has joined #openstack-sprint11:18
*** yolanda has quit IRC12:04
*** yolanda has joined #openstack-sprint12:06
*** yolanda has quit IRC12:41
*** baoli has joined #openstack-sprint12:54
*** baoli_ has joined #openstack-sprint12:56
*** baoli has quit IRC12:59
*** yolanda has joined #openstack-sprint13:20
pabelangernow that yak shaving is out of the way :)14:00
pabelangergoing to prep eavesdrop.o.o for launch14:00
pabelangershould be able to start at 1600UTC14:00
pabelangerclarkb: going to start looking at ES on ubuntu-trusty14:13
pabelangersetting cluster.routing.allocation.enable to none14:14
clarkbok, I am not fully here yet but its basi ally the upgradr process14:14
clarkbpabelanger: tgose instances do use cinder volumes for the es data14:14
pabelangerclarkb: Ya, that is what I am looking at now14:15
pabelangerI've stopped elasticsearch on ES0214:24
fungireminder: double-check any servers you've replaced to make sure you remembered to add reverse dns for both ipv4 and ipv6 addresses on each14:25
pabelanger++14:26
fungi(in other words, don't be a chump like me!)14:26
pabelangerokay, dropping dns ttl on elasticsearch hosts to 5mins14:27
fungithe paste-precise-backup snapshot completed overnight, so i'm going to delete the old halted instance now14:29
pabelangersorry, I've stopped elasticsearch on ES0714:29
pabelangernot ES0214:29
fungi2 and 7 look a lot alike14:30
pabelanger /facepalm14:30
pabelangerno, I stopped ES0214:30
pabelangerI'm using ES07 as my SSH tunnel14:30
pabelangerokay, need to step away for 5mins to let my brain recover14:30
pabelangerand fetch some coffee14:30
fungijust soak the brain in caffeine14:31
pabelangercould use some help landing https://review.openstack.org/#/c/320642/ for elasticsearch migrations14:43
clarkbapproved14:45
pabelangerdanke14:45
pabelangerThink I am going to put ES02 into shutdown, so I don't run into the detach issue again14:46
pabelangerfungi: safe to start work on ES02?  Want to make sure you were able to check the volume before I shutdown14:59
pabelangerA quick poke on the server didn't show any errors14:59
fungipabelanger: if the volume is still mounted read/write and dmesg -T doesn't show any filesystem/block device errors from overnight, go ahead with it15:01
fungii haven't gotten that far down the list yet15:01
pabelangerfungi: Yup, last logs are from Apr 415:01
pabelangerokay, placing ES02 into shutdown15:02
pabelangerHmm, looks like we're hitting out quote issue15:04
pabelangerneed to launch 60 GB Performance (performance2-60)15:04
pabelangerclarkb: fungi: Are we okay with deleting each elasticsearch host first to recover quota? Then standing up the replacement server15:07
fungipabelanger: that may be the only way to go about it15:07
clarkbpabelanger: yes I think its our only sane option15:07
pabelangerokay, let me do that15:08
fungiunless there are still some instances we need to clean up from other replacements15:08
pabelangerI've already detached the volume15:08
fungii think i've deleted all the old instances i've replaced so far though15:08
pabelangerSame15:08
pabelangerokay, going to delete elasticsearch02.o.o15:08
clarkbI have one 2gb insyance that needs deletion but thats not enough for an es host15:10
clarkbjust make sure the replacement is big like the original. Java and ES use all the memory15:14
pabelangerack15:27
pabelangerelasticsearch02.o.o online, I've enabled shard allocation again15:37
clarkbpabelanger: were you able to have the launch machinery attach exksting volume or did you do that by hand?15:39
pabelangerclarkb: I did it by hand this time15:39
pabelangerI can try using launch-node.py for the next one15:39
pabelangerI found 1 issue15:40
pabelangerafter I mounted the cinder volume, I had to chmod -R elasticsearch: /var/lib/elasticsearch because the the uid was not correct15:40
pabelangergoing to see if we can have puppet manage that15:41
clarkbah ya likely due to how we reserve a chunk of uids now15:41
pabelangerrunning daughter down to school. Waiting for cluster to go green ATM15:53
pabelanger#status log elasticsearch02.o.o upgraded to ubuntu-trusty and cluster is green16:20
openstackstatuspabelanger: finished logging16:20
pabelangerokay, moving on to ES0316:24
clarkbthe others shouldn't require an apache restart for the proxy since we only proxy to 02 (that needs fixing but has been low priority)16:24
pabelangeragreed16:25
pabelangerclarkb: So, just confirming, I need to pass --volume and --mount-path to launch-node.py it seems16:27
clarkbpabelanger: ya, I need to double check that the script sdo the right thing if the volume already has an fs16:28
pabelangergood call16:28
clarkblooks like they won't, if an fs already exists they noop16:29
clarkbso thats not super useful16:29
pabelangerokay16:29
pabelangerI'll attach by hand again16:30
pabelangeranybody want to review: https://review.openstack.org/#/c/322242/16:30
clarkbso the scripts work if you attach a brand new volume but not for migrating volumes between instances, I can look at addressing that16:31
pabelangerokay16:31
clarkbpabelanger: do you know if puppet will touch the file mode on that dir and its children?16:32
pabelangerclarkb: I believe just that directory16:32
clarkbit uses the file mode of the file source if you don't explicitly add one to the file resource, not sure what it does with dirs on recurse16:32
pabelangerbut, I can update it to present to be safe16:32
pabelangerTIL16:34
pabelangerhttps://docs.puppet.com/puppet/latest/reference/type.html#file-attribute-recurse16:34
pabelangerwe need ensure => directory16:34
clarkbpabelanger: ya ensure => directory is ok I think, I just don't want to to change the file modes on all those files due to some default behavior16:38
clarkb(which it does do on proper files without modes set)16:38
pabelangerOh right, I misread what you were asking16:38
pabelangerYa, that's the reason I left mode off16:39
pabelanger#status log elasticsearch03.o.o upgraded to ubuntu-trusty and cluster is green16:43
openstackstatuspabelanger: finished logging16:43
clarkbpabelanger: so puppet will not touch file modes in this case?17:00
pabelangerclarkb: right, it will just noop on them17:00
pabelangerthat's how I've always understood it17:01
clarkbok +217:01
pabelangerclarkb: So, I have a shard that is still unassigned. Is there any way to kick it to a host?17:07
pabelanger"reason": "NODE_LEFT",17:07
clarkbpabelanger: no, is it the only outstanding shard or are others recovering?17:07
clarkbES will only process a small number at a time (2 I think)17:08
pabelangerclarkb: only oustanding17:08
pabelangerothers have settled17:08
clarkblet me get a proxy running an dwill look17:08
pabelangeractually,17:08
pabelangerthere is 2 purple ATM17:08
clarkbya it may be rebalancing before assigning that one shard17:09
pabelangerokay17:09
clarkblets wait until those purple ones are done17:09
pabelangerI'll hold off on moving to ES05 until it gets assigned17:09
clarkbpabelanger: that unassigned shard does have its master copy on es07 so as long as yo udon't turn off es07 before it gets its replica up we should be ok17:10
clarkbbut would be nice to see it recover it17:10
pabelangersure, I don't mind waiting for a few minutes17:10
pabelangerthe upgrades are going smooth17:10
clarkbyay17:11
clarkbif we turn off es07 in this state the cluster will go read beacuse one shard is completely not available, it should go back to yellow once es07 is back up again though17:12
pabelangerclarkb: woot17:19
pabelangerhere we go17:19
pabelangermoved to recovering17:19
pabelanger#status log elasticsearch04.o.o upgraded to ubuntu-trusty and cluster is green17:24
openstackstatuspabelanger: finished logging17:24
pabelanger#status log elasticsearch05.o.o upgraded to ubuntu-trusty and cluster is green17:47
openstackstatuspabelanger: finished logging17:47
pabelanger#status log elasticsearch06.o.o upgraded to ubuntu-trusty and cluster is green18:51
openstackstatuspabelanger: finished logging18:51
clarkbpabelanger: \o/ just one more to go?18:52
pabelangerclarkb: indeed!18:52
clarkbstill waiting for puppet to update logtash.o.o18:52
pabelangerneed to pick up my daughter in 5mins, but should be able to hammer out ES07 once I get back18:53
clarkbwoot18:53
clarkb#status log logstash.openstack.org upgraded to ubuntu trusty18:59
openstackstatusclarkb: finished logging18:59
clarkbthat took entirely too much time but it is done now :)19:02
clarkblooks like lists, zuul, static, wiki, and planet are the remaining "hard" upgrades19:11
clarkband eavesdrop, puppetdb, and es07 are the remainder that are possible today (maybe)19:12
pabelanger#status log elasticsearch07.o.o upgraded to ubuntu-trusty and cluster is green19:33
openstackstatuspabelanger: finished logging19:33
pabelanger\o/19:33
pabelangerclarkb: Ya, we likely need to schedule lists / zuul / static.  wiki is last on my list, I haven't looked at planet19:34
pabelangergoing to look at eavesdrop.o.o now19:35
clarkbpleia2 was looking at planet and apparently trusty's version of the software is broken19:35
clarkbpleia2 mentioned possibly doing the jump straight to xenial19:35
pabelangerAh19:38
pabelangerI can poke at the puppet manifests next week, see what would be needed19:39
pabelangerI suspect we'd want to move to puppetlabs-apache to be safe19:39
pabelangerokay, shutting down eavesdrop.o.o to detach volume19:42
*** openstack has joined #openstack-sprint21:44
pabelangerokay, I have no idea how irc-meetings are updated on eavesdrop.o.o21:48
pabelangeralso, spacex sticks it again21:49
pabelanger\o/21:49
pabelangerI can only think some crontab that was manually installed21:49
clarkbpabelanger: I think it may be a jenkins job21:50
pabelangerOh21:50
pabelangerthat sounds right21:50
pabelangerguess we need to trigger a job21:51

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!