Tuesday, 2016-01-26

*** Zara is now known as Zara__10:31
jeblairinfra-root: ping15:53
*** yolanda has joined #openstack-infra-incident15:53
mordredjeblair: o/15:53
* anteaya stands by15:53
jeblairi'm moving discussion of filesystem corruption over here15:53
jeblairi don't quite understand what i'm seeing on nodepool yet15:54
yolandaelasticsearch nodes look fine15:54
jeblair[313589.385320] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)15:54
jeblairthere were lots of errors regarding loop devices15:54
jeblairso i assume something went wrong while dib was running...15:54
jeblairbut this is also in there:15:54
jeblair[318745.450555] EXT4-fs (xvde2): mounted filesystem with ordered data mode. Opts: (null)15:54
jeblairi don't think we're using xvde2 for anything15:55
mordredyah. I agree with that assumption15:55
jeblaircause we are using the cinder volume for /opt15:55
jeblairxvde2 is commented out in the fstab15:55
jeblairit does not show up in /proc/mounts15:56
jeblairi unmounted /opt (the cinder volume) and fsck'd; it seems fine15:58
jeblairso i think perhaps we did not actually see an error with the cinder volume here... just some other weirdness15:58
jeblairbut i'm still getting loop device errors15:58
jeblairlosetup /dev/loop0p116:01
jeblairloop: can't get info on device /dev/loop0p1: No such device or address16:01
jeblairlosetup /dev/loop016:01
jeblairloop: can't get info on device /dev/loop0: No such device or address16:01
jeblairi don't know what it thinks its attached to16:01
jeblairhrm... i wonder if we did see an interruption to the cinder volume, but since most of the work was happening through loopback devices on very large files on the volume, it was the filesystem in the file as exposed through the loopback device that failed, rather than the filesystem on the real volume16:02
jeblair(because we weren't performing ext4 filesystem operations on the volume's filesystem)16:03
yolandajeblair, have you tried dmsetup? i used to rely on that to clean stuff when i was testing dib and volumes16:03
jeblairyolanda: will that show info about loop devices?16:03
anteayacontext for other folks reading logs: https://status.rackspace.com/index/viewincidents?group=21&start=145378440016:04
jeblairdmsetup status only shows:16:04
jeblairmain-opt: 0 1073733632 linear16:04
jeblairi think something is messed up with the loopback module, and i'd like to reboot nodepool to try to correct it16:05
anteayaI support rebooting nodepool16:06
yolanda++16:06
jeblairokay, doing that now16:06
jeblairthe host is back up and everything looks okay with loop and /opt16:08
anteayawoooot16:09
jeblairclarkb: how did you start nodepool?16:09
jeblairi don't see "no-builder" in the init script16:09
jeblairah16:10
jeblairit's in /etc/default/nodepool16:10
anteayaso bit of lag from hearing from clarkb expected this week, yeah?16:10
jeblairso i will start it as normal16:10
jeblairanteaya: yes, but have to ask anyway -- he deployed some changes manually that haven't landed yet16:11
anteayaah understood16:11
anteayafigured you had remembered16:11
anteayawould they be in his user history?16:11
jeblairanteaya: but i believe they are documented in the pending change at https://review.openstack.org/271541  (which still needs an update before merging; though nobody wrote a comment on the change describing it)16:11
jeblairanteaya: i figured it out16:11
* anteaya nods16:11
jeblairnodepool is running, but i have not started the builder16:13
anteayaI understand better, thanks for sharing the link to that patch16:13
jeblairi'm going to see if it looks like there's some un-cleaned-up dib stuff laying around16:13
jeblairgreghaynes: ^16:13
jeblairgreghaynes: you may be interested in my theories about filesystem/device corruption in scrollback16:14
jeblairi'm removing some stuff from dib_tmp16:14
jeblairi'll start the builder now16:16
jeblairit's running and idle; i think that's everything for now then16:16
anteayayay16:17
mordredjeblair: back online - looks like we're back up and going now?17:21
jeblairmordred: yeah17:25
*** AJaeger has joined #openstack-infra-incident18:50
*** AJaeger has left #openstack-infra-incident21:00

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!