Tuesday, 2016-01-19

*** ChanServ changes topic to "+CVE-2016_0728."17:52
*** ChanServ changes topic to "CVE-2016-0728 http://www.openwall.com/lists/oss-security/2016/01/19/2"17:53
fungihere's trusty: http://www.ubuntu.com/usn/usn-2870-1/ (so we're looking for linux-image-3.13.0-76-generic_3.13.0-76.120)18:00
fungiprecise is unaffected since it's on too early of a kernel (the bug was introduced in 3.8, precise is 3.2)18:01
clarkbwe should be able to do a mass ansible apt-get update && apt-get install linux-image18:02
clarkbthen check the results and reboot18:02
fungiyep, and hopefully soonish if my sources.list change has propagated18:03
fungiwhich it doesn't seem to have done yet18:03
fungipuppetboard does not look so well18:04
clarkbfungi: its possible mordred's ansible work has broken something18:04
fungi111 population, 111 nodes unreported in the last 1.5 hours18:04
clarkbthere is also linux-image-virtual18:04
clarkbwe should double check on what our VMs actually use18:04
fungiahh, i was going by uname -a18:05
mordredo/18:05
mordredwhat?18:05
mordredclarkb: oh - in puppetboard?18:05
clarkbmordred: ya no rpeorts for an hour and a half18:05
mordredk. on it18:05
mordredo - yes - that's because of the two patches that are trying to land18:06
fungion a rackspace instance, which isn't entirely representative of the state of the world granted since we have pypi mirrors elsewhere18:06
mordredthat haven't landed due to node starvation18:06
mordredbut I just saw their last set of tests start running in zuul status18:06
fungimordred: well, also it's not puppeting. my 267778 change to manage sources.list merged an hour ago and hasn't propagated18:07
mordredright18:07
mordredthat's why they haven't reported18:07
mordredbecause they aren't puppeting18:07
mordredwill be fixed as soon as that patch lands18:07
fungiahh, okay, just confirming it's not only missing reports, but actually not running18:07
fungiokay, confirmed my sources.list change has propagated to review.o.o now18:37
clarkbfungi: you should be able to do something like sudo ansible all -m shell 'apt-get update && apt-get install $package'18:41
clarkbbut check my ansible :)18:42
fungiokay, just tested and confirmed that review and zuul (representatives of trusty and precise respectively) apt-get update successfully after the sources.list update got applied by puppet just now18:43
fungiclarkb: we need to break that up by release though, right? or just accept that it will fail on not-trusty?18:44
clarkbI was thinking accept it will fial or noop on not trusty18:44
clarkbif you write a proper playbook I think you cna restrict it to where the fact gathering reports trusty18:45
clarkbmight be able to do that on the command line too18:45
fungii have about 10 minutes to prep for the weekly meeting, so i'll pick this back up in about 70 minutes ;)18:48
clarkbhttp://docs.ansible.com/ansible/playbooks_conditionals.html#the-when-statement18:51
clarkbso we could do something like when os_distro_release == "trusty"18:51
clarkb(those values may not be what ansible uses)18:52
fungiokay, checking up on where we're at with this now21:03
fungiapt-cache show on review.o.o indicates we have a linux-generic 3.13.0.76.82 pending21:05
fungibut we apparently need 3.13.0-76.12021:06
fungii installed 3.13.0.76.82 on etherpad-dev just to check it out, and /usr/share/doc/linux-image-3.13.0-76-generic/changelog.Debian.gz indicates that it is indeed the update for CVE-2016-0728/LP: #1534887 "KEYS: Fix keyring ref leak in join_session_keyring()"22:13
openstackLaunchpad bug 1534887 in linux (Ubuntu Xenial) "CVE-2016-0728" [High,Incomplete] https://launchpad.net/bugs/153488722:13
fungithanks openstack22:13
fungi3.13.0-76.120 may be the corresponding source package version22:13
clarkbfungi: so thats just a package versioning confusion?22:13
fungihard to tell since the packages.u.c update pulse for today hasn't happened yet22:14
fungibut yes, looks that way22:14
fungioh!22:15
fungishould have done apt-cache show linux-image-3.13.0-76-generic not linux-generic22:16
fungithat does actually indicate that 3.13.0-76.120 is the current version in the package list22:17
clarkbif possible leave the nodepool restart to me and I can do it when I upgrade the service22:17
fungiso i think that means we're ready to start kernel upgrades22:17
fungiwell, we need to make sure the new packages get installed everywhere before we even start planning what order we reboot them in, i think22:17
clarkb++22:18
clarkbfungi: did you see my ansible link? something like that should work great for ansibling the update and install22:18
fungiyep, looks good22:19
fungialso we have 33 nodes running trusty according to http://puppetboard.openstack.org/fact/operatingsystemrelease/14.0422:19
fungii'll go ahead and give the package updates a shot22:20
clarkbok let me know if I can help with that22:20
fungialso need to circle back around and see if red hat has posted rhel 7 package details for this yet22:21
clarkbI can do that22:21
fungiand whether the equivalents have made it into centos22:21
fungii couldn't find any earlier, but that was hours ago so hopefully they've updated the bug now22:21
clarkbmy initial check is coming up empty22:22
clarkbI usually start at https://lists.centos.org/pipermail/centos-announce/2016-January/thread.html22:22
clarkbbut will check yum too just because22:22
fungimordred: if you have any input on ansibleisms for this, it's appreciated. i'm just going to start swinging wildly over here22:23
clarkbI appreciate that the package name is "kernel"22:23
fungidebian used to call it kernel-image until they started having hurd, kfreebsd, et cetera kernels in addition to linux22:23
fungithen it became a little confusing so they did a package name transition to linux-image22:24
clarkbwe are currently running a few versions behind, going t ocheck if the latest has our patch22:25
clarkbnope latest is from the 6th22:25
clarkbso I think we are still waiting for package update on centos22:25
fungiclarkb: from the example you linked, looks like we want when: ansible_distribution == "Ubuntu" and ansible_distribution_major_version == "14.04"22:26
clarkbfungi: that sounds about right22:26
clarkbfungi: then just run a shell task to do apt-get update && apt-get install $package or dist-upgrade22:26
clarkbor you can use the apt module and have it do those things22:26
clarkb(though I find it easier to just shell for one offs like this)22:27
fungido i create a temporary role file to put this stuff in, or can it be fed as cli arguments?22:27
clarkbI think you need a temp playbook22:27
clarkbit may be possible to feed it all as cli arguments but the docs tend to be lacking on how to do that22:28
clarkbso I just cave and make a yaml file so I cna follow docs22:28
clarkba general playbook for "forcefully update packages now" might make sense though22:28
clarkbthen we can just run it next time22:28
fungiNo manual entry for ansible-playbook; See 'man 7 undocumented' for help when manual pages are not available.22:29
* fungi shakes cane again22:29
clarkbfungi: because we pip install I think22:29
fungiand ansible-playbook --help doesn't work without sudo, wants to write to /var/log/ansible.log/var/log/ansible.log22:30
fungithat's sort of scary22:30
* fungi needs to stop thinking like a sysadmin and use web searches for stack sexchange articles from people he has no reason to trust22:30
clarkbthat may be a misconfiguration on our part22:30
clarkbfungi: I like to google man whatever22:30
clarkbgoogle is the best mandb22:31
clarkbya log_path=/var/log/ansible.log is set in /etc/ansible/ansible.cfg22:31
fungihowever, why does it even try to open its log when invoked with only --help? that's the part i found scary22:32
clarkbalso why is it appending /var/log/ansible to our path :)22:35
fungiclarkb: oh, that was my bouncing paste button. i need to fix that22:35
fungithe message just said IOError: [Errno 13] Permission denied: '/var/log/ansible.log'22:35
clarkbaha22:36
fungibut sometimes my middle button registers two clicks when i press once at the moment22:36
fungithe example i had used a -f 10 (from our workspace cleanup) so i was trying to figure out what that's doing, but found my answer in /usr/local/lib/python2.7/dist-packages/ansible/cli/playbook.py22:37
fungiuse the source, luke22:37
fungithought maybe it was a timeout or something, but no it's the short form of --forks22:37
fungiso parallelism count22:37
clarkbyup22:39
clarkbdefault is 5 I Think22:39
fungiokay, so i have a file in my homedir called upgrade_kernel.yaml with this content: http://paste.openstack.org/show/48434822:41
clarkblooking22:41
clarkbfungi: that looks about righ tthen you would do `ansible-playbook $pathtoyamlfile22:42
clarkbfungi: er you need a host spec first22:42
clarkbfungi: ansible-playbook $host yamlfile22:42
clarkbfungi: where all is an alias for all known hosts22:42
fungisudo ansible-playbook -f 10 all upgrade_kernel.yaml22:42
clarkbya22:42
fungitrying that now22:43
clarkbyou cna replace all with a specific fqdn if yo uwant to test first22:43
fungii'll do it under screen22:43
fungioh, good idea22:43
mordredheya - sorry - I was on the phone - do you still need help?22:46
clarkbmordred: we should know shortly :)22:46
fungilooks like ansible-playbook doesn't take a hostname parameter22:46
mordredsoooo22:46
fungiit wants ANSIBLE_HOSTS envvar passed?22:46
mordredthat's not a playbook you have there22:46
mordredone sec22:47
fungido i comma-separate multiples or...22:47
fungioh22:47
clarkboh can we ansible it instead22:47
fungidid i mention swinging wildly over here? ;)22:47
clarkb(I really hate the ansible terminology fwiw)22:47
clarkb(granted I am sure puppet is no better I have just gotten used to it22:47
clarkbfungi: you should just pass the name on the command line in place of all iirc22:48
mordredhttp://paste.openstack.org/show/484349/22:48
mordredfungi, clarkb: ^^ try tat22:48
clarkboh right this is the difference between playbook and the other command22:48
mordredansible-playbook name-of-yaml.yml22:48
clarkb(which is part of my confusion I think, and ansible should just rm one of them)22:48
fungimmm black magic22:48
mordreda playbook is a collection of plays - a play is the combination of host specifications with one or more tasks22:49
mordredso a playbook by design is a thing that associates tasks with where you want to run them22:49
fungiwhat's the strategy parameter?22:49
mordredactually - that's a waste, you can remove it - there is only one task here22:49
mordredbut in genreal it says "don't wait for other hosts to finish the task you're on before proceeding ot the next task"22:50
fungiahh, got it22:50
mordredwhich in a case like this, is correct - you do not care that all hosts finish task one then move on to task two22:50
fungiso i ansible-playbook this file still?22:50
fungiand pass an envvar to indicate which host i want to test it on?22:50
fungiat least that's what the manpage seems to imply22:51
mordredfungi: yes22:51
mordredno22:52
mordreddo --limit=$hostname22:52
mordredfungi: ansible-playboot --limit=review-dev.openstack.org that-file.yaml22:52
fungisudo ansible-playbook -f 10 --limit=ask-staging.openstack.org upgrade_kernel.yaml22:52
mordredyes22:52
fungithanks22:52
fungiit's not liking the hosts: * line22:53
clarkbfungi: use hosts all22:54
clarkbI think22:54
fungiyep, that seems to have worked22:54
mordredoops. sorry22:54
fungiwow, it claims to have run, but completed very quickly22:54
fungifar too quickly given that this should have taken a minute or so22:55
fungioh, it says TASK [command]... skipping: [ask-staging.openstack.org]22:55
clarkbfungi: the when may have failed22:55
mordredyes.22:56
mordredhttp://paste.openstack.org/show/484351/22:57
fungiindeed, it looks like it is actually running once i remove the when condition22:57
mordredfungi: ^^22:57
mordredthat way you can check to see what the values of the variables are there22:57
fungithanks22:58
mordredyou can also run "ansible ask-staging.openstack.org -m setup" and it will print all of the variables22:58
mordredfungi: are you screening?22:58
fungi"ansible_distribution_major_version": "14"22:58
fungimordred: yep22:59
mordredwell, there we go!22:59
mordred(that's probably good enough for matching for this)22:59
fungiso, er, i guess that's not the same as what facter does for major version'22:59
clarkbsince we never did unicorn22:59
fungiyeah, good enough22:59
fungi`sudo ssh ask-staging.openstack.org dpkg -l linux-image-3.13.0-76-generic` indicates it's installed now23:00
mordredwoot!23:01
fungiand rerunning the playbook with == "14" seems to not skip the host23:01
mordredthat's excellent23:01
fungiso i think we're set to open it wide?23:01
clarkbcool time to run it everywhere I think23:01
mordred++23:01
fungiso i expect etherpad-dev (which i tested manually earlier) and ask-staging to fail possibly as they've already got the new package23:02
fungialso possibly some hosts (review-dev and some of our pypi mirrors) with puppet disabled23:02
fungianyway, it's running now23:02
fungiseems to be working so far. i'll do some more spot checks after it finishes23:03
fungilooks like it's skipping the right hosts23:06
fungialso some errors which look like hosts where puppet has been disabled for a while so probably fon23:06
fungigah23:07
fungiprobably don't have my sources.list updates yet23:07
clarkbhrm, maybe we just manually update sources.list if list is small and rerun?23:08
clarkbor we fiure out if we can enable puppet again23:08
fungiyeah, for example i have no idea when/why it was disabled on review-dev23:10
clarkbI think that was part of the gerrit upgrade prep23:11
clarkbwe should get it puppeting again but that may be mor ethan a 5 minute change23:11
*** ianw has joined #openstack-infra-incident23:12
fungiit was reenabled at some point after the gerrit upgrade23:12
fungiand disabled again since23:12
fungimordred: what's the default timeout on ansible? it looks like it's just hanging for a few minutes now, and i suspect it's having trouble connecting to or hearing back from something23:14
mordredfungi: it's going to look like it's hanging if it's running something - output is buffered til the end of task execution23:14
mordredfungi: I have not noticed overly-long hang/timeouts when doing testing of puppet runs ...23:15
fungisome of these are probably just taking a while23:15
mordredyah23:15
Clinti think it's like 5 or 10 minutes23:15
mordredI wish the output was streamed - but output is a json blob, so I think streaming woudl be tricky23:15
fungilike update-initrd churning through dozens of old kernel packages we never autoremoved23:15
mordredfungi: oh. yeah. that'll take a minute23:16
clarkbdid the change to autoremove not merge23:16
clarkbmaybe I hsould dig that up23:16
fungiit's still sitting. no new output for about 10 minutes23:22
fungimordred: ps claims that my ansible-playbook call has 10 defunct zombie children23:29
fungino, sorry, i mis-counted. just 923:30
fungibut i worry they're never going to terminate and are actually indefinitely hung23:37
mordredhrm23:39
mordredfungi: well, worst-case with bailing is that you'll abort 10 apt-get install processes23:39
fungiany clue whether i should just shoot the parent in the head and try again? keyboard interrupt? sigkill? or are there more graceful options?23:39
mordredfungi: and have to apt-get -f install somewhere23:39
mordredfungi: I'd just ctrl-c if it were me23:39
fungii actually looked at ps hoping it would mention which hosts it was communicating with for the currently lingering forks, but no such luck23:40
mordredyou could maybe kill the forst23:40
mordredand see if you get good error from the parent23:40
mordredfungi: forks. not forst23:40
mordredfungi: don't kill the forst23:40
fungiyeah, i'll kill the immediate parent of the forks but not the parent's parent23:41
fungioh, i missed, it's right there in the ssh command-line23:42
fungi15.126.140.7 (which seems to have no reverse dns?)23:42
fungithat's pypi.region-b.geo-1.openstack.org23:42
mordredno reverse dns in hpcloud23:43
fungiright, which is why i checked that one first23:43
fungiand also because it looked like an hpcloud ip address23:43
mordredyay for 15.23:43
fungiso the recap says these failed: afs01.dfw, afs01.ord, afsdb01, afsdb02, odsreg-test-corvus, pypi.region-b.geo-1, review-dev, test-mordred-config-drive23:45
jeblairi think we can delete odsreg-test-corvus23:45
clarkbthose all sounds like VMs that may not have up to date sources.list23:45
fungier, test-mordred-config-drive was unreachable, not failed23:45
fungialso there's a ab78618b-a1f4-4d0a-8aeb-56f7b688bcf4 which was unreachable23:46
fungithe hostlist is dynamically generated from nova list such that i can just nova delete the trash instances, or does something else need updating in between?23:47
fungino idea what that uuid is, it's not in openstackci rax-dfw though23:47
funginevermind, i was looking in the openstack tenant not openstackci23:48
fungiit's the old release.slave. i'll clean that one up too23:48
fungiit was unreachable because it's in shutdown state since after i replaced it23:49
fungijeblair: i've deleted odsreg-test-corvus.openstack.org (from rax-ord) too, thanks23:50
fungishould we ignore the afs servers for now or reenable puppet on them or update them manually?23:51
fungii've already got review-dev sorted with zaro and reenabled23:51
jeblairfungi: you're saying the afs fileservers are disabled in puppet?23:58
jeblairfungi, mordred: i don't know why that would be23:58
fungijeblair: actually, they're not according to puppetboard23:59
mordredfungi: test-mordred-config-drive can die23:59
fungiso the update presumably failed on them for some other reason i'll debug in a bit23:59
mordredfungi: if you delete instances, you want to delete the inventory cache too23:59
fungimordred: thanks, doing23:59
mordredfungi: /var/cache/ansible-inventory/<tab>23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!