Wednesday, 2016-08-17

*** jeblair has joined #openstack-infra-incident01:47
*** anteaya has joined #openstack-infra-incident01:47
fungiquieter here01:47
*** ChanServ changes topic to "wiki compromise"01:48
anteayahttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=355&rra_id=all01:48
fungiso anyway, in combining jeblair's and my theories, best guess is that i missed that the trusty upgrade moved our v4 rules file incompatible wit the way it was previously symlinked, so a reboot ended with no v4 rules loaded exposing the elasticsearch api to the internet01:50
jeblairfungi: bup claims to have daily backups up to and including 8-16.  they appear at 05:3701:51
jeblairthey are append-only backups and can therefore be used for forensics as well as recovery01:52
jeblairhaving said that, i don't think we've done a restore test on wiki01:52
jeblairso i don't beleive anyone has verified whether there's anything actually in the backup.01:53
fungiwith file uploads disabled, we're probably fine rolling back to the upgrade and using the current db from trove01:53
fungii'll see if the snapshot i saved comes up in a sane state01:53
jeblairfungi: aren't there db migrations?01:53
fungiyes, i mean reupgrade the snapshot and use the migrated db01:53
jeblairfungi: okay, so create new server from snapshot, upgrade mediawiki, check firewall, then online?01:54
fungiright01:54
fungino idea how long it'll take this to build from the snapshot though01:54
fungithough actually, we _do_ have one already booted and configured01:56
fungiwith the db in trove, wiki-upgrade-test.openstack.org can simply have apache reenabled and started, and dns pointed at it01:57
anteayathat sounds faster01:57
jeblairfungi: was it similarly exposed?01:58
jeblair(i worry about whether we can trust it)01:58
anteayaI'm grateful for what you are worried about01:59
jeblairi worry a lot so i hope your gratitude is boundless :)01:59
fungijeblair: not so far as i could tell. after the upgrade puppet ended up blocking everything because the server wasn't in system-config at all so ti got our default ssh-only rules01:59
anteayajeblair: my gratitute for your worry is boundless :)02:00
fungiwe didn't turn off puppet for wiki-upgrade-test until after the upgrade, and then i had to manually copy the iptables rules back over from wiki.o.o02:00
jeblair(i also worry a little about whether someone could have elevated wiki permissions by modifying the database, though i confess that might be somewhat low risk)02:00
anteayaI think that is a fair worry02:01
anteayacan we check if anyone was granted admin wiki status since friday?02:01
anteayafrom what I'm guessing cacti is telling me whoever was doing what they were doing were doing it between 13:00 and 19:00 today02:02
jeblairfungi: there must have been days when the clone and the real server were both online?02:03
fungiyeah, it's possible, though it's also an unlikely target. once most scripted compromises get access to a shell their main interest is in using the server to run other stuff. it's possible we got compromised by someone who knew it was running mediawiki and wanted to leverage it to leave some sort of backdoor in the db, but it's on the unlikely end of things02:03
fungijeblair: by clone you mean wiki-upgrade-test.o.o?02:03
fungionly one was pointed at the trove db at a time02:03
jeblairfungi: i'm trying to figure out how this affects backups02:03
fungioh...02:04
anteayaoh you are talking about server permissions, not wiki permissions, sorry02:04
jeblairi imagine if we had to copies of a server online at a time, they would both run backups.  i only see one backup per day, so it might be that bup only allowed the first one through, and which one that was was random...02:04
jeblairanteaya: well, could be either02:05
fungiyeah, both of them have the same backup cronjob02:05
anteayaah so we might not be able to trust the backups :(02:05
jeblairanteaya: i'd rather say that they may not contain the data we expect02:06
fungiit hadn't dawned on me that the backups are initiated from the servers being backed up, so clones are certainly a danger for servers with backup configuration in this case02:06
anteayajeblair: okay a better way of putting it02:06
anteayaso is fungi's snapshot the most recent backup that would contain the data we expect?02:06
anteayathat we have confidence in?02:07
jeblairanteaya: with a key of when the clones were active, we can figure the last date that was certainly the old server.  backups after that could be one or the other, and it may be possible to determine which by clues on their filesystems.02:07
jeblairalso, we need to rotate the backup key for wiki.o.o after this, because *it* has been compromised02:08
anteayaack02:08
jeblair(the append-only nature of the backups makes them still reliable though [modulo the clone issue])02:09
jeblairi have moved the authorized_keys file for wiki on the backup server out of the way so that key can no longer be used to access the server02:10
fungiyeah, the wiki-upgrade-test clone was booted a few weeks ago, so i wouldn't trust it to necessarily be production data past july 21 (that's what rackspace says is when i created it)02:10
jeblairso we won't get a backup again until we fix that (by creating a new key, ideally)02:11
jeblairokay, well if we need the backups, we can poke at that, but i'm not going to restore 26 copies of the wiki server out of curiosity.  :)02:12
anteayaha ha ha02:12
jeblairthat would probably take 26 days.02:12
anteayaI better things planned for the next 26 days02:13
jeblairdinner is here, i have to run02:13
anteayaenjoy dinner02:13
anteayathank you02:13
fungithanks jeblair!02:13
anteayafungi: happy to help or listen02:14
anteayaor help by listening02:15
fungii'm moving forward pointing wiki.o.o dns at wiki-upgrade-test.o.o and taking the compromised wiki.o.o offline02:15
anteayaack02:15
fungiteh boot from snapshot is still spinning at "10%" so no idea how long that'll take02:16
anteayawoooo02:16
anteayathe watching paint dry routine02:16
anteayayour wife loves this one02:16
fungiokay, dns has propagated, configs have been adjusted on wiki-upgrade-test for the correct vhost name, and i've confirmed the firewall rules there are and were still sane02:34
*** ChanServ changes topic to "situation normal"02:59
*** openstack has joined #openstack-infra-incident10:16
*** pabelanger has quit IRC16:54
*** pabelanger has joined #openstack-infra-incident16:54
-openstackstatus- NOTICE: The volume for logs.openstack.org filled up rather suddenly, causing a number of jobs to fail with a POST_FAILURE result and no logs; we're manually expiring some logs now to buy breathing room, but any changes which hit that in the past few minutes will need to be rechecked and/or approved again19:45

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!