Monday, 2018-04-09

*** AJaeger has quit IRC06:10
*** AJaeger has joined #openstack-infra-incident06:31
*** rosmaita has joined #openstack-infra-incident11:15
*** rlandy has joined #openstack-infra-incident12:14
dmsimardohai, etherpad.openstack.org is running Ubuntu 14.04 with an outdated version of etherpad. There was a release today which fixes arbitrary code execution: http://blog.etherpad.org/2018/04/07/important-release-1-6-4/13:53
dmsimardThe file timestamps are up to date in /opt/etherpad but the git log dates back to 2015 (as do some of the settings and configuration files)13:54
dmsimardI'll try and see if I can get 1.6.4 to work off of a new 16.04 VM13:55
pabelangerhttp://git.openstack.org/cgit/openstack-infra/puppet-etherpad_lite/tree/manifests/init.pp has the ability to run develop version, but unsure why we are pinned to cc9f88e7ed4858b72feb64c99beb3e13445ab6d913:57
fungiwell that's fun. i used to be the one to receive security@etherpad.org e-mail and work with the author on embargoed disclosure with some of their downstream stakeholders14:07
dmsimardetherpad01.o.o spinning up on 16.04, we'll see how it goes14:07
fungii guess i'm not any longer (it's been a couple years since they had any security fixes though so maybe not entirely surprising)14:07
fungiis etherpad-dev.o.o not running latest?14:08
dmsimardI haven't looked, let me see14:08
dmsimardfungi: oh, yeah, etherpad-dev runs the latest version14:08
fungi/opt/etherpad-lite/etherpad-lite has 1fdb01fd759133b4da001dc5e233420a14cd8d59 checked out14:09
fungifrom today14:09
fungion etherpad-dev14:09
dmsimardwell, that's good -- we know 1.6.4 works with our current deployment setup, that's something14:09
fungilooks like node has been running since january, so i'm going to do a service restart14:10
fungijust to make sure we're testing the latest version14:10
pabelangerweb is down for me on etherpad-de14:11
fungiyeah, it's restarting14:11
fungitakes a few minutes, if memory serves14:11
pabelangercool14:11
corvuslooks like the version was pinned just because that was the latest develop version at the time.  so i think it was just us being conservative with versions on the production server.  if develop works on -dev, it should be fine to upgrade.14:11
pabelangergreat14:12
fungiyeah, that's been our pattern in the past. test latest version on e-dev.o.o, if it's good then roll to that version on e.o.o14:12
fungithough it's taking a while to start, so it _may_ not be happy14:12
fungigonna check logs here in a sec14:12
dmsimardnp, etherpad01.o.o is still spinning up on 16.04  (I screwed up and had to restart)14:13
dmsimardI wonder if there's any SQL migrations ?14:13
fungiinit: etherpad-lite pre-start process (10176) terminated with status 114:13
dmsimardI got this when running launch, didn't seem fatal: http://paste.openstack.org/raw/718739/14:15
fungi/var/log/eplite/error.log says "Ensure that all dependencies are up to date...  If this is the first time you have run Etherpad please be patient."14:15
dmsimardnot sure of the impact14:15
dmsimardfungi: maybe we update etherpad but not it's deps ?14:17
fungii'm checking to see what deps those might be14:17
dmsimardfungi: I think this only ever ran once: https://github.com/openstack-infra/puppet-etherpad_lite/blob/master/manifests/init.pp#L93-L10614:17
dmsimardbecause of the "creates"14:17
dmsimardah doh  | Apr  9 14:16:48 etherpad01 puppet-user[20797]: Could not find data item etherpad_ssl_cert_file_contents in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:454 on node etherpad01.openstack.org14:18
fungidmsimard: yeah, i expect we should rerun installDeps.sh14:19
fungiin /opt/etherpad-lite/etherpad-lite on etherpad-dev.o.o i'm running `sudo -u eplite env HOME=/var/log/eplite ./bin/installDeps.sh`14:22
fungiit's churning through pulling in the deps now14:23
dmsimardfungi: lgtm14:23
fungiit eventually spewed this error: http://paste.openstack.org/show/71874014:24
fungii think it may simply not like that the cert we're using there is self-signed14:24
dmsimardfungi: wait14:25
dmsimardHOME=/var/log/eplite ?14:25
dmsimardshould that not be like /opt/etherpad-lite or something ?14:25
fungihttps://git.openstack.org/cgit/openstack-infra/puppet-etherpad_lite/tree/manifests/init.pp#n9814:25
fungii think it just wants somewhere writeable by the eplite user14:26
dmsimardhuh14:26
dmsimardokay14:26
dmsimardre-spinning etherpad01 on 16.04 with fixed hiera things14:27
fungiseeing if maybe that error was non-fatal, trying again to start etherpad-lite service14:27
fungitrying to manually start the service here's what i'm getting: http://paste.openstack.org/show/718741/14:32
fungii wonder if it should instead be looking in /opt/etherpad-lite/etherpad-lite/src/node_modules/14:33
fungithough i don't see a ep_etherpad-lite subdir under there either14:33
dmsimardthere's one in14:34
dmsimardoh, huh14:34
dmsimardetherpad.o.o is different14:34
dmsimardetherpad.o.o has a node_modules at the root of /opt/etherpad-lite/etherpad-lite14:35
dmsimardand then you have /opt/etherpad-lite/etherpad-lite/node_modules/ep_etherpad-lite -> ../src14:35
dmsimardI don't see that on etherpad-dev14:35
fungii wonder if it wouldn't be simpler to just blow away /opt/etherpad-lite entirely and re-kick puppet14:36
dmsimardon etherpad-dev ? worth a shot. I'd simply rename the /opt/etherpad-lite directory before we're sure it's the good way though14:36
fungisure, can do that too. in case we want to compare content14:36
corvusqq, why are we upgrading the server?14:37
fungimoved it to /opt/etherpad-lite.old_2018-04-0914:37
corvusthe operating system, that is14:37
fungii don't know, nor do i expect it to be necessary14:38
fungiit was the rabbit hole i found people in when i got here14:38
fungii'm focusing on just getting latest etherpad code to deploy safely on existing servers14:38
fungire-kicking puppet on etherpad-dev now14:39
dmsimardcorvus: no particular reason, I figured I might as well take the opportunity to upgrade it, someone had already added xenial in site.pp14:40
fungii'd prefer if we could focus on fixing the security vulnerability in the fastest way possible, but i don't object to people looking into upgrading the operating system once this is behind us14:41
fungifatal: [etherpad-dev.openstack.org]: FAILED! => {"changed": false, "failed": true, "msg": "/usr/bin/timeout -s 9 30m /usr/bin/puppet apply /opt/system-config/production/manifests/site.pp --logdest syslog --environment 'production' --no-noop --detailed-exitcodes failed with return code: 6", "rc": 6, "stderr": "", "stdout": "", "stdout_lines": []}14:42
fungiyeah, it's hitting that same "npm ERR! Error: CERT_UNTRUSTED"14:43
fricklernot sure whether that has been mentioned yet, there are two patches still in progress related to etherpad on xenial14:43
fricklerhttps://review.openstack.org/528156 and https://review.openstack.org/52813014:43
dmsimardfrickler: ok thanks, let's leave xenial for later then14:43
fungilooking for a quick workaround now for what i expect is related to the self-signed cert on the dev server14:44
dmsimardfungi: we could generate a letsencrypt cert ?14:44
fungiwe could. will that be faster?14:44
dmsimardDo we generate/manage letsencrypt certs anywhere right now ?14:45
fungimay not actually be the server's cert it's complaining about14:45
dmsimardcan either pattern off of that or do a manual certbot challenge14:45
fungilooks like this is more likely due to outdated trust set in older node.js releases14:46
fungihttps://github.com/npm/npm/issues/2019114:46
dmsimardah so we need the patches that frickler mentioned14:46
fricklerthe notes in https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades seem to indicate that one needs at least node 6.x for current etherpad versions14:46
fungiyeah, ubuntu bug 176084014:47
openstackUbuntu bug 1760840 in npm (Ubuntu) "npm contains hardcoded certificate, so npm is not working anymore.." [Undecided,New] https://launchpad.net/bugs/176084014:47
fricklersee also the "in progress" section there for ethercalc14:47
fungilooks like it was reported less than a week agio14:47
fungii'm trying the patch from that bug as a temporary workaround, mostly to make sure it's the actual (and only) problem we're facing14:50
fungicleared out /opt/etherpad-lite and am re-kicking puppet now14:51
mnaserfungi: apt-get install ssl-cert and you will find /etc/ssl/{certs,private}/ssl-cert-snakeoil.pem14:51
mnaserlazy fool proof way of getting self signed certs D:14:52
dmsimardWow that's a lot of hardcoded certificates14:53
fungithat seems to have worked14:54
fungimnaser: yep, that's what we already do in the puppet-etherpad_lite module if no certs are provided14:54
fungipuppet applied without error after applying the patch to config-defs.js14:55
fungietherpad-lite start/running, process 1890914:55
dmsimardfungi: where is that config-defs file running ?14:55
dmsimardor located, rather14:56
fungihowever it looks like it's probably crashing immediately14:56
fungidmsimard: /usr/share/npm/node_modules/npmconf/config-defs.js14:56
dmsimardthanks14:56
fungilooks like it's gone into a classic etherpad spawn->crash->respawn->crash->respawn->... loop14:57
fungithis is what i find repeating in the error log for each time it starts: http://paste.openstack.org/show/718742/14:59
dmsimardI have to step away for dentist appointment, be back in a bit14:59
fungii'm going to try manually starting the service again in the foreground15:00
fungithis looks to be the next problem: http://paste.openstack.org/show/718744/15:01
fungiand `/usr/local/bin/node --version` does indeed report v0.10.2515:02
fungiyeah, that's a symlink to /usr/bin/nodejs because we tell it to use the system package per http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/manifests/etherpad_dev.pp#n915:05
fungiso system-config change 526978 and puppet-etherpad_lite change 528156 try to address that though we'll also need to patch the etherpad and etherpad_dev classes in system-config to use something other than "system" for the $nodejs_version parameter15:12
fungi528156 also has the other change frickler linked (528130) as a parent15:13
fungiso i'm reviewing those now15:13
fungioh, nevermind, 528156 is the parent of 528130 not the other way around15:17
corvusfungi: it looks like the stack is blocked by 52862515:19
fungiyeah, i just finished reviewing that one15:19
corvuslgtm too15:19
fungiother than the line setting the homedir in that exec resource being somewhat redundant now, it seems fine15:19
fungithough ultimately it's 526978 we need and then another system-config change on top of it15:20
fungii've approved that one and am writing the change to openstack_project::etherpad_dev now15:21
fungihttps://review.openstack.org/559767 Use nodejs 6.x on etherpad-dev.o.o15:24
corvusclarkb, pabelanger: ^15:31
* clarkb catches up15:33
pabelangersame15:35
clarkbI've approved nodejs update change15:36
fungithanks, i'll clean up etherpad-dev so it'll get retried15:40
fungibacking out the patch from bug 1760840 as well15:41
openstackbug 1760840 in npm (Ubuntu) "npm contains hardcoded certificate, so npm is not working anymore.." [Undecided,New] https://launchpad.net/bugs/176084015:41
fungiremoved /opt/etherpad-lite and /var/log/eplite/.npm too15:42
clarkbfungi: so I make sure I'm up to speed, we are update nodejs on -dev sothat we can deploy latest etherpad-lite there to fix a bug, Once we show that is working we'll do similar with production?15:48
fungiclarkb: 100% correct15:48
fungiwow, jobs for system-config changes really do seem to take a while16:09
corvuslooks like it depends on the provider16:13
corvussometimes they take 50% longer16:13
fungii'm going to need to disappear in ~30 minutes for an appointment, but expect this shouldn't be hard to iterate on16:17
fungilooks like they just merged16:17
fungiclarkb just pointed out to me that our production deployment is probably only actually impacted by the third bullet on http://blog.etherpad.org/2018/04/07/important-release-1-6-4/16:22
fungisince it's on a random commit somewhere after 1.5.0 but prior to 1.6.016:23
fungiwhich rules out the first bullet16:23
fungiand using mysql for the pad store16:23
fungiwhich rules out the second bullet16:23
fungiso depending on how worried we are about people being able to extract content from pads whose names they don't know, this is likely not super urgent?16:24
corvusindeed.  the obscurity of random etherpad data is nice (to share private drafts, etc).  but hopefully folks have all used that with a grain of salt and not relied on it not being discovered.16:24
corvusmight be nice to finish upgrading it today, but probably not stop-the-world urgent16:25
fungiright, as in i'm happy to continue poking at it, but not so worried about lunch-appointment-induced delays16:25
fungiand we can probably move remaining discussion back to #openstack-infra16:25
clarkbya I'm going to approve the dib change for bionic dns now that this is less urgent (I didn't want dib distracting us but I think we don't have to ensure this is done immediately before lunch as fungi puts it )16:27
fungiclarkb: there's a pending zuul scheduler restart16:28
clarkbthat should be fine, can just recheck or reapprove if it doesn't make it in before that16:28
fungiyeah16:29
dmsimardsorry about the commotion, didn't realize our etherpad was so out of date :p16:29
clarkbI've become somewhat allergic to updating it because every update meant bugfixes at the summit in the middle of summiting16:29
clarkbbut we have updated for bug fixes as necessary so fine to continue updating16:29
fungidmsimard: well, it's still semi-critical for etherpad-dev, ironically16:30
fungibut as long as we keep it offline there until we're running the latest version, shouldn't be a huge concern16:30
dmsimardI was looking at the couple patches we needed.. looks like we an unrelated failure in https://review.openstack.org/#/c/528626/ "ArgumentError: Could not find declared class ::drush::git::drush at /etc/puppet/modules/drupal/manifests/drush.pp:32 on node"16:33
dmsimardfor groups.o.o16:33
dmsimardThe parent patch dates back to december, I'll try a rebase.16:34
dmsimardoh, wait, that's in puppet-etherpad, not system config... it's already up to date.16:34
dmsimardmeh, "fatal: unable to access 'https://git.drupal.org/project/puppet-drush/': Failed to connect to git.drupal.org port 443: Connection timed out" let's try that again.16:37
clarkbdmsimard: what is the relationship between etherpad and drupal?16:37
clarkbI'm not confused16:37
clarkb*now16:37
dmsimardclarkb: https://review.openstack.org/#/c/528626/16:37
dmsimardclarkb: was failing a puppet apply job on a drupal thing because the puppet module installation failed for drush16:38
clarkbah its the noop apply job running against all the nodes in site.pp16:38
dmsimardyeah..16:38
dmsimardI was confused too :)16:38
clarkbdmsimard: https://git.drupal.org/project/puppet-drush/ is a 404 I'm guessing it moved16:43
dmsimardclarkb: the http is just not set up right, git clone works16:44
dmsimard(I tested it)16:44
clarkbah16:44
clarkbya confirmed works via git clone16:47
-openstackstatus- NOTICE: zuul was restarted to update to the latest code; please recheck any changes uploaded within the past 10 minutes16:51
*** rlandy has quit IRC21:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!