Thursday, 2021-08-12

Unit193Has anyone noticed that when the pastebin was moved, https://paste.opendev.org/json/?method=pastes.newPaste started becoming an error?01:12
fungiUnit193: i hadn't noticed, but what did that method do previously?01:15
Unit193fungi: That's the API to post a paste, specifically how pastebinit pastes stuff.01:16
fungihuh, i thought i'd used it since then... checking01:16
fungiindeed, pastebinit is giving me "Failed to contact the server: HTTP Error 502: Proxy Error"01:17
fungithat's no good... i wonder if something changed in our proxying config01:17
Unit193Yep, openstack has been my default for years in pastebinit. :3  Also, if you wonder why  `pastebinit file.ext` doesn't work, it's literally a one line fix.  At about line 353 change 'with open(filename, "rb")' to 'with open(filename, "r")' :P01:20
fungiyeah, i've similarly debugged and locally patched my copy of pastebinit long ago when switching to python301:21
fungiianw added an explicit filter for the getRecent method last week in https://review.opendev.org/803418 but we're generally proxying everything else01:21
ianwit is likely something to do with lodgeit on the backend01:22
fungimaybe01:23
fungiwe're proxying basically everything via https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/lodgeit/templates/paste.vhost.j2#L37-L4601:23
ianwyeah, the proxy error would be when lodgeit returns an exception i bet01:23
Unit193BTW, thank you very much for hosting a great lodgeit instance.01:23
fungiyep, i'm digging in logs now01:23
fungiAH00898: Error reading from remote server returned by /json/01:25
fungiso as ianw guessed, apache is getting a bad response from the backend01:25
ianwyeah, check in /var/log/containers/ for the lodgeit container logs01:25
* ianw lays a bet on unicode strings being involved01:26
fungidocker-compose says it should log to syslog, but it doesn't seem to that i can tell (or it's not logging the api error)01:28
fungiahh, its streaming to stdout not syslog01:28
fungii should have told it to only show me recent log entries :/01:28
ianw /var/log/containers/docker-lodgeit.log should be capturing it ...01:29
ianwAug 12 01:27:41 paste01 docker-lodgeit[647]:   File "/usr/local/lib/python3.7/site-packages/lodgeit/lib/json.py", line 45, in handle_request01:29
ianwAug 12 01:27:41 paste01 docker-lodgeit[647]:     response = {'data': None, 'error': str(e).decode('utf-8')}01:29
ianwAug 12 01:27:41 paste01 docker-lodgeit[647]: AttributeError: 'str' object has no attribute 'decode'01:29
fungiFile "/usr/local/lib/python3.7/site-packages/lodgeit/lib/json.py", line 45, in handle_request01:30
fungiyeah, you beat me to it01:30
fungiyou were close!~01:30
fungistring handling/decoding definitely involved anyway01:30
fungii agree that's entirely wrong. should we just drop the .decode() or drop the str() recast?01:35
fungiexcept Exception as e should already have e as a string01:36
fungii think we drop both the str() and the .decode()01:36
fungioh!01:37
fungiit was supposed to be .encode()01:37
fungithis is getting stuffed back into the json response01:37
fungiianw: oh, i should have read back farther in the log. that was a secondary exception. the real issue is "TypeError: pastes_new_paste() missing 2 required positional argumen01:44
fungits: 'language' and 'code'"01:44
fungiso we have two bugs. one to do with how *args is being passed (i think?) and one with how exceptions are being serialized into the response01:46
fungithe method handler is a bit spaghetti-like, i don't think i'm going to manage to debug it this late at night02:10
fungigoing to need to dump local.request.data to examine i think02:11
*** ykarel|away is now known as ykarel04:57
*** rpittau|afk is now known as rpittau06:22
*** dpawlik6 is now known as dpawlik07:07
*** jpena|off is now known as jpena07:33
*** elodilles_pto is now known as elodilles08:12
*** ykarel is now known as ykarel|lunch08:27
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch  https://review.opendev.org/c/opendev/elastic-recheck/+/80389708:50
*** sshnaidm|afk is now known as sshnaidm08:55
*** ykarel|lunch is now known as ykarel09:46
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch  https://review.opendev.org/c/opendev/elastic-recheck/+/80389709:49
*** dviroel|ruck|out is now known as dviroel|ruck11:24
*** jpena is now known as jpena|lunch11:33
fungireturning to the lodgeit situation, so far i've been able to determine that when sending via pastebinit, local.request.data is coming through as an empty string12:12
fungiwhich is basically an alias for werkzeug.LocalProxy(ctx, 'request').data12:14
fungiunfortunately, my familiarity with werkzeug is effectively nonexistent12:15
funginoonedeadpunk: any chance you've run into this and/or figured out the cause?12:15
*** jpena|lunch is now known as jpena12:24
*** jpena is now known as jpena|off12:33
*** ysandeep|PTO is now known as ysandeep13:26
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch  https://review.opendev.org/c/opendev/elastic-recheck/+/80389713:27
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch  https://review.opendev.org/c/opendev/elastic-recheck/+/80389713:37
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch  https://review.opendev.org/c/opendev/elastic-recheck/+/80389713:43
fungistarting to wonder if werkzeug doesn't werk with python 3.714:17
fungirather, the ancient werkzeug 0.15 we pin to14:17
fungi0.15.6 (2019-09-04) does claim it supports 3.7: https://pypi.org/project/Werkzeug/0.15.6/14:20
*** dmellado_ is now known as dmellado14:24
fungilooks like the old paste server got deleted, so i'm not sure if i can figure out what versions of deps were installed on it14:33
*** jpena|off is now known as jpena14:43
*** diablo_rojo__ is now known as diablo_rojo14:54
clarkbfungi: did we convert to python3 on the old server? I thought we had but maybe that never happened15:14
clarkbalso re etherpad, sleeping on it my brain woke up telling me to check the settings template and there are a few settings I want to check on the test instance. I'll restart an update there, then we can make the updates if necessary to the change and hold a new test instance to double check it15:15
fungii don't think we did, but if we did that'll be reflected in puppet-lodgeit i guess15:15
*** rpittau is now known as rpittau|afk15:31
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade etherpad to 1.8.14  https://review.opendev.org/c/opendev/system-config/+/80413615:32
opendevreviewClark Boylan proposed opendev/system-config master: DNM force etherpad failure to hold the test nodes  https://review.opendev.org/c/opendev/system-config/+/80413815:32
clarkbI'll delete the current hold and set up a hold for ^ then if that still looks good I think we can consider landing this change15:32
clarkbmnaser: do you still need your loci keystone autohold? Doesn't look like it ever caught a failure15:33
*** jpena is now known as jpena|off15:34
clarkbwhile waiting for those jobs to run I'm going to find breakfast and water the garden. Back in a bit15:34
clarkbfungi: if you review that update ^ you may want to read through the upstream example settings.json to make sure we aren't missing anything you think is important15:35
fungisure, and good call, this bit us in the past around the default "theme" change which completely redid how the interface works15:37
*** marios is now known as marios|out16:08
clarkbfungi: 23.253.160.131 etherpad.opendev.org and https://etherpad.opendev.org/p/clarkb-test if you want to test16:26
clarkbstill looks good to me16:26
*** ykarel is now known as ykarel|away16:31
roman_gGood morning, clarkb fungi. City Network has resolved Glance issues on KNA1. Please, revert https://review.opendev.org/c/openstack/project-config/+/804281 and proceed with image uploads and what else is needed.16:37
clarkbroman_g: reverting that change should be all we need to do to start using it again. I'll push that up momentarily16:38
roman_gGreat. Thank you.16:38
fungiroman_g: so they concluded that their problem was not us after all? ;)16:38
roman_gfungi of course.16:40
opendevreviewClark Boylan proposed openstack/project-config master: Revert "Disable airship citylcoud nodepool provider"  https://review.opendev.org/c/openstack/project-config/+/80441716:40
roman_gYour Zuul is one of the best canary services public cloud provider could use for free.16:41
fungito be fair, it's very resource-intensive for a canary ;)16:41
roman_gIt's more useful than fully artificial tempest workloads, for example. Doing good for community and testing infra reliability at the same time - couldn't imagine better options.16:44
roman_gParticipating in SETI@Home or something similar does not produce this type of workloads.16:45
roman_gAll right, I'm going home. Have a good day/evening, see you later. Big warm Thank you, fungi & clarkb.16:47
clarkbyup, it is one of the things we try to explain as a benefit for contributing resources. We're a flexible user that can shake out problems and potentially help fix them if we need to tie back into the upstream problems16:47
opendevreviewMerged openstack/project-config master: Revert "Disable airship citylcoud nodepool provider"  https://review.opendev.org/c/openstack/project-config/+/80441717:01
clarkbfungi: let me know how if you find anything with the etherpad test server. But if not I think we should probably try and land that chagne today and upgrade?17:03
clarkbinfra-root ^ if you think that is a bad idea for some reason let us know. I think the risk here is largely that we might break the jitsi meet integration which is harder to test pre merge17:04
fungiyeah, sorry, getting distracted by other discussions but testing now17:04
clarkbwe have until october for the ptg which will hopefully be enough time to correct any of those issues17:04
fungiclarkb: seems to be working well for me17:07
fungithe other thing we'll want to test is meetpad, but i expect we need to upgrade production etherpad to really do that properly17:07
clarkbyes unfortauntely due to how the proxying there works17:07
clarkbI don't think we can hijack it the way we can with a local etc/hosts update17:08
clarkbsince we run a proxy to etherpad on the meetpad host that jitsi meet talks to17:08
clarkbif we talked directly to etherpad then the local /etc/hosts override would work17:08
clarkbif anyone has good ideas for testing that prior to landing the change I'm open to ideas :)17:10
fungiclarkb: those patchfiles are reapplied to the corresponding upstream files, i take it? hard to know what i'm reviewing in those diffs17:10
clarkbI guess we could run a test jitsi meet and then hack it up?17:10
clarkbfungi: the patchfiles are all deleted because newer etherpad fixed the problem they addressed17:11
clarkbfungi: previously that was basically us backporting a fix because upstream hadn't done a release yet17:11
clarkbthey came from upstream but we didn't have a release17:11
clarkbspecifically it fixed the overlapping of colors with text iirc17:12
clarkbyup git log seems to confirm17:13
clarkb"We also make a small change to the css in the editor so that each line has its text fully rendered. See https://github.com/ether/etherpad-lite/issues/4106 for more details."17:13
clarkboh and another fix was for "Patch etherpad console logging to fix cross origin error"17:14
fungioh, indeed, i missed that the patchfiles were all deletions17:14
corvusclarkb, fungi: can you allow through https://review.opendev.org/804255 (and https://review.opendev.org/804256 ) so that, aiui, tristanC can confirm that the matrix gerritbot disconnection problem is resolved?17:14
fungibecause i was looking at diffs, in a diff view, and my brain read the diffs which were being removed as the actual git diffs17:14
clarkbcorvus: why not remove the line in https://review.opendev.org/c/opendev/system-config/+/804255/2/inventory/service/groups.yaml ?17:16
clarkbI'll apprive it as is with a note to consider deleting that line in a followup if there wasn't an explicit reason for that17:16
fungiclarkb: it did remove the line, and replaced it with a comment explaining why we don't need to put eavesdrop in that group17:18
clarkbfor https://review.opendev.org/c/opendev/system-config/+/804256 the iptables role has an explicit entry in rules.v4.j2 and rules.v6.j2 for port 22. That is what we mean by hard coded ya? And as such thee is no concern with approving that change?17:18
fungii saw that as a reminder to our future selves that we shouldn't think it's missing unintentionally and re-add it17:18
clarkbfungi: gotcha17:18
corvusclarkb: i left that as a comment so that people ... yes what fungi said17:18
corvusi could see us thinking "but eavesdrop is a webserver!"17:19
clarkbI want ot make sure we're all in agreement on the port 22 thing because I approve it and potentially remove ssh access17:20
clarkbs/because/before/17:20
corvusclarkb: that is my understanding and reasoning for the change and i agree extra caution is warranted.17:20
corvus(this is exactly why i wish it hadn't been added in the first place and think we should nip it in the bud)17:21
clarkb++ just want to double check first17:21
clarkbfungi: ^ is that your understanding? and if so should I approve it now?17:21
fungiyes, that is what i understood when i reviewed it17:21
corvusi also believe that testinfa checks this17:22
fungiwe always add ssh to all servers, so removing it from the webservers group should have no adverse impact17:22
fungiwe have other server groups which do not include ssh in them, and we still have ssh access to those servers17:22
clarkbalright approving now17:23
clarkband ya those extra ports get iterated over a few lines below the hard coded port 22 entry17:24
clarkbso should be fully redundant17:24
clarkband doing an iptables -L -n on eavesdrop01 shows two entries for port 2217:25
corvusclarkb: maybe an extra safety would be to do an ad-hoc iptables -L -n on the webservers group17:27
clarkbcorvus: ya that is what I did roughly on one member of that group and it does show two entries17:28
clarkbremoving one should still leave us with the other17:28
corvusclarkb: i just did it on the whole group, and they always appear in pairs17:29
corvuseavesdrop01.openstack.org and wiki-dev01.openstack.org are unreachable17:29
clarkbI think that is expected for eavesdrop01.openstack.org, not sure about wiki-dev0117:29
clarkbfungi: ^17:29
fungieavesdrop01.opendev.org is the server in the inventory17:30
opendevreviewJames E. Blair proposed opendev/system-config master: Remove 2 servers that don't exist  https://review.opendev.org/c/opendev/system-config/+/80442017:30
fungithe current wiki-dev is wiki-dev03.openstack.org17:30
corvusclarkb, fungi: ^ then it sounds like that change is desirable17:30
fungithanks17:30
clarkbI think eavesdrop01.openstack.org may still exist in a shutdown state as part of queued up cleanup? ianw would know what the state is on those17:31
fungibut yeah, removal from the inventory is fine, it may be that there's a change out there to do that for eavesdrop already, but i clearly forgot to do one for wiki-dev0117:31
opendevreviewMerged opendev/system-config master: Test port 9001 on eavesdrop  https://review.opendev.org/c/opendev/system-config/+/80425517:52
opendevreviewMerged opendev/system-config master: Remove port 22 from webservers extra ports  https://review.opendev.org/c/opendev/system-config/+/80425617:52
clarkbcorvus: ^17:53
*** sshnaidm is now known as sshnaidm|afk17:55
clarkbinfra-root do we want to proceed with https://review.opendev.org/c/opendev/system-config/+/804136 knowing that meetpad testing is a gap we have? or try to figure out testing for meetpad first?18:04
fungii'm good with going ahead, but hoped we could get at least one more core reviewer to weigh in18:04
clarkbwfm18:05
fungiclarkb: maybe unless ianw thinks he has time to review the etherpad upgrade (804136) during his morning, we can just go ahead and approve/deploy it and then test meetpad before it gets too late in your day19:54
clarkbfungi: wfm19:54
clarkbI think it may still be quite early in ianw's part of the world so may have to wait another coupel of hours before we hear back. I guess worst case we can land the change tomorrow?20:00
fungii'm also around for that20:00
clarkbfungi: I'm trying to think of other tools that might be hit by sqlalchemy 2.0 in opendev land20:41
clarkbjeepyb dropped its sql stuff20:41
clarkbstoryboard?20:41
clarkbI suppose for tools like that pinning isn't the end of the world either20:42
clarkbkeeping up to date is likely far more important for zuul20:43
opendevreviewClark Boylan proposed opendev/system-config master: Update our project rename docs  https://review.opendev.org/c/opendev/system-config/+/80446020:53
clarkbThat is a first pass at updating the rename docs. I think we can land that pretty safely. It doesn't include steps about splitting up the changes because that is more complicated and can happen in a followup20:54
clarkbI think we'ev leaked ansible processes on bridge again20:57
clarkbI noticed because https://review.opendev.org/c/opendev/system-config/+/804255 timed out a couple of jobs. I don't think those timeouts will impact what it is trying to accmplish though20:57
clarkbOnce bridge is more idle we should be able to safely clean up the processes there20:58
fungiyeah, i expect to just pin sb and/or jeepyb and/or lodgeit if sqlav2 becomes a problem21:48
clarkbcorvus: the infra-prod-service-eavesdrop job is running now21:48
clarkbfor the first change21:49
corvuswow that seems like yesterday21:50
*** dviroel|ruck is now known as dviroel|ruck|out21:51
fungiwait, that wasn't yesterday?21:54
* fungi has lost his perception of time21:54
ianwsorry, hi22:05
fungisorry for what? also, hiii!22:06
ianwi fiddled with the etherpad, just reading the changes now22:07
clarkbI should apologize for thinking of new things to check on etherpad but only after having slept on it :)22:07
clarkbianw: in addition to the new settings changes the other thing to think about is if we are worried enough about meetpad to not just land that change go for it22:08
ianwonly things of interest seem to be a warning22:17
ianw[2021-08-12 16:08:59.867] [WARN] console - Declaring the sessionKey in the settings.json is deprecated.22:17
fungimmm22:18
ianwnot sure if that's important.  and it doesn't seem to wait for mariadb to start; we could use the wait-for-it script if we cared22:18
ianwseems like it doesn't have issues retrying22:18
ianw(unlike lodgeit ...)22:19
ianwspeaking of ... python3 was new to the container, it was definitely python2 on the old host22:19
ianwi did try updating werkzeug but it's a dependency loop22:20
ianwthey kicked some session thing out into a new library; that library seems abanonded and so now doesn't work with the latest werkzeug22:20
ianwthat was where i left it.  22:21
clarkbhttps://github.com/ether/etherpad-lite/tree/1.8.14#tweak-the-settings looks like it wants a SESSIONKEY.txt?22:25
opendevreviewIan Wienand proposed opendev/system-config master: etherpad: remove session key  https://review.opendev.org/c/opendev/system-config/+/80446622:25
clarkbalso I feel like I look this up every time I interact with it but APIKEY is generated on startup for us and random?22:26
fungiianw: do you have any gut feeling for why the request data obtained from werkzeug would be an empty bytestring in those api calls?22:28
clarkbya that seems to be the case for apikey22:28
clarkbianw: your change lgtm and I think it is safe for old or new etherpad22:29
clarkbha nice the hourly deploy jobs have the semaphore so the extra port 22 cleanup is another hour away from running :/22:30
fungithe deploy for 804255 timed out on manage-projects and puppet-else builds22:30
clarkbianw: fungi: I guess we'll land the etherpad upgrade change first thing tomorrow so that it applies early enough in the day for testing22:30
clarkbfungi: yes I made note of that above we seem to have leaked ansible processes on bridge which I think is making it slowish22:30
clarkbbut I dno't want to clean those up until bridge is idle22:31
fungiahh, thanks22:31
fungiand yeah, tomorrow for etherpad wfm22:31
ianwhttps://github.com/ether/etherpad-lite/commit/a3ed82ad0622aaed0777e63280a2487322c49bad is the original change22:31
clarkbout of fear for stopping actual ansible processes that are doing work22:31
clarkbianw: oh actually one issues with your change is that if we down then up the containers we'll get a new session key value22:31
* ianw puts this in my list of "terrible commit messages that tell you the obvious thing but are useless on why they are making the change"22:31
clarkband will that cause everyone to need to reconnect?22:31
clarkbmaybe we are ok with that. Since it would only be when we restart the service. If the client side disconnects the random value is fine22:32
fungispeaking of reconnecting, i noticed the error messages improved when you open the same pad in a second browser window/tab22:33
fungiit now actually pops up an error in the first one saying "you've opened a new copy" or something to that effect22:33
clarkboh nice22:33
clarkbfungi: re the upgrade I'll try to get an early start so that I can approve the change then find breakfast while zuul chews on it22:33
fungihopefully fewer confused reports from users now22:33
fungii have no other scheduled obligations tomorrow so will be on hand as soon as you're ready22:34
ianwlooking at that code, it seems unlikely the override would ever go away; only if they removed that if (!exports.sessionKey) 22:34
fungii can even approve it earlier if you like so there's a head start on gating/deploy jobs22:34
clarkbfungi: up to you on that. I've pretty regularly been around by 1500 UTC but if I'm late you may have to deal with things a bit until I show up. Not that I have any plans tonight that would impact my ability to show up on time :)22:36
clarkbianw: looking at that try except I think that may make bind mounting the file in tricky. We'd have to supply the secret in the file directly22:37
clarkbbecause giving it an empty file will not cause an exception I don't think22:37
clarkband we can't bind mount the entire dir because that dir contains the installation :/22:37
ianwhrm, docker-compose could write out a static value?22:40
clarkbianw: ya we could have ansible write it out first and then bind mount that. Not too different than what we do today just in another file22:41
clarkbits just frustating because they seem to have done this for simplicity but then made it not any simply22:41
clarkbs/simply/simpler/22:41
ianwi guess from their POV it's complicated to get the right length of key in the config file, and you have to explain and handle people putting in the wrong thing22:42
clarkbhrm ya that could be. And it could also be that 99% of the time we're fine with the generated value taht chagnes each restart22:42
clarkbsince clients tend to have to reconnect in those cases anyway I think22:42
ianwit's maybe even good to require that, to avoid any weirdness 22:43
ianwfungi: to your prior question; i didn't have any insights to the argument passing on lodgeit.  i couldn't figure out a simple debug strategy; with limnoria i could jump int he container and edit the python files, and send a sighup and it would reload22:45
ianwbut with this one it seemed to kill the container.  so it was difficult to just edit in a quick debug statement to help trace things22:45
clarkbianw: as a general heads up https://review.opendev.org/804256 was merged and it hasn't run its deploy jobs yet. fungi and corvus and I all tried to double check it and we're confident that won't accidentally break ssh. But I may be eating dinner when it finally starts to apply on systems22:45
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/804460 is the project renaming doc update22:47
ianwheh, ok :)  not sure what the rollback strategy will be, other than to scream loudly :)22:47
clarkbfungi: it tries to handle everything but the split change for testing situation whcih I figure we can land in a followup if we manage to come up with a better process for that22:48
clarkbianw: I think it is do recovery mounts and edit those files and reboot22:48
clarkbianw: but we confirmed there were double port 22 rules on those webserver hosts and the rules templates hardcode in a port 22 rule so we don't rely on the lists of ports for ssh22:48
mordredianw: I find screaming loudly to be the best strategy for most situations23:04
fungiianw: oh, i have no problem editing the python scripts in the lodgeit container and stopping/starting the container23:15
fungii'm just not sure what else to debug23:15
fungi(but that's how i confirmed the args list is an empty bytestring)23:16
ianwfungi: ahh, ok, was this local or a held node?23:16
fungiianw: neither, i poked at the production server since i was initially trying to figure out if there was something strange about how it was running23:34
ianwok, if i get a sec this afternoon i might try and put a test server on hold where i think we can fiddle as we want23:34
fungiyeah, at this point i'm fairly certain i can do exactly the same on a held node23:37

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!