Wednesday, 2021-02-10

clarkbI've started trying to figure out how the inmotion cloud vip is working. I set up a nc -l 54321 < file on each of the three hosts and the content of the files was the last octet for the actual ip of those hosts00:15
clarkbthen I requested port 54321 against the VIP from home, review-test, and a third host in ovh00:15
clarkbeach time I got back a consistent IP address. That makes me think that the VIP is doign a 1:1 mapping currently00:16
clarkband they aren't doing higher level proxying or laod balancing00:16
clarkbas a sanity check I don't see the vip directly on an interface on that host either00:17
*** gmann_afk is now known as gmann00:19
clarkbok cool this is a kolla managed vip00:19
clarkbI see it in the kolla config for that host00:20
clarkbI still have no idea how it is functionally working but that is a start00:20
clarkbaha, apparently ifconfig isn't showing me all the addresses on the interface kolla is using but ip addr does00:25
clarkbcool so now I can see that ip addr is present on one of the three hosts00:25
clarkbthat confirms it is effectively 1:100:25
clarkband then if I exec into the haproxy container and look at /etc/haproxy/services.d/ contents i see haproxy listening on the vip00:26
clarkbreading the kolla docs I think we can set some config options (specifically kolla_enable_tls_external, and kolla_external_fqdn_cert) then rerun kolla and that should update the haproxy configs with a cert?00:28
clarkbI'll ask them if rerunning kolla is something they expect people to do00:29
clarkbthe upside to having kolla do that for us is it can be sure to get all the necessary ports in haproxy00:33
clarkbbut we could put another proxy in front of the haproxy from kolla and do it ourselves00:33
openstackgerritMerged opendev/system-config master: refstack: move non-private variables to public  https://review.opendev.org/c/opendev/system-config/+/77458700:37
openstackgerritMerged opendev/system-config master: Setup OpenInfra-Board Channel  https://review.opendev.org/c/opendev/system-config/+/77470600:40
clarkbI think I have a general idea of how to rerun kolla with an updated config. I expect that to take some time to simply execute and dinner is happening momentarily. I'll see if my questions to inmotion have been answered tomorrow and take it from there00:55
*** rchurch has quit IRC01:05
*** mlavalle has quit IRC01:07
*** rchurch has joined #opendev01:07
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup-server: run a weekly backup verification  https://review.opendev.org/c/opendev/system-config/+/77475301:27
openstackgerritMerged opendev/system-config master: refstack: add production image and deployment jobs  https://review.opendev.org/c/opendev/system-config/+/77458601:28
openstackgerritMerged opendev/system-config master: borg-backup-server: add script for pruning borg backups  https://review.opendev.org/c/opendev/system-config/+/77456101:28
openstackgerritMerged opendev/system-config master: borg-backup-server: volume space monitor  https://review.opendev.org/c/opendev/system-config/+/77456401:28
openstackgerritMerged opendev/system-config master: doc: update backup instructions  https://review.opendev.org/c/opendev/system-config/+/77457001:29
openstackgerritMerged opendev/system-config master: borg testing: catch stdout and stderr from test prune correctly  https://review.opendev.org/c/opendev/system-config/+/77474501:33
openstackgerritIan Wienand proposed opendev/system-config master: refstack: trigger image upload  https://review.opendev.org/c/opendev/system-config/+/77475602:13
*** artom has quit IRC02:13
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup-server: run a weekly backup verification  https://review.opendev.org/c/opendev/system-config/+/77475302:39
openstackgerritIan Wienand proposed opendev/system-config master: openafs-<db|file>-server: fix role name  https://review.opendev.org/c/opendev/system-config/+/77476102:50
openstackgerritMerged opendev/system-config master: borg-backup: save PIPESTATUS before referencing  https://review.opendev.org/c/opendev/system-config/+/77458803:01
*** rchurch has quit IRC03:14
*** rchurch has joined #opendev03:15
*** artom has joined #opendev03:19
*** hemanth_n has joined #opendev03:25
openstackgerritMerged opendev/system-config master: refstack: trigger image upload  https://review.opendev.org/c/opendev/system-config/+/77475603:30
*** diablo_rojo has quit IRC03:41
*** dviroel has quit IRC04:07
*** lamt has quit IRC04:25
*** mrunge has quit IRC04:37
*** dmellado has quit IRC04:37
*** JohnnyRainbow has quit IRC04:37
*** ykarel has joined #opendev04:38
*** mrunge has joined #opendev04:42
*** dmellado has joined #opendev04:42
*** JohnnyRainbow has joined #opendev04:42
*** Eighth_Doctor has quit IRC04:47
*** ysandeep|away is now known as ysandeep|rover04:49
*** mordred has quit IRC04:50
*** whoami-rajat__ has joined #opendev04:57
*** openstackstatus has quit IRC04:58
*** openstack has joined #opendev04:59
*** ChanServ sets mode: +o openstack04:59
*** ysandeep|rover is now known as ysandeep|brb05:13
*** mordred has joined #opendev05:19
*** Eighth_Doctor has joined #opendev05:22
ianwclarkb / kopecmartin : i have run a mysqldump of the refstack db and imported it into https://refstack01.openstack.org05:26
ianwclarkb / kopecmartin : to me, it looks like things are not working.05:30
ianwSQL connection failed. 10 attempts left.: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)")05:32
ianwthat's in the container05:32
*** redrobot4 has joined #opendev05:34
*** ysandeep|brb is now known as ysandeep|rover05:35
*** redrobot has quit IRC05:37
*** redrobot4 is now known as redrobot05:37
*** ykarel has quit IRC05:55
*** marios has joined #opendev05:55
*** ykarel has joined #opendev06:12
ianwi think it probably needs someting to wait for the mysql container to be alive06:14
ianwbut, i've hacked in something like that and it still doesn't work06:14
openstackgerritIan Wienand proposed opendev/system-config master: refstack: create database storage area  https://review.opendev.org/c/opendev/system-config/+/77477306:35
ianwclarkb / kopecmartin : ^ that's a start i guess ... out of time for today06:35
*** levalicious has joined #opendev07:19
*** eolivare has joined #opendev07:32
*** rpittau|afk is now known as rpittau07:51
*** ysandeep|rover is now known as ysandeep|lunch07:54
*** hashar has joined #opendev07:54
*** ralonsoh has joined #opendev07:55
*** sboyron has joined #opendev08:02
*** andrewbonney has joined #opendev08:21
*** slaweq|away is now known as slaweq08:29
*** zbr|pto is now known as zbr08:35
*** ysandeep|lunch is now known as ysandeep|rover08:52
*** jpena|off is now known as jpena08:57
*** tosky has joined #opendev09:12
*** DSpider has joined #opendev09:15
*** ykarel is now known as ykarel|lunch09:34
*** dtantsur|afk is now known as dtantsur10:38
*** hashar is now known as hasharLunch10:45
*** ykarel|lunch is now known as ykarel10:53
*** dviroel has joined #opendev11:02
*** hasharLunch has quit IRC11:19
*** hasharLunch has joined #opendev11:42
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs  https://review.opendev.org/c/zuul/zuul-jobs/+/77465011:44
openstackgerritOleksandr Kozachenko proposed openstack/project-config master: Add zuul-storage-proxy in zuul namespace  https://review.opendev.org/c/openstack/project-config/+/77236411:48
*** hemanth_n has quit IRC11:56
*** cloudnull has quit IRC12:02
*** cloudnull has joined #opendev12:05
*** eolivare_ has joined #opendev12:23
*** eolivare has quit IRC12:25
*** hasharLunch is now known as hashar12:29
*** ysandeep|rover is now known as ysandeep|call12:31
*** jpena is now known as jpena|lunch12:36
*** hashar is now known as hasharAway12:39
*** eolivare_ has quit IRC12:46
*** iurygregory has quit IRC12:51
*** ysandeep|call is now known as ysandeep|rover13:16
*** eolivare_ has joined #opendev13:23
*** ykarel_ has joined #opendev13:24
*** ykarel has quit IRC13:27
*** jpena|lunch is now known as jpena13:33
*** ykarel_ is now known as ykarel13:40
ttxHi all, I'm working to move the openstackptg bot to #openinfra-events and was taking the opportunity to rename it to "openinfraptg". But to do that it looks like someone will have no manually log in to Nickserv with the openstackptg account and associate an additional nick to it. Someone with access to the ptgbot password in hiera... Also after that the ptgbot_nick entry will have to be changed in hiera. I'm13:52
ttxa bit unclear on the process to follow to do hiera things, so any guidance would be appreciated.13:52
fungittx: i can take care of it shortly, just need to wire up a separate irc client13:55
ttxfungi: ok, no urgency at all13:55
openstackgerritThierry Carrez proposed opendev/system-config master: PTGBot is now openinfraptg on #openinfra-events  https://review.opendev.org/c/opendev/system-config/+/77486213:56
*** cloudnull has quit IRC13:56
*** cloudnull has joined #opendev13:59
fungiconfig-core: diablo_rojo is volunteering to help with irc channel management, and is working on some foundation channel moves to the #openinfra channel namespace, simple change to add her to our default channel operators list here: https://review.opendev.org/77455514:00
*** iurygregory has joined #opendev14:13
*** hasharAway has quit IRC14:15
openstackgerritMerged openstack/project-config master: Add diablo_rojo to AccessBot Operators  https://review.opendev.org/c/openstack/project-config/+/77455514:18
*** hasharAway has joined #opendev14:46
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size  https://review.opendev.org/c/zuul/zuul-jobs/+/77347414:49
*** hasharAway is now known as hashar14:54
mordredttx: my eyes were reading your ptg bot change and misparsed the new bot name as "open in fraptg" and I was like "what's a fraptg?" I'm clearly not fully awake14:57
fungithat's open in frap tg14:59
mordredexactly15:00
mordredsee - I knew I needed more coffee15:00
fungiit's all about the frappuccino in here15:01
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size  https://review.opendev.org/c/zuul/zuul-jobs/+/77347415:08
*** ysandeep|rover is now known as ysandeep|dinner15:09
*** fressi has quit IRC15:23
*** ysandeep|dinner is now known as ysandeep|rover15:29
openstackgerritSorin Sbรขrnea proposed zuul/zuul-jobs master: Upgrade ansible-lint to 5.0  https://review.opendev.org/c/zuul/zuul-jobs/+/77324515:38
*** hashar is now known as hasharAway15:42
*** ykarel is now known as ykarel|away15:54
clarkbianw: kopecmartin: I'll take a look after breakfast15:57
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs  https://review.opendev.org/c/zuul/zuul-jobs/+/77465016:08
*** mlavalle has joined #opendev16:16
*** mlavalle has quit IRC16:16
*** mlavalle has joined #opendev16:17
openstackgerritOleksandr Kozachenko proposed openstack/project-config master: Add zuul-storage-proxy in zuul namespace  https://review.opendev.org/c/openstack/project-config/+/77236416:17
*** ykarel|away has quit IRC16:17
fungittx: i've grouped openinfraptg into the nickserv registration for the existing openstackptg account, looking at what we need to update in hiera next16:23
ttxfungi: probably just the $ptgbot_nick16:24
ttxhiera('ptgbot_nick', 'username')16:25
*** marios has quit IRC16:31
fungi#status log Grouped openinfraptg nick to existing openstackptg account in Freenode and updated ptgbot_nick in our private group_vars accordingly16:32
openstackstatusfungi: finished logging16:32
*** ianw has quit IRC16:39
*** ianw has joined #opendev16:39
openstackgerritMerged openstack/project-config master: Add zuul-storage-proxy in zuul namespace  https://review.opendev.org/c/openstack/project-config/+/77236416:40
*** hasharAway has quit IRC16:47
*** hasharAway has joined #opendev16:49
*** ysandeep|rover is now known as ysandeep|away16:54
clarkbianw: the refstack change lgtm. I did leave a couple of thoughts/questions though would be great if you can check those before we merge it16:55
openstackgerritJeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version  https://review.opendev.org/c/opendev/puppet-pip/+/77490016:58
fungiinfra-root: ^ more fallout from pip 2116:58
clarkb+217:00
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs  https://review.opendev.org/c/zuul/zuul-jobs/+/77465017:15
tobiashfungi: is the only problem with pip 21 the drop of py 3.5 or should I expect more issues?17:25
openstackgerritLuigi Toscano proposed openstack/project-config master: test-release-openstack: use focal  https://review.opendev.org/c/openstack/project-config/+/77490617:26
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs  https://review.opendev.org/c/zuul/zuul-jobs/+/77465017:35
*** d34dh0r53 has quit IRC17:43
*** diablo_rojo has joined #opendev17:44
*** d34dh0r53 has joined #opendev17:45
diablo_rojofungi, since all those patches for the irc channel admin have landed, are we good to go on with the next set of commands then?17:47
diablo_rojoAnd hypothetically I will be able to run them given I am now apart of that group?17:47
clarkbdiablo_rojo: you can check your perms with chanserv first to confirm too17:48
clarkbdiablo_rojo: /query chanserv access #your-channel list17:49
clarkbtobiash: yes pretty much. They just made new pip >=python3.6 only17:50
clarkbI don't think much else about it has changed17:50
clarkb(previously they added the dependency resolution which was a major change but that happened with 3.5 supprot)17:50
tobiashclarkb: thanks, so probably no problem for us :)17:50
diablo_rojoLooks like I am good to go clarkb! Thanks for the direction.17:52
*** hasharAway is now known as hashar18:00
*** jpena is now known as jpena|off18:01
stephenfinHit a weird bug18:06
stephenfinI tried to edit a patch's commit message and clicked save18:06
stephenfinwaait, ignore that18:06
* stephenfin had to refresh to get the Publish Edit button to appear18:07
stephenfinand as confused by the "Go to latest patch set" bar that was appearing18:07
stephenfin*Was18:07
clarkbI think the go to latest patchset set button may imply you are editing an older patchset18:08
clarkbbut ya the in browser editor has always been a bit weird18:08
clarkb(I think its better now than it was on 2.13 though)18:08
*** Alex_Gaynor has joined #opendev18:10
Alex_Gaynor๐Ÿ‘‹ I'm seeing arm64 builds hanging out in queue'd status for an extended period (>1 hour) https://zuul.opendev.org/t/pyca/status/ I don't see anything obvious in grafana that explains this.18:10
clarkbhttps://grafana.opendev.org/d/pwrNXt2Mk/nodepool-linaro?orgId=1 shows that we were recently using at or near capacity, but right now it does look idle18:11
Alex_GaynorAnd has 0 in building. I'd expect things to be building if I'm queued :-)18:12
clarkbme too18:12
clarkbthere are some errored launch attempts there, I wonder if it is failing early to build so we don't trigger the building report to graphite /me goes to look at launcher logs18:13
Alex_GaynorThe queue also appears to be very deep, though I obviously have no idea if that's related.18:15
clarkba deep queue can also cause jobs to wait before building just due to lack of available resources to process everything at once, but in this case it seems to be that it isn't using any of the available resources from some reason18:16
clarkbnodepool just logged 504 Gateway Time-out: The server didn't respond in time. for linaro server deletions18:17
clarkbI wonder if the api services just went away /me digs more18:17
Alex_Gaynorif it can't handle deletions, seems like you might end up with a "phantom" pool of resources that exist, but are unusable, and also prevent spinning up new ones.18:18
clarkbya its also affecting the listing of resources and services on different ports.18:18
*** eolivare_ has quit IRC18:18
mordredthat reminds me of the old HP Public Cloud bug18:18
clarkbmy preliminary analysis is that the cloud apis just went away18:19
clarkbkevinz: ^ fyi if you happen to be awake18:19
clarkbI'll try interacting with it manually to see if I can observe any other useful behaviors18:19
mordredwith the missing database index that caused both creates and deletes to timeout at the LB but continue running/blocking on the backend - and of course since the LB timed it out, client code would retry the operation just putting more in the backend queue ...18:20
clarkbmordred: ya image lists and server shows work18:21
clarkbif run manually so I'm suspecting the issues are more narrow (similar to what you describe)18:21
* clarkb tries to manually boot and delete a server18:22
*** rpittau is now known as rpittau|afk18:23
clarkbmy test node failed with 'No valid host was found. There are not enough hosts available.'18:25
clarkbhowever, if nodepool was hitting ^ I would've expect node failures to bubble up to zuul18:26
clarkbthis is an interesting situation18:26
mordredyeah - why was nodepool getting gateway timeouts - no valid host is a real error18:30
mordredunless no valid host is causing nodepool to retry loop and the loadbalancer is rate-limiting nodepool now but not you18:30
mordreddid you do that manual launch from a nodepool node?18:31
clarkbno I did it from bridge18:31
clarkbso that could be it, the proxy telling us to go away after tight looping due to failures18:31
mordredyeah18:31
mordredso could be a double failure sitch18:31
clarkbya grepping on not enough hosts I see a bunch of those errors in a small period of time then it stops which would be inline with your hypothesis18:32
clarkbhrm except the gateway failures happened first and now its going through and finding no valid host18:35
clarkbI expect that what is happening is something broke at a network level and caused the cloud to have a sad. It has since recovered enough to fail node boots with no valid host but not recovered enough to actually boot them18:36
clarkband now jobs are going to start getting node failures18:36
clarkbbut I'll keep poking and see if I can come up with a more concrete idea of what is going on18:36
clarkbjust caught it doing another round of attempts and then getting back no valid hosts18:40
clarkbdo we have a backoff on node relaunch attempts?18:40
clarkbthat may explain why we aren't seeing this in a tight loop18:41
clarkbcorvus: ^18:41
*** diablo_rojo has quit IRC18:44
*** dtantsur is now known as dtantsur|afk18:45
fungitobiash: that's the main change i'm aware of in pip 21, it drops support for python <3.6 (including dropping 2.7 support)18:45
fungioh, i see clarkb also answered you18:46
fungidiablo_rojo seems to have dropped again18:46
corvusclarkb: i don't think there's an explicit backoff, just a complex system of loops and timeouts18:52
clarkbhrm, it is definitely not progressing through the requests as quickly as I would expect if there is no backoff. I think this is a "good" thing in that it means we may end up with a fixed cloud before everythong NODE_FAILUREs though18:53
*** klonn has joined #opendev18:56
*** whoami-rajat__ has quit IRC18:57
openstackgerritMerged openstack/project-config master: test-release-openstack: use focal  https://review.opendev.org/c/openstack/project-config/+/77490619:02
clarkbfwiw still seeing bursts of no valid host found19:15
*** ralonsoh has quit IRC19:17
*** rchurch has quit IRC19:17
Alex_GaynorFWIW, I'm now seeing clear "node_failure" statuses, so progress?19:18
fungicorvus: so the manage-projects run took 1.5 hours19:19
corvusinfra-root: i'm looking at a manage-projects log, and it output a lot of errors on gitea0319:19
corvusand 0219:19
fungimaybe we're getting slammed again19:20
*** rchurch has joined #opendev19:20
fungichecking graphs19:20
clarkbAlex_Gaynor: yes I think what is happening is the no valid host errors we are seeing more recently are going to start bubbling up as NODE_FAILURES19:20
tobiashclarkb: I'm wondering if we should treat no valid host found errors in nodepool like non-fatal quota issues19:20
corvusand 05... let's just say several giteas for now since it's hard to read these logs19:20
fungicorvus: oh yeah, massive swap thrash and eventual oom in that timeframe19:20
fungiso yay, our mystery load generator has returned? and now we have improved logging to investigate with19:21
clarkbtobiash: in this case there is only one cloud provider for these node types so that would just cause all jobs to sit and wait until the cloud fixed tiself19:21
clarkbfungi: fwiw I think its been about exactly one week since last time19:21
fungineat19:21
clarkbfungi: a fun cron maybe?19:21
fungilast time our suspect was rdo's ci servers, right?19:21
clarkbbut ya the improved logs will hopefully allows us to identify the source19:21
tobiashclarkb: we were getting no valid host found mostly when the cloud is short on resources due to potential too high over provisioning19:22
clarkbfungi: yes, they had an order of magnitude more requests on some of the servers (and theory was they tripped that one over which caused a chain reaction as haproxy rebalanced the pool)19:22
corvusfungi: gitea03 starting at 2021-02-10T17:05:30.15691219:22
clarkbAlex_Gaynor: unfortunately I don't think there is much more we can do without the cloud intervening.19:22
Alex_Gaynor๐Ÿ‘19:22
clarkbkevinz: when your day starts can you sync up with us and see if we can help with further debugging?19:22
clarkbcorvus: fungi: the rough debugging process with improved logs is look at apache2 access logs on affected hosts during the time frame and note the source port for large or out of place requests. Then go to haproxy server syslog and grep for that port and gitea backend19:23
fungiclarkb: also sometimes it's helped to e-mail kevinz since he may see that sooner than irc19:24
clarkbbecause there are only 65k possible ports you also typically have to match timestamp ranges too19:24
clarkbfungi: good idea. I'll write an email then see if I can help with gitea19:24
corvustobiash, clarkb: i agree, that's usually what that error means.  i think i'd be in favor of treading it as a non-fatal error; though since it's not actually reflected in quota, i don't think we'll be able to handle it intelligently.  i think we ought to decline the request if we are not the last possible launcher.19:24
fungiyeah, i'm doing the thing with gitea02, but someone independently doing the same for another impacted backend would help correlation19:24
corvusfungi: i think i need to leave the gitea debugging to you, sorry19:25
fungicorvus: no worries, thanks for spotting it!19:26
fungii'll be semi-focused on this for the next little while, but also need to do some cooking shortly19:27
*** hashar has quit IRC19:33
clarkbok email sent19:34
clarkbtried to accurately describe the transition from 504 gateway errors to no valid host found with accurate timestamps19:35
fungiinterestingly, the greatest number of connections i see to gitea02 during the 17z hour was from codesearch.o.o19:36
*** andrewbonney has quit IRC19:39
clarkbis it possible that creating new projects is doing it?19:41
clarkb16:40:05  openstackgerrit | Merged openstack/project-config master: Add zuul-storage-proxy in zuul namespace  https://review.opendev.org/c/openstack/project-config/+/77236419:41
clarkbor is that just getting caught in the fallout? Our gitea testing does actually create all the projects in our project list and it does that successfully19:42
fungiyeah, it's mostly come to our attention when new project creation fails, but we create new projects at other times without issue19:43
fungii don't think it's codesearch, because it's also far and away the largest source of connections to gitea02 at other times where this wasn't going on19:44
clarkbautomated email response has reminded me that it is the chinese new year19:46
fungid'oh!19:47
fungithat doesn't bode well19:47
*** slaweq has quit IRC19:49
*** zimmerry has quit IRC20:11
*** zimmerry has joined #opendev20:13
*** sboyron_ has joined #opendev20:55
*** sboyron has quit IRC20:58
*** sboyron_ has quit IRC21:09
ianwclarkb: how strongly do you feel about the /var/refstack v /var/lib/refstack?  enough to respin?21:29
fungieven with /var/refstack being non-fhs-compliant, i wouldn't want anyone to redo work21:30
clarkbianw: not super. I tend to always look at the docker compose file and work back from there anyway (and that has teh /var/lib/refstack pointers21:31
ianwfungi: well the change is moving everything to /var/lib/refstack so i guess we're good from that pov21:31
clarkbjust calling it out as a difference to gitea if others care more strongly21:31
ianwok i might just go with it, and see if having a persistent db makes things work.  i'm not sure though, i did a "mysqldump <trove-details> | mysql" to try and populate it and it didn't seem to work, but i don't know21:32
ianwi have to just do school run but can help with gitea things if i can be useful21:33
fungiin what way did it not work? i think i've used mysql -e for such things in the past21:34
fungior source the path to the dumpfile in the interactive mysqlclient prompt21:34
clarkbre gitea I'm beginning to wonder if it could be our own project description updates that does it21:34
clarkbpossible that when there is background load on gitea that doing the management stuff all at once like that can make things sad21:35
clarkbhowever, not 100% sure of that yet21:35
fungiyeah, i need to find a minute to try and match up the manage-projects ansible log to see if i can tell when it started hitting different backends with when the memory on each of them started to skyrocket21:37
fungiugh, perhaps unsurprisingly, we have bitrot in our puppet-pip module testing21:41
fungilooks like it could be a problem with beaker-hiera21:42
fungireading a bit, we may need to pin beaker-hiera<0.2 in our spec helper repo21:44
clarkbthat seems to be the classic case of bit rot in the puppet space for us21:44
openstackgerritJeremy Stanley proposed opendev/puppet-openstack_infra_spec_helper master: Pin beaker-hiera<0.2.0  https://review.opendev.org/c/opendev/puppet-openstack_infra_spec_helper/+/77503021:49
openstackgerritJeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version  https://review.opendev.org/c/opendev/puppet-pip/+/77490021:50
clarkbfungi: I seem to recall that depends on won't work for that for some reason? we may just have to land the infra spec helper change (which I am reviewing)21:51
fungiyeah, maybe21:52
openstackgerritMerged opendev/system-config master: refstack: create database storage area  https://review.opendev.org/c/opendev/system-config/+/77477321:54
clarkbianw: fwiw I would've expected the mariadb to work without the persistent mount but if docker-compose down then up -d was run you'd lose the db21:55
clarkbits definitely a good and correct improvment21:56
clarkbfungi: just thinking out loud here about the gitea thing. Maybe we can measure it in our test job and see if that exhibits high load during the project description update pass?22:12
clarkbI don't know how easy our existing test tooling makes that though22:12
fungiwonder if we could add roles to start dstat and collect its record?22:16
ianwdevstack already has a background service that does similar22:17
fungiyeah, just didn't know if devstack's implementation was easily reused or tightly coupled to devstack's overall design22:19
ianwprobably jumping on a running host a copying the .service file would be enough22:21
*** diablo_rojo has joined #opendev22:21
diablo_rojofungi, been trying to run the renaming commands in the docs, but chanserv says I am not authorized?22:23
clarkbdiablo_rojo: you may have to explicit op yourself in the channel first22:23
clarkbthe ability to op and actually being op are separate22:23
fungidiablo_rojo: /msg chanserv op #openstack-board22:24
fungii think that's the syntax22:24
fungiand then the same but deop instead of op when you're finished22:24
clarkbalso this could be behavior change between gitea 1.13 and 1.14?22:28
clarkbassuming the project description updates are related.22:28
clarkbJust thinking out loud here: we could also remove the description updates for now and monitor the next project creation22:29
diablo_rojoI did actually explicitly op myself in the channel first.22:33
diablo_rojoBut let me try again22:33
diablo_rojoYeah. It still says I am not authorized..22:34
clarkbdo you need it on both sides maybe?22:35
diablo_rojoIn both #openinfra-events and #openstack-ptg I have op22:35
diablo_rojobut I can't set the guard on #openstack-ptg22:35
clarkbhrm22:36
diablo_rojoRight?22:36
fungii'll need to re-check the freenode mode reference222:38
fungimost of us are +Aeforstv but diablo_rojo is only +Aefortv22:38
diablo_rojoWeird.22:38
fungifounder perms are +AFRefiorstv22:39
fungihttps://freenode.net/kb/answer/channelmodes22:39
fungiahh, nope, i wanted perms22:40
*** levalicious has quit IRC22:40
diablo_rojoI would guess since I am missing the 's' that's why I can't 'SET' things.22:40
fungioh, though that says "An operator can use MLOCK with +f only if they have access flag +s in both channels, or if the channel to be forwarded to is +F and they have +s in the original channel."22:43
fungiso, yep22:43
fungithat looks likely22:43
fungii wonder why we don't normally grant +s to our operators list?22:43
fungidiablo_rojo: i've added +s to your perms for #openinfra-events and #openstack-ptg, see if that worked?22:45
fungiif so i can add you to the others you're working on while we figure out why we're not setting +s on everyone in the access list22:45
diablo_rojofungi, no dice.22:47
fungi+s is "Enables use of the set command." according to `/msg chanserv help FLAGS`22:47
fungidiablo_rojo: my fault, syntax error. should actually be added now22:49
fungii didn't spot the error response on my first try. probably doing too many things at the same time22:49
*** klonn has quit IRC22:51
fungiianw: "This message is to inform you that the host your cloud server, ianw-klog-collector, resides on alerted our monitoring systems at 2021-02-09T12:39:27.515190."22:59
fungii can close that ticket out, just wanted to make sure you knew23:00
ianwfungi: oh, we can delete that.  that was the server i was using to collect logs from the linaro hosts that kept disappearing23:01
fungiianw: also are the "backup inconsistency" e-mails a test, or false negative?23:01
openstackgerritMerged opendev/puppet-openstack_infra_spec_helper master: Pin beaker-hiera<0.2.0  https://review.opendev.org/c/opendev/puppet-openstack_infra_spec_helper/+/77503023:01
clarkbianw: re linaro do you know if there is someone else we should send email about the no valid host found errors there? kevinz's returned that it is the chinese new year23:01
fungi"Inconsistency found in backup /opt/backups/borg-gitea01/backup on backup01 at Wed Feb 10 01:13:47 UTC 2021"23:02
fungiet cetera23:02
ianwsorry just pulling up my mail client23:02
clarkbfungi: I'm yet to receive that one it seems23:02
fungiclarkb: they went to the root inbox23:02
ianwclarkb: yeah, i don't have another contact unfortunately ... not sure what else to do :(23:02
diablo_rojofungi, good to go now.23:02
diablo_rojoI can set stuff.23:03
ianwhuh, those backup inconsistency ones i would not expect23:03
fungiclarkb: ianw: could hrw know who to contact?23:03
diablo_rojoIf you want to give me that perm on the other channels I can move forward.23:03
fungidiablo_rojo: awesome, yeah doing that now, just a sec23:03
diablo_rojofungi, thank you!23:03
clarkbfungi: ya not finding it (I did a global serach too to rule out filing into an unexpected dir)23:04
ianwfungi: maybe, he is usually in here but from #linaro we might have just missed him.  worth a try23:05
fungiclarkb: no, i mean our shared root inbox23:05
clarkboh I see23:05
fungidiablo_rojo: i think i got them all23:06
ianwoh, hrm we didn't actually approve the backup verification yet @ https://review.opendev.org/c/opendev/system-config/+/77475323:06
fungiclarkb: also inmotion has been sending setup complete notifications to that address, do you need those or can i file them into a subfolder?23:06
ianwfungi: ok, they are definitely false positives from when i was testing it and pressed ctrl-c, killing the verification process that the script then warned out23:07
diablo_rojofungi, sweet! Thank you :)23:07
fungicool23:08
fungidiablo_rojo: you're welcome, lmk if you run into more problems23:08
clarkbfungi: they can be filed away23:09
diablo_rojofungi, will do!23:11
fungiclarkb: thanks, done23:11
fungiworking on closing out the rackspace ticket about ianw-klog-collector as well23:12
openstackgerritClark Boylan proposed opendev/system-config master: Build Gerrit 3.3 images  https://review.opendev.org/c/opendev/system-config/+/76502123:12
openstackgerritClark Boylan proposed opendev/system-config master: Run gerrit 3.2 and 3.3 functional tests  https://review.opendev.org/c/opendev/system-config/+/77380723:12
openstackgerritClark Boylan proposed opendev/system-config master: Cleanup refstack job dependencies  https://review.opendev.org/c/opendev/system-config/+/77504123:12
ianwfungi: if you're in the control panel can you just delete it?23:12
ianw(the server, otherwise i'll do it later)23:12
fungiianw: oh, yep happy to do that too23:12
clarkbianw: ^ I stuck that refstack cleanup behind my gerrit 3.3 jobs addition beacuse there were merge conflicst23:12
diablo_rojofungi, missing #openstack-summit23:13
clarkbI don't think either is urgent but wanted to point that out as I noticed it when fixing the conflicts23:13
ianwclarkb: oh, sorry, i owed a review on the gerrit 3.3 things, looking23:13
fungidiablo_rojo: oh, hah, i'm not allowed to do that23:13
diablo_rojoHa ha ha23:14
diablo_rojoAlright, will circle back to that one then.23:14
fungidiablo_rojo: apparently not actually an official channel, only access is for the founder "spy" who created that channel >10 years ago23:14
fungi>10.5 years ago in fact23:15
fungi"modified 10y 31w 3d ago"23:16
fungicorvus: mordred: does the irc nick "spy" ring a 10.5-year-old bell for you?23:16
clarkbthat must've been the first summit23:16
fungiindeed23:16
clarkb(if I've done math right)23:16
fungijbryce: ^ you might remember too, i suppose?23:18
corvusyes spy was an OG23:20
fungiianw: i've closed out the ticket and deleted the instance now23:20
corvusfungi: need me to ask freenode for it?23:20
ianwthankyou!23:20
fungicorvus: if you have a moment, that would be much appreciated!23:20
fungiat this point we're just trying to set a forward on it anyway23:21
fungithat reminds me i still need to start the org application for the #openinfra channel namespace, i found freenode's documentation on the process at least23:24
corvusfungi: better sooner than later23:26
fungiyup23:26
fungiwe're only just starting to forward to those, and i didn't want to jump the gun asking for that namespace until we'd given the former occupants of the base channel some time23:27
corvusfungi: done; Flags +AFRefiorstv were set on openstackinfra in #openstack-summit.23:28
fungithanks corvus!23:28
fungidiablo_rojo: as soon as our next accessbot run fires, we should be all set23:29
mordredfungi: wow. spy is old23:29
diablo_rojoseems I cant set the MLOCK for the openstack-foundation to openinfra redirect23:30
ianwclarkb: did you ever look at making gitea pause until the db container was active?  you used to be able to set a "healthcheck" on the mariadb instance and make other containers wait on that with a condition, but for some reason they removed that apparently23:31
corvusdiablo_rojo: there's an existing forward for the unregistered channel; maybe that needs to be removed first?23:32
corvusinfo #openstack-foundation23:33
corvusderp23:33
diablo_rojocorvus, ohh that makes sense. Oh nailed it.23:33
diablo_rojoThats already in place then23:33
clarkbianw: I think docker-compose doesn't have that ability to wait. YOu have to do it within the container with like an init script23:33
clarkbianw: I want to say once I couldn't do it with docker-compose I gave up because I didn't want to have a super complicated container image23:34
clarkb(but maybe complicated container image is a good idea?)23:34
corvusdiablo_rojo: the current status is: Mode lock  : +ntcrf #openstack-unregistered23:34
diablo_rojooh so not redirected to the right place23:35
*** DSpider has quit IRC23:35
ianwclarkb: i got it working with http://paste.openstack.org/show/lCL5sfUhtkXLtvmHwMmV/ using version 2.1, but then i read that apparently that was considered too useful and so they removed it in version 3 :/23:36
ianwi don't actually know if it matters; i'm assuming refstack retries until it connects anyway23:36
ianwwe didn't deploy because of a typo23:36
clarkbgitea retries23:36
clarkbianw: what was the typo?23:37
openstackgerritIan Wienand proposed opendev/system-config master: refstack: fix typo in role matcher  https://review.opendev.org/c/opendev/system-config/+/77504423:37
ianwclarkb: ^ :)23:37
fungiso looking at our accessbot config, we say to set +Aeforstv on everyone in the operators list, so i'm not sure why it added diablo_rojo to them without +s23:38
fungiactually there are channels it didn't add her to at all, i'll check the accessbot output23:39
clarkbianw: doh23:41
ianwi'll just do a manual run to get the new files on23:41
fungi2021-02-10 08:05:41,556 [INFO] setaccess - access #openinfra-board add diablo_rojo -FRis23:45
ianwok, i've started the refstack mariadb container and /var/lib/refstack/db/ is populated.  i'm going to run the mysqldump import from the old trove23:45
funginow to figure out where/why accessbot is setting -FRis23:46
ianwwell https://refstack01.openstack.org/#/community_results still seems to not be happy23:48
clarkbfungi: operators don't have FRi (but do have s)23:49
fungiyeah, that's what i find weird23:49
clarkb-FRi seems like what I would've expected23:49
fungibut also she wouldn't have had FRi anyway23:49
clarkbas a santiy check other operators do have +s (but no FRi)23:51
fungiand it seems like accessbot isn't processing the whole list either, though it's not immediately apparent to me from the log why that is23:51
*** CeeMac has quit IRC23:51
openstackgerritIan Wienand proposed opendev/system-config master: refstack: capture container logs to disk  https://review.opendev.org/c/opendev/system-config/+/77504623:52
fungi2021-02-10 14:33:57,877 [DEBUG] irc.client - TO SERVER: QUIT :Connection reset by peer23:52
fungioh maybe we're getting disconnected23:53
clarkbis it being rate limited?23:53
fungiyeah, could be something like that, though the server doesn't seem to explain23:54
ianw"Blocked loading mixed active content โ€œhttp://refstack01.openstack.org:8000/v1/results?page=1โ€23:55
fungithat's being logged by the apache layer?23:55
clarkbianw: fwiw it gives me json back23:56
clarkbbut I have to switch it to port 44323:56
ianwyeah, i think the errors might be on the front end and the db is ok, it's just all confused between https/http and it's hostname...23:57
clarkbianw: I think the port 8000 stuff isn't meant to be publicly exposed fwiw23:57
clarkbbut I guess if that is apache complaining then you aren't hitting that23:57
fungilooks like tools/apply-test.sh needs some help to deal with latest cryptography now23:58
clarkbfungi: that uses ansible to run puppet with ansible and probably uncapped cryptography with old pip being the problem there?23:58
clarkbfungi: can probably upgrade pip first or cap cryptography in the ansible install23:58
fungiyeah, it's happening when we pip install ansible23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!