Wednesday, 2021-12-15

*** rlandy|ruck is now known as rlandy|out00:57
corvuswe are at > T+2h since the rolling restart, and everything seems nominal01:10
fungiyeah, things are looking clean to me01:18
wxy-xiyuanhi ianw, the openEuler label is there https://zuul.opendev.org/t/openstack/labels but there is no node for ready. I assume there is any build/launch error in nodepool? Could you please take a look, or how can I debug it? Thanks. 02:00
ianwwxy-xiyuan: ahh, sorry i meant to check back on that : you can see the build @ https://nb03.opendev.org/openEuler-20.03-LTS-SP2-arm64-0000000023.log02:01
ianwit's failing in our project-config elements02:01
ianwi have to admit that totally slipped my mind02:02
wxy-xiyuanNice, this log is what I need. Checking. Big thanks02:02
ianwwxy-xiyuan: the elements in https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements will need updating02:05
wxy-xiyuanxinliang https://nb03.opendev.org/openEuler-20.03-LTS-SP2-arm64-0000000023.log02:05
wxy-xiyuandib-run-parts Running /tmp/in_target.d/install.d/20-iptables02:05
wxy-xiyuanecho 'Unsupported operating system openeuler'02:06
wxy-xiyuanianw ++02:06
wxy-xiyuanxinliang: https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/nodepool-base/install.d/20-iptables  as ianw said, maybe not only iptalbes, but also other base elements need update02:07
xinliangwxy-xiyuan: thanks, looking at it02:07
xinliangwxy-xiyuan: these elements haven't been tested before02:09
fungisystem-config-run-mirror-update seems like it may have started consistently timing out in the run phase02:31
opendevreviewwangxiyuan proposed openstack/project-config master: Add openEuler disto support for elements  https://review.opendev.org/c/openstack/project-config/+/82179402:33
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177202:50
ianwfungi: hrm -- i think that does mount afs as part of it.  that's the only part that i think would cause problems02:53
ianw2021-12-15 02:02:59.852337 | bridge.openstack.org | 2021-12-15 01:50:38,158 INFO ansible: changed: [mirror-update01.opendev.org] => {03:02
ianw2021-12-15 02:02:59.867179 | bridge.openstack.org | 2021-12-15 01:58:35,723 INFO ansible: changed: [mirror-update01.opendev.org] => {03:03
ianwit took about 8 minutes to build openafs03:03
ianwthat ... seems about normal, i guess03:03
ianwhttps://03eb8bb46d4e2a6a232a-dc3e65ccae23bb6c49297bc4ac109b91.ssl.cf5.rackcdn.com/820899/6/check/system-config-run-mirror-update/3e8178e/job-output.txt03:03
ianw2021-12-15 02:03:00.611982 | bridge.openstack.org | 2021-12-15 02:02:43,597 INFO ansible: mirror-update01.opendev.org : ok=233  changed=121  unreachable=0    failed=0    skipped=28   rescued=0    ignored=1203:05
ianwthe whole thing finished 4-ish minutes after that, so nothing blowing out there03:05
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177203:16
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177203:35
opendevreviewchzhang8 proposed openstack/project-config master: register and bring back tricircle under x namespaces  https://review.opendev.org/c/openstack/project-config/+/80044203:42
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177203:51
fungisecond recheck seems to have made it back into the gate too, so i guess it's not 100% failing04:28
*** ysandeep|out is now known as ysandeep04:50
*** pojadhav- is now known as pojadhav05:12
ykarelrax-iad and rax-dfw still affected with pypi issues05:13
ykarelianw, you around?05:13
ykarelor anyone else from infra who can help in clearing this05:14
ykarelone option i see to try out revert of https://review.opendev.org/c/openstack/project-config/+/76049505:16
ykarelas public mirror of these looks good, the internal ones are impacted05:16
ykarelor can try -XPURGE to those internal mirrors if that resolves it, but need to do that from where those are reachable05:17
ianwykarel: I am for just a bit06:02
ykarelianw, ack np i am trying to clear with hack https://review.opendev.org/c/openstack/neutron/+/82179806:03
ykarelclear rax-iad06:03
ykarelnow waiting for node on rax-dfw06:03
ianwi'm not sure what that internal mirror would have to do with it?  that is just using the rax local network to access the mirror, but it's the same node06:04
ianwi.e. mirror-int.iad.rax.opendev.org == mirror.iad.rax.opendev.org, just one is the internal interface06:05
ianwas mentioned with pypi, we are only a proxy...06:05
frickleralso, although the nodes are called "mirror", they are actually just caching proxies06:06
ykarelianw, but can see failures with mirror-int.iad.rax.opendev.org but with mirror.iad.rax.opendev.org installation get's fine06:06
ianwhrm, that is *probably* a red-herring issue -- they both go to exactly the same apache process06:06
fricklerykarel: do you have links to the failures?06:06
ianw(i mean, i will never say never, but I'd highly doubt that is actually the issue)06:07
ykarelfrickler, https://86804121d4d0f7ba6424-61662cfb64be48a1e2663c2773bf553c.ssl.cf2.rackcdn.com/821414/2/check/neutron-tempest-plugin-api/c4e6f3a/job-output.txt06:08
ykarelthere were many failures http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22The%20user%20requested%20(constraint)%5C%2206:09
ykarelyesterday other providers were also impacted but fungi cleared those by running -XPURGE against those06:10
ianw2021-12-15 03:53:43.468529 | controller | ERROR: Cannot install neutron==19.1.0.dev278 because these package versions have conflicting dependencies.06:10
ykarelbut today only seeing failures in rax-iad and rax-dfw06:10
ykareland the common among those was these using mirror-int06:10
ianw2021-12-15 03:53:43.468804 | controller |     neutron 19.1.0.dev278 depends on pecan>=1.3.206:10
ykarelso likely yesterday those were not cleared06:11
ianw2021-12-15 03:53:43.468828 | controller |     The user requested (constraint) pecan===1.4.106:11
ykarelianw, yes rax-iad affected with ^ and rax-dfw with pyjwt06:11
fricklerlikely yet another CDN hickup for pypi. give me a bit to setup some local testing06:11
ykarelfrickler, try try with rax-dfw06:12
ykarelrax-iad seems to be fixed after i ran -XPURGE06:12
ianwso the real problem is that pip gets an error fetching pecan?  but outputs that error?06:13
ykarelianw, yes right06:13
ykarel1.4.1 not available06:13
ykarelolder version can be installed06:13
fricklerthe error that pip doesn't log usually is that it doesn't find any version at all in the index06:17
ykarelfrickler, i found a running job on rax-dfw, is it possible to log into it ?06:22
ykarelip 104.130.132.10706:22
fricklerykarel: both iad and dfw seem to be working for me. either my testing is wrong or the XPURGE is indeed global as I assumed06:22
ykarelfrickler, you used mirror-int?06:23
fricklerykarel: no, I agree with ianw that there is no plausible explanation for it to behave differently06:24
ykarelfrickler, may be can check on 104.130.132.107 node to confirm?06:24
ykarelin a venv with latest pip just need to run MIRROR=mirror-int.dfw.rax.opendev.org06:24
ykarelpip install  --index-url="https://${MIRROR}/pypi/simple" --extra-index-url="https://$MIRROR/wheel/ubuntu-20.04-x86_64"  --trusted-host=$MIRROR -c https://raw.githubusercontent.com/openstack/requirements/master/upper-constraints.txt pyjwt06:25
ykarelif ^ fails try with public one06:25
fricklerworking just fine for me06:27
ykarelfrickler, ohkk then likely it got fixed, you ran on 104.130.132.107 only, right?06:28
frickleryep06:28
ykarelack Thanks for checking, will keep an eye if it happens again06:29
frickleroh, wait, I tested without u-c. it only finds pyjwt==2.2.0, 2.3.0 is missing06:31
ykarelahh then it's affected, can try the same command as above06:31
ykarelfirst with mirror-int and then with public one06:31
fricklerykarel: o.k., indeed it works without the -int, this is very weird, need to look into what is happening on the proxy06:38
ykarelfrickler, okk good06:39
ykarelfor now may be can just run PURGE against mirror-int.dfw.rax.opendev.org to clear CI06:39
fricklerykarel: it doesn't work like that, the purge is against pypi, not the proxy.06:42
ykarelfrickler, but isn't if you fire the request to proxy it don't got to pypi?06:42
fricklerbut maybe the proxy has different local caching for -int and external URLs06:42
ykareli ran it for rax-iad and seemed it worked06:42
ykareli ran purge there and after some time module installed fine06:43
fricklero.k., indeed the proxy caches by URL, so mirror and mirror-int are different06:53
fricklerrunning "htcacheclean  -A -v   -p /var/cache/apache2/proxy/ 'https://mirror-int.dfw.rax.opendev.org:443/pypi/simple/pyjwt/?'" has resolved the issue for that pkg06:53
ykarelfrickler, Thanks06:54
fricklerinfra-root: for reference, this is what I did in detail, still need to look into decoding the timestamps https://paste.opendev.org/show/811679/06:57
fricklerykarel: thanks for being so persistent, I'll see if we can better tune the proxy cache06:58
ykarelfrickler, Thanks, btw you are in what timezone?07:02
fricklerykarel: nominally UTC+1 currently, but I don't always stick to that ;)07:04
ykarelyes seems so as it's too early for you now :)07:05
elodillesfungi corvus : ack, thanks!07:09
dulekHey folks! I see another set of dependency issues in the OpenStack jobs, this time Keystone installation fails on pyjwt. 07:16
dulekDoes it make sense to recheck these jobs now?07:16
fricklerdulek: if the errors were happening on rax-dfw or rax-int, yes. otherwise please link to a failure07:28
fricklerinfra-root: it seems that sometimes we have "stuck" cache entries with an expiry of 24h instead of the expected 5m, see the first timestamps in my above paste for the mirror-int entry07:29
dulekHere's one on rax-iad: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0a9/821442/7/check/openstack-tox-pep8/0a92d7c/job-output.txt07:35
dulekAnd that's it, rest failed on rax-dfw.07:37
*** ysandeep is now known as ysandeep|brb08:12
*** ysandeep|brb is now known as ysandeep08:22
*** sshnaidm is now known as sshnaidm|afk09:33
*** ysandeep is now known as ysandeep|afk09:57
*** redrobot6 is now known as redrobot10:34
*** ysandeep|afk is now known as ysandeep11:08
*** rlandy is now known as rlandy|ruck11:13
*** sshnaidm|afk is now known as sshnaidm11:26
anbanerj|ruckHi,11:39
anbanerj|ruckWe have a gate blocker. Patches 816991,16 and 821778 which fixes bugs needs to go first to unblock the rest. Can someone please get these two patches to the top of the queue?11:39
anbanerj|ruckhttps://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/821699/11:39
anbanerj|ruckhttps://review.opendev.org/c/openstack/tripleo-ci/+/821778/11:39
anbanerj|ruckfungi, clarkb ^ when you get some time11:40
anbanerj|ruckthanks11:40
*** pojadhav is now known as pojadhav|afk11:46
anbanerj|ruckAlso 821538 (https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/821538) pls. thanks11:52
*** tkajinam is now known as Guest851711:58
anbanerj|ruckHey clarkb, fungi:  Sorry pls ignore the previous patches. The correct patches that have to go first to unblock the gate are: 821538, 821778, 821699. in order. Could you pls put these at the top of the tripleo gate queue? Thanks!12:07
*** pojadhav|afk is now known as pojadhav12:41
*** kopecmartin_ is now known as kopecmartin13:41
*** lbragstad8 is now known as lbragstad14:09
fungifrickler: i suspect some fastly cnd endpoints occasionally serving pypi's fallback mirror could explain that if their fallback sets different cache parameters on responses14:12
fricklerfungi: indeed, I made some further tests and the cache timeout (default seems to be mostly 10m) is being sent from pypi and not specified on our side14:14
fricklerfungi: while the broken index responses seem to be sent also with broken or long timeout. we should consider setting like a maximum timeout of maybe 1h or less to reduce the impact of those14:15
fricklerhttps://httpd.apache.org/docs/2.4/mod/mod_cache.html#cachemaxexpire would be the option to set14:17
fungiso basically when a fastly endpoint decides it can't reach the real pypi backend and serves something from pypi's backup mirror instead, it's also including a much longer cache timeout which results in us serving that stale data from our proxies for even longer14:17
fungianbanerj|ruck: i've put 821538,2 821778,1 821699,2 (in that order) as the first three items in the tripleo shared gate queue now14:25
anbanerj|ruckfungi, thank you!14:25
fungino problem14:26
*** pojadhav is now known as pojadhav|afk14:32
fricklerfungi: that's just my current assumption based on what I saw and the data from the cache I posted. might need further watching.14:43
fricklerfungi: otoh enforcing a limit of 1h when the default we see is 10m might be agreeable already14:43
fricklerfungi: it's also not sure whether that timeout is set by fastly or comes from the backends14:44
fungiyeah, i'd be up for making it 10min even14:44
frickleranother thing I noticed: we run a htcacheclean daemon for /var/cache/apache2/mod_cache_disk but that dir is empty, the proxy cache is in /var/cache/apache2/proxy , which we don't seem to actively clean at all14:49
*** ysandeep is now known as ysandeep|dinner14:55
*** simondodsley_ is now known as simondodsley14:58
*** mnaser_ is now known as mnaser14:58
*** ildikov_ is now known as ildikov15:00
*** johnsom_ is now known as johnsom15:00
*** clayg_ is now known as clayg15:00
*** bbezak_ is now known as bbezak15:00
*** erbarr_ is now known as erbarr15:01
*** parallax_ is now known as parallax`15:01
*** walshh__ is now known as walshh_15:01
*** davidlenwell_ is now known as davidlenwell15:01
*** JpMaxMan_ is now known as JpMaxMan15:02
*** parallax` is now known as parallax15:02
*** parallax is now known as Guest853515:03
Clark[m]One of those cache paths and htcacheclean is the default you get with Ubuntu packaging. The other is our path necessary due to cinder volumes in use on some hosts. Both should get cache cleaning via Cron jobs.15:09
Clark[m]Note a 10m expiry won't be very effective due to how often htcacheclean runs for keeping disk use down. But I think apache will refresh data it sees as stale, it just won't delete it as quickly as we might expect15:10
*** clarkb is now known as Guest853615:11
fricklerClark[m]: the 10m would be mostly to reduce the impact of caching broken indices, not to reduce disk usage15:13
fricklermight be interesting to check whether we could actually make that specific only for indices and have longer timeouts for wheels/tars15:14
fungiyeah, we want to turn over indices fairly quickly, but the packages don't ever change so can be cached for as long as we have space15:22
fungihowever, the packages are technically proxied from a different site name entirely, so we can probably leverage that difference?15:22
*** ysandeep|dinner is now known as ysandeep15:25
*** parallax_ is now known as parallax15:30
fricklerContext:server config, virtual host, directory, .htaccess15:31
fricklerdoesn't seem to work by location :(15:31
fricklerthe last thing to note, which both ianw and me were wrong about: the cache works per target URL, not per source, so for rax indeed the cache for the -int versions is distinct from the one seen from the public15:33
*** dviroel|rover is now known as dviroel|rover|lunch15:47
elodillesfungi corvus : i'm about to run the branch delete script now. i'll let you know when i reach the part where multiple branches will be deleted in a short time15:49
fungielodilles: sounds great, i'm around and can keep an eye on things as well15:49
elodillesack, let's see15:50
*** rlandy|ruck is now known as rlandy|ruck|brb15:50
*** rlandy|ruck|brb is now known as rlandy|ruck16:12
*** Guest8536 is now known as clarkb16:18
opendevreviewMerged opendev/system-config master: Copy Exim logs in system-config-run jobs  https://review.opendev.org/c/opendev/system-config/+/82089916:32
*** dviroel|rover|lunch is now known as dviroel|rover16:32
*** marios is now known as marios|out16:33
fungithat took far more rechecks than i would have expected16:37
*** ysandeep is now known as ysandeep|out16:54
clarkbit just occured to me that we should update the limnoria bot when meetings aren't happening17:00
clarkbfor that reason my next bullseye update will be the matrix eavesdrop bot instead of limnoria17:00
clarkbcorvus: ^ fyi I'm approving that update. I don't expect trouble since that bot doesn't rely on debian user space for much17:00
clarkbfor limnoria if we land that today we'll want to do it after the swift meeting at 2100 UTC17:02
opendevreviewClark Boylan proposed opendev/system-config master: Add firewall behavior assertions to test_bridge  https://review.opendev.org/c/opendev/system-config/+/82178017:07
clarkbI think ^ might actually end up doing what we want, but I'm doing another forced failure to be sure17:11
*** sshnaidm is now known as sshnaidm|afk17:14
fungilooks like the other test-related change in topic:mailman-lists will merge shortly, and then i'll start approving the ones which might (though highly unlikely) have production impact17:19
clarkbcool. I'll be around if I can help17:20
corvusi'm not quite sure why i'm not seeing a login button on zuul... i'll try to look into that today17:22
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114417:23
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039717:23
elodillesfungi: 10 branch have been deleted in the last ~3 minutes17:28
elodillescurrently i see 10 management events17:29
clarkbnow to see how long it takes them to exit the queue17:29
fungiyeah, in theory it should only spend time on the first and last ones, right?17:30
corvusseeing 10 events in the queue is expected; they should all be processed together (or at least 1 and then 9 more together)17:30
corvus2021-12-15 17:28:42,098 INFO zuul.Scheduler: Tenant reconfiguration beginning for openstack due to projects {('opendev.org/openstack/openstack-ansible-os_barbican', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_ceilometer', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-openstack_openrc', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_gnocchi', 'stable/ocata'),17:31
corvus('opendev.org/openstack/openstack-ansible-os_horizon', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_designate', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_aodh', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_cinder', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_heat', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_glance', 'stable/ocata')}17:31
corvusthat is very promising though :)17:31
elodilles:]17:31
fungiso either it will drop from 10 to 9 to 0, or from 10 to 1 to 017:31
corvusor maybe 10 to 0 if we're lucky17:31
fungipresumably the latter17:31
fungiahh, okay17:32
elodilles0 \o/17:32
fungimuch better!17:33
fungithanks again corvus!!!17:33
elodillesyepp, thanks for the fix!17:33
corvusno prob, thanks for finding the bug :)17:33
elodillesi'll queue now some more branch to delete17:33
fungialso i'm enjoying the cute icons for the pipeline manager types now17:33
corvusthat's mhu's handywork17:34
fungiit's lovely17:34
elodillesanother 16 branch deleted, and now: Queue lengths: 4 events, 14 management events.17:40
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114417:43
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039717:43
opendevreviewMerged opendev/system-config master: Update matrix-eavesdrop image to bullseye  https://review.opendev.org/c/opendev/system-config/+/82133217:46
opendevreviewMerged opendev/system-config master: Collect mailman logs in deployment testing  https://review.opendev.org/c/opendev/system-config/+/82111217:46
elodillesQueue lengths: 6 events, 7 management events.17:46
elodilles(there were 2 managements events for a while, but now it's 0 \o/)17:59
elodilles(and that was all, every EOL'd branch have been deleted now, i think)18:00
fungithat's awesome, thanks for working through that with us elodilles!18:03
elodilles:]18:05
opendevreviewClark Boylan proposed opendev/system-config master: Add firewall behavior assertions to testinfra testing  https://review.opendev.org/c/opendev/system-config/+/82178018:06
clarkbfungi: ^ I think that is mergeable now and addresses the confusion around editing those extra files with extra commas18:07
clarkbI've approved the zp01 dns record update change18:17
fungithanks, i agree putting the new file in the list of those triggering the jobs we're interested in is a better way to go about it18:17
opendevreviewMerged opendev/zone-opendev.org master: Try to make zuul-preview records more clear  https://review.opendev.org/c/opendev/zone-opendev.org/+/82174318:24
opendevreviewMerged opendev/system-config master: Make sure /usr/bin/python is present for mailman  https://review.opendev.org/c/opendev/system-config/+/82109518:27
opendevreviewMerged opendev/system-config master: Add "mailman" meta-list to lists.katacontainers.io  https://review.opendev.org/c/opendev/system-config/+/82177518:31
opendevreviewClark Boylan proposed opendev/system-config master: Add firewall behavior assertions to testinfra testing  https://review.opendev.org/c/opendev/system-config/+/82178018:31
clarkbfungi: ^ the point you made about inventory changing is a good one. That attemps to avoid problems18:31
fungiclarkb: inspection of the logs indicates 821144 is working now as written. if you're okay with it let's get it and the one after it merged and then we're caught up to working for the current mailman deployment and i can start on migrating foundation mailing lists to the new site18:32
clarkbfungi: +2'd if you want to approve18:33
fungithanks! will do18:34
opendevreviewMerged opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114419:05
opendevreviewMerged opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039719:09
clarkbThe meeting about openstack health is happenign tomorrow at 16:30 utc. I'll try to attend that19:10
clarkbShould be able to as that is just late enough to not interfere with my morning tasks19:10
fungiqa team meeting? is it irc?19:12
gmannhttp://lists.openstack.org/pipermail/openstack-discuss/2021-December/026250.html19:13
gmannfungi: clarkb ^^19:13
clarkbit is on google meet19:13
fungiahh, okay19:13
gmannwe discussed in QA IRC meeting and setup the schedule time with tripleo team on google meet19:13
fungiit overlaps with the cd foundation interop sig call i usually join19:14
fungibut if clarkb's on then i'm redundant anyway19:14
opendevreviewGhanshyam Mann proposed openstack/project-config master: Add openstack-venus irc channel in access an gerrit bot  https://review.opendev.org/c/openstack/project-config/+/82187520:08
ianwfrickler: thanks for looking in on the cache stuff, sorry i had to disappear.  the -int thing is interesting -- told you i never say never :)20:35
ianwis the end result that we should be talking to pypi about their backend having different timeouts to fastly?20:37
ianwmaybe it is something they do, so that if the cdn is down, they don't get hit so hard on the backend server?20:37
ianwbut if the backend is serving bad data (again) ...20:37
Clark[m]We brought the general issue up to them when it was last hitting us frequently. The plan at the time was to try and make the backup far more up to date aiui. But the data here seems to indicate that hasnt happened yet20:39
Clark[m]Constraints is the main reason it affects openstack. Most other installs will just use an old package version. But that represents security risks. Bringing up that angle might be productive20:40
ianwok, the last time i remember we found the backend was actually out of disk and was seriously out of date.  but that was a while ago20:43
ianwhttps://github.com/pypa/warehouse/issues/8568 16-sep-2020 to be exact :)20:44
Clark[m]Ya I think they realized they had to address it in general but seems to not have happened due to occurrence rate for us. But really openstack using constraints is what exposes this so they may not be aware it has gotten bad again20:44
Clark[m]And maybe arguing this exposes their uses to pulling old insecure packages is a good angle to approach for why this isn't just an openstack is weird issue20:45
clarkbI think what I'm getting at is it would be better for them to fail than to fallback20:47
fungiwell, the last time we brought it up with them the biggest issue was that the fallback lacked python_requires metadata in the indices, and the plan was to get that added (which i think they did?)20:49
clarkbfungi: right there were a number of issues. One of which was that, theo ther was it hadn't updated in months20:49
fungior maybe that was the time before last20:49
clarkbIn this case it seems the backup doesn't update for weeks at least based on some of the errors we have observed20:49
clarkbbut regardless it seems that returning a 400/500 error to the client when the cdn can't find the data is a better response due to the security issues this potentialy presents20:50
fungithe debate raging over pep 665 "lockfiles" might present an opportunity to point out that this will become increasingly painful for users20:50
clarkbOpenStack has essentially opted into those errors via contraints, but I'm saying everyone should see them instead20:50
clarkbIf pypi cannot serve a correct and up to date index that may include important security updates then they should fail and pass that to the user20:52
clarkbthat is my tldr20:52
clarkbUnrelated: the swift meeting is about to start at 21:00 UTC. I need to pick up kids from school ar about 22:15UTC and won't be back until close to 23:00UTC. If I approve the limnoria bullseye update at ~22:00UTC and can't debug until ~23:00 UTC is that a problem for anyone?20:53
funginot a problem for me. but also i plan to be around and am happy to help debug20:53
clarkbok I'll plan to approve it at 22:00 UTC or when I notice the swift meeting is over20:54
fungilooks like in order to redirect foo@lists.bar to foo@lists.baz we may need to add a new kind of aliases router which can match on whole addresses with the domain part rather than just the local part21:00
fungiluckily i just realized i already have one written for my personal mailserver21:09
*** dviroel|rover is now known as dviroel|out21:11
clarkbianw: https://review.opendev.org/c/opendev/system-config/+/821780 is what I ended up with for testing firewall rules externally. I don't think it is perfect (there is a todo in there) but it seems to functioan and do the checking we want to have21:18
opendevreviewGhanshyam Mann proposed openstack/project-config master: Add openstack-venus irc channel in access an gerrit bot  https://review.opendev.org/c/openstack/project-config/+/82187521:35
opendevreviewGhanshyam Mann proposed opendev/system-config master: Add openstack-venus channel in statusbot  https://review.opendev.org/c/opendev/system-config/+/82188221:35
ianwclarkb: I think you can just match testinfra_hosts on the zk hosts, and then anything you run is running on bridge21:42
ianwsimilar to how the screenshots work; selenium is running on bridge -- we just use things in "host.X()" context to run on the remote host?21:42
ianwdoes that make sense?21:42
clarkbhrm isn't the host passed in the remote testinfra_host entries so would be zk instead?21:43
clarkbI guess I could implement my own checker for tcp connectivity is what you are saying and not use the host argument?21:44
clarkbthen the actual test case is running from bridge so it would always be external connectivity. Just need to implement our own checks21:45
clarkbthe swift meeting has ended. I'm approving the limnoria bullseye chagne now21:45
clarkbianw: ya so I think what would work is set testinfra_hosts to zk or whatever and move the test case into test_zookeeper.py. Then ignore the host var that is passed to the testcase except for getting its IP address. Then implement our own checker?21:49
timburkethanks for waiting for us :-)21:51
opendevreviewGhanshyam Mann proposed openstack/project-config master: Mark openstack-placement IRC channel as retired  https://review.opendev.org/c/openstack/project-config/+/82188921:53
ianwclarkb: yep -- basically if you hav ea test_zookeeper.py as "usual", if you run, say, requests.* in there it's running on bridge.  it's only if you use like "host.cmd()" that it's actually running on the remote server22:03
clarkbright. I'll take a look at doing that refactor later today22:04
opendevreviewGhanshyam Mann proposed opendev/system-config master: Fix command for setting the entry message for IRC channel  https://review.opendev.org/c/opendev/system-config/+/82191322:07
opendevreviewJeremy Stanley proposed opendev/system-config master: Add a domain aliases mechanism to lists.o.o  https://review.opendev.org/c/opendev/system-config/+/82191422:10
opendevreviewJeremy Stanley proposed opendev/system-config master: Create an OpenInfra Foundation staff ML  https://review.opendev.org/c/opendev/system-config/+/82191522:10
opendevreviewJeremy Stanley proposed opendev/system-config master: Forward messages for OpenInfra Foundation staff ML  https://review.opendev.org/c/opendev/system-config/+/82191622:10
opendevreviewMerged opendev/system-config master: Update limboria ircbot to bullseye  https://review.opendev.org/c/opendev/system-config/+/82133022:27
opendevreviewMerged opendev/system-config master: Fix command for setting the entry message for IRC channel  https://review.opendev.org/c/opendev/system-config/+/82191322:27
clarkbNeed to generate some text here to check if limnoria is working22:54
clarkbthat last message shows up in the text log.22:55
clarkbI'll try a test meeting momentarily22:55
clarkbdo we have a test meeting entry?22:55
clarkbhttps://meetings.opendev.org/meetings/test/ looks like yes22:56
opendevreviewJeremy Stanley proposed opendev/system-config master: Add a domain aliases mechanism to lists.o.o  https://review.opendev.org/c/opendev/system-config/+/82191422:56
opendevreviewJeremy Stanley proposed opendev/system-config master: Create an OpenInfra Foundation staff ML  https://review.opendev.org/c/opendev/system-config/+/82191522:56
opendevreviewJeremy Stanley proposed opendev/system-config master: Forward messages for OpenInfra Foundation staff ML  https://review.opendev.org/c/opendev/system-config/+/82191622:56
clarkbhttps://meetings.opendev.org/meetings/test/2021/test.2021-12-15-22.56.txt I think the new image is happy. Note I didn't land the install from upstream update yet as that one seems more scary. I was goign to check if they haev a stable branch or release tags or something as an alternative22:57
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Switch docs theme to RTD  https://review.opendev.org/c/zuul/zuul-jobs/+/82191822:59
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Switch docs theme to RTD  https://review.opendev.org/c/zuul/zuul-jobs/+/82191823:08
corvuszuul changes merged... i could do a rolling restart, but i have to head out in <2 hours, so i'll plan on doing it tomorrow (unless other folks plan on being around today and would rather i do it now)23:12
opendevreviewClark Boylan proposed opendev/system-config master: Add firewall behavior assertions to testinfra testing  https://review.opendev.org/c/opendev/system-config/+/82178023:17
clarkbcorvus: I've got family stuff this evening so I'd prefer tomorrow, but no objections if fungi and/or ianw would like to do it23:17
clarkbianw: ^ I think that is what you were suggesting for the connectivity testing23:17
ianwshoudl we merge the log format update too before restart https://review.opendev.org/c/opendev/system-config/+/821508 ?23:18
ianwis it scheduler or complete restart?23:18
clarkbianw: I've approved the log formatter change. I believe this restart is a rolling restart of scheduelrs (and maybe web?)23:19
ianwclarkb: yep, that's almost exactly what i was thinking :)23:20
fungii can be around for a restart if that's preferable23:21
ianwi'll be happy to do it, just wait for the log format changes to apply?23:22
*** rlandy|ruck is now known as rlandy|ruck|bbl23:23
corvusit would be a rolling scheduler+web restart23:26
ianw++23:27
opendevreviewIan Wienand proposed openstack/diskimage-builder master: centos: work around 9-stream BLS issues  https://review.opendev.org/c/openstack/diskimage-builder/+/82177223:27
corvusokay, i'll go ahead and do the restart under the assumption that fungi/ianw will be around to check on it later.23:27
corvusi'm still around for the next hour or so23:28
corvuspulling images now23:28
corvuskilling 0223:31
fungiyeah, i'm still around and will keep an eye on things23:38
corvusokay 02 is fully up; going to restart 01 now23:42
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Switch docs theme to RTD  https://review.opendev.org/c/zuul/zuul-jobs/+/82191823:59
corvusit's up; i don't see any concerning errors in the scheduler logs23:59
corvuswill restart web now23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!