Friday, 2021-06-04

corvusianw: no i'm trying to say it's an incompatible option in zuul00:03
corvusianw: zuul either uses the app auth or the webhook auth00:03
corvusianw: a scheduler restart will bring in significant zuul changes; i was planning one tomorrow; i would not recommend it unless you're prepared to monitor for fallout.  also, i would not recommend it because i don't think it'll fix the problem.00:05
corvusianw: i read https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L1116 as meaning it's not going to use the api token as a fallback00:11
ianwcorvus: yeah, i agree also as i was reworking https://review.opendev.org/c/zuul/zuul/+/794371 that became clearer00:15
ianwi think if we're not installed for the project, we should fall back to api authentication00:17
ianwi feel like i'm doing this very inefficiently compared to someone like tobiash[m] who might have thought about this a lot more.  i might just update the story and give others a chance to weigh in00:18
*** bhagyashris_ has joined #opendev00:31
*** bhagyashris has quit IRC00:38
*** ysandeep|away has joined #opendev01:19
*** ysandeep|away is now known as ysandeep01:19
*** ysandeep is now known as ysandeep|ruck01:19
*** boistordu_old has joined #opendev02:11
*** boistordu has quit IRC02:17
*** brtknr has quit IRC03:00
*** brtknr has joined #opendev03:00
*** ysandeep|ruck is now known as ysandeep|afk03:49
opendevreviewSandeep Yadav proposed openstack/diskimage-builder master: [DNM] Lock NetworkManager in DIB  https://review.opendev.org/c/openstack/diskimage-builder/+/79470404:01
*** ykarel|away has joined #opendev04:37
*** ysandeep|afk has quit IRC04:38
*** ysandeep|afk has joined #opendev04:51
*** ysandeep|afk is now known as ysandeep|ruck05:03
*** ykarel|away is now known as ykarel05:14
*** marios has joined #opendev05:18
*** ralonsoh has joined #opendev05:30
opendevreviewSandeep Yadav proposed openstack/diskimage-builder master: [DNM] Lock NetworkManager in DIB  https://review.opendev.org/c/openstack/diskimage-builder/+/79470405:31
*** slaweq_ has joined #opendev06:00
*** slaweq has left #opendev06:02
*** slaweq_ has quit IRC06:03
*** slaweq_ has joined #opendev06:03
*** dklyle has quit IRC06:16
*** bhagyashris_ is now known as bhagyashris06:33
*** amoralej|off has joined #opendev06:35
*** amoralej|off is now known as amoralej06:38
opendevreviewSlawek Kaplonski proposed opendev/irc-meetings master: Move neutron meetings to the openstack-neutron channel  https://review.opendev.org/c/opendev/irc-meetings/+/79471106:39
fricklerslaweq_: do you want to have some time for neutron ppl to review ^^ or do you think it's enough that it has been discussed in the meeting? then I'd just merge it06:54
slaweq_frickler: yes, I want at least Liu and Brian to check it06:56
fricklerslaweq_: o.k., waiting for that, then06:58
slaweq_frickler: thx07:04
*** rpittau|afk is now known as rpittau07:05
*** tosky has joined #opendev07:23
*** andrewbonney has joined #opendev07:32
*** slaweq has joined #opendev07:35
*** ysandeep|ruck is now known as ysandeep|lunch07:35
*** slaweq_ has quit IRC07:41
*** hashar has joined #opendev07:46
*** jpena|off is now known as jpena07:54
*** open10k8s has quit IRC08:02
*** open10k8s has joined #opendev08:04
*** lucasagomes has joined #opendev08:04
*** slaweq_ has joined #opendev08:05
*** slaweq_ has quit IRC08:11
*** CeeMac has joined #opendev08:15
*** lucasagomes has quit IRC08:19
*** lucasagomes has joined #opendev08:26
opendevreviewPierre Riteau proposed opendev/irc-meetings master: Move Blazar meeting to #openstack-blazar  https://review.opendev.org/c/opendev/irc-meetings/+/79474008:32
*** WeSteeve has joined #opendev08:41
*** lucasagomes has quit IRC08:48
*** lucasagomes has joined #opendev08:49
*** WeSteeve has quit IRC09:00
*** lucasagomes has quit IRC09:05
*** hashar is now known as Guest83309:16
*** hashar has joined #opendev09:16
*** ysandeep|lunch is now known as ysandeep09:16
*** ysandeep is now known as ysandeep|ruck09:16
*** Guest833 has quit IRC09:21
*** lucasagomes has joined #opendev09:25
*** lucasagomes has quit IRC09:37
*** lucasagomes has joined #opendev09:43
opendevreviewMerged opendev/irc-meetings master: Move Blazar meeting to #openstack-blazar  https://review.opendev.org/c/opendev/irc-meetings/+/79474009:48
*** lucasagomes has quit IRC09:54
*** lucasagomes has joined #opendev09:59
*** hashar_ has joined #opendev10:41
*** hashar is now known as Guest84410:41
*** hashar_ is now known as hashar10:41
*** Guest844 has quit IRC10:47
*** lucasagomes has quit IRC10:53
*** hashar has quit IRC10:53
*** hashar has joined #opendev10:54
*** amoralej is now known as amoralej|afk10:55
*** lucasagomes has joined #opendev10:58
*** jpena is now known as jpena|lunch11:32
*** osmanlicilegi has quit IRC11:38
*** cgoncalves has quit IRC11:59
*** osmanlicilegi has joined #opendev12:00
*** cgoncalves has joined #opendev12:01
*** osmanlicilegi has quit IRC12:11
*** jpena|lunch is now known as jpena12:20
*** ysandeep|ruck is now known as ysandeep|mtg12:33
*** osmanlicilegi has joined #opendev12:35
*** lucasagomes has quit IRC12:38
*** kapoios_allos has joined #opendev12:41
*** kapoios_allos has quit IRC12:42
*** osmanlicilegi has quit IRC12:46
*** bhagyashris_ has joined #opendev12:50
*** osmanlicilegi has joined #opendev12:55
*** lucasagomes has joined #opendev12:56
*** bhagyashris has quit IRC12:56
*** bhagyashris_ is now known as bhagyashris12:58
*** amoralej|afk is now known as amoralej13:00
*** fultonj has joined #opendev13:00
*** cgoncalves has quit IRC13:14
*** cgoncalves has joined #opendev13:15
*** hashar has quit IRC13:24
*** fultonj has quit IRC13:26
corvusinfra-root: i'd like to restart zuul to pick up some recent changes13:30
corvusstarting that now13:33
*** rpittau is now known as rpittau|afk13:37
corvusas expected, zk data size is growing significantly (we're caching config data there now)13:39
corvuslooks like node count increased from 12k -> 35k, and data size increaste from 10mib -> 20mib13:42
corvusre-enqueing13:42
corvusit didn't seem like startup took any more or less time, which is good.13:43
corvusjobs are running13:43
*** ysandeep|mtg is now known as ysandeep13:44
*** ysandeep is now known as ysandeep|ruck13:44
corvus#status log restarted zuul at commit 85e69c8eb04b2e059e4deaa4805978f6c0665c03 which caches unparsed config in zk. observed expected increase in zk usage after restart: 3x zk node count and 2x zk data size13:47
opendevstatuscorvus: finished logging13:47
corvuslooks like the final numbers may be a bit bigger as we're still adding in the operational baseline from before now that jobs are starting13:48
corvusso far none of the performance metrics look different13:49
*** lowercase has joined #opendev13:56
fungiseems good so far, yeah14:06
*** lucasagomes has quit IRC14:18
*** lucasagomes has joined #opendev14:21
*** lucasagomes has quit IRC14:34
*** lucasagomes has joined #opendev14:35
*** gmann is now known as gmann_afk14:40
*** lucasagomes has quit IRC14:46
*** dklyle has joined #opendev14:46
*** lucasagomes has joined #opendev14:50
corvuseverything still looks nominal14:53
*** open10k8s has quit IRC15:08
clarkbone thing I wonder about is how the zookeeper memory use changes as the data set increases. Currently those servers are relatively small, but also the existing resource utilization is low so lots of headroom to grow into15:09
*** open10k8s has joined #opendev15:10
*** open10k8s has quit IRC15:10
fungiyeah, we'll probably know more as we get well into monday or tuesday15:12
*** gmann_afk is now known as gmann15:13
*** engine_ has joined #opendev15:22
*** engine_ has quit IRC15:25
*** ykarel is now known as ykarel|away15:27
corvusthere's like zero change in memory usage on them despite the 2x change in data size15:33
corvusa miniscule amount of additional cpu15:33
corvus(like, it's currently at 95% idle)15:34
corvusbut that could actually just be due to a change in connection distribution15:35
*** lucasagomes has quit IRC15:45
*** lucasagomes has joined #opendev15:48
*** ysandeep|ruck is now known as ysandeep|away15:49
opendevreviewMerged opendev/base-jobs master: Set a fallback VERSION_ID in the mirror-info role  https://review.opendev.org/c/opendev/base-jobs/+/79117715:53
fungijrosser: ^ probably worth rechecking your bullseye addition change now16:00
fungiin theory it should work without your workaround at this point16:01
*** marios is now known as marios|out16:04
*** lucasagomes has quit IRC16:04
opendevreviewMonty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify  https://review.opendev.org/c/zuul/zuul-jobs/+/73904716:10
*** ykarel|away has quit IRC16:18
*** marios|out has quit IRC16:18
*** ysandeep|away has quit IRC16:33
*** jpena is now known as jpena|off16:38
*** ralonsoh has quit IRC16:47
clarkbas a reminder I was planning on dropping out of freenode channels entirely next week. Any reason to not do that given the way the transition has gone?16:53
clarkbfungi: should we maybe roll the dice on topic updates first?16:53
fungii'm happy to stick around in there for months watching for stragglers, but sure maybe we update the topic for #opendev (as an example) to "Former channel for the OpenDev Collaboratory, see http://lists.opendev.org/pipermail/service-discuss/2021-May/000249.html"16:59
*** amoralej is now known as amoralej|off17:00
clarkb++ I liek that. Clearly points to docs people need without tripping any of the known keyword that are a problem17:00
mordred++17:04
mordredI have also dropped out of freenode fwiw17:05
*** amoralej|off has quit IRC17:42
*** andrewbonney has quit IRC17:46
mnaseri am also planning to continue to be there to send people over18:51
mnaserbut it is getting incresingly quiet :)18:51
*** lowercase has quit IRC18:51
*** tinwood has quit IRC18:55
*** amoralej|off has joined #opendev18:56
*** tinwood has joined #opendev18:58
fungithere have only been a few people speak up in the channels i'm watching who didn't seem to know, the rest are either helpful lurkers like us or (i think the majority) zombie bouncer processes nobody's connected to in months/years19:07
mordredfungi: yeah - I did an active disconnect - otherwise I figured I'd become a new zombie bouncer :)19:08
mordredfelt a little weird19:08
*** slittle1 has joined #opendev19:41
slittle1Please set me (Scott Little) up as first core for new repo starlingx-audit-armada-app.   I'll add the  rest of the cores19:42
*** amoralej|off has quit IRC19:45
fungilooks like that repo was created on may 8 when https://review.opendev.org/790250 merged and added a starlingx-audit-armada-app-core group19:46
fungiand yeah, it has no members yet19:47
fungislittle1: done!19:47
slittle1thanks20:03
fungiany time!20:05
noonedeadpunkfungi: any known activity to centos8 repos? Like dropping them?20:08
noonedeadpunkas they started filing wierdly today at ~10am utc20:09
fungifiling what?20:28
funginoonedeadpunk: id you meant failing, a link to an example build result would be helpful20:43
johnsomI wonder if we have another ansible upgrade going on. Cloning is slow at the moment.20:49
fungijohnsom: i've checked resource utilization for all the backends and the load balancer, everything is fairly quiet... can you check the ssl cert for the backend you're hitting to see what hostnames it lists? i'll dive deeper on the one you're hitting20:54
johnsomIt's a stack (devstack) so let me try in another window. Just odd to see a clone horizon taking minutes20:55
fungiand it's the cloning phase specifically, not installing, which is going slowly?20:57
johnsom~400 KiB/s20:57
johnsomWhat is the trick to grab the TLS info? Do I need to tcpdump it?20:58
johnsomYeah, even a direct git clone is super slow.20:58
fungii do `echo|openssl s_client -connect opendev.org:https|openssl x509 -text|grep CN` but there are lots of ways20:59
johnsomYeah, ok, a separate s_client.21:00
johnsom CN = gitea01.opendev.org21:00
fungithanks21:00
fungiyou can also get away with just a simple `openssl s_client -connect opendev.org:https` and then scroll up to the beginning of the verification chain info where it mentions the CN21:01
fungibut you end up with a lot of output21:01
johnsomYeah, I know s_client all too well. lol21:01
fungiheh21:01
fungicurrently testing cloning nova from gitea01 with another server on the same network just to get a baseline21:06
fungiReceiving objects: 100% (595113/595113), 155.44 MiB | 14.64 MiB/s, done.21:06
johnsomThis is the IP it's hitting: 2604:e100:3:0:f816:3eff:fe6b:ad6221:07
fungiyep, that's an haproxy load balancer21:07
fungiyou can test cloning directly from https://gitea01.opendev.org:3081/openstack/nova to bypass it21:07
fungisee if you get similar speeds21:07
johnsomThat looks faster, 2.6MiB/s21:08
fungiso that suggests one of two things, either the lb is slowing things down or (more likely) ipv6 performance for you is worse than ipv4 at the moment21:09
fungimaybe try `git clone -4 https://opendev.org/openstack/nova1 to rule out the latter21:09
fungier, `git clone -4 https://opendev.org/openstack/nova`21:09
johnsomYeah, I'm still getting 1gbps to Portland. Let21:09
johnsomme try the v421:10
fungii can confirm cloning via the load balancer's ipv6 address is very slow for me as well21:10
fungieven though i'm hitting a different backend entirely21:10
johnsomAbout the same for ipv421:11
johnsomWell, maybe it's just Friday afternoon people streaming stuff. lol21:11
fungiheh21:11
fungiyep, i'm seeing even worse performance to the lb over ipv4 than over ipv6, yikes21:12
funginetwork traffic graph for it seems reasonable though: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66621&rra_id=all21:12
johnsomYeah, must be something upstream.21:13
fungicpu is virtually idle, so it's not like it's handling an interrupt storm or anything21:13
fungihowever, given that ipv4 to the load balancer is slow but ipv6 to the backends isn't, even though they're in the same provider, suggests there could be something else going on21:14
fungier, ipv4 to both i mean21:14
fungiipv4 to backends is fast (they're ipv4-only in fact)21:15
johnsomWell, that isn't enough traffic to wake haproxy up from an afternoon nap.21:15
fungiindeed21:16
clarkbmtr can often point out locations with problems21:16
fungiseems like it might be a local network issue impacting the segment the lb is on but not the segment the backends are on21:16
clarkbthough the local router appears to be shared and the ip addresses are on the same /2521:19
clarkbah and according to the network interfaces that range is part of a larger /24 segment21:20
fungimtr --tcp is showing some pretty substantial packet loss to the lb21:24
fungibut not to the backends21:24
fungianyone else observing the same?21:24
fungiseems to come and go in bursts21:25
funginow i'm not seeing it21:25
clarkbI've got a couple `mtr --tcp` running now but no loss so far21:25
clarkbmaybe sad lacp link or similar21:26
johnsomfungi Are you bouncing through seattle with your mtr?21:26
fungione frame for you, one frame for the bit bucket, one frame for you, ...21:26
johnsomI'm seeing some congestion in Seattle that comes and goes21:27
fungilooks like my cable provider peers with zayo and then i go through atlanta to dallas to los angeles to san jose21:27
fungithough i wouldn't be surprised if the return route is asymmertic. lemme see if i can check the other direction21:28
clarkbI'm going over cogent via pdx and sfo21:28
johnsomYeah, I bounce through Seattle, get on zayo straight to San Jose21:28
fungibut i would be surprised if my routes to and from the lb differ significantly vs those for a backend server in the same cloud21:29
clarkbI have not seen any loss over my path21:30
fungican't even make it through one pass with mtr before it crashes on "address in use"21:30
fungibut a traditional traceroute shows the return path to me is actually cogent not zayo21:31
fungiso my connections are arriving at vexxhost via zayo but responses go over cogent (san jose straight to atlanta)21:31
fungianyway, for me the routes are the same to/from gitea01.opendev.org as well, yet i can clone from it far faster21:35
fungieven the last hop for me is the same in both traceroutes21:37
fungilet's see if the first hops in the other direction line up21:37
fungifirst hops on the return path are also the same for both servers, so maybe it's a layer 2 issue, host level even?21:39
johnsomSorry to derail the end of Friday. I did get my clones finished, so I'm good to go at this point.21:39
fungino, i appreciate the heads up, it's looking like we might want to let mnaser in on the fun21:39
mnaserhi21:40
fungimnaser: are you aware of any internet network disruptions in sjc1?21:41
fungier, i mean internal21:41
mnasernothing that i'm aware of21:42
mnaseri haven't been able to digest the messages though21:42
fungiwe're seeing very slow network performance from multiple locations communicating over tcp with (both ipv4 and ipv6) addresses for gitea-lb01.opendev.org21:42
mnaseripv6 is fast but ipv4 is not, or?21:42
fungiboth slow, v4 is actually slower for me than v6 even21:42
fungihowever other servers on the same network, like gitea01.opendev.org are fairly snappy21:43
fungiresource graphs for gitea01.opendev.org all look basically idle21:43
mnaseroh so going direct to gitea backends is ok, but the load balancer is not?21:43
fungimtr --tcp is showing me a lot of packet loss for gitea01.opendev.org as well and not for other hosts in that network21:43
fungicorrect21:43
fungier. meant to say resource graphs for gitea-lb01.opendev.org all look basically idle21:44
fungiwondering if there could be something happening at layer 2 but only impacting gitea-lb01.opendev.org, maybe at the host level?21:44
fungitraceroutes to/from both the lb and backends look identical21:45
fungithe server instance is showing no obvious signs of distress, not even breaking a sweat21:46
mnaserfungi: are you inside gitea's vm?21:46
fungithe haproxy lb vm is the one we're seeing weird network performance for, not the backend gitea servers21:47
fungi"opendev.org" (a.k.a. gitea-lb01.opendev.org)21:47
mnaserright, sorry, i meant gitea-lb01 :p21:48
mnasermy 'shortcutting' failed21:48
mnaser`curl -s http://169.254.169.254/openstack/latest/meta_data.json | python3 -mjson.tool | grep uuid`21:48
fungimnaser: curl can't seem to reach that url from the server, but server show reports the instance uuid is e65dc9f4-b1d4-4e18-bf26-13af30dc3dd621:50
fungifor the record, the curl response is "curl: (7) Failed to connect to 169.254.169.254 port 80: No route to host"21:51
*** gmann is now known as gmann_afk21:51
fungiso we're probably missing a static route for that21:51
clarkbyou can get the instance uuid from the api `openstack server show gitea-lb01.opendev.org`21:52
fungiyeah, that's where i got the one i pasted above21:52
* mnaser looks21:53
mnaserhr21:56
fungiperformance seems to at times rise as high as 1.5MiB/s and then fall as low as 400KiB/s according to git clone... same sort of cadence i see mtr --tcp report packet loss coming and going for it21:57
mnaserim seeing peaks of like21:57
mnaser400-500Mbps on the public interface21:57
mnaserbut i guess that's because it's using the same interface for in/out traffic21:57
fungithat doesn't match at all what we're seeing with snmp polls though: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66621&rra_id=all21:58
fungibut we're aggregating at 5-minute samples, so maybe it's far more bursty21:58
fungias far as the traffic graphs we have are concerned though, our network utilization on that interface is basically what we always have there, but the network performance we're seeing is not typical21:59
mnaserfungi: would it be possible to setup iftop on the lb node and see if you see anything odd22:01
*** tosky has quit IRC22:01
fungiinstalled, checking the manpage for it now22:01
clarkbfwiw I don't see the same issues that fungi sees22:02
fungiahh, reminds me of nftop and pftop on *bsd22:02
fungiclarkb: what speeds do you get cloning nova?22:02
mnaserfungi: p. much, nice little real time thing to see :)22:02
clarkbchecking now22:02
clarkbjust under 2MiB/s22:03
fungiand it's steady?22:03
clarkbwhich is typical for me iirc22:03
fungiwhat about to one of the backends?22:03
clarkbyup bounces between about 1.75 to 2.00 MiB/s but seems steady22:03
clarkbwill try a backend when this completes22:04
clarkb155.55 MiB | 1.74 MiB/s, done. <- was aggregate22:04
fungimnaser: the averaged rates are lower than i would expect but i do see the 2sec average occasionally around 150Mbps22:06
clarkb155.46 MiB | 1.70 MiB/s, done. <- to gitea01 all my data is via ipv4 as I don't have v6 here22:07
fungii just was 2sec average go a hair over 200Mbps22:07
fungier, just saw22:07
clarkbthat is pretty consistent with what i recall getting via gitea in the past22:07
fungiactually now i'm getting fairly poor performance directly to gitea04 so it's possible there is a backend issue22:11
fungimnaser: yeah this may not be as cut and dried as it seemed at first, and if clarkb's not seeing performance issues then it could be just impacting me and johnsom not everyone22:13
clarkb04 has plenty of memory available and cpu isn't spinning22:13
fungii'm going to do some clone tests to the other backends as well for comparison22:13
fungimy clone from the 04 backend averaged 643.00 KiB/s22:15
fungii'm getting much the same from the 01 backend now... i was seeing far better performance before. may need to test this from somewhere out on the 'net which doesn't share an uplink with lots of tourists watching netflix on a rainy friday evening22:22
fungiokay, so if there was a more general problem i'm not able to reproduce it now22:26
fungigetting 7.7 MiB/s from poland cloning via ipv6 at both the lb and directly from a backend22:27
fungi(in the ovh warsaw pop)22:27
fungithink i'm going to blame tourists and call it an afternoon, just as soon as i finish ironing out this nagging negative lookahead regex22:33
*** whoami-rajat has quit IRC23:40

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!