Thursday, 2021-03-18

openstackgerritMerged opendev/system-config master: Add zookeeper-statsd  https://review.opendev.org/c/opendev/system-config/+/78116000:08
*** gothicserpent has joined #opendev00:14
*** tosky has quit IRC00:22
corvusi'm going to restart zuul again to get that metrics fix00:24
*** mlavalle has quit IRC00:25
corvus#status log restarted zuul at commit 4bb45bf2a0223c1c624dbd8f44efff207e6b409700:31
openstackstatuscorvus: finished logging00:31
*** tbarron has joined #opendev00:39
ianwi don't think i realised we were setting up mirror nodes as kerberos clients00:53
ianwif they're not authenticating, i don't think they need client setup?00:54
openstackgerritIan Wienand proposed opendev/system-config master: Add kerberos-client group  https://review.opendev.org/c/opendev/system-config/+/78117301:00
*** stevebaker has quit IRC01:03
fungiianw: yeah, doesn't seem like it would be needed for read-only use01:07
corvusftr i'm manually running the service-zookeeper playbook because i don't want to wait forever01:34
fungilooks like we're getting node_failure from amd64 jobs01:34
corvuswe're in the middle of a deployment run with a bunch of non-zk related playbooks, so i don't think there's a conflict01:34
corvusbtw, here's the stat from zuul: https://graphite.opendev.org/?width=586&height=308&target=stats.timers.zuul.tenant.openstack.event_enqueue_processing_time.mean01:36
corvuslooks like the zk stats are showing up, but they're all zero, likely because the mntr whitelist needs a zk restart to take effect01:39
corvusi'm going to start doing rolling zk server restarts now01:39
corvushttps://graphite.opendev.org/?width=586&height=308&from=-1hours&target=stats.gauges.zk.*.zk_avg_latency01:43
corvusthat's looking good -- data from all 3 now01:44
corvushttps://graphite.opendev.org/?width=586&height=308&from=-1hours&target=stats.gauges.zk.*.zk_followers01:44
corvusthat one is interesting -- it tells us that zk02 is the leader01:44
corvus(it's the only one with followers, and there are 2 of them, so all good)01:45
corvuswe apparently have about 10500 znodes01:45
*** brinzhang has joined #opendev01:59
*** brinzhang_ has quit IRC02:00
openstackgerritJames E. Blair proposed openstack/project-config master: Add ZooKeeper stats to Zuul dashboard  https://review.opendev.org/c/openstack/project-config/+/78118202:07
corvusinfra-root: looks like we're collecting all the data now; that should let us see it02:07
*** hamalq has quit IRC02:12
*** hemanth_n has joined #opendev02:13
openstackgerritMerged opendev/system-config master: Add kerberos-client group  https://review.opendev.org/c/opendev/system-config/+/78117302:44
*** ykarel has joined #opendev04:07
*** bhagyashri|rover is now known as bhagyashris04:53
*** chandankumar is now known as chkumar|ruck04:57
*** brinzhang_ has joined #opendev04:57
*** whoami-rajat_ has joined #opendev04:58
*** brinzhang has quit IRC05:01
openstackgerritIan Wienand proposed opendev/system-config master: kerberos-kdc: add realm value  https://review.opendev.org/c/opendev/system-config/+/78119205:05
*** roman_g has joined #opendev05:30
ianwkevinz: it seems we're seeing some node failures in the arm cloud again ...05:42
openstackgerritMerged opendev/system-config master: kerberos-kdc: add realm value  https://review.opendev.org/c/opendev/system-config/+/78119205:45
*** roman_g has quit IRC05:47
*** ykarel_ has joined #opendev06:03
*** ysandeep|away is now known as ysandeep06:03
*** ykarel has quit IRC06:05
*** marios has joined #opendev06:09
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/78101906:13
*** ykarel_ has quit IRC06:20
*** jaicaa has quit IRC06:31
*** calcmandan has joined #opendev06:32
*** jaicaa has joined #opendev06:33
*** calcmandan has quit IRC06:38
*** calcmandan has joined #opendev06:48
*** sboyron has joined #opendev06:56
*** ykarel has joined #opendev07:05
*** ralonsoh has joined #opendev07:18
kevinzianw: let me check07:26
*** eolivare has joined #opendev07:41
*** rpittau|afk is now known as rpittau07:51
kevinzianw: re-triagger to see if it better?07:51
*** lpetrut has joined #opendev07:53
*** jpena|off is now known as jpena08:02
ianwkevinz: i can see a bunch of queue djobs @ https://zuul.openstack.org/status (search -arm64)08:08
*** amoralej|off is now known as amoralej08:08
ianwkevinz: none running though08:09
kevinzianw: well, double check08:09
kevinzianw: any clue from Zuul side?08:12
openstackgerritMerged openstack/project-config master: Add ZooKeeper stats to Zuul dashboard  https://review.opendev.org/c/openstack/project-config/+/78118208:13
kevinzianw: actually I can easily create the instance from the dashboard08:16
openstackgerritRico Lin proposed openstack/project-config master: Add ubuntu-focal-arm64-xxlarge for linaro-us  https://review.opendev.org/c/openstack/project-config/+/78121908:17
*** tosky has joined #opendev08:37
*** ykarel is now known as ykarel|away08:46
ianw2021-03-17 19:58:56,455 ERROR nodepool.driver.openstack.OpenStackProvider: Couldn't consider invalid node08:51
ianwTraceback (most recent call last):08:53
ianw  File "/usr/local/lib/python3.7/site-packages/nodepool/driver/utils.py", line 280, in estimatedNodepoolQuotaUsed08:53
ianw    if node.type[0] not in provider_pool.labels:08:53
ianwIndexError: list index out of range08:53
ianwkevinz, clarkb: ^ i think there's something else going on08:53
kevinzianw: wow...08:54
kevinznode.type lost?08:54
ianwkevinz, 449f31dc-8207-420a-a182-da7c58d962bd 402bcb9d-f40f-4b71-81e9-3aba5b4b432c 74485f01-22e3-4df5-bc23-d408af99cd6f08:55
ianware three id's in error state, maybe you can grep logs and see what happened?08:55
ianwthen we have a bunch of ready nodes08:58
ianwbut nothing seems to be starting08:58
kevinzAha08:58
kevinzI see. those servers are deleted but still list at the DB08:59
kevinzI will check08:59
ianw2021-03-18 08:25:28,121 DEBUG nodepool.driver.NodeRequestHandler[nl03.opendev.org-PoolWorker.linaro-us-main-42c43c589f774838b16d1aa77a51216d]: [e: 3a9eeb1213d948cfb88e7f2199e1aec8] [node_request: 300-0013320208] Declining node request because pool is disabled by max_servers09:05
ianwclarkb: ^ i'm not 100% sure what state the nl openstack/opendev servers are in.  i get the feeling somehow arm64 requests are being rejected by one or the other09:05
ianwi'm afraid i'm out of time for now to keep poking at this.  but kevinz I do think this is mostly something on our side at this point09:06
*** andrewbonney has joined #opendev09:08
*** hashar has joined #opendev09:12
*** ykarel|away has quit IRC09:25
*** ysandeep is now known as ysandeep|afk09:26
zbrianw: do you happen to know why we get NODE_FAILURE on arm64? https://zuul.opendev.org/t/zuul/builds?job_name=zuul-jobs-test-ensure-docker-ubuntu-bionic-arm6409:40
openstackgerritSorin Sb├órnea proposed zuul/zuul-jobs master: Make ensure-docker ubuntu arm64 non voting  https://review.opendev.org/c/zuul/zuul-jobs/+/78123409:47
kevinzianw: thanks, I will check. No worries09:53
zbrthere is something really weird around those NODE_FAILUREs, as I did not see them as failing. I wonder if this happens only when lots of jobs are created for a single change, as we maybe reach a limit of nodes to be spawned because we have a builldset that requires more than we can allocate.09:53
*** whoami-rajat_ is now known as whoami-rajat09:53
openstackgerritGuillaume Chauvel proposed opendev/gear master: Update testing to Python 3.9 and linters  https://review.opendev.org/c/opendev/gear/+/78010310:10
openstackgerritGuillaume Chauvel proposed opendev/gear master: [DNM] gear debug log  https://review.opendev.org/c/opendev/gear/+/78042210:10
openstackgerritGuillaume Chauvel proposed opendev/gear master: WIP: trying to solve gear+zuul+ssl(tls1.3)  https://review.opendev.org/c/opendev/gear/+/78123810:10
openstackgerritMerged opendev/irc-meetings master: Remove usused Zaqar meeting slot  https://review.opendev.org/c/opendev/irc-meetings/+/77719410:24
openstackgerritGuillaume Chauvel proposed opendev/gear master: WIP: trying to solve gear+zuul+ssl(tls1.3)  https://review.opendev.org/c/opendev/gear/+/78123810:34
openstackgerritGuillaume Chauvel proposed opendev/gear master: [DNM] gear debug log  https://review.opendev.org/c/opendev/gear/+/78042210:34
openstackgerritGuillaume Chauvel proposed opendev/gear master: WIP: trying to solve gear+zuul+ssl(tls1.3)  https://review.opendev.org/c/opendev/gear/+/78123810:48
openstackgerritGuillaume Chauvel proposed opendev/gear master: [DNM] gear debug log  https://review.opendev.org/c/opendev/gear/+/78042210:48
*** dtantsur|afk is now known as dtantsur11:02
*** hashar has quit IRC11:11
*** ysandeep|afk is now known as ysandeep11:19
*** frigo has joined #opendev11:58
zbrkevinz ianw: let me know if there is something i can do to help with arm64 nodes issue.12:05
*** slaweq has quit IRC12:22
*** slaweq has joined #opendev12:25
*** jpena is now known as jpena|lunch12:30
*** ricolin has joined #opendev12:44
*** hemanth_n has quit IRC12:49
*** hashar has joined #opendev13:00
openstackgerritDmitry Tantsur proposed opendev/glean master: NM: add an early service to configure networking  https://review.opendev.org/c/opendev/glean/+/78146013:02
*** artom has quit IRC13:24
*** jpena|lunch is now known as jpena13:26
*** marios is now known as marios|call13:32
dtantsurclarkb, ianw, another glean awesomeness: it may happen that glean starts before the configdrive device is initialized..13:45
dtantsuror to put it another way: I managed to start glean pretty early, now it starts too early :)13:46
dtantsurI honestly have no ideas better than After=systemd-udev-settle.service :(14:04
*** artom has joined #opendev14:14
openstackgerritDmitry Tantsur proposed opendev/glean master: NM: add an optional early service to configure networking  https://review.opendev.org/c/opendev/glean/+/78146014:23
openstackgerritDmitry Tantsur proposed openstack/diskimage-builder master: simple-init: allow passing additional arguments to glean install  https://review.opendev.org/c/openstack/diskimage-builder/+/78149114:26
*** fressi has quit IRC14:29
*** stand has joined #opendev14:45
*** marios|call is now known as marios14:48
*** mgoddard has quit IRC14:49
*** lpetrut has quit IRC14:53
clarkbianw: kevinz: nl03.opendev.org should be processing linaro cloud node requests now and nl03.openstack.org has been turned off. I'll check on them now though15:08
fungiif you can catch an api error response or something, that would help15:10
clarkbnl03.openstack.org is still off as expected and nl03.opendev.org is running. nodepool list shows linaro nodes in use with timestamps as recent as 3 minutes ago15:12
clarkbI've found a server it is building now that I'll keep an eye on before proceeding further with the old nodepool server cleanups15:15
fungiso maybe the node_failure results are intermittent (repeated boot failures/timeouts exceeding nodepool's retry count?)15:16
fungior maybe they're solved now15:16
clarkbya it sounded like kevinz had found a db snyc problem15:19
clarkbmaybe that was the root cause and it just happened to coincide with replacing the launchers15:19
clarkb2021-03-18 15:15:58,673 INFO nodepool.NodeLauncher: [e: 7b4feb53efac4fada316a7ed5839871e] [node_request: 900-0013327236] [node: 0023563053] Node is ready15:19
clarkbits looking happy from here15:20
openstackgerritDmitry Tantsur proposed opendev/glean master: Allow disabling DHCP fallback  https://review.opendev.org/c/opendev/glean/+/78150015:20
fungiprobably skim for any third try boot failures/timeouts, but otherwise it's probably fine now15:20
*** hashar has quit IRC15:21
clarkbfungi: I've responded to your question at https://review.opendev.org/c/opendev/system-config/+/78115415:22
fungithanks15:23
clarkbalso I think I'm ready to approve that change if you are15:23
fungiand yeah, that explains my confusion. so the /etc/letsencrypt is actually unused. i couldn't find where we were referencing anything under /config but maybe that's baked into part of the nginx config i didn't see15:24
clarkbyup skimming for failed launch attempts does show some are failing with 'Detailed node error: No valid host was found. There are not enough hosts available.' in linaro15:24
fungiand yes, i'll approve it now15:24
fungiif there's a host shortage, we might need to drop our max-servers?15:24
clarkbfungi: yes, the upstream image configs for nginx's ssl.conf assume the cert/key will be in the location we put them15:25
clarkbfungi: I suspect that this may be more due to the db issues kevinz identified. In theory our quota and max-servers are double covering that aspect for us15:25
clarkbbut ya we could reduce max-servers too15:25
clarkbimportantly, I'm not seeing anything that says the problem is on the new nodepool launcher side of things15:26
clarkbit seems to be cloud side or config15:26
fungiright, that's reassuring15:26
clarkbfungi: do you want to do your own double checking or do you think we should be ok to land https://review.opendev.org/c/opendev/system-config/+/781171 too?15:32
clarkbonce ^ is landed we can alnd the project-config nodepool yaml cleanups as well as delete the servers15:32
*** hamalq has joined #opendev15:33
clarkbopenstack.exceptions.ResourceTimeout: Timeout waiting for the server to come up. <- we are also seeing this occasionally in linaro15:34
*** mlavalle has joined #opendev15:37
clarkbdtantsur: why is glean-early used in addition to glean-nm?15:37
clarkbdtantsur: is it because all of the interfaces may not be plugged yet? If so I think we should probably write that down15:38
clarkbit feels like the osrt of thing that will be forgotten15:38
fungiclarkb: i'll refresh my memory on which change that is in a sec. in two meetings simultaneously right now15:42
*** lpetrut has joined #opendev15:43
dtantsurclarkb: you mean, as a comment in the code?15:44
clarkbdtantsur: ya, I'm worried that could easily get optimized away similar to my optimization of reducing two glean calls to one15:45
clarkbdtantsur: also similarly I think we should capture motivation on https://review.opendev.org/c/opendev/glean/+/781500 because it isn'y clear to me why you'd be doing that15:45
dtantsurclarkb: agreed. please leave a comment in gerrit, I'll get to it when I finish my testing around15:45
dtantsur* testing round15:46
clarkbdtantsur: done comments on both changes. Basically I think we've done a bad job capturing motivations and the why of fixes in glean. Early boot is complicated and can vary across platforms etc and writing down more hints will likely help us15:48
dtantsurtotally15:48
*** hashar has joined #opendev15:49
dtantsurI wrote a comment, but will also update both patches15:50
*** ysandeep is now known as ysandeep|away15:51
clarkbdtantsur: hrm my concern with that is dhcp-all-interfaces and simple-init are not intended to be used together15:52
clarkbthey are supposed to be used XOR each other iirc15:52
dtantsurclarkb: this is true, but they actually do play well together15:52
dtantsursimply because one is Before=network-pre and the other is After=network15:52
dtantsurand both check for an existing configuration file15:53
clarkbright but glean will already configure dhcp and ipv615:53
clarkbeven without a config drive15:53
dtantsurthis is a grey area for me. I know a lot of fixes have been done by our folks to it.15:54
clarkbit just feels like we're trying to address an explicitly undesired state15:54
openstackgerritMerged opendev/system-config master: Manage jitsi-meet meet.conf as a template input for the container  https://review.opendev.org/c/opendev/system-config/+/78115415:54
dtantsuralso yes, the "no, never use DHCP because it's always wrong" aspect is of value15:54
*** lpetrut has quit IRC15:55
dtantsurdoes glean, for example, do all this: https://opendev.org/openstack/diskimage-builder/commit/f94508d537432817619932074a5f98ea08d93055 https://opendev.org/openstack/diskimage-builder/commit/5d23d8e6b09c623cb910159dee40df6af2844856 ?15:55
* dtantsur is sad we even started with dhcp-all-interfaces, not glean, but the ship has sailed15:56
clarkbright I feel like first of all that is a completely unfair qusetion15:57
clarkbyou arguably added features to the simple tool that should have gone into the complicated tool15:57
clarkbbut yes I blieve glean supports vlans (and config on top of them) and ipv6 via RAs15:57
clarkbwe certainly use glean + ipv6 relying on router advertisements on our nodes in clouds like limestone15:57
clarkbwe don't use vlan features, but they were added for ironic users so I would hope they support ironic's use cases15:58
fungiand had to work around some nasty nm behaviors relative to that too15:58
fungiipv6 slaac related i mean15:58
dtantsurI personally have little control of dhcp-all-interfaces.. I'm trying to somehow navigate this space without telling folks "hey, I'm switching you over to a new DHCP helper, which I hope works the same as before" :)15:58
clarkbdtantsur: right and I'm trying to avoid adding too much special cases to glean that it becomes unwieldy15:58
clarkbto me don't try to fallback to dhcp risk becoming that because falling back to something seems sane in almost every situation15:59
* dtantsur -> meeting, brb16:00
clarkbdtantsur: looking at the dhcp-all-interfaces change for ipv6 mroe closely I think I see how that is different than what glean does16:00
clarkbdtantsur: in that change you are baking the behavior into the image. With glean you provide that config at boot time with a config drive16:00
clarkbbut otherwise I would expect similar behaviors at runtime16:00
dtantsuryeah16:01
dtantsurhonestly, to me a switch whether to use fallback or not seems quite logical, but I can understand why you see it otherwise16:01
dtantsurbut otherwise I'll need to take some completely different direction, like calling glean from IPA manually or something like that.. and do nm restart probably?16:02
clarkbdtantsur: fwiw I'm not saying no. At this point I'm mostly saying we need to capture the use case because I don't understand it :)16:03
clarkbbut I did want to point out that it is my understanding we explicitly say do not use those two elements together16:03
*** mgoddard has joined #opendev16:04
clarkbnot having a fallback likely makes sense without dhcp-all-interfaces I'm just struggling to undersatnd it. I thought of another potential reason which may be speed up time to boot failure so you can retry more quickly16:04
clarkbbut I don't know if that would help in practice16:04
dtantsurthe security aspect too (not rogue DHCP on a remote edge node)16:05
clarkbya, though you could subvert config drive in that case too16:05
clarkb(plug a usb device in with a fat32 config-2 partition)16:06
clarkbbut maybe the risk of ^ is lower than the risk of plugging into a switch. As mentioned I'm not against it but thnik we should capture the why and what better as we seem to struggle with that in glean then have trouble making changes later16:07
dtantsuryep, I will16:07
clarkbwoot etherpads apepar to be working again on meetpad though you have to explicitly open the shared odcument. The flag to open it by default is in a change that I suppose we can keep iterating through now16:15
clarkbI've approved the next meetpad fix16:16
clarkbthis one disables p2p and mutes video on connection16:16
clarkbthe one after that should open the shared doc by default16:17
*** frigo has quit IRC16:21
*** Tengu has quit IRC16:23
fungiclarkb: sorry for the delay, i've approved 781171 now since the servers are confirmed idle16:30
clarkbthanks16:32
*** roman_g has joined #opendev16:39
*** mlavalle has quit IRC16:56
openstackgerritDmitry Tantsur proposed opendev/glean master: NM: add an optional early service to configure networking  https://review.opendev.org/c/opendev/glean/+/78146016:58
dtantsurokay, the first explanation inserted16:58
openstackgerritMerged opendev/system-config master: Restore some meetpad settings we had previously set  https://review.opendev.org/c/opendev/system-config/+/78115616:59
openstackgerritMerged opendev/system-config master: Clean up the old openstack.org nodepool launchers.  https://review.opendev.org/c/opendev/system-config/+/78117116:59
*** mlavalle has joined #opendev16:59
roman_gclarkb thank you for the logs! I've tried to filter kna1 on logstash, but seems that it does not get collected. http://logstash.openstack.org/#/dashboard/file/logstash.json query is node_provider:(\"airship-citycloud-kna1\") . What am I missing here?17:00
*** marios is now known as marios|out17:01
clarkbdtantsur: thanks that really helps. Though I noticed one thing I asked a question about inline17:02
dtantsurclarkb: clean-nm@ is not enabled explicitly because it's enabled by udev17:02
dtantsurglean-nm of course17:02
clarkbdtantsur: aha17:02
dtantsurglean-early is a normal service and has to be enabled17:03
clarkbroman_g: try node_provider:"airship-kna1"17:03
openstackgerritDmitry Tantsur proposed opendev/glean master: Allow disabling DHCP fallback  https://review.opendev.org/c/opendev/glean/+/78150017:04
dtantsurtried explaining this one is clear as possible ^^17:04
clarkbdtantsur: ya that helps, thanks17:04
*** marios|out has quit IRC17:08
*** rpittau is now known as rpittau|afk17:11
*** eolivare has quit IRC17:14
*** jpena is now known as jpena|off17:16
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/781159 is the last meetpad change. You've arleady reviewed it so I guess its a question of whether or not we want to get a second reviewer on it before approving17:16
clarkbOn the nodepool side of things https://review.opendev.org/c/openstack/project-config/+/780984 has been marked active. Its child should be ready for review nwo too17:17
*** amoralej is now known as amoralej|off17:18
fungiclarkb: oh, yep, i forgot i reviewed that. i'll go ahead and approve it17:22
clarkbfwiw I discovered why my android jitsi client didn't break when my browser one did due to the xmpp websocket issue17:22
fungilooks like corvus was already +2 on an earlier patchset anyway17:22
clarkbthe apps don't use websockets so continued to use the old config happily17:22
roman_gclarkb works! Thank you.17:25
clarkblooking at the prosody and web images config templates I'm fairly certain that if we toggle ENABLE_XMPP_WEBSOCKETS over to 1 that we shoudl get a working setup17:26
clarkbhttps://github.com/jitsi/docker-jitsi-meet/commit/d747bfbe6b72dd86789f61a97372b94c2c19677f is the commit that added all that17:28
*** hashar has quit IRC17:43
openstackgerritMerged opendev/git-review master: Add missing -h to manpage and remove -c from it  https://review.opendev.org/c/opendev/git-review/+/78105317:50
*** dtantsur is now known as dtantsur|afk17:55
openstackgerritMerged opendev/system-config master: Restore meetpad etherpad settings.  https://review.opendev.org/c/opendev/system-config/+/78115918:05
openstackgerritElod Illes proposed openstack/project-config master: Use nodejs8-publish-to-npm for monasca-grafana-datasource  https://review.opendev.org/c/openstack/project-config/+/78153618:16
openstackgerritClark Boylan proposed opendev/system-config master: More jitsi meet config cleanups  https://review.opendev.org/c/opendev/system-config/+/78154419:01
openstackgerritClark Boylan proposed opendev/system-config master: Enable jitsi-meet xmpp websockets  https://review.opendev.org/c/opendev/system-config/+/78154519:01
clarkbfungi: ^ the first change there was inspired by your questions asking :) I tried to clean up any bits that may cause confusion in the config management or on the server (because ansible and container will fight over the content of configs)19:02
clarkbthe second thing is a bit scarier, but has a straightforward revert path and apparently it performs better19:02
*** andrewbonney has quit IRC19:03
fungiyeah, i'd love to test something like that asap, and then *if* we discover it's causing problems before or during the ptg, we can revert quickly19:03
*** priteau has quit IRC19:08
fungibut hopefully it's an improvement, and leads to a better experience for users than previously19:14
*** openstackgerrit has quit IRC19:34
fungiaha! ^19:35
fungiif you have weechat smart filters on then you probably don't see it19:35
fungi<-- openstackgerrit (~openstack@eavesdrop01.openstack.org) has quit (Quit: Changing servers)19:35
fungithat happened the *exact* moment i pushed https://review.opendev.org/78155119:36
fungii'll restart the bot and then look into possible fixes19:36
clarkbwow nice sleuthing19:37
fungi#status log Restarted the gerritbot container after it disconnected from Freenode after trying to handle a very long commit message subject19:38
openstackstatusfungi: finished logging19:38
fungiyeah, the traceback in the log doesn't give any hint it's related, there is no disconnect or anything back from the irc server side about it19:38
fungibut if you match the exit time up with the logs, you find a traceback for a very long commit message subject each time we get the "Quit: Changing servers" reason19:39
fungi"irc.client.MessageTooLong: Messages limited to 512 bytes including CR/LF"19:43
fungithat's the exception we hit19:43
fungibut it's wrapped in a try with except Exception that does a self.connection.reconnect()19:44
fungii'm guessing the reconnect just never happens19:44
fungiadded by https://review.openstack.org/257174 in 201519:45
fungiso it's not new19:45
fungii need to switch to dinner prep, but will ponder this19:45
*** openstackgerrit has joined #opendev19:46
openstackgerritMerged opendev/system-config master: More jitsi meet config cleanups  https://review.opendev.org/c/opendev/system-config/+/78154419:46
fungidebugging may be deeper in the irc module, but since we can reproduce the failure we should be able to suss it out19:46
clarkbfor the nodepool launchers I was going to double check with ianw when ianw's day starts then delete the old servers19:46
fungisounds good to me19:46
*** roman_g has quit IRC19:53
fungipondering while i cook, i guess we have to intertwined bugs in gerritbot. one is that the reconnect method seems to only disconnect, the other is that we shouldn't bother reconnecting on a irc.client.MessageTooLong exception20:10
fungitwo intertwined bugs20:10
clarkbwith message too long the server will just chomp the end of the message right? so ya shouldn't need to reconnect20:11
fungithe server discards it20:12
fungior rather the server never gets it because the irc module isn't sending it20:12
clarkbah20:12
fungithe irc module is raising an exception on send20:12
funginot actually sending20:12
fungiwe could act on that to chomp the message down to some safe length20:13
fungior we could just ignore irc.client.MessageTooLong exceptions and be satisfied that they don't get sent20:13
clarkbignoring it seems like a better behavior than the current one20:13
fungibetter than trying to reconnect, yes. irc.client.MessageTooLong shouldn't be taken as an indication that the connection is broken20:15
funginow, troubleshooting why reconnection is broken will be the harder bit20:16
*** roman_g has joined #opendev20:17
*** stevebaker has joined #opendev20:18
clarkbmeetpad should get the last two updates applied to it in the next little bit. I'll double check on it afterwards20:20
ianwclarkb: o/ ... reading20:35
clarkbianw: tldr is it seems the new launchers are fine. But linaro was struggling due to no valid host found and launch timeouts20:37
clarkbI don't think that is related to the opendev server swap out20:37
clarkbif you agree we've landed the change to pull the old servers from inventory and my next step is to delete the servers then land https://review.opendev.org/c/openstack/project-config/+/780984 and its child20:37
ianwok, i couldn't find what i thought was a smoking gun log message for why nodes weren't coming up, but if they are now, that's good :)20:38
ianwlooks like plenty are running so LGTM20:39
clarkbya most seem to launch, but hvae the occasional cloud side error stil20:39
clarkbthanks for confirming. I'll look at cleaning up the old servers shortly20:39
clarkbfungi: ianw  I think we can go ahead and land https://review.opendev.org/c/openstack/project-config/+/780984 and its child now too fwiw20:41
ianwit would be super to get a second arm64 cloud for backup; if anyone is reading this with access to such a thing :)20:41
*** Tengu has joined #opendev20:42
clarkbhrm I think I added buggy config.js to meetpad used a : instead of a =20:42
openstackgerritClark Boylan proposed opendev/system-config master: Fix jitsi config.js  https://review.opendev.org/c/opendev/system-config/+/78156520:44
clarkbfungi: ianw  ^ I have not tested taht yet but I suspect that is the fix20:44
clarkbthere is one more infra-prod-run-meetpad queued up and when that is done I'll manually do ^ and confirm20:45
*** sboyron has quit IRC20:46
ianweverything else config.blah is an =20:46
clarkbok manually applied that fix and restarted and it is happy now. Sorry for that miss20:51
clarkbthey converted from using json like data structure to var assignments20:51
clarkband when I copy pasta'd I failed to convert20:51
ianwclarkb / dtantsur : ok, i sort of buy the theory "udev hasn't activated glean@INTERFACE, so network-pre doesn't know it has anything to wait for, and goes ahead and starts NM anyway"20:52
ianwand i can see that NM doesn't want to, by default, wait for udev-settle20:52
clarkbchanges from https://review.opendev.org/c/opendev/system-config/+/781544 appear to have applied cleanly (redirect is present etc)20:53
ianwi wonder if instead of running glean in that early service -- does it just need to be some dummy thing that makes network-pre depend on udev-settle20:53
clarkbianw: oh taht is an interseting idea20:53
openstackgerritMerged openstack/project-config master: Remove unused nl0X.openstack.org config files  https://review.opendev.org/c/openstack/project-config/+/78098420:53
openstackgerritMerged openstack/project-config master: Update nodepool zk configs to be a bit less confusing  https://review.opendev.org/c/openstack/project-config/+/78098520:53
ianwit feels like that should not be an option -- if this theory is the case, it's only because things are generally faster on hardware that we're getting things in order20:55
clarkbI'm deleting old nodepool launchers now20:56
clarkb#status log Replaced nl01-04.openstack.org with new Focal nl01-04.opendev.org hosts.20:58
openstackstatusclarkb: finished logging20:58
clarkbianw: re glean, normally I raelly like to do boot tests of changes like this locally to build confidence in them. However I think our integration testing covers things pretty well at this point. Do you think we should go ahead and start landing some of these?21:03
clarkbnormally == $yearsago when I did more glean work :)21:03
ianwclarkb: thinking even more, i think a better way might even be to drop an override file for NetworkManager to make it depend on udev-settle21:04
clarkbya I guess simple-init could do that21:06
clarkbsince it is modifying the image already21:06
clarkband is probably the cleanest way to express the ordering desired21:06
ianwi just have to run morning errands, but yeah i think we can merge the more obvious things as they are gate tested by the boot tests21:07
ianwperhaps there are now other ways to find active interfaces that aren't udev too, that wouldn't surprise me21:08
fungiwell, udev just reacts to kernel events21:08
clarkbianw: I doubt it since udev is part of systemd now21:08
fungiudevd does, i mean21:08
fungianything can listen for those21:08
clarkbor at least the "proper" systemd method is likely udev21:09
clarkbI'd like to fully restart meetpad after the above fix lands and is applied. Just to double check the state of config mgmt produces a happy state. Then I think we should go ahead and land the xmpp websockets change and revert if it causes problems21:11
fungisounds good. i'm transitioning to evening relaxation but am happy to help test that when it deploys21:12
clarkbgreat and thanks21:12
*** roman_g has quit IRC21:18
*** ralonsoh has quit IRC21:20
*** sshnaidm is now known as sshnaidmoff21:21
*** sshnaidmoff is now known as sshnaidm|off21:21
*** whoami-rajat has quit IRC21:26
*** openstackstatus has quit IRC21:26
openstackgerritMerged opendev/system-config master: Fix jitsi config.js  https://review.opendev.org/c/opendev/system-config/+/78156521:30
*** openstackstatus has joined #opendev21:31
*** ChanServ sets mode: +v openstackstatus21:31
*** openstack has joined #opendev21:33
*** ChanServ sets mode: +o openstack21:33
*** openstack has quit IRC21:35
*** openstack has joined #opendev21:36
*** ChanServ sets mode: +o openstack21:36
fungii've joined the isitbroken room21:38
clarkbyup I'm there too don't hear you though21:38
*** openstack has joined #opendev21:45
*** ChanServ sets mode: +o openstack21:45
*** openstack has quit IRC21:48
*** openstack has joined #opendev21:58
*** irclogbot_3 has joined #opendev21:58
*** openstack has quit IRC21:59
*** openstack has joined #opendev22:01
*** ChanServ sets mode: +o openstack22:01
openstackgerritMerged opendev/system-config master: Enable jitsi-meet xmpp websockets  https://review.opendev.org/c/opendev/system-config/+/78154522:09
*** iurygregory has quit IRC22:09
*** iurygregory has joined #opendev22:09
fungiany feel for when we should be looking at a gerrit 3.3 upgrade? i guess the dstat metrics are to help us decide that22:16
clarkbya that and gatling git, but I got distracted (if anyone else wants to poke at gatling git feel free)22:18
clarkbthere were at least 2 people on the repo discuss ml who ended up reverting and it might be good to see if we can find out what caused them to do that22:18
fungiyeah, i don't have fond memories of rolling back gerrit upgrades. it's pain, pure and simple22:26
clarkbok xmpp chagne has deployed. It only restarted 2 of the 4 containers. I think I'll do another full restart to ensure that if/when that happens we are still good22:29
clarkband Iget the you have been disconnected message so something must be missing22:31
clarkbI'll get a revert going and then manually fix it22:31
openstackgerritClark Boylan proposed opendev/system-config master: Revert "Enable jitsi-meet xmpp websockets"  https://review.opendev.org/c/opendev/system-config/+/78157722:33
clarkbI'ev manually applied ^ and will go ahead and approve that22:34
fungithanks for checking it22:47
openstackgerritIan Wienand proposed opendev/glean master: Override NetworkManager to wait for udev-settle  https://review.opendev.org/c/opendev/glean/+/78158022:48
ianwclarkb: ^ see if that tests ok or introduces more problems ... not sure i've even convinced myself on that one :)22:48
ianwi'm assuming that udev starting a service "rebuilds" the dependency tree in systemd22:51
clarkbdtantsur|afk: ^ fyi since you've been testing your end22:57
clarkbianw: ya I think so22:58
clarkbre rebuilding the tree22:58
clarkbianw: I'm not sure I grok the After declaration there though23:01
clarkbianw: does it need a Before=NetworkManager.service too?23:01
ianwumm, the override i installed was for NetworkManager, to make it wait for udev settle23:02
ianwat least, that's what i was trying to do :)23:02
funginotification says rackspace opened a ticket to warn us about network maintenance in dfw. i'm not where i can log into their dashboard to read the ticket right now, but i'll try to remember to take a look tomorrow if nobody has time before then23:02
clarkbianw: oh I see the target file dest is what is important there23:02
clarkbianw: sorry I missed it because the source file name is arbitrary23:02
clarkbianw: ya I think that is correct23:03
clarkbso ya if that passes our testing and dtantsur|afk reports it fixes things then I think that may be the clenaest option23:03
ianwfungi: that's a weird account23:06
fungiit's got the right project id though23:07
ianwthe credentials listed for the account that ticket was filed against don't appear to allow me to log in23:07
clarkbis this the one that we had shared with the foundation that got split?23:09
fungioh, i think that's a swift-only account i created for testing authenticated write access to a container for story attachments on storyboard-dev23:09
clarkbah23:09
fungiit's the same tenant, the user just has limited access to only be able to write to a single swift container23:09
fungiwonder why they picked that on23:09
fungithat one23:10
ianwi can't see the ticket logging in as the "main" user23:10
fungibizarre23:10
clarkbmaybe it only affects swift?23:10
fungithe creds for that account should be in our usual list and at least in hiera, but anyway i can look at it tomorrow, i doubt it's important in that case23:11
openstackgerritMerged opendev/system-config master: Revert "Enable jitsi-meet xmpp websockets"  https://review.opendev.org/c/opendev/system-config/+/78157723:17
ianwfungi: yeah, i pulled up the creds but i guess that user can't log into the UI to see the tickets23:19
fungiright, probably not23:19
openstackgerritIan Wienand proposed opendev/system-config master: kerberos-kdc: quote some integers to avoid string/int confusion  https://review.opendev.org/c/opendev/system-config/+/78158523:38
ianwok, did a couple of runs on the kdc servers and all looking good; replication working and everything listening23:44
clarkbnice23:45
clarkbI just rechecked the meetpad server and it seems happy so I think we can call that done for now23:45
ianw#status log all afs and kerberos servers migrated to focal, under ansible control23:48
openstackstatusianw: finished logging23:48
fungiyay! excellent23:52
clarkbianw: were the kdcs done in place too?23:52
clarkband yes yay23:53
fungithose are new servers, right23:53
fungi?23:53
ianwclarkb: yeah, i just left them; since they're serving OPENSTACK.ORG domain it didn't seem logical to move to opendev.org and new servers23:53
fungioh, right23:53
fungithey were already 03 and 0423:53
clarkbfungi: ya last round we did new servers and then afterwards discussed maybe not doing that again :)23:53
clarkbiirc something didn't catch on that things had moved until we had more forcefully made them check again23:54
ianwhowever, if we wanted a couple of new servers to do an OPENDEV.ORG domain ... that would largely be trivial and it would even set bootstrap itself with no intervention23:55
clarkbya the hard part would be migrating to it23:56
clarkbunless we also just started over on the target (and copied files in afs as necessary)23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!