Monday, 2022-08-22

opendevreviewIan Wienand proposed openstack/project-config master: nodepool: Add Fedora 36  https://review.opendev.org/c/openstack/project-config/+/85389301:26
opendevreviewIan Wienand proposed openstack/project-config master: nodepool: add rocky-9 stub  https://review.opendev.org/c/openstack/project-config/+/85389401:26
ianwi requested a fresh rocky-9 build01:51
opendevreviewMerged openstack/project-config master: nodepool: Add Fedora 36  https://review.opendev.org/c/openstack/project-config/+/85389302:02
*** soniya29 is now known as soniya29|ruck03:39
*** soniya29 is now known as soniya29|ruck04:29
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Fedora : update to Fedora 36 release nodes  https://review.opendev.org/c/zuul/zuul-jobs/+/85390904:37
*** ysandeep|PTO is now known as ysandeep04:38
*** soniya29|ruck is now known as soniya29|ruck|afk05:06
opendevreviewIan Wienand proposed zuul/zuul-jobs master: zuul-jobs-test-ensure-python-pyenv: update matchers  https://review.opendev.org/c/zuul/zuul-jobs/+/85391305:16
*** ysandeep is now known as ysandeep|afk05:18
opendevreviewIan Wienand proposed zuul/zuul-jobs master: zuul-jobs-test-ensure-python-pyenv: update matchers  https://review.opendev.org/c/zuul/zuul-jobs/+/85391305:27
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Fedora : update to Fedora 36 release nodes  https://review.opendev.org/c/zuul/zuul-jobs/+/85390905:27
opendevreviewIan Wienand proposed zuul/zuul-jobs master: [wip] zuul-jobs-test-registry-buildset-registry-openshift-docker: use 9-stream  https://review.opendev.org/c/zuul/zuul-jobs/+/85391605:27
opendevreviewIan Wienand proposed zuul/zuul-jobs master: [wip] zuul-jobs-test-registry-buildset-registry-openshift-docker: use 9-stream  https://review.opendev.org/c/zuul/zuul-jobs/+/85391605:45
opendevreviewIan Wienand proposed zuul/zuul-jobs master: zuul-jobs-test-registry-buildset-registry : update matcher  https://review.opendev.org/c/zuul/zuul-jobs/+/85391805:45
*** soniya is now known as soniya29|ruck05:57
mnasiadkaianw: I can confirm rockylinux-9 nodes work, thanks for fixing that :)06:45
*** soniya29|ruck is now known as soniya29|ruck|lunch07:31
*** jpena|off is now known as jpena07:36
*** soniya29|ruck|lunch is now known as soniya29|ruck08:38
*** ysandeep|afk is now known as ysandeep08:55
*** soniya is now known as soniya|ruck09:08
*** soniya|ruck is now known as soniya|ruck|afk09:24
*** efoley_ is now known as efoley09:35
opendevreviewTobias Rydberg proposed opendev/irc-meetings master: Restart of the Public Cloud SIG  https://review.opendev.org/c/opendev/irc-meetings/+/85393809:54
*** soniya|ruck|afk is now known as soniya|ruck10:18
*** rlandy__ is now known as rlandy10:22
ianwmnasiadka: oh good, thanks for confirming :)10:50
*** soniya is now known as soniya|ruck11:11
*** dviroel|out is now known as dviroel|rover11:28
*** ysandeep is now known as ysandeep|afk12:22
priteauHello. Do you know when https://review.opendev.org/c/openstack/project-config/+/853003 will get deployed to Zuul? It doesn't seem to be in place as https://review.opendev.org/c/openstack/kayobe/+/850903 still doesn't run CI jobs.12:48
fungipriteau: that change rolled out 5 days ago13:12
frickler2022-08-22 12:46:06,340 ERROR zuul.GithubConnection.GithubClientManager: No installation ID available for project stackhpc/ansible-role-os-images13:13
fricklerfungi: priteau: does the zuul app need to be added to that repo?13:13
fungioh, good question. i don't actually know much about how the github driver works13:14
fungithere were other repos from the same org already in the config, so i guess i just assumed that was an org-wide thing13:15
fricklerah, no:13:15
frickler2022-08-22 12:46:17,717 ERROR zuul.source.GithubSource: Failed to retrieve dependency https://github.com/stackhpc/ansible-role-os-images/pull/63. Retrying13:15
fricklerpriteau: that PR has been merged, iiuc it can no longer be reference as a dependency then13:15
frickleradmittedly it would also be helpful if zuul reported that error in gerrit instead of only in its log13:18
fungii also wasn't aware zuul didn't allow merged pull requests as cross-project dependencies, but as i said i'm not all that familiar with the github driver so i guess that could be the case13:18
fricklerfungi: I think the root cause for the issue might be that github deletes the source branch after the merge, so zuul fails at cloning it13:20
fricklernot sure if we'd want to teach the zuul github driver to just ignore the dependency for merged PRs and use the target branch instead13:21
fungiyeah, i did see the source branch deletion message in the pr comments, but as far as internals of the driver didn't realize that's what it pulled rather than the discrete commits from the pr13:21
priteauOK, so we need to update the change and remove the depends-on13:22
fungii guess the problem there is that the merged state of the pr may not actually include the parent commits if gh is auto-rebasing instead of creating merge commits13:23
fungiperhaps the driver could be made to try to git am the patch from e.g. https://patch-diff.githubusercontent.com/raw/stackhpc/ansible-role-os-images/pull/63.patch13:25
priteauThanks, the updated change is running CI jobs now.13:26
fungipriteau: it would probably still be good to find out if a dependency on an open pr for one of those added projects works13:27
fungijust to make sure that was the issue13:27
priteauI've pushed another change to test13:30
priteauI don't see Zuul running jobs for it13:32
priteauhttps://review.opendev.org/c/openstack/kayobe/+/85397913:33
fricklerpriteau: yep, still the same pair of errors. together with the surrounding logs, the missing zuul app seems to be the culprit13:42
priteauOK, I will check what can be done on our side13:43
priteaumgoddard: is this something you are managing?13:43
Clark[m]The GitHub app shouldn't be required from depends on from our side. The app is only needed to have GitHub events drive zuul13:53
Clark[m]Deleting the branch that hosted the PR does break things. GitHub prunes the refs and shas when you do that so zuul can't fetch the information. I don't know what the updated change with the updated depends on isn't working13:55
priteauIs there an error in logs like there was for the previous patch?13:59
Clark[m]I think frickler said it was the same error. I can't check myself for another hour or so. Just wanted to call out the app should only be required to drive zuul from the GitHub side which isn't what is being done here14:00
*** ysandeep|afk is now known as ysandeep14:01
*** dasm|off is now known as dasm14:02
priteauOK. But the branch still exists in this case.14:05
fricklerClark[m]: priteau: this is the whole log I found https://paste.opendev.org/show/bhgxMh429jLp4NzTU4X1/ , this then repeats a couple of times14:06
Clark[m]Did the zuul GitHub driver update recently to assume an app is always installed?14:10
Clark[m]I think the flow is: zuul finds depends on, maps to GitHub driver, looks for app install and fails to find one, the manually contracts the information needed to satisfy the depends on. I'm not sure those errors are fatal.14:21
Clark[m]Line 43 of the paste indicates it got the PR as well.14:22
fricklermight well be a bug in zuul triggered by those PRs. like maybe the "Merging is blocked - Merging can be performed automatically with 1 approving review." which is rather non-standard14:37
fricklerso whatever triggers this, IMO there should be a check for commit==None here https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/graphql/__init__.py#L126-L12914:51
clarkbis there a traceback failing on a None type not having attributes/methods?14:54
*** dviroel|rover is now known as dviroel|rover|lunch14:58
*** soniya|ruck is now known as soniya|out15:05
clarkboh I see the graphql request is a 40115:12
clarkbI agree I think this is exposing a bug in zuul15:13
* clarkb works on an updated paste15:13
*** ysandeep is now known as ysandeep|dinner15:13
clarkbI've posted details to the zuul matrix room15:16
clarkbas I don't think thsi is an opendev problem but a zuul one15:16
clarkbinfra-root landing  https://review.opendev.org/c/opendev/system-config/+/853528 is on my todo list. Any objections to me approving that now and then trying to restart gerrit sometime later today? I know the openstack release is approaching so trying to be extra careful but I expect a change like this should be quite safe?15:40
clarkbGitea 1.17.1 is out now too. https://review.opendev.org/c/opendev/system-config/+/847204 may be worth reviewing now then we can figure out when a good time to land and upgrade gitea is15:41
corvusclarkb: ++15:42
fungiclarkb: gerrit restart today sounds great, thanks!15:51
clarkbgreat change approved15:52
fungithanks!15:53
*** dviroel|rover|lunch is now known as dviroel|rover16:12
clarkbfungi: also do you think you'll have time this week to poke at mm3 further? I've got a node held currently for the latest patchset (198.72.124.71)16:22
clarkbtuesday's tend to be full of meetings and I'm out wednesday. I can probably dig into that with you on thursday if you want me to help (I've definitely learned a few things about mm3 at this point)16:22
fungiyeah, hopefully wednesday and thursday we can test some production-like config/archive migrations. today i'm still trying to catch up from vacation, tomorrow is solid meetings as you noted, and friday's a bust since i've got to take christine to an appointment on the mainland16:26
*** tkajinam is now known as tkajinam|off16:35
*** jpena is now known as jpena|off16:38
clarkbfrickler: priteau: one thing I notice browsing that PR while logged in is that the stackhpc/ansible code owner isn't clickable like jovial is16:39
clarkbI wonder if we don't have perms to see that user and the graphql query does seem to try and list that sort of info?16:39
clarkbIf I look at a random ansible proper change with a reviewer listed the reviewer is clickable there16:40
clarkbalso the user appears to be 'stackhpc/ansible' which is also a valid repo path. Maybe graphql is getting confused due tocollisions?16:41
priteauThat's because it's a private team in our org16:42
priteauI can update the change to use an older, unmerged PR with users as reviewers16:42
clarkbthat seems like the liekly issue then16:42
clarkbzuul is trying to list code review requirements and getting a not authorized error because stackhpc/ansible is hidden away is my hunch16:42
clarkbI think zuul could report this issue better, but I'm not sure zuul can work around it in a reasonable way. It would still be an error I thinik16:43
priteauI've updated the change to depend on https://github.com/stackhpc/ansible-role-os-images/pull/44, but it still doesn't run jobs16:44
priteauUnless it is still getting our new code review requirement from the API16:44
clarkblooks like it is still getting a 401 from github16:46
clarkb[e: f023a8f7b3c740658dd1e01f89d21e4c] POST https://api.github.com/graphql result: 401, size: 177, duration: 55, zuul_query: canmerge, owner: stackhpc, repo: ansible-role-os-images, pull: 44, head_sha: 33ce0bc07293900073c887b135706863488d46d216:47
clarkbmaybe that rules out the code reviewer isn't visible idea. Something else is tripping authorization issues?16:47
clarkbhttps://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/graphql/canmerge.graphql is the query executed. I don't know enough about github's permission model to say why this does/doesn't work16:48
clarkbit does seem to be repo specific though. And whatever the issue is I'm not sure zuul will be able to work around it, just get better at reporting the error (will depend on whether or not zuul can function without the resulting data if we can update the query to dodge problematic access)16:49
priteauAnd adding the zuul app to the repo, as was suggested. Could it help?16:49
clarkbIt may help if that gives zuul the necessary permissions in your repo to complete that query16:49
clarkbmy understanding is that the app really should only be required if taking action on github events and/or trying to merge changes with zuul.16:50
clarkbOtherwise the expectation (and what seems to work with say ansible/ansible and a few ohter repos) is that we can fetch sufficient info via the api to integrate with those upstreams via downstream events16:50
priteauWe can try and see if that makes a difference.16:50
priteauThanks for investigating.16:51
clarkbIf I open the PR in an unauthenticated context then the checks info goes away. I thought zuul was making these requests with a token so it is authenticated. But maybe it isn't when there isn't an app and checks info is hidden?16:53
clarkbcorvus: ^ do you know who might understand the permissions model of all this better? perhaps someone at bmw?16:54
*** ysandeep|dinner is now known as ysandeep|out16:57
opendevreviewMerged opendev/system-config master: Increase the number of Gerrit threads for http requests  https://review.opendev.org/c/opendev/system-config/+/85352816:58
corvusclarkb: maybe try asking in #zuul again in a few weeks?  lots of folks on vacation.17:01
clarkbI'm reading the githubconnection.py file in particular how it handles clients and now I'm beginning to wonder if I've misunderstood when the app is necessary. We've told people they don't need the app for including the repo in their jobs as a required project (and I believe tripleo uses thsi successfully for a number of ansible roles/collections)17:11
clarkbBut in the getGithubClient() method which is used by fetch_canmerge() to get a client (and that is used to satisfy a depends on) it seems to think if there is an app id then the app should be installed to the project and we use that for auth17:12
clarkbso the fix here may be to add zuul to the project repo and annotate in my head that only reuqired projects but not depends on function without the app17:12
corvusi think we would like depends-on to work without app installation, so i think changes that help achieve that would be welcome17:13
clarkbthat method also has a comment that seems to say we will fallback to anonymous17:14
clarkbanonymous PR GETs seem to have all the info there except for the checks info. I wonder if we can better support depends on here if we only request checks info when zuul needs it then maybe we can anonymously fetch the PR info?17:15
clarkbI'll try to bring this up with the BMW folks when vacations end17:15
fungiclarkb: anonymous api calls get slotted into a different rate limit, so could be problematic for that reason17:17
fungior at least that was my recollection for why it got updated to prefer authenticated calls whenever possible17:17
clarkbfungi: thats true, but getting rate limited is better than a 401  since it won't happen all the time17:17
clarkbin cases like this I suspect depends on to github are infrequent enough to limit the impact of rate limits. And when we do hit them we can suggest using a different code hosting platform :)17:18
fungii assumed there were a lot more that wouldn't need checks info than just depends-on, but maybe not17:19
clarkbfungi: I think if zuul is acting as the CI system for a github repo then it always needs the checks info since it is updating it17:20
fungithough it could also be a fallback path for 401 results17:20
clarkband if zuul isn't acting as the CI system I think required projects and depends on are the integration points? separately it is crazy to me that you must authenticate to see checks info17:21
fungibut at any rate, more on-topic for the zuul matrix channel i guess17:21
*** rcastillo|rover is now known as rcastillo18:22
*** pojadhav is now known as pojadhav|out18:23
danieloGreetings. Since about Wednesday of last week, all traffic to the opendev/openstack gerrit from just one of my external IP addresses appears to be blocked. What's the best way to see about seeing if that is the case, and if so, clearing the block?19:27
clarkbdanielo: a couple (what I think anyway) jenkins servers were blocked because they appeared to be improperly configured to request a gerrit event log19:41
clarkbdanielo: I'm guessing one or both of those are yours?19:41
clarkbif we can address the noisy requests that 404 indefinitely (and will continue to do so) then I think we can open that back up again19:42
danieloPossibly; I'm just the network engineer, but I believe it is a Jenkins server.19:42
clarkbthe requests are to /plugins/events-log/ and are performed by apache httpclient running on java 819:43
danieloOK, thanks. What can I tell the admins they need to do to get back in good graces?19:43
danieloGreat. I'll check with the admins and see if that is something they're doing.19:44
clarkbI think we can ask them to update their jenkis configuration to stop perpetually making requests that will 404. We're happy to have them use the ssh event stream.19:44
clarkbWe may have felt more confident blocking the IP due to the failed requests (without double checking there weren't also valid requests). It was done in response to some load issues and we were trying to mitigate where we could19:47
danieloMakes sense19:48
clarkbinfra-root anything else to add to tomorrow's meeting agenda?20:46
fungii have nothing20:46
corvusclarkb: 1 sec20:48
corvusclarkb: i added an item to https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting thxn for the ping20:52
clarkbyou're welcome I see it too so it'll get included20:52
clarkbThe gerrit config update applied (I just double checked the server). In the past ianw has often volunteered to do them when things are quiet. I'll see if ianw wants to do that this time otherwise I'll plan to do it before my day ends21:27
fungii'll probably be around fairly late in my day too21:40
danieloclarkb: The admins claim they are using the Gerrit Trigger plugin with the recommended configuration at https://docs.opendev.org/opendev/system-config/latest/third_party.html#the-jenkins-gerrit-trigger-plugin-way . That said, according to the plugin's docs it has a feature that checks that URL to see if the server has the events-log plugin installed (see https://plugins.jenkins.io/gerrit-trigger/#plugin-content-missed-events-playback21:51
*** dasm is now known as dasm|off21:51
clarkbdanielo: any chance they can communicate with us? I don't know that we need to play telephone21:51
danieloclarkb: I was thinking the same think. What's the best way to get them in touch? IRC?21:52
clarkbya synchronous comms here on IRC. Or they can subscribe to our mailing list service-discuss@lists.opendev.org here https://lists.opendev.org/cgi-bin/mailman/listinfo/service-discuss and use email asynchronously21:53
*** dviroel|rover is now known as dviroel|out21:53
clarkbI guess https://plugins.jenkins.io/gerrit-trigger/#plugin-content-gerrit-server-events-log-plugin is the part of the plugin that is giving us trouble. Unfortunately that doesn't indicate if we can disable it21:54
clarkbI think we can probably go ahead and remove the blockage, but it would be good to better understand why jenkins is so spammy now21:55
danieloLooks like you can if you disable the REST API... but that might have other ramifications.21:55
danieloThat is, disable the REST API from the gerrit trigger plugin21:56
danieloDo you still want me to have the admins reach out to you?21:56
clarkbdanielo: I think it would be good for them to know how if they are going to talk to these services. I'm not sure it is urgent here. More that game of telephone is inefficient21:57
clarkbfungi: do you see where in the iptables rules frickler made these drops? I'm not seeing them myself21:58
clarkbI guess it must be the very first rule21:59
clarkbI'll just restart the netfilter persistent service which should reapply our normal ruleset22:00
clarkbdanielo: thats been done if you want to have them double check things are happy again22:02
danieloclarkb: checking...22:02
clarkbinfra-root: side note restarting netfilter-persistent has a side effect of blowing away the docker chains. This doesn't (currently) affect us because we use host networking22:03
clarkbinfra-root: but that is something we should be aware of should we ever stop using host networking exclusively for containers22:03
danieloclarkb: still no responses to Gerrit SSH (29418/tcp), HTTPS (443/tcp) or ping from that source address. If it helps, I can give you my last octet.22:10
clarkbdanielo: ya we should probably double hceck the addrs. But I reset our entire set of iptables rules so ther eshouldn't be anything on our side blocking you now22:11
clarkband I've just double checked that the port is generally open from my location (so there isn't some widespread issue)22:13
ianwi'm happy to do a gerrit restart later22:16
clarkbianw: thanks22:16
clarkbdanielo: manual testing can be done via something like `ssh -p 29418 yourusername@review.opendev.org gerrit ls-projects` this should print a couple thousand lines of text containing the repo names in the gerrit server22:17
danieloclarkb: last octet is 21; it works from other addresses in the same prefix22:18
clarkbok the one in the rule we had ended in a 5. However, in our logs discussing this back on wednesday there was a .21 mentioned. So now I'm wondering how frickler updated the fules22:20
clarkbs/fules/rules/22:21
clarkbfungi: ianw ^ ideas?22:21
* clarkb attempts to check security group rules out of completeness22:21
ianwhrm, i imagine it would only be iptables, confirmed it looks clear to me22:23
clarkbsecurity groups appear clear too if I'm reading things correctly22:24
clarkbunfortunately frickler didn't specify how things were blocked22:25
ianwtcp dpt:29418 flags:FIN,SYN,RST,ACK/SYN #conn src/32  is the connection limit rule right?22:25
clarkbianw: yes22:25
ianwseems to be the only thing it might have hit22:26
clarkbapache isn't at play here since it is port 29418 and ansible would've reset that anyway22:27
fungiand the connection limit overflow is only for too many concurrent connections (like 100)22:28
clarkbneutron's firewall as a service doesn't seem present in that cloud so no rules there22:29
corvusclarkb: `ip route`22:31
fungioh!22:32
fungiindeed, there are a ton of blackhole entries in the routing tabke22:32
fungitable22:32
clarkboh wow22:32
clarkbfrickler: ^ can we pleaas use the firewall to drop packets22:32
*** rlandy is now known as rlandy|out22:32
clarkbI think for me at least thats far more intuitive since that is the job of a firewall22:32
fungiso anyway, we should be able to ip ro del those22:33
clarkbfungi: thanks /me is pulling up manpages now22:34
corvusi think if the pattern holds, probably `ip route del blackhole A.B.C.D` or `ip route del blackhole A.B.C.0/24` depending on the rule22:35
clarkbfungi: looks like `ip route delete blackhole ipspecifier` ?22:35
clarkbcorvus: yup thats what I just concluded reading the manpage too22:36
clarkbfungi: ^ if you agree I can run that for the ip in question22:36
fungiyeah, or may be able to just get away with ip ro del a.b.c.d/e22:36
clarkbcool trying that now22:36
fungiusually the destination is enough to identify the route22:36
clarkbok thats done seems to have removed it22:37
clarkbdanielo: ^ if you want to try again22:37
danieloclarkb: It's alive! Thank you for your help today.22:37
clarkbwe can followup with frickler on cleaning up the other routes tomorrow. In general I have a preference myself for using the firewall to drop packets since in my head that is the job of a firewall and not a router. But I'm open to using the route table for this and learning to check it in the future if people prefer it22:38
clarkbI suspect it is slightly cheaper to use the route table which may be one reason to prefer it (but linux network is complicated enough this may not be true)22:39
corvusthe routing table is counter-intuitive for me as well; i would have expected iptables.22:39
fungithe problem with blackhole routing return paths is that i don't think iptables/netfilter is smart enough to factor urpf into its filtering decisions, so lets the initiating packets in and starts socket negotiation and adds state table entries22:42
clarkbfungi: but if we block with iptables then iptables is aware of the whole decision making chain and can manage the state properly?22:43
fungiyes, well telling packet filtering to block the first incoming packet means it never gets far enough to create state table entries nor start socket negotiations which will never complete22:44
corvus++22:44
fungiblackhole routing is analogous to egress filtering22:45
fungiso in that sense, for these sorts of situations, ingress filtering seems preferable22:45
fungiwe could set up source route blackholing too i think, which would be more akin to ingress filtering, problem there is order of operations. iptables is hooking a bpf (i think? this may have changed more recently in kernel land) so sees the packets before they reach the routing table to get discarded by any source routing rules22:47
fungitypically, ingress filters are the first thing to see incoming packets, and egress filters are the last thing to see outgoing packets. routing happens after ingress filtering and before egress filtering22:48
clarkbseparately https://issues.jenkins.io/browse/JENKINS-69338? appears to be the issue tracker for the gerrit trigger plugin22:50
clarkber https://issues.jenkins.io/browse/JENKINS-69338?jql=resolution%20is%20EMPTY%20and%20component%3D1573122:50
clarkbI think I have an account on this server22:53
clarkboh looks like they also use the github repo issue tracker22:54
clarkboh wait no thats a sub library that they have upstream of them22:57
clarkbdanielo: I've discovered that you can modify the check period in the plugin: Long.getLong("com.sonyericsson.hudson.plugins.gerrit.trigger.playback.checkEnabledPeriod"23:00
clarkbdanielo: the default is 2ms (I don't know why 2ms was chosen) but maybe ya'll could set that to something like once a week? It is very unlikely we'll add that plugin any time soon23:01
clarkbthough digging around through the source I'm not sure where you are supposed to set that value. It may jus always fail to find it and use the default? That would be unfortunate23:05
clarkbI'm assuming there must be some configuration field that you can edit that into. Maybe it has to go directly into some xml and isn't exposed via the web? Its been a long time since I used jenkins and not sur ewhat the current method for doing that is23:08
clarkboh have I misread that and its is actually 2 seconds but carried as a millisecond value? In any case checking every 2 seconds if a plugin has been installed to a gerrit server isnt much better than every 2ms23:14
clarkbit should check on the order of days by default imo. Its not like plugins are installed and removed from gerrit servers frequently23:14
danieloclarkb: That is excessive. I'll ask the admins to see if they can figure out a way to reduce the frequency. If they do, I'll ask them to share it with you so you can update the Sphinx docs.23:15
clarkbdanielo: that would be excellent, thank you23:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!