Tuesday, 2020-12-01

*** sboyron has quit IRC00:23
*** hamalq has quit IRC06:21
*** hamalq has joined #opendev-meeting06:22
*** zbr has quit IRC06:52
*** sboyron has joined #opendev-meeting06:52
*** zbr has joined #opendev-meeting06:53
*** hashar has joined #opendev-meeting08:10
*** zbr has quit IRC09:51
*** zbr has joined #opendev-meeting09:56
*** zbr has quit IRC10:58
*** zbr has joined #opendev-meeting10:59
*** zbr has quit IRC11:00
*** zbr has joined #opendev-meeting11:01
*** zbr has quit IRC11:03
*** zbr has joined #opendev-meeting11:05
*** zbr has quit IRC11:07
*** zbr has joined #opendev-meeting11:17
*** hashar is now known as hasharAway15:20
clarkbanyoen else here for the meeting?19:00
diablo_rojoo/19:00
ianwo/19:00
clarkbcool we shall get started momentarily19:00
clarkb#startmeeting infra19:01
openstackMeeting started Tue Dec  1 19:01:13 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-November/000137.html Our Agenda19:01
clarkbWe have an agenda, trying to get things back to normal after an eventful few weeks19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
clarkbWallaby cycle signing key has been activated https://review.opendev.org/76036419:01
clarkbPlease sign if you haven't yet https://docs.opendev.org/opendev/system-config/latest/signing.html19:01
clarkbat this point this is there mostly as a reminder for myself as I have failed to sign it sofar :(19:02
clarkb#topic Actions from last meeting19:02
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:02
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-24-19.01.txt minutes from last meeting19:02
clarkbLast meeting we didn't haev a formal agenda and instead went through gerrit upgrade related items.19:02
clarkbThere are still a few of those to talk through which we will get to shortly19:03
clarkb#topic Priority Efforts19:03
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:03
clarkb#topic OpenDev19:03
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:03
clarkbWe've been working through the debugging of system load on Gerrit. We've had a few good leads so far but nothing that has made it go completely away19:03
fungiohai19:03
clarkbIn particular someone else on the Gerrit mailing list was struggling with similar on Gerrit 2.16 and the discussion there pointed to caches19:04
corvuso/19:04
clarkbfungi and I have since been trying to tune our cache sized based on the info that ssh review gerrit show-caches gives us19:04
clarkbI think this has helped but it hasn't completely made things happy yet19:04
fungiworth noting, we think there is correlation between the "missing tree" errors people get on push and the elevated system load19:05
clarkbWe also noticed that there is a jgit recieve.autogc setting that runs git gc when code is pushed. We set that but literally just 5 minutes ago I realized we set it in the wrong config file19:05
clarkbthere is not a jgit config file so I imagine next ups is getting that moved into the correct config file19:05
fungithough it's so far only been observed on large repos, generally while or immediately after pushing change series19:05
clarkbwhcih could be related to the autogc thing maybe? the gerrit docs note that disabling it is recommended (despite being enabled by default) due to the load impact it has19:05
fungiyeah, i'll work on adding the jgit.config after this meeting19:06
clarkb3.3.0 release notes imply that it will not be disabled by default though so maybe they decided making things bad by default was not recommended19:06
fungithe release note on that is a little vague/confusing, to be perfectly honest19:06
clarkbIn conversation with Luca on the Gerrit slack he says that Java 11 is likely also to have some performance benefits. Gerrithub has been running 3.2 on java 11 since the beginning of this release19:06
clarkbfungi: yup19:06
clarkbI think this means we should also look at landing java 11 support in our images then switch over prod to java 11. Fungi switched review-test over to java 11 this morning19:07
fungiyeah, if folks want to beat on review-test at all, that's helpful19:07
clarkbAnd I'm hoping that this afternoon I'll have time to update the image jobs to build a 3.3 as well19:07
fungii feel like we should land the openjdk 11 patch before trying the upgrade to gerrit 3.3, fwiw19:08
clarkbI agree19:08
fungithat way if we see new issues we have a better idea of what brought them in19:08
ianw++19:08
ianwnot to derail, but how important do we think upgrading the hsot from xenial is too?19:08
fungialso we need to do openjdk 11 before we upgrade to (not yet existent) gerrit 3.419:09
fungisince they're planning to drop support for <11 at that release19:09
clarkbianw: I think that is reasonably important, but not urgent. eg we should be able to schedule that and warn people of the upcoming new IP address19:09
clarkbif someone wants to start looking at what that would require I would be grateful :)19:09
corvusmy understanding is it should only be important for OS support reasons19:10
corvusnot for java version/performance reasons19:10
corvusis that correct or do we think there's a perf benefit?19:10
clarkbcorvus: generally linux benchmarking gets worse as you get newer kernels19:10
clarkbI would actually expect a performance impact (if I had to guess without testing)19:10
fungiyeah, i think the os upgrade would just be mre because xenial reaches eol in a few months19:11
clarkbphoronix does generic benchmarking of linux over time if people want to see what I would assume that19:11
ianwyep, and also if you're spending time debugging things and it does get down to the kernel/container-ish layer better to be debugging something current19:11
clarkbianw: ya thats true19:12
fungiand on that note, sometime soon we should also talk out a plan for how we would actually do the upgrading to focal... options are to build a new vm and then we have new ip addresses to warn folks about (given how many we know are stuck behind corporate firewalls with special rules allowing 29418/tcp to our server's current address) or do in-place upgrades19:12
clarkbI think I still strongly prefer the new host method19:12
ianwi feel like last time we went with in-place19:12
corvusit sounds like it's a wildcard and could go either way, so i'd lean towards deferring os upgrade until we've stabilized or run out of other things19:12
fungii do too, but in that case we need to decide on a communication schedule19:12
clarkbcorvus: ++19:12
fungicorvus: yes, i agree we should hold off the os upgrade until we have known performance for the container on the current os version19:13
corvusdo we want to see about putting together an http-only recommendation for third-party ci before host replacement?19:13
clarkbThe other thing I wanted to bring up is tristanC has done some plugin work to do zuul results table rendering. I've been too distracted by other things, but do others think that is in a place that we should consume it? I think if I had any concerns its that it is written in another esoteric alnguage that compiles to js/java aiui19:13
clarkbcorvus: based on some of the responses I've gotten so far I think a lot of third party CIs would struggle with that19:14
clarkba non zero number are still stuck on zuul v219:14
corvusthey would have a choice about what kind of struggle19:14
fungialso how would http-only work? are we planning to add the checks plugin?19:15
corvusfight internal network rules or upgrade software to supported versions19:15
clarkbfungi: that is a good question19:15
corvusfungi: that's the question; i'm not sure checks has a long-term future, but it does exist and has no limitations for the third-party ci use-case (it does for a full gating system); an alternative may be webhooks.19:15
fungiright now we're not offering them an alternative for the stream-events cli19:15
clarkbcorvus: is webhooks another plugin option?19:16
corvusyep19:16
fungiso while i think http-only sounds great, we'd probably need to decide what that looks like and get it available first19:16
corvusafaik, its supporters do have a long-term interest19:16
clarkbfungi: ya sounds like something to do more investigating for19:16
ianwthere's also now the "findings" tab?  if i've understood, you're supposed to put "autogenerated" on your review comment to be in there?19:17
corvusfungi: agreed (is why i raised it -- do we want to look into setting that as a goal?)19:17
clarkbianw: I think zuul is doing that?19:17
corvusyes has been for some time19:17
corvusi believe findings are different (at least, last time i was exposed to the design doc)19:17
fungiianw: robot comments are toggleable, zuul has done that by default for ~ a year19:18
fungiand yes, robot comments and findings are separate things19:18
ianwi haven't yet managed to find the documentation on how to get anything into "findings"19:18
corvusianw: are you suggesting findings tab as alternative to results table rendering?19:18
fungithe checks plugin puts thnigs in findings19:18
ianwcorvus: not really as i don't understand it, but i mean it does seem like a summary of the latest zuul results is a "finding"19:20
corvusclarkb: i haven't seen tristanC's table; is there a ml message or other link or something?19:20
corvusianw: have a link to an example?19:20
fungiianw: what "robot comments" (autogenerated) do is hide things when you switch the "only comments" slider in the "change log" section of the change view19:20
ianwcorvus: yes, let me did, it was rolled out on a test instance19:21
ianwdig19:21
clarkbcorvus: https://review.opendev.org/c/opendev/system-config/+/763891 is the change19:21
clarkband ya the job that test gerrit installation on ^ was held aiui for people to test it19:21
corvusfungi: (at some point i understood "robot comments" to be a new type of comment associated with checks plugin vs the "regular old comments" which may or may not have the 'autogenerated' tag19:22
ianwhttps://104.130.172.52/c/openstack/diskimage-builder/+/55400219:22
ianware we onto talking about the table?  because i'd like to run some things about gerrit gate testing by the peanut gallery19:23
fungicorvus: oh, interesting, it's possible i've confused them but i kept seeing them mentioned as the same thing19:23
corvusclarkb, tristanC: there are 2 zuul plugins for gerrit19:23
corvusclarkb, tristanC: is there any way maybe we could contribute to one or more of those?19:23
clarkbcorvus: yes I strongly encouraged tristanC to do so, but was told there is no interest in learning java or js19:24
corvusi believe tristanC knows js19:24
corvusunless tristanC forgot js?19:24
clarkbwhich is one of my concerns with using the sf thing, its in a random language that tristanC finds acceptable rather than the upstream tooling19:24
clarkbcorvus: I dunno that is just what I was told last week when it came up19:24
clarkbI believe this particular plugin is written in some language that compiles to js19:25
ianwyeah, but "javascript" these days is similar to assembly language really19:25
fungithe main thing i've wondered about scope-wise is whether a pg plugin for displaying a summary table of arbitrary third-party ci comments/votes is relevant to the zuul plug-in, but maybe if zuul is the reference for the comment format then it could be19:25
corvusianw: i think that's a bit of a stretch19:25
corvusfungi: displaying zuul results is absolutely relevant19:25
fungiyep, and so if other ci systems leave comments which look like zuul results, then supporting that as part of the zuul plugin seems sane enough19:26
corvushttps://gerrit.googlesource.com/plugins/zuul-status/19:26
ianwcorvus: maybe, but i mean https://104.130.172.52/plugins/zuul-results/static/zuul-results.js19:26
corvusDisplays zuul status on PolyGerrit change19:26
clarkbcorvus: http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-11-23.log.html#t2020-11-23T15:10:5519:26
corvusianw: i'm not sure what the point you're making is19:27
corvusianw: that is clearly a minimized and obfuscated file; i don't deny the existence of such things19:27
corvusi only say that plenty of people write javascript as the input to creating such files19:27
corvusthe fact that there are minimized js files doesn't mean we need to learn new languages; the upstream polygerrit plugins are written in something resembling js, right?  so collaboration with others could be done that way, and since we've managed to teach some zuul devs how to do some basic js, they may be able to contribute too19:28
clarkbcorvus: yup agreed. Maybe the best thing here is to hold out and see if we can upstream support for this into an existing plugin first19:29
ianwright, anyway i guess the exact point at hand is this is this concrete proposal for adding this table is written in https://reasonml.github.io/ and we probably have to decide if we want to incorporate that19:29
corvusclarkb: based on that convo, it seems like we're saying "someone needs to learn polygerrit" vs "someone needs to learn reasonml"19:29
ianwin terms of the bigger picture, of testing plugins, i think we should do some work there too.  fungi suggested on-list that we should hold a node to test the plugins, which sort of works19:30
clarkbright which I still think would be better if that is the toolchain gerrit has attached to19:30
corvusianw: i've already -2d one change to add reasonml to zuul based on the lack of support for our last experiment with an esoteric language19:30
clarkbbecause then we're collaborating in that ecosystem rather tah nsetting off on our own and being different19:30
ianwhowever, getting reviews into that held gerrit that look useful enough to test the plugin is a bit of a pain19:30
fungiianw: yeah, what i didn't consider at the time was that we also need to get some representative content into the held gerrit somehow19:31
clarkbianw: fungi could we autogenerate some content?19:31
fungiwe could instead demo things on review-test for now, i suppose, and hold off deleting it19:31
corvusi mean, i like playing with esoteric functional languages, don't get me wrong, but as a group we don't have the best track record there, whereas i think there's a bigger chance we can get more long-term collaboration/support by sticking with how upstream does plugins19:31
clarkbmake a project, push some changes, merge a change or two, etc19:31
corvusclarkb: ++ 'collaborating in that ecosystem'19:31
ianwclarkb: yes, i think so ... but we need to figure out adding the first admin user automatically19:31
clarkbianw: the zuul all in one stuff does that, I bet we can reuse it19:32
fungiianw: i have that figured out19:32
corvusyou just need to leave a comment to test this, right?19:32
clarkbcorvus: ya a zuul formatted comment I think (maybe the username matters too? I'm not sure)19:32
fungithere are probably multiple ways to create an initial admin account, but one is to use the gerrit cli with the built-in "gerrit code review" user19:32
ianwfungi: ok, i think we should go through together out of meeting maybe, and see if we can get the test job doing it19:32
corvusfor hideci, yes; but hopefully we can omit that in the future -- comment tags are a thing :)19:32
fungii think the zuul quickstart just uses become auth right?19:33
fungibeen a while since i looked at that bit19:33
ianwat that point, it seems like it would also be easy to use a headless browser to take a screenshot of a review, which would make it easy to have an artifact confirming plugins working19:33
ianwand we can also hold the node for manual fiddling19:33
fungithat also sounds really awesome19:33
ianwthere's some flag, DEVELOPMENT_BECOME_ANY_ACCOUNT which i didn't fully get to understanding last week19:34
fungiianw: the alternative is the mechanism i describe in the gerrit admins section of our system-config docs. that works even on a gerrit with no existing accounts19:35
corvusalso, ftr, i suspect it's perfectly fine to make a new plugin if this doesn't fit with zuul-status; i don't get the impression that lots of small plugins are necessarily bad.19:35
clarkbcorvus: that is a good point. It seems the more important bit is using the toolchains upstream is using then they may get involved and help us19:35
ianwfungi: ok, that was what i was trying but wasn't getting an admin account.  i think we should try again19:35
clarkbI think the gerrit maintainers do actually do a reasonable amount of plugin work to keep them working as things change ing errit19:35
clarkbsupporting that work would be a good idea imo19:36
ianwi've already engaged on the thread; i can write a summary to respond if we like19:36
clarkbthat sounds like a good way to recap this discussion for those who may not be hear19:36
fungithat reminds me, paladox contributed an opendev theme override (with light and dark mode support) as what i think is a pg plugin, but it's just an sgml/html blob in a paste. i was going to try to learn how to integrate that19:36
clarkbs/hear/here/19:36
ianwit sounds like basically a) we're not currently convinced on the separate project, especially in a language that doesn't have a lot of exposure, and would like to investigate integrating with upstream more19:37
ianwand b) we'd like to expand the overall plugin testing environment to make it easier19:37
clarkbianw: ++19:37
ianwi'll draft something and loop people back19:37
clarkbthank you19:37
fungialso if that discussion thread wasn't on service-discuss, could it be redirected there?19:38
fungii have a feeling it might have ended up on openstack-discuss19:38
ianwi can cc, i think it was openstack discuss only19:38
corvusyeah, fwiw i have no idea what thread is being discussed :(19:38
corvusalso, friendly reminder that there is a zuul running for the purposes of testing plugins in the upstream gerrit; i have no idea what testing means for polygerrit plugins; that may be interesting to learn19:39
ianw#link http://lists.openstack.org/pipermail/openstack-discuss/2020-November/019051.html19:39
fungii do recall replying on it, but in retrospect i should have asked people to follow up to service-discuss19:39
ianwfor reference19:39
corvus(it's mostly testing java plugins)19:39
fungithanks ianw19:39
clarkbalright anything else on Gerrit before we move on?19:40
fungipart of the problem is i subscribe to lots of mailing lists and dump them into the same folder, so sometimes it's not immediately apparent to me if people have started discussions in the wrong ml19:40
fungimaybe we should agree to move forward with the jdk update asap?19:41
fungiother than that, no i think we've got things pretty well covered19:41
clarkbI'm on board, its being tested on review-test. If others can give that a quick check then we're probably good to proceed on that19:41
clarkbthinking out loud here: do the jgit autogc config first maybe? then do java 11 next?19:41
clarkbjust to do one thing at a time and autogc fix seems simpler19:41
fungiyeah, i'll push that change up after the meeting19:42
clarkbthanks19:42
clarkb#topic Update Config Management19:42
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:42
clarkbIs there anything new on this effort to call out? I don't think so but I'm double checking19:43
fungithe codesearch rebuild maybe?19:44
clarkboh ya ianw ^ is that complete at this point?19:44
fungiwe have two servers at the moment still, right?19:44
fungioh, actually it's a cname now19:44
ianwno i cleaned the old one up, that should be all finished now19:44
fungiawesome, thanks!19:44
ianwnobody has complained so i assume it's working perfectly :)19:44
clarkbexcellent19:45
fungiyes, i was making a point to use the opendev one so i would test it19:45
fungiand have had no problems19:45
clarkb#topic General topics19:45
*** openstack changes topic to "General topics (Meeting topic: infra)"19:45
clarkb#topic Bup and Borg Backups19:45
corvusianw: ++ thanks!19:45
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:45
clarkbI think we're getting more and more comfortable with borg? I've unfortunately had little time to interact with it mroe recently19:46
clarkbianw: I know at some point you wanted to do verification then drop bup?19:46
fungii should practice with restoring something i guess19:46
clarkbmaybe a good thing to try and do before dropping bup is having other admins do things like ^19:46
ianwyeah, i was thinking what i'll do is a config change to remove the bup cron jobs; people can audit the borg changes and approve that when happy19:46
clarkbianw: that sounds like a reasonable plan19:47
fungii'm down19:47
clarkband that is important for the focal upgrades we were talking about earlier too19:47
ianwthen we can kill all the puppet bits and maybe just attach the old backup volumes to the new server for a bit19:47
clarkbsince bup and pytho3n don't mix19:47
clarkbthank you for getting this moving and doing all that work, really appreciated19:48
clarkb#topic Docker Rate Limits are Being Seen in CI19:48
*** openstack changes topic to "Docker Rate Limits are Being Seen in CI (Meeting topic: infra)"19:48
clarkbThis is mostly a heads up/fyi19:49
ianwmostly in NAT environments?19:49
clarkbjobs particularly those running on limestone seem to hit this19:49
clarkbianw: ya, though I would've expected it to hit all environments fairly equally due to our use of mirrors? But maybe we aren't using the mirrors the way I thought we were19:49
fungiyeah, we're not seeing it so much on our proxies as on limestone nat for jobs not using the proxy19:49
clarkba few weeks back I pushed up changes to switch our zuul mirror config for docker over to just using the host addrs rather than the mirror. I don't think we need to land those yet since its NAT getting us19:50
clarkbbut something to be aware of and maybe we need to bring that conversation for getting our images open source specialled again19:50
clarkbjbryce was going to look at the agreement in more detail and get back to us but I think like us has been busy19:50
clarkbanother option is to use quay which does not rate limit19:51
clarkbbut does have outages when aws east goes down19:51
clarkbI don't have answers, just info for people to digest :)19:51
corvusor make a new kind of pass-through proxy/mirror19:51
clarkbya one that understands it needs to be a sort of lru cache19:52
corvusyup; i believe that's doable and much of the code in zuul-registry can be repurposed for that19:52
corvus(but still, it's not a trivial project, so one that we should deliberately choose)19:52
fungialso possibly a more useful effort than trying to bend something like squid to cache "authenticated" requests19:52
ianwso that would authenticate with a higher-limited key, and transparently pass through all our requests?19:53
clarkbthough possibly squid would be better for our http caching we do on those hosts in general19:53
clarkbsince in theory it can be more flexible than what apache is currently doing19:53
corvusianw: or even anonymously but just stay under the limit?19:53
clarkbya docker hub sends the required cache control headers to cache publicly those manifests19:53
clarkbthe issue is that apache will not cache any authenticated request even with those headers19:53
fungithere's no "anonymously" really through right?19:54
clarkbwe believe squid can be convinced to do so though19:54
corvusclarkb: do you think the squid approach will work with all the weird auth stuff?19:54
clarkbcorvus: I think so if we can make it respect the cache-control: public or whatever header it is that is sent back by docker hub19:54
corvusfungi: in the way i intended to use it, yes (an auth credential obtained with no identifying information)19:54
ianwwhat we have not traditionally done is limit our mirrors to only be connectable from their respective clouds; we might want to think about that if we're using a opendev specific key19:54
corvusfungi: (authz without authn i guess?)19:54
fungiit's been years since i've done esoteric things with squid (including trivially patching it to ignore some things which would cause it not to cache but that it lacked configuration for), so it would need a poc regardless19:55
clarkbI think the major issue with apache as we use it for this problem space is that it will never cache a request that had an authorization header even if cache control says it is ok to do so19:55
clarkbif apache could be convinced to do ^ it would probably be fine too. Since it is now the manifest data that we need to cache19:55
corvusbased on my estimation of effort, it sounds like spending a couple of days attempting to get squid to work should take precedence over a couple of weeks to implement a smart registry proxy19:56
corvus(or, you know, convince everyone to use quay.io :)19:56
fungiyeah, like i said, there have been times when i had to patch and recompile squid to get it to cache some stuff too, so i don't want to say it's necessarily better than apache mod_proxy, and i don't personally think being stuck maintaining our own patched build of either of those is particularly wise19:57
fungithe first thing it really needs is exploration19:58
clarkb++19:58
clarkbpart of the issue in the past is the info from docker was a bit vague19:58
clarkbbut now we've got a bit more real world data and we should be able to work with that to find a reasonable solution19:58
clarkbalright we are just about at time so I'll call it here19:59
clarkbthanks everyone19:59
fungithe other part of the issue was that it was an advance warning about stuff they weren't actually doing yet, yeah19:59
funginow it's observable and testable at least19:59
clarkbfeel free to continue any/all of these conversations in #opendev19:59
clarkb#endmeeting19:59
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"19:59
openstackMeeting ended Tue Dec  1 19:59:39 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:59
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-01-19.01.html19:59
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-01-19.01.txt19:59
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-01-19.01.log.html19:59
fungithanks clarkb!19:59
*** hasharAway is now known as hashar20:14
*** hashar has quit IRC22:16

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!