Tuesday, 2020-12-08

*** hamalq_ has quit IRC03:45
*** hamalq has joined #opendev-meeting06:44
*** sboyron has joined #opendev-meeting07:51
*** hashar has joined #opendev-meeting08:07
*** hamalq has quit IRC09:10
*** hamalq has joined #opendev-meeting09:14
*** hamalq_ has joined #opendev-meeting09:16
*** hamalq has quit IRC09:18
*** hamalq_ has quit IRC09:20
*** hamalq has joined #opendev-meeting10:00
*** hamalq has quit IRC10:04
*** hashar is now known as hasharLunch11:26
*** hamalq has joined #opendev-meeting12:01
*** hamalq has quit IRC12:05
*** hasharLunch is now known as hashar12:37
*** hamalq has joined #opendev-meeting14:02
*** hamalq has quit IRC14:07
*** hamalq has joined #opendev-meeting14:17
*** hamalq has quit IRC14:22
*** hashar has quit IRC15:31
*** hashar has joined #opendev-meeting16:00
*** hamalq has joined #opendev-meeting16:18
*** hamalq has quit IRC16:23
*** hamalq has joined #opendev-meeting16:56
*** hamalq has quit IRC17:01
*** hamalq has joined #opendev-meeting17:06
*** guillaumec has joined #opendev-meeting17:07
*** hashar is now known as hasharDinner18:02
fungiahoy!19:00
clarkbhello19:00
*** frickler has joined #opendev-meeting19:00
clarkbwe'll get started shortly (if you are looking for the opendev infra meeting you are in the right place)19:00
clarkb#startmeeting infra19:01
openstackMeeting started Tue Dec  8 19:01:14 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-December/000151.html Our Agenda19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
ianwo/19:02
clarkbI intend to be far away from keyboards next week. I'll also be doing school duties to give my wife a break from that so will be distracted either way. This means that we will need a meeting chair volunteer or we can cancel the next meeting19:02
clarkbthenfor the 22nd and 29th I figured we'd play it more by ear as others are also likely taking time?19:02
corvusclarkb: are you away all next week?19:03
fungiwith things getting quiet, having fewer meetings might also just be nice19:03
clarkbcorvus: ya sorry, trying to take the week off and get some rest/reset19:03
corvusdon't be sorry :)19:03
fungii heard he's taking a week-long trip to oregon19:03
corvus(just wanted to be clear if it was a day or a week)19:03
ianw++ sounds good :)19:03
corvusi'll be around through the 23rd, then not around19:04
clarkbif you will be around and want to chair either let me know or maybe just send out a meeting agenda email on monday19:04
clarkb#topic Actions from last meeting19:05
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:05
fungii'm in favor of just having fewer meetings and handling things as the come up19:05
clarkbfungi: that wfm19:05
clarkb#undo19:05
openstackRemoving item from minutes: #topic Actions from last meeting19:05
corvusyep19:05
clarkbin that case why don't we consider the meeting cancelled and we can schedule meetings as necessary instead with those who happen to be around19:05
fungiseconded19:05
clarkband apply similar logic to the 22nd and 29th19:05
clarkb#topic Actions from last meeting19:06
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:06
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-01-19.01.txt minutes from last meeting19:06
clarkbthere were no actiosn recorded so lets just dive in19:06
clarkb#topic Priority Efforts19:06
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:06
clarkb#topic OpenDev19:07
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:07
clarkbOn the gerrit side of things I listed out a few items for further tuning consideration19:07
clarkblast night (relative to me) ianw ended up resetarting gerrit as it became non responsive. I think that possibly the lack of memory headroom with java 11 may be related to that? I've pushed up a chnge to reduce allowed heap size to 44g from 48g19:07
clarkbI think java 11's non heap space is larger than java 8s19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/766020 reduce java heap size on review.o.o19:08
clarkbthat should give us more room for things like apache, git gc, backups, and so on19:08
fungithis all seems reasonable. i've +2'd but not approved the changes you recommended for the next restart19:08
clarkbeven if the memory wasn't at fault for the issue last night I think we're seeing sawpping and should avoid it if necessary19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/765867 Put more jgit configs in jgit.config19:09
clarkbThis change is the other one that I think we should consider for the next restart. Reading the gerrit documentation it is completely confusing as to whether our preexisting jgit tunables in gerrit.config apply anymore or if they need to be in jgit.config19:09
clarkbI tried poking around the source to figure it out but failed doing that as well. Instead I'm thinking lets put the config options in both files and see if we get a difference in behavior19:10
ianwyeah, the host was responsive, but gerrit was not.  my debugging was not extensive unfortunately19:10
clarkbThere are two children changes of ^ this change which are worth consideration too, but likely need more eyeballs and should go in on later restarts so we can tell what helps and what doesn't19:10
clarkbspecifically using packedGitUseStrongRefs and enabling git protocol 219:10
clarkbfor the strong refs the idea there reading stuff from matthias upstraem is that when garbage collection happens it has a tendency to flush out jgit caches which jgit then immediately refills and this thrashing can lead to a sad gerrit19:11
clarkbthe strong refs makes the garbage collector stop doing that. My concern with this change is that I'm not sure if the garbage collector can ever clean strong refs if it needed to?19:11
clarkb(strong refs are not eligible for garbage collections)19:12
clarkbI think if we land this particular change we should do so when we can monitor it over a long period of time just to keep an eye on memory use19:12
clarkbfor git protocol v2, the idea there is it is much more efficient for git client operations when dealing with repos that have a lot of refs (like our gerrit repos)19:12
clarkbthe client must also support it but current git clients default to v2 aiui so as systems update we woudl get more and more use out of that?19:13
clarkbanyway those first two chagnes should be much safer than the latter two. ANd if we can get the first two in and restart with them that would probably be good19:13
fungisounds great19:14
clarkbThe other tunable that I discovered is that gerrit allows you to split its thread resources into batch and interactive user sets. The idea here is that things like CI systems could have dedicated thread resources. I'm not sure if this would help us or not but I noticed it was somethign called out in tuning discussions19:14
fungii can do another gerrit restart later in my evening for the initial changes you mentioned19:15
clarkbif others have time to look into ^ that would probably be good (even if it is to say "no we don't want this as it will start regular users")19:15
clarkbfungi: thanks!19:15
clarkbthat was all I had for tunables. ianw want to update us on the ci results table progress?19:15
ianwfor a quick look at what i've got see https://104.130.172.52/c/openstack/diskimage-builder/+/55400219:16
ianwthere's a tab19:16
clarkbooh I like that19:17
corvus++ that looks great19:17
ianwthis is just all very very simple plain javascript @ https://github.com/ianw/gerrit-zuul-summary-status/blob/main/gr-zuul-summary-status/gr-zuul-summary-status-view.js#L8719:17
corvusare comment tags available to the js?19:18
corvus(so we could act on that instead of author name?)19:18
fungii guess we call it a "zuul summary" because it's based on parsing zuul's standard comment format, even though it may include results from other non-zuul ci systems reporting in a similar format?19:18
ianwcorvus: hrm, whatever is in https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#change-info i guess19:19
ianwfungi: yeah, i mean that's up for debate i guess19:19
ianwi think i can probably get it down to be simple enough to be a single file19:19
ianwi am starting to wonder about pushing it upstream, it might feel more at home even as a contrib/ in zuul19:21
* diablo_rojo sneaks in late19:21
fungibut it installs as a pg plugin?19:21
clarkbthe big upside to pushing it upstream is that we know there are other zuul users out there with gerrit and they may be more likely to find these things on the gerrit side (as it is a gerrit modification)?19:21
corvusianw: https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#change-message-info 'tag' field19:21
corvusyeah, and we might be able to take advantage of the gerrit plugin ecosystem19:22
clarkbin fact a zuul user I didn't recognize (sorry if I shoudl've) caught the stream events thing on gerrit 3.3.0 (which we will talk about in a bit) and sent amil about it to the repo discuss list19:22
corvus(like i think there's a pluginmanager plugin or something where you can click-to-install gerrit plugins)19:22
clarkbya there is19:22
clarkb(we don't have it enabled on our setup)_19:23
corvusso i'd be in favor of putting that in the upstream gerrit for that and community relations purposes :)19:23
ianwcorvus: i'll look into it.  basically the plugin gets called with a changeinfo object for the current change.  my debug method is "console.log" so that's how i inspect what's going on :)19:23
ianwyeah, i do have it building via bazel ATM19:24
ianwand there's testing frameworks for polymer19:24
corvusand a zuul to run those tests :)19:24
clarkbanything else to add on this?19:25
ianwnope, i'll just keep plugging away on it19:25
corvusianw: if you're okay with pushing that to gerrit's gerrit, i think the next step is to send an email to repo-discuss requesting the repo creation; i can help with that if you want19:25
ianwcorvus: thanks.  i will clean it up a bit more and get back to you19:26
corvusfwiw, i think it looks good enough to start iterating on things in parallel :)19:27
clarkbNext up is the built in WIP status for changes on newer gerrit. We had hacked in WIP support by adding a -1 approval category that change owners and cores could toggle, but now gerrit supports it directly (for change orwners at least). People have started asking about using the actual WIP status instead of the approval category19:27
clarkbbut I think just now we have discovered that zuul doesn't yet know about the built in wip status and should be updated before we recommend our users use the built in wip status19:28
clarkbcorvus: fungi zbr any other specifics to call out on that? sounds like work will start soon on addressing that in zuul19:28
corvuszbr volunteered to work on a change tomorrow19:29
fungii had nothing to add19:29
clarkbok, I figure once zuul is updated we'll do more testing then we can decide if we want to clean up the old approval hack or not (or at least offer that as an option to users)19:29
fungijust be aware it will cause top-of-queue gate resets for now if people accidentally approve wip state changes19:30
clarkboh ya beacuse submit will fail which zuul will think is a merge failure19:30
fungizuul will get as far as trying to submit, right19:30
clarkbthat is a good point particularly since we haev seen deep gate queues in some projects recently (there has been a lot of python trouble with pip lately)19:31
fungipython comics, issue #473: the trouble with pip19:31
clarkbLast up on the Gerrit OpenDev topic was calling out that Gerrit 3.3.0's event stream implementation breaks zuul's ability to take action on comment contents (think recheck comments)19:31
clarkbcorvus: ^ are any other zuul interactions with gerrit known to be affected ?19:32
clarkbcalling this out beacuse upstream is aware of the issue and is working on addressing it, but we should avoid upgrading to 3.3 until it is fixed19:33
fungican we insert after this subtopic the jeepyb lp bug/bp hook scripts? i wanted to know if anyone has made progress on those or if i should try to pick them up next myself tomorrow-ish19:33
clarkbsure I think that was all I had on it (basically upgrade to 3.3.0 has found a blocker)19:34
corvusclarkb: that's all i'm aware of19:34
clarkbfungi: I am not aware of anyone working on them yet19:34
corvuslatest on the stream-events thing is luca is going to rage code a bunch of tests :)19:34
fungicool, mostly just trying to prioritize the stuff we've been accumulating on the post-upgrade etherpad19:34
clarkbfungi: ianw had looked at them briefly pre upgrade iirc, but that was the last I heard19:34
clarkbfungi: ++ and thank you19:34
corvushe said something about "if it's not tested it's broken"19:34
fungiand bug/bp integration seems to be next on the painpoints after/alongside ci results table19:35
fungicorvus: i feel like i've heard that somewhere before19:35
ianwyeah, i hadn't really got that far with them, but now we have the actual REST API to play against i think we can iterate on it faster19:35
clarkbAnything else on the subject of gerrit and or opendev?19:36
ianwone quick thing on the system-config gate test for gerrit/review ... what does the review-dev node test over just the review node?19:36
ianwi'm wondering if we can prune that to just the one node?19:36
fungiit's been pointed out that we may be invalidating gerrit logins more quickly than (we think) we've configured, so i'll also test whether my restart later today invalidates my webui session19:37
clarkbianw: ya I think the idea before we realized that we really need something like a prod alike is that we might have -dev and prod in different stages of upgrades19:37
clarkbianw: since we're doing pre merge testing anyway I think we can probably have a single node that just does the thing we want prod to look like and use it that way19:37
clarkbianw: mordred may remember if there was any better reason than that though19:37
clarkbfungi: oh good idea19:38
mordredwhat did I do?19:38
clarkbmordred: basically in the system-config job for gerrit we have a review.o.o and review-dev.o.o fake tests nodes separated I think19:38
fungii feel like we should rip out review-dev at this point (and keep in mind when we're ready to also tear down review-test in favor of held job nodes)19:38
clarkbfungi: ++19:38
ianwfungi: re the logout, i was not logged out when i restarted it last night my time19:39
mordredyeah- I think it was just because they were a bit different19:39
clarkbat this point its an artifact of how we didn't have great testing for gerrit and now we can make that better with testing that looks like prod19:39
mordredso I think re-collapsing those at this point is ... yup19:39
fungiianw: thanks, that's also a useful data point19:39
ianwok, i will propose that.  i makes it a bit simpler doing a full gerrit initalisation and pushing changes in the job19:40
clarkb#topic Update Configuration Management19:40
*** openstack changes topic to "Update Configuration Management (Meeting topic: infra)"19:40
clarkbHas there been any movement on this topic in the last week (sorry gerrit has been overly consuming)19:41
fungithis might be the place to remind folks we're running into dockerhub rate limits on our containerized service test jobs19:42
fungino easy answers at this point though19:42
corvusis it enough we're ready to decide we want to do something about it?19:42
clarkbcorvus: probably not? It is just infrequent enough that I haven't rage fixed it :)19:42
fungiit hasn't been particularly crippling yet19:43
fungibut worth keeping an eye on in case it escalates quickly19:43
clarkbit may be worth setting up a job to publish to quay just to see if that works?19:43
corvusiirc, we're thinking if it is annoying enough, we should start by looking into squid, and if that fails, we could look at a smart proxy based on zuul-registry but that's high-effort.  that still a decent summary?19:43
clarkbsince that may be an easy out19:43
clarkbcorvus: yup I think as far as proper fixing goes that is a good summary19:44
clarkb(my quay comment is more that "maybe this is an easy half measure to consider alongside ^ and we'd still want to cache for quay anyway)19:44
fungiyeah, that's still the latest thinking as far as i'm aware19:44
clarkb#topic General Topics19:46
*** openstack changes topic to "General Topics (Meeting topic: infra)"19:46
clarkb#topic Bup and Borg Backups19:46
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:46
clarkbianw: this is still on the agenda mostly as a remidner that we should look out for your bup removal change and +2 that after we've verified borg backups?19:46
clarkbianw: is that change up yet?19:46
fungithanks, that reminds me to actually add restoration docs exercising to my to do list19:48
ianwno, it is not.  i'll get to it so we can hopefully sort it by year end19:48
clarkbthanks19:48
fungithere's no rush19:48
clarkb#topic OpenStackID hosting19:48
*** openstack changes topic to "OpenStackID hosting (Meeting topic: infra)"19:48
clarkbThis I failed to add to the agenda but the foundation sysadmins have started to think about what a more ideal hosting situation looks like (ignoring who is hosting it) which I think is a good first step in figuring out how we collaborate (if at all) in hosting it19:49
clarkbbasically taking another look at service needs and requirements and work out how to deploy it well19:50
clarkb(this hasn't been forgotten)19:50
clarkbThen for remaining topics I may have to declare bankruptcy on ptg followups or at least in the way I've done them before. Meetpad testing with users in china has not happened yet (I'd still like to coordinate that though), and puppet job splitting hasn't happened as far as I can tell19:51
clarkb#topic Open Discussion19:51
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:51
fungithinking back, the reason we insisted on hosting it in opendev previously was that we were tying the gerrit contact store api to it, so new contributors at the time couldn't agree to the (then mandatory for basically all projects) osf icla if openstackid was down, but also we were looking at depending on it for authenticating users to various services19:51
clarkb#undo19:51
openstackRemoving item from minutes: #topic Open Discussion19:51
fungigerrit since remove the contact store feature entirely so that's no longer a concern19:52
clarkbthat is a good point, the requierments/needs on our end have shifted too19:53
fungiand the only services we set up authenticating against it were translate (openstack-only abandonware which needs to be replaced soonish), refstack (also openstack-only, tied to foundation trademark programs), and survey (beta which never really gained traction)19:53
clarkbalright I'll open it up now as we only have a few minutes left19:54
clarkb#topic Open Discussion19:54
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:54
clarkbAnything else to call out really quickly?19:54
diablo_rojoNothing from me.19:55
fungii've promised to spend less time on the computer this month. i expect to still be around some of the time but will also be taking more time away as i can to work on some projects around the house. also probably for the last week-ish of the month i may not be around much at all19:56
clarkbya I'll be trying to take it easy around the holidays though in and out19:57
fungifor me that probably translates to fixing emergency fires but maybe not much progress on longer term efforts19:57
fungis/fixing/fueling/ ? ;)19:59
clarkbheh19:59
clarkbanyway we are about at time now. Thanks everyone!19:59
clarkb#endmeeting19:59
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"19:59
openstackMeeting ended Tue Dec  8 19:59:38 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:59
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-08-19.01.html19:59
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-08-19.01.txt19:59
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-08-19.01.log.html19:59
fungithanks clarkb!19:59
*** hasharDinner has quit IRC21:34
*** sboyron has quit IRC22:20

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!