Monday, 2021-03-08

openstackgerritxinliang proposed openstack/diskimage-builder master: Add aarch64 support for rhel
openstackgerritSteve Baker proposed openstack/diskimage-builder master: Don't run grub2-install for efi block devices
openstackgerritdaniel.pawlik proposed openstack/diskimage-builder master: Changed get-pip url
openstackgerritdaniel.pawlik proposed openstack/diskimage-builder master: Changed get-pip url
openstackgerritdaniel.pawlik proposed openstack/diskimage-builder master: Change get-pip url
dpawlikfungi, ianw, clarkb: hey, pls take a look on . The pypa community has changed the url...08:30
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: Fixes all tasks should be named rule
openstackgerritLee Yarwood proposed openstack/project-config master: Revert "Add custom cirros image with ahci module enabled to cache"
openstackgerritLee Yarwood proposed openstack/project-config master: Add Cirros 0.5.2 to cache
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: Fixes all tasks should be named rule
*** ysandeep is now known as ysandeep|afk12:00
*** ysandeep|afk is now known as ysandeep12:30
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: Fixes all tasks should be named rule
fungidpawlik: thanks, i emergency approved that13:32
fungicodesearch says the now broken url is also used in master branches of  openstack/openstack-helm-images, opendev/system-config, openstack/mistral, openstack/openstack-ansible-ops, opendev/puppet-pip13:35
fungii'll work on fixes for opendev/system-config and opendev/puppet-pip in a moment13:35
dpawlikthanks fungi13:48
fungidpawlik: i have a feeling there are also lots of stable branches which will need updates for that (at least devstack)13:53
dpawlikfungi: they can even create an redirection in such update, but not only set message with "We ware sorry"13:56
dpawlikfungi: checking...13:57
fungidpawlik: yeah, when i first saw the pr, i thought maybe they were creating a wrapper around to emit a warning on stderr or something before invoking the original script13:59
fungibut yeah, a redirect seems like it would have solved the problem too13:59
fungireplacing a script at a well-known url with a different script which just spits out an error, while certainly visible, is not so swell for automated systems people may not be watching nonstop14:01
dpawlikfungi: exactly14:02
openstackgerritdaniel.pawlik proposed opendev/system-config master: Change get-pip url
openstackgerritMerged zuul/zuul-jobs master: Fixes all tasks should be named rule
dpawlikfungi: ups, I also proposed a patch for system-config with just simple find + sed. Feel free to abandon if its bad14:10
fungioh, thanks! you beat me to it14:18
openstackgerritJeremy Stanley proposed opendev/puppet-pip master: Change get-pip url
fungidpawlik: and that's ^ the equivalent for our puppet-pip module14:28
fungilooks like we've got arm64 node problems again14:29
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: Bits to keep ansible-lint happy
zbrDo we have any progress on gerrit ability to auto complete reviewers? It becomes impossible to pick someone.14:50
zbr26s to get an autocomplete for "Kevin", often result are returning in random order (not sure if this impacts the display of results)14:53
openstackgerritMerged openstack/diskimage-builder master: Change get-pip url
fungizbr: what do you mean progress?15:26
fungii expect that 26s depends on when you try and how much load the server is under15:27
fungisounds like you may be describing a symptom of high load rather than an actual inefficiency with user typeahead autocompletion15:28
fungihave you tried to time it when the system is mostly idle?15:28
fungior on a separate idle server like review-test?15:28
zbryeah, since the v3 upgrade, it was very slow most of the time during normal working hours (the fact that is fact when nobody is using is not of much practical use)15:29
zbrduring the weekend it is easy to see how performance goes to normal values15:30
zbrmy impression is that there was not a sudden explosion of number of patches being created/updated since the upgrade, and do not remember observing this kind of delays with gerrit v2 ever. I hope clarkb does not take this as a complaints about the upgrade ;)15:32
fungizbr: sure, but my point was that gerrit itself has been very slow/loaded in general since the upgrade, i'm not sure whu user typeahead is necessarily a separate problem15:32
fungiso are you asking about problems with typeahead, or are you asking about general gerrit slowness?15:32
zbrthe typeahead feature is a very special one as user cannot really freeze while trying to select a reviewer.15:33
fungii haven't seen gerrit freeze while i'm typing usernames15:33
zbri have no problem waiting 10-30s more seconds when uploading a change using git-review, but when adding reviewers inside the browser my expectations are very different.15:33
zbrnothing happens, if you switch window to do something else while the server is processing a text search, when you go back the popup disapears.15:34
zbrthe current implementation forces you to have to wait inside the browser for a magic answer.15:35
fungisometimes it doesn't provide suggestions right away, but it doesn't stop me from continuing to type into the field. maybe i'm misunderstanding what you're saying15:35
fungijudging from if you tried in the past hour, yes i expect gerrit's api would have been quite slow to respond15:36
zbr"right away" is for me <5s, not 36s. I do not happen to know the exact usernames of all reviewers i need to pick.15:36
zbrShould I build a local copy paste list of usernames and paste them?15:37
fungii think if you want faster response on typeahead while under heavy load, you'll likely need to work upstream to implement query optimizations. we're probably going to continue trying to find ways to reduce the load on our gerrit deployment so that everything is generally faster rather than focusing on typeahead autocompletion responsiveness15:38
zbri wonder if we can do something particularly for these lookups.15:38
fungiyep, the source code of gerrit is freely available, and its maintainers are reasonably resoinsive folks15:38
fungier, responsive15:39
fungiand if you can work with them on improvements to typeahead completion then i expect we can backport whatever improvement they approve15:39
*** ysandeep is now known as ysandeep|dinner15:40
fungii expect with the variety different identity types those fields match on, even the cache lookups for them are very expensive15:41
fungibut maybe something like a separate trigram cache could help15:42
clarkbzbr: my frustration is that some of us (like me) are activiely trying to continue to profile and improve these problems. We know the performance isn't great, but I don't just go upstream and yell at them about it15:45
clarkbzbr: if you'd like to helpw ith that that would be excellent but the semi periodic complaints really don't help imo15:45
clarkbthe most recent things I have been working on include dstat system metric recording for gerrit (and other services) in CI as well as a gerrit specific gatling git job which does load testing15:46
clarkbthe gatling-git stuff could definitely use some work as its very naive right now and doesn't even succeed reliably15:46
clarkbnone of this work requires special access or privileges as it relies on CI to generate the data15:47
clarkbspecifically, I'm hoping we can measure some of this stuff between gerrit 3.2 and 3.3 to see if 3.3 improves things before we upgrade. Preliminary data indicates that gerrit 3.3 may consume a bit more memory but perform slightly better for rtt on pushing changes and the like15:49
zbrbased on I am afraid it could be much worse.15:49
clarkbit is also worth noting that interacting with upstream's gerrit is also often slow15:50
clarkbI expect there is likely more we can do to improve things, but that we should expect some sort of floor considering upstream seems to struggle similarly15:51
zbrdid we reach the upper hw limits for the machine on which we can host gerrit? can't we throw more ram/cores/ssd?15:52
clarkbthe disk is already an ssd (and doesn't seem to be the bottleneck, we see higher disk activity when doing other system acitivities like backups and git gc), we are running on a 60GB of memory instance and using far too much of that and it has 16 cpu cores15:53
clarkbit is theoretically possible to use a bigger instance, but we're pretty much maxing out what is there15:53
fungifor the most part. it's a 60gb ram/16 vcpu flavor, and we put the data on ssd-backed cinder, yeah15:53
clarkbno, we don't have access to any baremetal15:54
fungithere also aren't great options for scaling gerrit "horizontally" (you can do some offload via replication, but we more or less already do that anyway by using a separate farm of git servers)15:55
clarkbit appears that the zuul graceful shutdown resulted in zuul-executors restarting for some reason15:55
clarkband they cannot connect to zk so are idling15:55
clarkb(on the old servers I mean)15:55
clarkbI'll down zuul on them once morning meetings are done15:56
fungiwell, at least they're not doing anything15:56
fungibut yeah i wonder why they started back up15:56
zbri don't know if that helps, but I can share my scaling experience I had with corporate Citrix Jira instance (JVM) too, which was slow too (same patterns slow response times).15:57
clarkbzbr: as mentioned, feel free to poke at it in CI and leverage things like dstat and gatling git to help measure the results15:57
zbrit was ~6-7 years ago and I got two Dell servers 48cores, 96GB RAM. It run well on them but when I decided to ditch the hypervisor and run baremetal, I got a 3.5x increase in performance (measures with a full database reindex, something that used to take hours). Because I had two identical machines, I was able to compare differences.15:59
openstackgerritMartin Kopec proposed opendev/system-config master: refstack: trigger image upload
*** ysandeep|dinner is now known as ysandeep16:13
openstackgerritMerged openstack/project-config master: Revert "Add custom cirros image with ahci module enabled to cache"
clarkbzuul-executor has been stopped on ze05-08.openstack.org16:32
clarkbI'm going to eat some breakfast then will delete those servers and start spinning up 09-1216:32
clarkb#status log Deleted as new servers have taken over17:01
openstackstatusclarkb: finished logging17:01
fungilooks like neutron and nova added py38 and py39 unit test jobs to the check-arm64 pipeline at some point. interesting, though that's creating quite the backlog in the lenovo cloud17:15
clarkblinaro you eman?17:15
fungilinaro, yes sorry17:15
fungiseveral of the same letters17:16
openstackgerritClark Boylan proposed opendev/ master: Adding to DNS
openstackgerritClark Boylan proposed opendev/system-config master: Replace with
clarkbinfra-root if we can get both of those approved I'll happily monitor the results today and get servers swapped out17:24
corvusclarkb: ++ and nice tach on removing ze12 from testing17:26
clarkbI'm hoping I'll be able to observe the results of the graceful stop in real time this time around to see if I can tell why the service was restarted17:27
clarkbfungi: did you want to review those too ^ or should I go ahead and approve them?17:35
*** jpena|brb is now known as jpena17:37
clarkbI've gone ahead and approved them. I'll watch them17:43
fungiyep, sorry, got sidetracked dealing with other tasks, but i'm good with it17:45
openstackgerritMerged opendev/ master: Adding to DNS
openstackgerritClark Boylan proposed opendev/system-config master: Replace with
openstackgerritClark Boylan proposed opendev/system-config master: Fix url for python3.5 compat
clarkbfungi: corvus  ^ that probably deserves a bit more careful review now. I'm also going to quickly check if we use the old url anywhere18:08
fungiclarkb: the same fix has already been approved18:09
fungiand does the same for our puppet-pip module18:09
fungi is the system-config one18:10
*** toomer has quit IRC18:10
openstackgerritClark Boylan proposed opendev/puppet-pip master: Update the url for python3.5
fungiclarkb: see above18:11
clarkboh heh I should look first I guess18:11
clarkbI'll abandon and rebase accordingly18:12
fungisystem-config fix is ~40 minutes eta on merging18:12
fungii single-core approved it a little while ago18:13
fungiapproval of my puppet-pip version would be appreciated though18:13
fungiand i see you just did--thanks!18:13
openstackgerritClark Boylan proposed opendev/system-config master: Replace with
clarkbfungi: ^ reabsed on top of the existing chagne already in the gate. I have abandoned the changes i created.18:14
fungiclarkb: also, before you propose any others, the similar fix for dib already merged earlier today18:14
fungiit was 77917318:15
clarkbfungi: codeserach only showed system-config and puppet-pip remaining, so I wasn't palnning on doing more. THough I suppose some stable branches may have problems?18:15
fungiyeah, there's fixes going in for devstack already too18:15
fungidpawlik was on the ball18:15
fungimany thanks to him18:15
clarkbthank you dpawlik !18:16
fungivarious pins for old python versions. thankfully most of the urls were for unversioned get-pip (and a few for easy_install)18:19
clarkbfungi: were these old urls deprecated for more than a week? its hard to tell based on that commit18:25
clarkb(I can see why they appeared to do it, they wanted to namespace the script paths which makes sense18:25
fungiat first i thought they were saying they'd add a wrapper at the old urls to emit a warning and call the original script, but i didn't actually find evidence of it18:27
fungifirst we noticed was when they switched it to a script which just prints an error and exits 118:28
openstackgerritMerged opendev/system-config master: Change get-pip url
openstackgerritMerged opendev/puppet-pip master: Change get-pip url
fungiclarkb: ^ those are in now18:45
clarkbyup, and my chagne is based on it so it should +1 now18:45
clarkband now my change is in the gate so that was all that was holding it up19:02
clarkbthanks again!19:02
* fungi is clearly having problems with his input history today19:12
*** gmann_afk is now known as gmann19:32
openstackgerritMerged opendev/system-config master: Replace with
fungii guess that means the pipes are lubricated and thinks are moving again19:39
fungier, things19:39
clarkbindeed, now we wait a bit for zuul to catch up with managing itself19:41
*** whoami-rajat_ is now known as whoami-rajat19:44
openstackgerritEmma Foley proposed openstack/project-config master: Add infrawatch/{collectd,qdr}-config to available repos
openstackgerritMerged openstack/project-config master: Add infrawatch/{collectd,qdr}-config to available repos
*** klonn has quit IRC21:12
ianwforgot to mention yesterday was a public holiday here.  back and catching up now21:57
fungiianw: no worries, hope it was refreshing!21:58
fungithere was not too much excitement21:58
fungipypa decided to reorganize a bit but there weren't too many fixes needed21:58
fungiclarkb is almost through replacing executors21:59
ianwexcellent.  i hope to knock off our kerberos servers in ansible today, there's hopefully not much to it22:00
fungijust waiting for 779309 to get through infra-prod-service-zuul in the deploy pipeline, it's up next22:01
fungithen we should be able to bring up the new executor services and gracefully stop their old counterparts22:02
funginow the job is running22:02
fungii need to do a few chores but will try to keep an eye on it and take a break to start/graceful-stop services once it's passed22:03
mordredianw: for some reason my eyes read your line about public holiday as having been something fungi said. so then I thought "yesterday was a public holiday? shoot - I missed it. wait ... yesterday was sunday ... I'm confused! OH - that was ianw" :)22:18
fungimordred: there's nothing stopping you from observing australian holidays22:21
fungior maybe there is, but you shouldn't... let that stop you22:22
fungiinfra-prod-service-zuul: success22:24
fungii'll check the new servers to make sure they're configured now22:24
fungiyep, looks like it deployed22:27
ianwmordred: it was the Queen's birthday public holiday.  It was not actually the Queen's birthday.  It is not celebrated in all states.  but if you want to raise a gin and tonic to Her Majesty's theoretical birthday, feel free :)22:27
* fungi hums a few bars of the sex pistols' "god save the queen"22:30
fungiokay, executor containers have been upped on the new ze09-1222:30
fungii've now issued a `zuul-executor graceful` in the containers on the old ze09-1222:34
mordredianw: I feel like raising a gin and tonic to HM's theoretical birthday is a great idea22:41
fungiit's true that boston only dumped tea in the harbor, they held onto the gin22:44
openstackgerritMerged openstack/project-config master: Add Cirros 0.5.2 to cache
clarkbfungi: thanks for doing that, I'm back now and will continue to monitor the replacement process.23:19
clarkbon my bike ride I started thinking about the launcher replacements, can I basically set max-servers: to 0 on's config and max-servers: real value on
clarkbI think that may work23:20
clarkbcorvus: ^ if you're still around does that raise any alarms for you? I think I may also need to switch over the min servers value in the process23:23
corvusclarkb: if they have the same provider name, they may think they have leaked each others nodes; better to stop/start processes23:25
clarkbin that case I think the process looks more like removing the old server from inventory when we add the new one but also stop the launcher on the old one before we start the launcher on the new one23:28
clarkbze10 is waiting on 2 jobs to complete before it gracefully stops23:32
clarkbI wonder if we are exiting non zero and thati s why the restart: on-failure is tripping23:32
clarkbanyone know if docker records the exit status of historical containers somewhere23:33
ianwclarkb: docker ps -a might ...23:42
clarkbI'll try that once at least one of them flips over23:43
openstackgerritMerged openstack/diskimage-builder master: Fix installation of proliant tools
openstackgerritMerged openstack/diskimage-builder master: Fix hooks order for CentOS/Fedora when mirror used
fungiclarkb: actually what you suggested will probably work... you're changing domain names and we fixed nodepool some time back to include use the fqdn instead of the short hostname for its identifier23:51
fungicorvus: ^ yeah?23:52
clarkbfungi: ya, but they all use the same provider name which I think means they will fight over ownership?23:52
fungioh, provider name not launcher id. okay got it23:52
clarkbI think both come into play23:52
clarkbbasically if launcher foo is operating on provider bar if it sees nodes from launcher baz in the same provider in zk it will see them as leaked?23:53
clarkbsomething like that23:53
openstackgerritMerged openstack/diskimage-builder master: Remove fedora-31 testing

