Saturday, 2020-04-11

clarkbfungi: ^ fyi00:01
*** mlavalle has quit IRC00:01
openstackgerritClark Boylan proposed opendev/system-config master: Revert "Set env vars pointing to correct file locations"  https://review.opendev.org/71912400:07
clarkbI'm not 100% sure we need to do that yet, but I haven't found anything that would indicate the new paths work with non container gerrit00:08
clarkbmordred: fungi ^ I'll leave that up and if someone else thinks it is also necessary we can land it00:11
fungiexporting unused environment variables shouldn't break anything00:17
fungithe jeepyb side of that hasn't landed yet anyway, right?00:17
clarkbonly the git dir is new00:18
clarkbI belive the other 4 vars areexisting00:18
clarkband will point to invalid paths on non container gerrit00:18
fungiis the current running non-container gerrit deployed by that playbook also, or still by puppet?00:23
clarkbpuppetis gone aiui00:24
clarkbthat playbook affects the current running gerrit I think as the host isnt in the emergency file00:24
clarkbbut its friday and I could be wrong00:25
fungino, i think you're right01:03
fungimordred: ^01:03
fungii'll go ahead and approve the revert for now01:04
*** Eighth_Doctor has joined #opendev05:35
Eighth_Doctor👋05:35
*** moppy has quit IRC07:16
*** moppy has joined #opendev07:31
*** DSpider has joined #opendev07:41
*** tosky has joined #opendev08:29
zbrare there any active/in-progress plans to upgrade opendev gerrit?09:13
*** sgw has quit IRC11:52
*** ChanServ has quit IRC12:55
*** ChanServ has joined #opendev13:03
*** tepper.freenode.net sets mode: +o ChanServ13:03
*** ChanServ has quit IRC13:08
*** ChanServ has joined #opendev13:10
*** tepper.freenode.net sets mode: +o ChanServ13:10
mordredzbr: yup13:14
fungiall of the recent gerrit maintenances have been mainly in service of getting us to a point where we can easily upgrade13:18
mordredclarkb: good catch13:19
mordredso we should land the re-revert and do another restart aroud the same time13:20
mordredfungi, clarkb: the revert didn't land - might be the same amount of waiting/time to just land https://review.opendev.org/#/c/719052/ and do a quick restart13:26
mordredzbr: yeah - what fungi said. once we're done with this current maint (finishing switching deployment from puppet to ansible/docker) we'll be working on the upgrade plan13:28
fungimordred: ahh, i'll take a quick look13:30
fungioh, yeah, i'm already +2 on that one13:31
fungiif we want to give it another quick try i'm up for that13:31
fungii've gone ahead and approved it13:32
mordredfungi: cool. I think on a holiday saturday morning it should be fairly low impact - and shouldn't be _worse- than waiting on the revert to land13:37
openstackgerritMonty Taylor proposed opendev/system-config master: Install ep_headings module  https://review.opendev.org/71912313:39
mordredfungi: there's the ansible for the hack from yesterday btw ^^13:40
openstackgerritMonty Taylor proposed opendev/system-config master: Run cloud_launcher from zuul  https://review.opendev.org/71879814:00
openstackgerritMonty Taylor proposed opendev/system-config master: Stop removing cloud-launcher cron  https://review.opendev.org/71879914:03
corvusmordred: +2 on the npm thing, but honestly, i think building the image is going to be the better long-term solution -- mostly because if we're running npm on the host, suddenly we care about what version of node/npm is on the host, which is the main thing we want to avoid with all the container stuff.14:16
openstackgerritMerged opendev/jeepyb master: Fix issues from rolling out containers  https://review.opendev.org/71905214:25
mordredcorvus: yeah - I thnik you're probably right14:25
mordredI'll work on some patches to do that a little later14:25
corvusmordred: you had some earlier right?14:26
corvusdid we merge those and revert, or did we just revise them before merging?14:26
mordredcorvus: we revised before merging14:27
mordredbut I can go cherry-pick the changes out14:27
mordredcorvus, fungi : the jeepyb promote job for 2.13 has succeeded, so we should have new gerrit images, and I think the new scripts are already in place on gerrit - should we try another restart?14:28
mordredI confirm - the new version of the scripts have been applied14:29
corvusmordred: wow so fast :)14:30
mordred:)14:30
corvusmordred: sure, let me blink the sleep out of my eyes and let's go for it14:30
mordredkk. I'm in the root screen on review14:30
corvusi have joined14:31
mordredstatus notice Restarting gerrit to fix an issue from yesterday's mataintenance14:31
mordredyeah?14:31
mordredwow. except that's horrible spelling14:31
mordredstatus notice Restarting gerrit to fix an issue from yesterday's maintenance14:31
corvuslgtm14:31
mordred#status notice Restarting gerrit to fix an issue from yesterday's maintenance14:31
openstackstatusmordred: sending notice14:31
-openstackstatus- NOTICE: Restarting gerrit to fix an issue from yesterday's maintenance14:32
mordredwow, openstackstatus is taking its time14:34
openstackstatusmordred: finished sending notice14:35
mordredcorvus: ok. shall we?14:36
corvusmordred: ++14:36
corvusthere's like a constant stream of hangups from stackalytics-bot-2 in the error log...14:36
mordredcorvus: "neat"14:36
fungiokay, back14:37
mordredI suppose I could have pulled before stopping :)14:37
mordredfungi: we're in root screen on gerrit14:37
corvuslive and learn14:37
corvusmordred: the screen has stopped updating for me14:38
corvusit's on extracting 208407758d73:14:38
fungiyep, joining14:38
corvusmordred: but it looks like gerrit is running14:38
corvuswhat's going on?14:38
mordredcorvus: weird. yeah- it seems fine?14:38
corvusmordred: did it finish and did you restart it?14:39
mordredyes14:39
fungii saw control return to a shell prompt14:39
mordredI'm now tailing logs14:39
mordred[2020-04-11 14:38:54,813] [main] INFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 2.13.12-11-g1707fec ready14:39
corvusmy screen caught up14:39
mordredlet me push upa patch to trigger some scripts14:40
corvus[2020-04-11 14:40:25,990] [HookQueue-1] INFO  com.googlesource.gerrit.plugins.hooks.HookTask : hook[patchset-created] output: FileNotFoundError: [Errno 2] No such file or directory: '/home/gerrit2/review_site/etc/gerrit.config'14:40
mordredoh - somebody did14:40
mordredreally?14:41
mordredwhy is patchset-created not updated/14:41
mordred?14:41
mordredI'm going to manually fix that real quick to make sure it fixes the issue14:42
mordredit's bind-mounted in so it should fix wihtout restart14:42
openstackgerritMonty Taylor proposed opendev/system-config master: WIP Update install-ansible away from /opt/system-config  https://review.opendev.org/71918614:43
mordreddid that patchset created trigger the error?14:43
corvus[2020-04-11 14:43:17,782] [HookQueue-1] INFO  com.googlesource.gerrit.plugins.hooks.HookTask : hook[patchset-created] output: TypeError: cannot use a string pattern on a bytes-like object14:43
corvushttp://paste.openstack.org/show/791957/14:43
openstackgerritMonty Taylor proposed opendev/system-config master: Actually install patchset-created hook  https://review.opendev.org/71918714:44
fungianother missed python3 fix i suppose14:44
mordredcorvus: STELLAR14:44
mordredwell - there's the hook fix14:44
fungid'oh14:44
mordredah - it's because subprocess.Popen14:46
fungii guess we need to .decode('utf-8') the fd from it?14:47
openstackgerritMonty Taylor proposed opendev/jeepyb master: Decode utf-8 from subprocess.Popen  https://review.opendev.org/71918814:49
mordredcorvus, fungi: ^^14:49
mordredI could exec into the container and apply that same fix live to double check it (and keep things going until that patch lands)14:50
corvusmordred: sgtm to keep the loop going14:51
* mordred is a little worried that the slow version of whack-a-mole here might take an age14:51
mordredyeah14:51
fungiwell, it's not overly-broken, code review is working some hooks aren't successfully running, and another restart or several for new images ought to address it, right?14:53
mordredk. done14:53
corvusfungi: yeah, but we might be able to get that down to just one restart with all the fixes at once14:53
mordredyeah - but I _think_ we're close enough that we might be able to get by with only one more restat14:53
mordredyeah14:53
mordredand then be actually done with this mess14:53
mordredand remind ourselves to never write a completely untested program like jeepyb ever again14:54
fungiwell, yes, hopefully only one restart. granted each time we've restarted so far we thought we had all the fixes in ;)14:54
* mordred looks forward to reworking these hooks as zuul jobs14:54
mordredfungi: indeed :)14:55
corvusi pushed a patchset14:56
corvusi watched the gerrit queue, there are no more patchet-created hook entries14:57
corvusso i think that means success14:57
mordred\o/14:58
mordredI've got an update on the jeepyb patch - pep8 gods are unhappy14:58
openstackgerritMonty Taylor proposed opendev/jeepyb master: Decode utf-8 from subprocess.Popen  https://review.opendev.org/71918814:58
mordredcorvus, fungi : ^^14:58
corvusi will prepare breakfast while those land15:02
mordredcorvus: one of the promote jobs failed on the previous jeepyb patch (not important, it was 2.15)15:03
mordredcorvus: https://zuul.opendev.org/t/openstack/build/800a7224cf0143158e86ede8a9a35bdd/log/job-output.txt#8915:03
mordredcorvus: we might want to put in some retries15:03
mordredcorvus: although ,.. that's a little weird ... why does it say tag=change_719052_2.13 - that's the 2.15 job15:04
mordredall the vars seem to match in the jobs fwiw15:05
openstackgerritMonty Taylor proposed opendev/system-config master: Update install-ansible away from /opt/system-config  https://review.opendev.org/71918615:16
openstackgerritDavlet Panech proposed openstack/project-config master: Add kernel to StarlingX  https://review.opendev.org/71877215:16
mordredcorvus: ^^ step one in "run ansible from zuul checkout" - I believe that's an ok and self-standing change15:16
corvusmordred: that was a 'list tags' call15:21
corvusmordred: it looks kind of like a dockerhub internal consistency error15:22
corvusmordred: that might explain why an unrelated tag was mentioned15:22
mordredcorvus: ah - nod15:25
Eighth_Doctorhey folks!15:27
Eighth_Doctornice to see that this channel isn't dead :D15:27
corvusmordred: left an idea on that change15:29
openstackgerritMonty Taylor proposed opendev/system-config master: Update install-ansible away from /opt/system-config  https://review.opendev.org/71918615:33
openstackgerritMonty Taylor proposed opendev/system-config master: Run playbooks out of zuul checkout  https://review.opendev.org/71919015:33
mordredcorvus: cool - there's the followup to finish it15:33
mordredcorvus: yes - I think that's a good idea15:33
mordredis it possible to pass explicit vars to template: ?15:34
corvusmordred: i think you can do that for any task?15:34
mordredcorvus: so you can just add a vars: block to it?15:35
fungiEighth_Doctor: why would it be dead? ;)15:35
Eighth_Doctorwell, when I joined last night, I was the only person here :)15:36
Eighth_Doctorand I've been in other openstack channels that looked empty before...15:36
fungiahh, i think a lot of us were drained by a long friday after a long week15:36
Eighth_Doctorwas something particularly bad happening?15:43
funginah, just getting lots done!15:43
fungialso weekends tend to be quieter15:48
Eighth_Doctorso there was something I was curious about15:52
mordredyeah - we had some maintenances to do and wanted to take advantage of the slow holiday friday as a good time to do that15:52
mordredthem15:53
Eighth_Doctorwhy did opendev select gitea over other options?15:53
mordredit was visually nice  - and it allowed us to completely disable the features we don't use (like pull requests)15:53
mordredwe used cgit before - but our users weren't super thrilled with it as a code browser and so more consistently fell back to mirrors on github15:54
Eighth_Doctorwas pagure ever considered?15:54
corvusEighth_Doctor: there's some documentation about that decision: https://docs.opendev.org/opendev/infra-specs/latest/specs/opendev-gerrit.html15:54
corvusalso https://review.opendev.org/#/c/623033/15:56
mordredEighth_Doctor: I believe I did look at it - iirc one of the issues was inability to full disable things like the pull request interface. I feel like there was another reason as well but I sadly don't remember what it was15:56
Eighth_Doctormordred: so we added the ability to disable damn near everything instance-wide last year15:57
corvuswas it related to search?15:57
corvusgitea does have code searching (though we aren't able to use it yet, we still plan to enable it)15:57
fungiEighth_Doctor: however, we were making this decision in 21815:57
fungi201815:57
Eighth_Doctorah15:57
Eighth_Doctorpagure 5.0 was released at the end of 2018, and our zuul integration was completed in mid 201915:58
Eighth_Doctorso that explains it... poor timing15:58
mordredyeah - might have just been timing15:58
mordredyeah15:58
fungiwell, we also didn't need zuul integration for our use case15:58
fungiwe definitely didn't want to replace our choice of code review system15:59
fungijust needed a source browser15:59
Eighth_Doctorfungi: having zuul status report back into commits is nice though :)15:59
fungiwhy?15:59
fungii mean, if you're proposing changes to that system then yes, but we're not15:59
Eighth_Doctorbecause it makes it very easy for people to reference back and forth between tested commits and such15:59
Eighth_Doctorit's something I personally find handy, even if you're not using PRs15:59
fungiwe're just using it as a read-only code browsing frontend, not to do change review16:00
fungiwe do pretty much all our testing pre-merge16:00
Eighth_Doctorright, that's a zuul feature16:00
fungiso still not clear what we'd be reporting from zuul into the code browsing system16:01
Eighth_Doctorwell, whatever the merged commit was, it would have a status link back to zuul that people can click to see the test results16:01
fungizuul in our case is being triggered by activity in the code review system16:01
Eighth_Doctorpresumably also would have a link to gerrit, so you can see the reviews16:01
fungiso zuul doesn't/wouldn't know about the code browser16:01
corvusthat's a good point, i wonder if we can link change-id footers in gitea back to gerrit16:02
Eighth_Doctorit'd also be trivial to customize the template so that instead of showing a PR tab or issues tab, it'd give a link to gerrit for the project16:02
Eighth_Doctoror storyboard for issues16:02
fungiand yeah, we're working on getting the gerrit links displaying. they're in git notes, we just need to turn on displaying git notes in gitea now that it (i think?) added capability to display arbitrary notes refs16:02
Eighth_DoctorFedora's pagure instance does this to replace issues with a link to rhbz16:02
corvuswe did do that for gitea -- the "Proposed changes" tab links to gerrit16:02
Eighth_Doctorhttps://src.fedoraproject.org/rpms/pagure16:02
fungiwe do have gitea configured to link to gerrit and storyboard or launchpad already16:02
Eighth_Doctoroh nice, I guess I missed that piece16:03
Eighth_DoctorI mainly looked at the zuul projects, since that's my main opendev interest atm :)16:03
Eighth_Doctorbut yeah, I see you already did that16:03
mordredanother nice thing about pagure - it's in python :)16:03
Eighth_Doctoryeah :D16:03
Eighth_Doctoralso, another thing about pagure, docs are stored as a git repo :)16:04
fungiyep, at https://opendev.org/zuul/zuul issues links off to https://storyboard.openstack.org/#!/project/zuul/zuul and proposed changes links to https://review.opendev.org/#/q/status:open+project:zuul/zuul16:04
Eighth_Doctor(technically, same goes for issues and PR metadata, but you don't care about those)16:04
fungiwhat docs are you talking about?16:04
Eighth_Doctorproject documentation (e.g. gh-pages, readthedocs, etc. stuff)16:05
fungiahh, well we already develop our documentation in git repos through code review anyway16:05
Eighth_Doctorah okay16:05
fungiand use zuul to render/publish them16:05
openstackgerritMerged opendev/system-config master: Actually install patchset-created hook  https://review.opendev.org/71918716:05
Eighth_Doctorwell, then I guess the only thing left I have is pagure scales?16:05
Eighth_Doctorit handles ~30K repos with ~10K concurrent users accessing performantly from one server (src.fedoraproject.org)16:06
Eighth_Doctorand has means for scaling beyond that16:06
fungiwe're running 8 gitea servers behind a load balancer right now, but better clustering (especially for the code search functionality) would be nice, yes16:06
Eighth_Doctorholy crap, 8?!16:07
Eighth_DoctorI knew gitea wasn't great for scaling, but that's awful16:07
fungipartly so that we can handle bursts of cloning activity better16:07
Eighth_Doctorsure, makes sense16:07
fungithey're usually under-utilized16:07
mordredsince you say that - I'm curious if pague would be better at browsing teh nova repo16:07
fungialso gives us the ability to take some of them offline without impacting performance16:08
fungifor upgrades et cetera16:08
Eighth_Doctoris openstack/nova usually the problem child?16:08
mordredwell - it's the best example of a problem child16:08
fungiyeah, that repo is large, has ~10 years of history, et cetera16:09
mordredit is a large repo and gitea has had some issues with doing the right things caching its refs in the past16:09
Eighth_Doctorwell, let's see if I can even download it! :P16:09
Eighth_Doctorwe've hosted mirrors of the linux kernel reasonably well on pagure.io (which has less resources than src.fedoraproject.org) and I think I have a copy of mongodb pre sspl there16:10
mordredEighth_Doctor: does pagure handle operating in a cluster decently? like - if we wanted to run 8 pagures in a k8s but treat them as a single server?16:10
Eighth_Doctormordred: this is of comparable size: https://pagure.io/mongodb-agplv316:10
Eighth_Doctormordred: I personally do not know because I don't run pagure that way, but I know of users who are running it in OpenShift or Kubernetes and scaling the backend workers accordingly to handle the load well16:11
Eighth_Doctorso far, I haven't heard any complaints16:11
Eighth_Doctorthere's a WIP helm chart PR for pagure, but neither I nor the other developers have experience with k8s enough to be able to do anything meaningful with it16:12
mordrednod. I mean - the k8s part isn't as important as the being able to scale it horizontally part16:12
fungimaster branch of nova is nearly 60k commits at this point, looks like16:12
fungiEighth_Doctor: what's the typical server size for pagure, do you think? part of why we're running 8 backends for gitea is that they're each small virtual machines with like 8gb ram16:13
fungibut we're also not nearly the repository count of fedora, only a little over 2k repositories at the moment16:14
Eighth_DoctorI don't have the exact details, but I think the existing src.fedoraproject.org server is basically a VM with 4GB of RAM16:14
fungineat16:14
Eighth_Doctorit might be 8GB of RAM now, but I know it's not a huge machine16:14
corvushere's utilization of the individual gitea backends: http://cacti.openstack.org/cacti/graph_view.php16:15
corvusclick the 'gitea farm' on the left16:15
Eighth_Doctorthat's not too bad16:16
Eighth_Doctorstorage I can ignore, since those are synced16:16
corvuslooks like a median load average might be about 0.25, peaking at 216:16
Eighth_DoctorI'm pretty sure the utilization levels are similar on src.fp.o16:16
mordredif I'm reading https://docs.pagure.org/pagure/overview.html#pagure-workers right - in general there is expected to be one copy of the git repos on disk and pushing to those would be via a gitolite instance. then the pagure web interface is going to read from that filesystem copy via async worker tasks16:17
fungiour typical activity levels would probably be handled with only a couple backends, but with some frequency people point high-volume ci systems at our git refs and start cloning hundreds of copies of repositories at the same time16:17
Eighth_Doctoryep16:17
mordredso if the filesystem were shared amongst workers, teh read traffic looks like it would be pretty scalable16:17
corvusi bet we could halve the cluster (to 4 8gb vms) with no significant impact to performance.  more than that we'd probably have peak memory usage issues.16:17
Eighth_Doctorthis is essentially the characteristic for fedora16:18
Eighth_Doctorwe also have things like koschei, zuul, etc. constantly checking out and interacting with pagure API16:18
mordredvia scale out - but writes might still have a spof?16:18
Eighth_Doctorand it's doing very well with just one server16:18
Eighth_Doctorthe only bottleneck is if you need to scale storage... but if you're operating in k8s, this is abstracted for you16:19
mordredoh - we're not :)16:19
mordredbut - that's been a thing we've been looking at doing if we could get to a clustered solution for the git browsing16:19
Eighth_Doctor... then I'm confused about k8s?16:19
mordredright now we replicate to all 8 machines independently16:19
Eighth_Doctoroh... ouch16:20
mordredwe'd LIKE to have a single clustered system that we replicate to once16:20
mordredbut so far that's problematic16:20
Eighth_Doctorthat means you're inducing state sync load16:20
Eighth_DoctorI've usually seen this solved with either shared nfs or gluster16:20
mordredwith cgit it was just impossible. with gitea there are some indexes that made single-machine assumptions that are in process of being fixed16:20
Eighth_Doctorthat's not to say other solutions aren't valid, but those are the two I usually see16:21
mordredyeah- that was/is the gitea design - run the gitea cluster on top of a cephfs16:21
Eighth_Doctorthere is an option for sharding git storage in pagure16:21
mordredbut there were 2 things it was doing that were storing index files in the filesystem which needed to be abstracted out into plugin interfaces so they could store in a service16:21
Eighth_Doctorbut we don't use it in fedora right now and it needs some love16:22
Eighth_Doctorhttps://github.com/repoSpanner/repoSpanner16:22
Eighth_Doctorthis does work with pagure, but the issue is that the sync penalty is too high in some cases16:22
mordredthat would be a cost in push right?16:23
Eighth_Doctorthere was some in-progress work for improve performance, but interest died off on completing it16:23
Eighth_Doctoryes16:23
mordredI'd LOVE to be able to scale without needing to run a shared filesystem16:23
Eighth_DoctorrepoSpanner was designed to avoid the shared filesystem requirement16:23
clarkbyes memory is the major thing. You need about a gig of memory for each git operation on several of our repos16:23
Eighth_Doctorbecause we don't use one in Fedora16:23
clarkbas long as git is used regardless of frontend I dont expect that changes dramaticallu16:24
clarkbthen you add N operations amd suddenly you need quite a bit of memory16:24
openstackgerritMerged opendev/jeepyb master: Decode utf-8 from subprocess.Popen  https://review.opendev.org/71918816:24
clarkbalso note the split git repos arent an issue as long as this is a read only frontend16:24
clarkbits going ti be eventually consistent regardless due to how gerrit replication works16:25
mordredinfra-root: ok the jeepyb change landed - I'm about ready to try another restart16:25
Eighth_Doctorso perhaps pagure + repospanner would work in your specific scenario16:25
clarkb(so overcomplicating that to sync isnt worth much imo, using a fa that syncs for us is nice and simple16:25
clarkb*using a fs16:25
mordredclarkb: yah - but ... it's possible running repoSpanner might be easier than running ceph16:26
Eighth_Doctormordred: _that_ I can say is true :)16:26
mordred(if we got a ceph magically from someone already running one, using a ceph would be easier)16:26
Eighth_Doctorisn't that how that always works? :)16:27
corvusmordred: i have to run; i can check back in in a few hours, but i support you restarting if you're comfortable16:27
mordredI'm comfortable16:27
fungithanks corvus16:27
fungiand yeah, i'm around again16:28
mordredcorvus, fungi, clarkb: images ahve been pushed, ansible change have applied16:28
mordredI'm going to try another restart16:28
mordredclarkb: we're in root screen on review if you wanna watch16:28
clarkbIm half around. Drinking tea and eating cornbread16:28
mordredalthough the root screen itself isn't super exciting16:28
Eighth_DoctorI don't know if you guys use ansible or something else for config management, but you can see Fedora's ansible role for pagure here: https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/pagure16:29
Eighth_Doctor(the ansible repo hasn't yet moved to pagure.io)16:30
mordredclarkb, fungi : gerrit looks like it's back u16:31
mordredip16:31
mordredUP16:31
Eighth_DoctorCentOS also runs an instance and has an Ansible role: https://github.com/CentOS/ansible-role-pagure16:31
Eighth_Doctorpagure uses MySQL on CentOS and PostgreSQL on Fedora ;)16:32
fungibased on our present direction for evolution of our deployment methodology, we'd presumably consume docker images for the service components and then deploy those images with ansible16:32
mordredyeah16:32
Eighth_Doctorthat's fine too :)16:33
fungieither consume upstream-provided docker images, or (re)build our own with our ci system and then consume those16:33
Eighth_Doctorwe have Dockerfiles for pagure that we use primarily for dev and CI, but we don't currently publish any containers for prod16:34
Eighth_Doctorso the latter would probably be the way for you to go16:34
mordredyeah - that's what we do for gitea too - their docker images aren't structured for what we'd want in prod - are more focused on the AIO "I want to run it quickly on my laptop" use case16:34
mordredwhich is an important use case16:35
mordredbut not what we're doing :)16:35
fungiyep, that's how we're deploying gerrit in production, as of, well, today i suppose (if we don't have to roll back again)16:35
mordredfungi: I'm going to roll this forward today if it kills me16:35
fungihow about let's just not stick to deployment models which leave dead sysadmins in their wake16:35
mordredok fair16:36
Eighth_Doctorpagure is packaged for Fedora, RHEL/CentOS via EPEL, Mageia, and openSUSE by me16:36
Eighth_Doctorso if you want to play with it in a VM or a container, it's pretty easy to do ;)16:36
clarkb"Here lies Mordred. A java program eventually got the best of him"16:36
Eighth_DoctorRIP16:36
mordredHOOKS HAVE RUN WITH NO TRACEBACKS16:38
mordredI declare victory16:38
clarkbmordred: fungi if there is a list of things to review I can help with that but probably not get into it beyond that16:38
* Eighth_Doctor sees mordred fall over in a heap16:39
fungiclarkb: i think we've got them all in now? can probably abandon the revert of the hooks update which didn't merge16:39
clarkbcool will do that16:39
fungiwe can resurrect it if we decide we do have to roll back for reasons we can't correct immediately16:40
mordredfungi: while I've got you - would you mind reviewing https://review.opendev.org/#/c/719088/ ?16:42
mordredfungi: if you're ok with that - I'll delete the old ones and land it16:42
Eighth_Doctorfungi, mordred: if you were interested in the k8s based approach: https://pagure.io/pagure/pull-request/448316:44
fungimordred: yep, cool will do16:45
Eighth_Doctorand since we've been talking about performance, here's the info I gave the FSF for helping them set up a performant system for their forge based on pagure: https://lists.pagure.io/archives/list/pagure-devel@lists.pagure.io/message/SZ7GJ5P65Q76FRZIDNYFP3HI4RD4H6LT/16:47
clarkboh thats the other performance related issue we do have. We have to use source ip based load balancing due to unshared git repos16:50
clarkbbecause a fetch executing across different repos of the same logical entity can fail16:50
clarkb(it depends on how objects are packed iirc)16:51
clarkband that hasissues when large companies funnel through a single NAT IP16:52
Eighth_Doctoryup16:52
Eighth_Doctorthat might be where repoSpanner helps here16:55
openstackgerritMonty Taylor proposed opendev/system-config master: Write out db config for root user  https://review.opendev.org/71919216:56
Eighth_Doctorassuming you want to have multiple storage replicas16:56
Eighth_Doctorclarkb: I'm not sure, given your usage model, that repoSpanner would be necessary, but it would avoid the load balancing problem16:59
Eighth_Doctoryou could run one frontend app with some number of workers, and then have a repoSpanner cluster that handles the git storage16:59
openstackgerritMerged opendev/system-config master: Install ep_headings module  https://review.opendev.org/71912316:59
clarkbEighth_Doctor: ya any shared repo content or synced content would fix that I think16:59
fungias long as the shared backend guaranteed all frontends were serving exact same copies of the content at the same times17:02
mordredclarkb, fungi: corvus suggested earlier that we should build etherpad image instead of doing that ep_headings hack above and i agree17:05
mordredI'll ressurect the child-image-building code in a bit17:05
fungimordred: i take it cron running track-upstream outside the container is fine?17:08
mordredfungi: it actually runs it in a container :)17:09
fungihuh... looking closer17:10
mordredfungi: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/track-upstream.j217:10
mordredfungi: that gets installed into /usr/local/bin17:10
fungioh! so it does ;)17:10
fungialso... has gerritbot config update gotten solved yet? i want to say we're still not seeing infra-manual changes in here since the namespace move17:11
mordredit has not - I've got the first change up17:12
mordredhttps://review.opendev.org/#/c/715635/17:13
openstackgerritMerged opendev/system-config master: Add review and etherpad to backup group  https://review.opendev.org/71903617:13
openstackgerritMerged opendev/system-config master: Run ansible on the backup server  https://review.opendev.org/71907617:13
fungioh, cool17:16
fungiright, we were containering it, i forgot17:17
fungi+217:17
mordredfungi: so many containers17:18
fungiokay, christine's got a lengthy list of things i need to repair around the house, but i'll be in and out to keep tabs on gerrit in case we run into any more unforeseen problems17:18
mordredfungi: cool. I think we're good though - it seems like we've finally finished this phase!17:29
openstackgerritMerged opendev/system-config master: Add root cron jobs to gerrit  https://review.opendev.org/71908817:46
fungihere's hoping!17:48
mordredfungi, clarkb, corvus: the backup playbook is not working18:27
mordredmy brain can't quite process it at the moment18:27
mordredbut we shoud fix it :)18:28
fungii'll see if i can figure it out in a bit18:37
fungionce this leftover curry is gone ;)18:37
fungimordred: when you say "the backup playbook is not working" you mean the periodic pipeline job running the playbooks/service-backup.yaml playbook?19:40
mordredfungi: I mean the playbook itself - if you look in /var/log/ansible/service-backup.yaml.log on bridge19:44
mordredfungi: looking at the ansible it looks like there's maybe a mismatch in variable name - but I'm not 100% sure and I'm not 100% sure of the intent19:44
mordredso the job is running the playbook fine -but the playbook itself is bombing out :)19:46
openstackgerritSean McGinnis proposed openstack/project-config master: Make job template update best effort  https://review.opendev.org/71930819:47
mordredfungi: oh! I think I might know what the issue is19:50
mordredfungi: etherpad01.opendev.org is in the disabled list19:50
mordredso it's not being run in the backup role - so it's not setting the bup_user variable19:51
fungid'oh19:51
mordredBUT - we do with_inventory_hostnames in backup-server19:51
mordredon 'backup'19:51
mordredwhich does not subtract hosts in the disabled group19:51
mordredooh - it supports exclusion patterns19:52
openstackgerritMonty Taylor proposed opendev/system-config master: Exclude disabled group from backup-server loop  https://review.opendev.org/71930919:54
mordredfungi, corvus : ^^19:54
mordredthis is an issue that will only arise if we have a server we backup disabled at a time when we have backup-server enabled19:54
mordredlike now19:54
mordredalso - I think we can unemergency etherpad19:55
mordredbut why don't I leave it in emergency so we can check that backup runs correctly in this scenario19:55
fungigood call20:07
fungimordred: so... we don't want to backup servers which have config management disabled?20:09
mordredfungi: well - we do - but we probably don't want to set up new backup info on them if they're disabled20:31
mordred(or we can't, since we won't have run the corresponding stuff on the server themselves - so there's potentially no user to connect to yet - which would be true in the case of etherpad)20:32
mordredbackups _themselves_ are via cron - but attempting to set up new backups while disabled == sad panda20:32
fungiokay, so the bup_users set would only be used for initial configuration, not to decide which to run the backups for when already set up, got it20:33
fungithe job name system-config-run-backup was mildly misleading20:34
funginow realizing it's infra-prod-service-backup i meant to be looking at20:35
fungiand yeah, now i see in playbooks/roles/backup/tasks/main.yaml we're still configuring a cronjob, not triggering backups directly20:36
fungimakes sense, thanks20:36
*** tosky has quit IRC23:24
*** DSpider has quit IRC23:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!