Thursday, 2021-01-14

clarkbthere are 31 neutron changes in the queue too00:05
clarkbwhich is about 20% of the queue size but likely significantly more when measured in node time00:07
clarkbdoes anyone know why the tripleo-ci-centos-8-standalone jobs in the tripleo gate queue all seem to be waiting? I wonder if they are waiting on resources by a parent job?00:09
clarkbthat seems to be a big reason for why that queue isn't flushing its contents quickly00:09
clarkbwell that and the general slowness of the jobs that are running but that is normal00:09
corvusinfra-root: it looks like perhaps afs release jobs are stuck or slow; is this known?00:15
ianwcorvus: i belive due to a prior failure of afs02 everything is doing a full release00:16
corvusneat00:16
corvusit came to my attention because docs volumes aren't updating; from the process list, i'm guessing they have completed their release, but due to the way the cron works, they won't get another release until everything else is done00:17
clarkbI think the tripleo jobs that aren't running depend on tripleo-ci-centos-8-content-provider which zuul says has a > 3 hour estimated runtime00:17
clarkbso that is likely another place that can be optimized00:17
clarkboh I wonder if that job pauses though so the long runtime is related to waiting for the other jobs to run00:18
corvuslet me revise that: it looks like the docs release script is waiting only on the release of project.tarballs00:20
fungicorvus: yep, i mentioned in #zuul earlier, still keeping an eye on it00:20
clarkbhttps://opendev.org/openstack/tripleo-ci/src/branch/master/zuul.d/standalone-jobs.yaml#L1092 that seems to be a job the builds container images. I wonder if every tripleo content-provider job rebuilds all tripleo images which isn't quick, then other jobs sit around waiting for that to complete00:21
corvusfungi: oh, sorry missed that00:21
fungiall the mirror volumes are trying to do full releases to afs02.dfw which has immensely slowed the tarballs volume release, ad the zuul site release is in a serialized script behind the tarballs full rerelease00:21
fungirecovery has been underway since roughly 17:20 utc00:22
clarkbthe neutron changes also run that trpleo content-provider job as a dep for the tripleo jobs run against neutron changes00:23
clarkbI'm running out of steam to do a zuul throughput debug but we may need to talk to the openstack TC tomorrow if this persists or gets worse00:24
clarkbit does seem like we've got at least a couple of setups that are making it bad00:24
clarkbI expect the build the images step is a response to docker hub doing rate limits00:25
corvusfungi, ianw: are we sure afs01.dfw is okay?00:25
clarkbquay.io says they won't do image download rate limits (though if you make too many looks they rate limit those) ? I wonder if we need to work with tripleo to look at quay again00:25
corvusi'm seeing 'vos examine' hang on volumes that are on that server00:26
clarkbcorvus: I think it stopped complaining about its cinder volume after rebooting, but I'm not sure if further verification of health was made00:26
fungicorvus: not entirely sure since the server itself hung last week and had to be rebooted. it looked like things had recovered but possible i missed something00:26
fungiclarkb: afs01 not 0200:26
clarkbfungi: oh got it00:27
fungifor afs01 the server hung last week and had to be hard rebooted to recover (likely something related to live migration in rax)00:27
fungifor afs02 one of its cinder volumes was on a host which became unresponsive, so the /vicepa partition remained in a read-only state until i rebooted the server00:28
fungicorvus: where did you see hung vos examine processes? possible those were somewhere i missed after the reboot last week00:29
ianwafs01 seems up and no bad messages as step 100:29
corvusfungi: in my terminal00:29
fungiahh00:29
corvusvos examine test.fedora00:29
fungiyeah, it takes a while for me to get a response on that as well (maybe forever, hasn't returned yet)00:30
fungicorvus: looks like there's a hung `vos examine project.tarballs` from 10 minutes ago on mirror-update.opendev.org too, is that you as root?00:31
ianwnothing bad in the afs logs, but agree it's not looking healthy00:32
corvusyep00:32
fungicould it be that having so many volumes doing full releases in parallel has overwhelmed it?00:32
corvuscould be?  maybe the volserver limited queue for answering requests and it's full of release txns?00:34
fungiproject.tarballs, mirror.yum-puppetlabs, mirror.debian, mirror.opensuse, mirror.debian-security, mirror.ubuntu-ports, mirror.epel, mirror.centos, mirror.fedora, mirror.deb-octopus, mirror.deb-docker, mirror.ubuntu-cloud, mirror.apt-puppetlabs, mirror.deb-nautilus, mirror.ubuntu00:34
fungithose are all running simultaneously00:34
corvussince we're not seeing any errors, probably makes sense to just leave it be until the backlog clears a bit00:34
fungibasically the static volume releases kicked off and those went fine until it got up to project.tarballs which needed a full release and is something like 175GB of data00:35
fungiroughly an hour into that, mirror volumes started getting vos release runs and i expect many if not most of them decided they needed a full release too00:35
fungiand they piled up before i noticed00:35
corvuswould be cool if we could shard that by date :)00:36
corvusnext time we design a tarballs archive....00:36
fungiindeed00:36
*** tosky has quit IRC00:36
fungiin theory at the date of data transfer i'm seeing the project.tarballs full release would have only needed two hours to complete00:37
fungiexcept about halfway into that it started having to compete with full releases of mirror volumes00:37
fungier, rate of data transfer i'm seeing00:37
fungiwe're clocking ~50Mbps into eth0 on afs02.dfw00:38
ianwiotop on afs01 shows a lot of reads, and a lot of writes on afs02 ... it does appear to be doing *something*, although i agree none of the status commands return00:39
fungier, actually bad math on my part. more like 9 hours to complete a full release of 175GiB at 45Mbps (closer to what the graph indicates)00:41
fungii'm increasingly tempted to kill the mirror volume releases and hold locks for them, then remove the locks in an orderly fashion once static site volumes are done00:42
fungiwhen i thought the tarballs volume could finish at any time i was less concerned about it, but at this point i'm quite sure the mirror volumes are going to need faaar longer and drag out the tarballs site finishing by a lot more than i originally estimated00:43
ianwfungi: i once had a change out to put a global stop to mirror releases ...00:44
ianwhttps://review.opendev.org/c/opendev/system-config/+/68058600:45
fungithinking through this, the main risk is if i kill an in progress vos release, the volume it was releasing might need a full release instead of an incremental one. the problem volumes are likely ones doing full releases anyway and i bet they're not very far along... so maybe if i check the logs and only kill the mirror volumes doing a full release, then not much progress is lost?00:45
fungiany of the ones where their last loglines are currently something like "Starting ForwardMulti from .* to .* on afs02.dfw.openstack.org (full release)."00:47
auristorfungi: the most likely reason that "vos status afs01.dfw.openstack.org" is failing to return is that there are no available threads from to process the request.00:48
fungiauristor: sounds likely, thanks!00:49
fungiyeah, currently 14 mirror volumes all in the midst of a full release to afs02.dfw according to their respective logs00:49
auristoralthough rxdebug reports that there are currently 10 calls waiting for a thread and two threads are idle00:49
fungiplus the project.tarballs volume00:49
*** d34dh0r53 has quit IRC00:50
ianwthe other thing is, we'd be in a much better position to put in a sequential lock for the mirror volume releases now that they're all done from the same server00:50
fungigreat point00:51
fungiwell, except for the wheel volumes i think?00:52
fungibut those aren't a huge problem00:52
*** d34dh0r53 has joined #opendev00:54
*** auristor has quit IRC00:56
openstackgerritmelanie witt proposed opendev/elastic-recheck master: Add query for bug 1911574  https://review.opendev.org/c/opendev/elastic-recheck/+/77068800:59
openstackbug 1911574 in OpenStack-Gate "SSH to guest sometimes fails publickey authentication: AuthenticationException: Authentication failed." [Undecided,New] https://launchpad.net/bugs/191157400:59
fungiactually a more careful analysis of the mirror update logs indicates there are only 8 with a full release in progress so i'll stop those first and put locks in place, then see which others queue up and whether they want full releases too01:00
*** auristor has joined #opendev01:07
*** auristor has quit IRC01:19
*** auristor has joined #opendev01:20
fungiokay, i've stopped releases underway for the following and held locks in a root screen session on mirror-update.o.o: centos, epel, fedora, opensuse, yum-puppetlabs, debian-security, debian, ubuntu-ports01:21
auristorwhat do you mean by "stopped releases"?   do you mean you killed the vos processes?   that isn't going to cancel the transfers between afs01.dfw and afs02.dfw.   it will result in the transferred data being discarded after the transfer completes.01:25
fungiyeah, i still need to do something about the transactions, presumably01:25
fungijust trying to make sure our scripts don't restart them once i have01:25
auristoruntil there are available threads to process incoming rpcs there is nothing that can be done.01:26
fungiahh, even the vos endtrans calls will queue up i guess01:26
fungiso need to wait for at least one of them to complete01:26
auristorone of the design flaws in openafs that was addressed by auristorfs is that you cannot terminate the transfer between the volservers01:27
auristoralso, I strongly recommend that no more than 5 volume operations be permitted in flight at a time.01:27
fungithanks, sounds like our plan to start setting up a semaphore of some sort with these is a good one in that case01:28
fungiusually it's not a problem, but if something happens to the server and most of the volumes have to get full releases they'll pile up01:29
*** cloudnull has quit IRC01:29
openstackgerritxinliang proposed openstack/diskimage-builder master: Fix building error with element dracut-regenerate  https://review.opendev.org/c/openstack/diskimage-builder/+/77024101:53
*** chateaulav has joined #opendev02:29
*** chateaulav has quit IRC02:33
*** mlavalle has quit IRC02:34
openstackgerritxinliang proposed openstack/diskimage-builder master: Fix centos 8.3 partition image building error with element iscsi-boot  https://review.opendev.org/c/openstack/diskimage-builder/+/77070103:00
openstackgerritJeremy Stanley proposed opendev/engagement master: Initial commit  https://review.opendev.org/c/opendev/engagement/+/72929303:01
openstackgerritJeremy Stanley proposed opendev/engagement master: Initial commit  https://review.opendev.org/c/opendev/engagement/+/72929303:04
openstackgerritxinliang proposed openstack/diskimage-builder master: Add rhel support for iscsi-boot  https://review.opendev.org/c/openstack/diskimage-builder/+/77070203:25
*** brinzhang0 has joined #opendev03:27
*** brinzhang0 has quit IRC03:29
*** brinzhang0 has joined #opendev03:29
*** brinzhang_ has quit IRC03:30
*** brinzhang0 has quit IRC03:30
*** brinzhang0 has joined #opendev03:31
openstackgerritxinliang proposed openstack/diskimage-builder master: Add aarch64 support for rhel  https://review.opendev.org/c/openstack/diskimage-builder/+/77070303:37
openstackgerritMerged opendev/system-config master: Publish structured data listing our ML archives  https://review.opendev.org/c/opendev/system-config/+/75112503:38
*** whoami-rajat__ has joined #opendev04:22
*** artom has quit IRC04:24
*** amotoki has quit IRC04:44
openstackgerritIan Wienand proposed opendev/system-config master: vos-release: implement sequential release lock  https://review.opendev.org/c/opendev/system-config/+/77070504:44
openstackgerritIan Wienand proposed opendev/system-config master: mirror-update: create common timeout set  https://review.opendev.org/c/opendev/system-config/+/77070604:44
*** amotoki has joined #opendev04:44
ianwfungi: ^04:45
ianwmnaser: so we have two vexxhost backups; the old bup one @ backup01.sjc-1.vexxhost.opendev.org and the new borg one @ backup02.sjc-1.vexxhost.opendev.org04:53
ianwmnaser: the old one has 3tb attached, 2.4 taken, so i can't take any drives out of that.  the new one only has 1tb attached and is at 98%04:54
ianwmnaser: i think ideally we'd like to keep the old bup 3tb of backups for a while, just as ... well ... a backup.  but we're out of quota to add more disk to the new server04:55
ianwmnaser: let me know what you think.  if adding extra is an issue, we could probably drop the old bup backups on vexxhost and just leave one copy in RAX; i.e. effectively free up the 3tb of the old server and allocate to new04:56
*** ykarel has joined #opendev05:22
*** ykarel has quit IRC05:39
*** ykarel has joined #opendev05:41
*** ysandeep|out is now known as ysandeep|afk05:43
*** ykarel_ has joined #opendev05:54
*** ykarel has quit IRC05:57
ianwfungi: ooohhhh ... hrm 770705 & 770706 actually have a little dependency on production in testing.  it does a no-op test of the vos-release script in the testinfra06:03
ianwfungi: the problem being that it does a vos examine (or whatever) on the volumes to determine if they need have release run on them.  that is timing out due to the prior discussion, failing the test06:04
ianwi dunno what to do about that.  probably just leaving it till everything settles down is the best course of action at this point06:04
*** ykarel__ has joined #opendev06:05
*** marios has joined #opendev06:07
*** ykarel__ is now known as ykarel06:07
*** ykarel_ has quit IRC06:08
*** brinzhang_ has joined #opendev06:25
*** brinzhang0 has quit IRC06:28
*** auristor has quit IRC06:43
*** ysandeep|afk is now known as ysandeep07:21
*** eolivare has joined #opendev07:42
*** jaicaa has quit IRC07:44
*** jaicaa has joined #opendev07:45
*** rpittau|afk is now known as rpittau07:47
*** ralonsoh has joined #opendev07:51
*** jpena|off is now known as jpena07:52
*** ralonsoh_ has joined #opendev07:56
*** ralonsoh has quit IRC07:59
*** slaweq has joined #opendev08:03
*** fressi has joined #opendev08:05
danpawlikianw, fungi: Hey, do you have some issue with AFS mirror?08:05
danpawlikseems that release date are not updated since 1 day for all distros08:05
*** sgw has quit IRC08:11
*** zoharm has joined #opendev08:14
*** andrewbonney has joined #opendev08:17
*** hashar has joined #opendev08:22
*** ralonsoh has joined #opendev08:28
*** ralonsoh_ has quit IRC08:28
zoharmHi all, would like to ask here for some pointers regarding setting up Cinder volume backend driver 3rd party CI.08:31
zoharmWe currently have devstack streamlined to run with our storage backend and are able to launch successful tempest runs. My question/request is, what are some useful resources documenting the integration points needed for Gerrit, initiating assigned CI runs, and publishing results?08:31
zoharmAnd any recommendations for the setup architecture would be greatly appreciated! Thank you!08:32
*** fp4 has joined #opendev08:35
*** fp4 has quit IRC08:39
*** fp4 has joined #opendev08:41
*** brinzhang0 has joined #opendev08:45
*** brinzhang_ has quit IRC08:48
*** tosky has joined #opendev08:49
openstackgerritMerged zuul/zuul-jobs master: upload-artifactory: no_log upload task  https://review.opendev.org/c/zuul/zuul-jobs/+/76811108:51
*** sgw has joined #opendev08:51
*** slaweq has quit IRC08:55
fricklerdanpawlik: yes, one afs node crashed yesterday and we are still trying to get things back into sync09:00
*** slaweq has joined #opendev09:00
zbrAm i the only one that find quite often gerrit stuck forever with "Loading..."?09:08
zbri have to reload the page again to fully load it, as when it happens it almost never finishes.09:09
zbrit did happen in the past but during the last week it become very common.09:09
*** ykarel_ has joined #opendev09:13
zbrfrickler: low hanging, https://review.opendev.org/c/opendev/git-review/+/770556 thanks.09:14
*** ykarel has quit IRC09:16
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525809:29
*** fressi has quit IRC09:39
*** fressi has joined #opendev09:40
*** ykarel_ is now known as ykarel10:06
openstackgerritAndy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present  https://review.opendev.org/c/zuul/zuul-jobs/+/77065610:14
*** sgw1 has joined #opendev10:15
*** sgw has quit IRC10:16
*** lpetrut has joined #opendev10:33
*** sshnaidm|afk is now known as sshnaidm|ruck10:40
openstackgerritAndy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present  https://review.opendev.org/c/zuul/zuul-jobs/+/77065610:56
*** dtantsur|afk is now known as dtantsur10:56
*** brinzhang has joined #opendev10:57
*** brinzhang has quit IRC10:58
*** brinzhang has joined #opendev10:58
*** brinzhang0 has quit IRC10:59
mnaserianw: how long would you keep it up?11:03
openstackgerritAndy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present  https://review.opendev.org/c/zuul/zuul-jobs/+/77065611:06
openstackgerritAndy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present  https://review.opendev.org/c/zuul/zuul-jobs/+/77065611:07
*** ysandeep is now known as ysandeep|afk11:17
*** DSpider has joined #opendev11:19
*** brinzhang has quit IRC11:25
*** brinzhang has joined #opendev11:25
*** brinzhang_ has joined #opendev11:26
*** brinzhang_ has quit IRC11:28
*** brinzhang_ has joined #opendev11:29
*** brinzhang has quit IRC11:30
*** fressi has quit IRC11:32
*** fressi has joined #opendev11:39
*** hashar is now known as hasharLunch11:54
*** fressi has quit IRC11:54
*** brinzhang0 has joined #opendev12:03
*** brinzhang_ has quit IRC12:05
openstackgerritAndy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present  https://review.opendev.org/c/zuul/zuul-jobs/+/77065612:21
*** hasharLunch is now known as hashar12:22
*** jpena is now known as jpena|lunch12:33
*** auristor has joined #opendev12:39
*** ysandeep|afk is now known as ysandeep12:52
*** michael-mcaleer has joined #opendev12:58
michael-mcaleerHi OpenDev team, I have a question around editing watchlists in review.opendev.org. The docs say to go through Settings > Watched Projects but since the recent changes to gerrit I seem to have lost the ability to add/remove watched tags or projects. I am trying to unsubscribe from tags from cinder after moving teams13:01
michael-mcaleerCan you help point me in the right direction here? Thanks!13:01
fricklermichael-mcaleer: I think https://review.opendev.org/settings/#Notifications should be what you are looking for13:08
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525813:10
michael-mcaleerThanks frickler, I was able to find what I was looking for13:13
michael-mcaleerI also needed to remove it from launchpad, that was where I was going wrong13:13
*** brinzhang0 has quit IRC13:13
*** brinzhang0 has joined #opendev13:14
*** chateaulav has joined #opendev13:16
*** sboyron has joined #opendev13:26
*** jpena|lunch is now known as jpena13:29
*** artom has joined #opendev13:30
*** zul has quit IRC13:49
*** ykarel has quit IRC14:02
slaweqhi fungi and other infra-root guys, can You take a look at14:04
slaweqhttps://review.opendev.org/c/zuul/zuul-jobs/+/762650?14:04
slaweqthx in advance14:04
*** ysandeep is now known as ysandeep|cinder_14:14
*** ysandeep|cinder_ is now known as ysandeep|session14:14
openstackgerritAlfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream  https://review.opendev.org/c/zuul/zuul-jobs/+/77081514:16
*** fressi has joined #opendev14:16
openstackgerritBenedikt Löffler proposed zuul/zuul-jobs master: Pass environment variables to 'tox envlist config' task  https://review.opendev.org/c/zuul/zuul-jobs/+/77081914:25
*** fressi has quit IRC14:32
*** fbo has quit IRC14:45
*** fbo has joined #opendev14:47
fungidanpawlik: yes, there was a catastrophic storage failure for afs02.dfw due to a cinder outage in the provider, it's being rewritten very slowly but volume releases are delayed due to limited bandwidth to complete that14:50
guillaumeczuul-promote-docs doesn't seem to update Zuul Documentation https://zuul-ci.org/docs/zuul/index.html, recent https://review.opendev.org/644927 and https://review.opendev.org/732066 doc update aren't online14:50
fungioh, i see frickler replied further down in my scrollback14:50
danpawlikfungi, frickler: thanks for the information14:51
fungiguillaumec: yes, we're roughly a day backlogged on replicating afs volumes due to a catastrophic cinder failure in rackspace14:51
guillaumecfungi, ok14:51
fungiand that site is served out of a read-only afs volume replica14:51
fungiit will update once all the replication catches back up14:52
fricklerslaweq: can you consider ianw's remark? would it be possible to just rerun this task when you need it instead of using the new script?14:54
*** cloudnull has joined #opendev14:55
slaweqfrickler: but can I run this ansible task from devstack directly?15:00
slaweqcurrently the problem is that:15:00
slaweq1. zuul installs ovs and configures bridges for infra connectivity15:01
*** sgw1 has left #opendev15:01
slaweq2. devstack runs and if ovn module is used there, it removes ovs installed previously from packages and installs everything from source15:01
*** ykarel has joined #opendev15:02
slaweq3. and then all those settings made by zuul and this role are gone15:02
slaweqfrickler: that's why I wanted to have simply script which I can call from the devstack plugin15:02
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525815:07
*** michael-mcaleer has quit IRC15:17
*** sboyron has quit IRC15:31
*** hashar has quit IRC15:33
*** lpetrut has quit IRC15:33
*** mtreinish has joined #opendev15:41
*** JayF has joined #opendev15:50
openstackgerritMatthew Thode proposed openstack/project-config master: update gentoo from python 3.6 to python 3.8  https://review.opendev.org/c/openstack/project-config/+/77082815:55
*** ysandeep|session is now known as ysandeep15:58
*** cloudnull has quit IRC15:59
clarkbslaweq: are ovn and ovs not able to coexist because they share the same kernel configuration?16:01
clarkbor maybe the same module in the kernel16:01
slaweqclarkb: it's not that they can't coexists, ovn requires ovs to be running16:02
slaweqclarkb: but the problem in our case is that ovs installed from packages is using different ovsdb file, different sockets, etc.16:02
clarkbslaweq: right I undersatnd that, I think I'm trying to understand why you must flush our existing config16:02
clarkbwe've intentionally tried to set it up such that is uses very high vxlan numbers and is otherwise out of the way16:02
openstackgerritMatthew Thode proposed openstack/project-config master: update gentoo from python 3.6 to python 3.8  https://review.opendev.org/c/openstack/project-config/+/77082816:03
slaweqso when we are installing new ovs from source, it don't see anything what was created earlier when "packaged ovs" was running16:03
clarkbI see it completely ignores the preexisting config16:03
slaweqclarkb: config is flushed by reinstallation of ovs from source16:03
clarkbmy concern with adding scripts there is that they won't be tested 99% of the time. I think if we are going to go that route then the role should run those scripts as its setup too16:03
clarkbI know others will object to replacing ansible with bash (though I'm personally less concerned about that)16:04
slaweqclarkb: I can do that of course but as You said, will others be happy with that?16:04
*** cloudnull has joined #opendev16:05
slaweqI just need simple way to "reconfigure" that br-infra bridge again, after ovs is installed from source by ovn devstack plugin16:05
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525816:05
clarkbslaweq: I think if we are going to supply a script to do this but don't run it as part of the role  it is very likely to regress the next time we make a change to the role16:05
clarkbwhich is why I think we should run the scripts in the role too. Maybe ask in #zuul and see if others have strong opinions about replacing ansible with bash16:06
slaweqclarkb: ok, I will ask16:06
slaweqbut not today as I'm almost done for today16:06
fungiright, it's really a zuul project, not an opendev project16:06
clarkbslaweq: yes enjoy your evening16:06
slaweqthx for checking that16:06
fungi(zuul-jobs is the zuul standard library in essence)16:06
clarkbfungi: did the backlog of changes in zuul come up in the tc meeting?16:07
clarkblooks like it has grown significantly since I last looked :/16:08
clarkblooks like pylint caused neutron to reset the gate recently too16:08
clarkbI didn't realize any openstack projects used pylint (for this very reason)16:09
fungiit did not come up, no16:09
fungiwasn't on the agenda i don't think16:09
*** ykarel is now known as ykarel|away16:24
clarkbfungi: for the afs release is the list of volumes to release shrinking ?16:28
*** mlavalle has joined #opendev16:29
*** hashar has joined #opendev16:31
*** zbr3 has joined #opendev16:38
*** zbr3 has quit IRC16:39
*** zbr9 has joined #opendev16:40
*** zbr has quit IRC16:40
*** zbr9 is now known as zbr16:40
*** ykarel|away has quit IRC16:46
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525816:49
fungiclarkb: not really, no, because they're all volumes which would require multiple days to do a full release and all got triggered on top of each other16:49
clarkbgot it16:50
fungiand as was subsequently pointed out, killing the vos release processes doesn't terminate the data transfer transactions, just makes them get discarded when they complete, so there's still no open rpc slots to actually tell afs to do anything else16:50
clarkboh TIL16:51
*** marios has quit IRC17:03
*** diablo_rojo has joined #opendev17:04
JayFhttps://lists.openafs.org/pipermail/openafs-info/2021-January/043013.html in case you were not aware; IDK what version of OpenAFS you all run, but this is particularly nasty.17:06
*** jpena is now known as jpena|off17:07
*** eolivare has quit IRC17:07
clarkbfungi: ^ I think our restarts were prior to that date. But I assume subsequent restarts will completely break us17:09
clarkbthats a joy17:09
fungilooks like we're using 1.8.6 on focal17:09
clarkbfungi: our servers likely pose a bigger issue they are still 1.6 iirc17:09
JayFfolks were talking about it in #lopsa -- apparently almost any change, client or server side, can trigger it17:10
fungialso using 1.8.6 on bionic17:10
clarkboh our servers are 1.8 ? I guess that is "good"17:11
fungiat least some of our mirrors are bionic and focal, not sure which ones might be xenial still17:11
clarkbI didn't thnk we had upgraded the servers17:11
clarkbfungi: the afs servers themselves17:11
fungiahh17:11
fungiyeah, afs01.dfw at least is still using 1.6.1517:12
clarkbI see that email says <1.8 might not be affected in the same way "further research needed"17:13
fungiall three fileservers are 1.6.1517:13
clarkbalso specifically calls out unauthenticated there17:13
clarkbbut I thnik our servers talk to each other authenticated?17:13
fungizuul executors are also using 1.8.6 (but on xenial)17:14
clarkbfungi: I think the 1.8 packages we install are out of our own ppa17:15
clarkbso in theory we can apply those patches and rebuild in the ppa17:15
fungiyeah17:15
fungithat's a great point17:15
clarkbJayF: fungi ya reading the earlier messages it sounds like any action like a vos * starts a new rx stack and can hit this17:16
clarkbso it isn't just if we restart services or reload the kernel module17:16
*** chateaulav has quit IRC17:16
clarkblooking at the change in gerrit it says authenticated calls will fail because it detects the mismatch between the id and the auth state17:18
*** rjcv has joined #opendev17:22
*** rjcv has quit IRC17:25
fungiunrelated to the bug fix, but...17:28
fungiinfra-root: probably the biggest impact to users while we're still waiting to get vos releases back on track is the static site content volumes... what if we temporarily patched the vhost configs on static.o.o to hit the read-write volume path instead of the read-only path? is that a terrible idea? keep in mind we're probably looking at some time next week having everything back to normal?17:28
clarkbI think we have done it in the past17:29
clarkbfungi: I don't know if doing that will trip over that bug though17:29
clarkbits possible that changing the target will cause a new rx stack to be created then we'll break there too?17:29
clarkbfungi: I'm not opposed to the idea, I think the biggest risk with it is if it trips on the afs bug. Maybe we can attemtp to test things somehow with different hosts?17:33
clarkbthough this may not be a problem until there is >1 new connection since they won't share the ids until there is >117:34
* JayF feels like he played the role of messenger of pain this morning17:34
fungiJayF: nah, appreciate the heads up17:35
*** ysandeep is now known as ysandeep|out17:35
clarkbhttps://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs is what we use to install 1.8 on the clients17:37
clarkbso I guess we can put those two patches into that then do a forced update on openafs on the openafs clients?17:38
clarkbthen figure out what it means for the servers?17:38
fungiwell, once it builds17:38
clarkbJayF: you haven't come across any more concrete info for 1.6 have you?17:39
clarkbI'm kind of reading it like it doesn't have the specifric issue but lacks randomness so can hit this in other ways17:39
JayFThe only reason I even know about this is a couple of folks (billings and ENOMAD) were talking about the problem in #lopsa17:39
clarkbJayF: thanks17:39
JayFMight not hurt to drop in there and ask billings? but I wouldn't have high expectations he has an answer17:39
clarkbya I think we can work off of what is published now then go digging when we get to that point17:40
clarkb(to avoid bugging people who are likely triaging their stuff)17:40
JayFit does sound to me like having 1.6 server doesn't solve it though; 1.8 clients can trigger it just on their own17:40
JayFbut IMBW, and I haven't worked with AFS in ~10 years17:40
fungiright, i think we need to push the fixes into our lp forks for now and then trigger ppa rebuilds17:41
clarkbfungi: ya I think that is step 017:41
fungimight be worth considering updating our servers to 1.8.6 from the ppa as well i guess17:41
clarkbreading the comments on the fixes the first fix only makes this work 50% of the time. Need the second fix to get it much better than that17:41
clarkbfungi: the reason we haven't done that is the 1.6 -> 1.8 migration requires an outage aiui17:42
clarkbbut maybe if we're gonna have an outage anyway this is our opportunity17:42
clarkbbut ya ++ to updating the ppa17:42
*** rpittau is now known as rpittau|afk17:42
clarkbfungi: the first fix ahs merged but the second has not yet17:43
clarkbbut we likely need both (not sure if we want to wait for them to merge the second)17:43
*** dtantsur is now known as dtantsur|afk18:10
*** andrewbonney has quit IRC18:17
*** hashar is now known as hasharAway18:29
*** diablo_rojo has quit IRC18:34
*** hasharAway has quit IRC18:34
*** slaweq has quit IRC18:38
*** ralonsoh has quit IRC18:41
*** diablo_rojo has joined #opendev18:43
fungiokay, lemme hydrate real quick and then i can look at ppa updates for realz18:53
fungialso ianw might have suggestions once he's around18:53
fungihe seems to have done the most recent uploads for it18:54
clarkbfwiw it looks like the second change to fix things has merged so should be good to grab that version18:54
fungioh, awesome, that was fast18:54
clarkbI've also been poking at figuring out build stuff in a container but it is slow going as I haven't done this in like 15 years18:55
clarkblearning a lot about various things like quilt :) right now I'm just trying to do a local build of the existing package source though18:55
clarkbthen will try to figure out quilt for applying the patches then rebuild. I'm not in a good spot to test this before uploading though beacuse my container is running on suse with different kernels and all that18:55
clarkball that to say fungi please don't stop looking at it too :)18:55
fungii'll get some pbuilder chroots set up for xenial, bionic and focal on my workstation but that'll need a few minutes18:56
fungicreating ubuntu chroots is its own rabbit hole19:07
clarkbI used docker which may or may not present problems too :)19:09
clarkbI'm sure if you do debian packaging this all makes immediate sense via looking at the debian/ dir in the source but what is the best way to say I want this git diff to be a quilt patch?19:10
clarkbmy local build succeeds but tests fail. I'm now going to fiddle with patches and see if I can make it do a thing19:12
fungiif you do debian packaging regularly you already have clean chroots on hand with the dev tools installed into them and your gpg keys mapped in19:18
fungimmdebstrap was throwing weird gpg errors trying to create ubuntu chroots (and it's not that i'm missing the ubuntu archive keys, those are definitely already installed) so i'm updating the packages on my workstation to rule out something being stale/behind19:19
*** whoami-rajat__ has quit IRC19:20
fungiyeesh, our node request backlog is up over 4k now19:27
clarkbya nova neutron glance etc are backed up >24 hours now19:29
clarkbI brought it up with the tc and called out a couple of the things I noticed and sounds like people are looking at it now19:29
fungiclarkb: probably easiest way to get those patches now: https://git.openafs.org/?p=openafs.git;a=patch;h=a3bc7ff1501d51ceb3b39d9caed62c530a804473 https://git.openafs.org/?p=openafs.git;a=patch;h=2c0a3901cbfcb231b7b67eb0899a3133516f33c819:30
clarkbfungi: thanks, I did manage to get them from gerrit directly eventually19:30
clarkbdumping them into the patches dir and updating the series file by hand seems to work when I run quilt push -a19:31
clarkbnow i'm looking to verify the diff between that state and the old state (but I didn't record the old state first so now learning how to revert with quilt)19:31
fungiright, that's easier than learning to use quilt push/pop19:31
fungiand apply19:31
fungiclarkb: the debdiff tool can compare two source packages19:32
fungioh, though that's only for the filenames themselves19:32
clarkbquilt pop worked for reverting19:33
fungidebdiff manpage recommends diffoscope tool for deeper comparison of packages19:33
clarkband the diff looks good. I guess now I rerun the build and see if tests pass (if they don't I'm not sure what I can do next as I suspect it may be cranky about my kernel?)19:33
fungiyeah, can't hurt, but you also might just try building the source package and pushing to lp and letting the ppa builder complain if it doesn't really build19:34
clarkbya but if it build we'll start pulling it automatically in like 10 hours?19:35
clarkbI'm just paranoid I will accidentally push an unhappy package (also I have to figure out signing still)19:36
fungiahh19:36
clarkbdo you know if lp will run those same tests on build?19:36
clarkbits rebuilding with the patches applied now so should find out if the test failures are related to that soon I guess19:37
fungii believe it will run any autotests defined in the package build19:38
clarkbcool. It also ran them automatically when I ran debuild19:39
clarkbseems likely it would run them in lp too19:39
mordredyah- the ppa builds should run the in-package tests19:40
clarkboh I bet I need to bump the package version too otherwise it won't upgrade?19:41
fungiyeah, you can use the dch command or just add an entry to the debian/changelog file19:42
clarkbalso I'm told lunch will be ready shortly so I need to pop out for that (please don't rely on me to figure this out quickest, really going through it as a learning exercise and expecting people like fungi that understand debs better to get to the end first)19:42
fungiyou may still beat me to it, my chroot creation rabbit hole has turned up what i think may be a corrupt gnupg trustdb for apt-key19:43
fungican't create any ubuntu chroots until i solve why i'm getting "gpg: [don't know]: invalid packet (ctb=2d)"19:44
fungiwhich is a marvellously clear error message, lemme tell you19:44
clarkbalright that builds successfully now and creates a ton of .debs19:48
clarkbthe versioning stuff is really confusing to me. The changelogs for the ppa package are different for xenial and focal19:52
clarkbbut only in their versions19:52
clarkbdoes this mean ianw uploaded version specific source packages for each one or is lp doing something smart?19:52
funginot sure if you can tell a ppa to backport a source package to another release, searching now19:54
mordredclarkb: usually version specific source packages19:54
mordredfungi: you can also do that in the LP user interface but there are times when it doesn't work awesomely19:55
clarkbgot it19:55
mordredwhen I've done this before I've always created release-specific source packages - usually only different in the version19:55
clarkb1.8.6-1ubuntu2 is the version number I should use ya?19:55
clarkbactually no the current version is 1.8.6-1ubuntu1~focal1 so it would be 1.8.6-1ubuntu1~focal2 ?19:55
fungiclarkb: yeah, or you could make it 1.8.6-1ubuntu2~focal119:56
fungisince you're adding patches which are not focal-specific that's probably technically more correct, but either should work19:56
ianwo/19:56
clarkbfungi: thanks19:57
clarkbianw: ohai https://lists.openafs.org/pipermail/openafs-info/2021-January/043013.html is what we are poking at19:57
fungithere's also the fact that these are built from 1.8.6-1 in debian but now debian has a 1.8.6-4 which ubuntu has imported into its archives, may be worth taking a look at the changelog for that too19:57
clarkbianw: I'm learning me a debian right now the hard way but also need to pop out to lunch19:57
ianwso we need new afs packages post haste?19:57
fungiianw: with upstream patches applied19:57
clarkbianw: yes with two upstream gerrit patches applied. Then we need to figure out how/if 1.6 on our openafs fileservers if affected19:57
funginot-yet-released19:57
ianwhrm; clients not too hard as we pull from launchpad.  the servers don't19:58
clarkbianw: ya and 1.6 doesn't have any patches yet I think. In that email I linked auristor says that investigation is necessary. One (potentially crazy) idea I threw out was the reason we haven't gone to 1.8 on the fileservers yet is that it requires a downtime iirc19:59
clarkbbut if this ends up forcing downtime maybe we do that with the patched version19:59
clarkbianw: also totally feel free to sort this out more quickly than me. I'm doing this as a good exercise to learn but doubt I'll be quickest19:59
ianw:/19:59
fungiunrelated, but also weighing the possibility of switching static.o.o vhost configs to serve from the read-write path as we're unlikely to get the read-only replicas updating regularly again before some time next week20:00
clarkbianw: fwiw it almost sounds like unauthenticated client connections on 1.6 are expected to be affected but authenticated may not be? it is possible that our servers will be fine bceause they talk to each other auth'd?20:00
ianwfungi: that seems very sane, iirc we've done that before in recovery situations20:01
clarkband its only our leaf nodes like the mirrors etc that are unauth'd20:01
fungiexecutors use auth too as they're primarily write not read20:01
fungiso static.o.o and the mirror servers are the main unauthed systems i guess20:01
clarkblooks like there may be a third patch https://gerrit.openafs.org/1449520:02
clarkbwhich has been abandoned in favor of https://gerrit.openafs.org/1449620:02
ianwfungi: that one looks abandonded20:03
ianwoh, what clarkb said :)20:03
clarkbalso people on the mailing list are reporting that 1.8.6 clients talking to 1.6 servers are failing even after being patched20:03
ianwthere's also centos to consider, but that's only for wheels20:03
clarkbbut they indicate 1.6 clients seem to be ok?20:03
clarkbit sort of does seem like 1.6 might be less affected20:04
ianwalthough we're all 1.8 clients -> 1.6 servers20:04
clarkbcorrect20:04
clarkbbut the 1.6 servers also talk to each other aiui20:04
clarkbjust pointing out that the comms between servers may end up being ok (though I have no concrete assertion for that020:04
ianwahh, although i guess that's kind of moot if no client can talk correctly to the servers :)20:05
clarkbhttps://lists.openafs.org/pipermail/openafs-info/2021-January/043015.html20:06
fungiturn back all the clocks to 2020? ;)20:06
fungii guess that won't work unless you can globally turn back the world20:06
fungiand i don't think the world wants another 202020:06
ianwmaybe i should turn the clock back an hour and go back to bed :)20:06
clarkbcan I dothat but for 5 hours?20:07
fungiianw: or forward 36 hours and start your weekend?20:07
clarkbI do think we should try and be careful about testing this if we can. Like maybe we upload to another ppa or just copy a deb around and install it?20:07
clarkbbefore we push to the normal ppa that servers will automatically update from20:07
clarkbsince that may restart services and such and then actually break us if the fix is not sufficient?20:08
clarkbof course if things properly break then it will be moot at that point anyway20:08
clarkbI have been told food is waiting for me. back in a bit20:08
auristorthere will not be patches for openafs 1.620:08
ianwclarkb: should we spin up a common ubuntu/debian vm?20:08
clarkbauristor: I didn't necessarily expect them, but it wasn't clear reading the mailing list stuff if it is suffering from the same problems20:08
clarkbauristor: as at least one person indicates a 1.6 client can talk to 1.6 servers just fine?20:09
clarkbianw: ya maybe that is a good next step20:09
ianwwe want to be a bit careful with launchpad as it likes to only keep one version of the package20:09
auristorits a related but different problem with all pre-1.8 openafs and non-AuriStorFS and non-Linux rxrpc clients20:09
clarkbauristor: and does that related but different problem affect us if only the server side of afs is 1.6 ?20:10
auristorThe 1.6 issue is not as simple as "if you restart today it won't work"20:10
ianwmnaser: i'd have to run that by infra-root, not sure we have a strict policy on keeping the old backups.  maybe 6 months or so i guess?20:10
*** lpetrut has joined #opendev20:10
auristorthe problems are always in the rx initiator.  so client connections to fileserver / vlserver; fileserver to vlserver and cache managers; volserver to volserver, etc20:11
clarkband volserver to volserver etc is where our 1.6 side of things would be using the buggy rx initiator20:12
*** zoharm has quit IRC20:12
clarkbauristor: is https://gerrit.openafs.org/#/c/14496 expected to fix https://lists.openafs.org/pipermail/openafs-info/2021-January/043015.html ?20:14
clarkbif so I guess we continue to focus on getting patched packages built, then limp along with 1.6 with its not 100% failure while we update all our 1.8 clients. Then sort out a 1.6 -> 1.8 upgrade20:14
openstackgerritJeremy Stanley proposed opendev/system-config master: Temporarily serve static sites from AFS R+W vols  https://review.opendev.org/c/opendev/system-config/+/77085620:15
openstackgerritJeremy Stanley proposed opendev/system-config master: Revert "Temporarily serve static sites from AFS R+W vols"  https://review.opendev.org/c/opendev/system-config/+/77085720:15
ianwi'm just spinning up a common server to work on now20:17
clarkbthanks. I need to pop out before my lunch gets cold20:17
clarkbtrying to start a new local build with that third patch just to get to see it passes testing20:18
fungithe good news is the fixes stack on the existing quilt patches in the source package without conflict and pass the autopkgtests20:18
ianwnp, do that :)20:18
fungiat least the first two merged fixes do anyway20:18
clarkbyup the third applies cleanly too20:19
*** slaweq has joined #opendev20:19
clarkbI just started a build with the third and will go eat now20:20
clarkb(that will tell us if the tests pass)20:20
auristorpretty much try not to restart any openafs clients or servers (except for patching 1.8) from now to the end of the month20:21
ianwroot@10.209.39.22620:21
ianwwe don't tend to restart the afs servers ... but when it does happen it's usually not our choice and something wrong on the cloud provider side20:22
fungiyup, so far it's been last week the provider decided they needed to live-migrate one server and it hung the kernel such that a hard reboot was the only option, then earlier this week they had a problem with a system serving one of the iscsi volumes which make up the lvm underlying our /vicepa on another afs server and i had to restart it to fsck the fs mounted writeable again20:26
fungiit's not been a good couple of weeks stability-wise for that segment of our services20:27
*** slaweq has quit IRC20:27
fungiinfra-root: anybody else feel strongly in favor or against switching to serving static site content from the rw path temporarily? https://review.opendev.org/77085620:29
clarkbI guess not20:34
*** sgw has joined #opendev20:35
clarkbianw: thats a 10/8 address :)20:36
ianwok, how about 104.239.144.149 :)20:37
funginow that i can reach20:37
clarkbI can hit it and need to reload my keys. Will do that once lunch is more properly finished20:38
ianwi'm just putting 14491/2 in the patches; so far this is nothing unique over what anyone else has done right?20:38
ianwi.e. we know that works20:38
clarkbianw: correct. Lookls like my 14496 build succeeded locally20:39
clarkbrunning the build without these patches failed for me locally20:39
clarkbmy rough setup was pull the source and deps, build clean, that failed. Apply patches, rebuild that succeeded20:40
fungiianw: in debian/patches and list them in debian/patches/serial20:45
fungito make sure quilt will apply them20:45
ianwyep, i'm running a build on that host in a screen too, just for sanity20:45
fungiand then dch to add a new build version to the debian/changelog20:45
ianwdpkg-source: info: applying 0011-14491.patch20:45
ianwdpkg-source: info: applying 0012-14492.patch20:45
ianwagre they apply clean20:45
ianwfungi: there we differ, i tend to use the emacs mode :)20:46
fungiianw: sure, whatever works20:47
fungithough if you set VISUAL or EDITOR to emacs then dch will respect that too20:48
ianwso yeah, the only trick is the source needs to be signed by a key accepted by launchpad, and you do need to upload for each release20:48
ianwthere's no clicky button to build for different releases20:48
*** hashar has joined #opendev20:54
ianwlooks like kaduk has pushed 1.8.6-5 to debian20:56
ianwand there's talk of making 1.8.7 release20:56
clarkbianw: fungi before we push to lp we can install our built package on that server and then check it works?20:58
clarkbthen if it works figure out signing and push to lp?20:58
ianwand jsbillings has already done rpm's @ https://copr.fedorainfracloud.org/coprs/jsbillings/openafs/package/openafs/ which we can import20:59
clarkbianw: we should make sure that they include the third fix too if we start using the upstream packaging stuff21:00
clarkbat least I had completely missed hte third change until very recently21:00
ianwthat being 14496?21:02
clarkbyes21:02
ianwhttps://gerrit.openafs.org/#/q/status:merged+project:openafs+branch:openafs-stable-1_8_x now has the backports of those 3 too21:02
ianwi didn't have 14496 on the server, rebuilding now21:05
ianw(the server being 104.239.144.149)21:06
*** lpetrut has quit IRC21:07
fungiclarkb: if you built binary packages from your patched source package, then yes you should be able to `apt install ./somepackage.deb ./otherpackage.deb` or use dpkg -i to do it for that matter21:09
clarkbfungi: yes, except I'm in a ubuntu container on suse and my kernel is too new I think21:09
fungiahh21:09
clarkbbut on ianw's server we should be able to do that21:09
fungiyeah, that will also exercise the dkms bits21:10
*** Alex_Gaynor has left #opendev21:10
ianwreally need to add --parallel to the rules :/21:10
clarkbianw: slight nit, I think fungi said the mroe correct packge version was 1.8.6-1ubuntu2~focal1 not 1.8.6-1ubuntu1~focal221:10
clarkbbut both should work for now21:11
ianwyeah i'm fairly tempted to take the 1.8.6-5 packages21:13
fungiagreed, we could in theory just nab them from debian/sid21:14
fungionce they land in the archive anyway21:14
ianwwe've only kept our own ppa because a) for a while there packages were way behind and we had much more bespoke work and b) iirc for whatever reason they weren't building for arm6421:14
fungi1.8.6-5 is unlikely to end up in xenial at all, and may see a lengthy delay getting to bionic21:15
clarkbwhen you say take teh 1.8.6-5 you mean backport them to xenial and bionic from whatever is at 1.8.6-5?21:15
clarkbfungi: ok that answers some of my question21:15
fungibut we can grab the source packages and stuff them into our ppa, right21:15
ianwclarkb: yeah, basically stuff those packages into our ppa21:15
fungieven landing in focal proper will take some time because ubuntu is trickling those packages in from debian and probably not with significant urgency21:16
ianwi imagine it just has a few more of those patches from the 1.8 stable branch21:16
fungiyes, it will be just more stuff in the debian/patches dir21:16
ianw... which also all might be moot if there's a 1.8.7 release with all htem too21:17
clarkbhttps://salsa.debian.org/debian/openafs/-/blob/master/debian/patches/0014-Remove-overflow-check-from-update_nextCid.patch seems to show that debian's package has all three fixes in 1.8.6-5. The 0013 and 0012 patches are the other two21:17
clarkbso ya that would probably work too assuming it also compiles for xenial and bionic and focal21:17
clarkb(and I guess arm64)21:18
ianwit *should* if no new build-deps, since we've been ok with all the prior versions21:18
ianwok, the .debs on 104.239.144.149 have all three patches.  we can install them and try if we like21:19
ianwwhat i'm still not clear on is if our 1.6 servers are just now broken and can't be fixed21:19
clarkbianw: should we install openafs-client on 104.239.144.149 first and see if we can navigate /afs/ ?21:19
fungisince they're keeping the packaging in git on salsa, you can probably just look at the history there to see if anything about the package itself has changed other than the quilt patches21:20
clarkbianw: it sounds like 1.6 servers are broken in a similar way but not exactly the same and that means they aren't 100% broken like 1.8 is21:20
clarkbianw: and if we don't restart 1.6 servers we'll maybe be ok?21:20
clarkbfungi: https://salsa.debian.org/debian/openafs/-/commit/be72605900a4820ce613a3c3b2bce372a203d2c6 no thats it21:20
clarkbat least in the last 2 months21:21
fungitesting against our servers at the moment may be intractable because they're overrun with vos release activity in progress from yesterday and not accepting new rpc calls21:21
clarkbthere are other updates in the debain package we don't have in our ppa21:21
clarkbfungi: well they should be able to do reads right? or are those slots also full up?21:21
fungiyou should be able to test that you can reach files, right21:21
fungii assumed you meant testing vos commands21:22
ianwbuilding modules now21:22
clarkbya I guess I don't actually know enough about afs to know how to sufficiently test this21:22
*** hamalq has joined #opendev21:22
clarkbthe example on the mailing list seem to be primarily vos * commands21:23
clarkbianw: oh also at least for 1.8 it seems to be based on server start time21:24
clarkbianw: so 1.6 may not actually exhibit any problems until we restart21:24
clarkband then we'd be completely hosed potentially21:24
ianwi think that's fine, but yeah, the restart may not be our choice and recent stability hasn't been reassuring in that regard21:25
fungilooks like the other major changes in the newer package version are support for more recent linux kernels (5.8, 5.9)21:26
*** hashar has quit IRC21:26
fungiso yeah, honestly i would just clone https://salsa.debian.org/debian/openafs/ and debuild from that21:29
*** jrosser has quit IRC21:29
*** ildikov has quit IRC21:30
clarkblsmod show openafs on ianw's server now21:30
ianw# ls /afs/openstack.org/21:30
ianwdeveloper-docs  docs  docs-old  mirror  project  service  user21:30
ianwis promising21:30
*** ildikov has joined #opendev21:31
*** jrosser has joined #opendev21:32
ianwa "find . -type f -exec md5sum {} \;" stress test is seeming good21:32
clarkbdo we try to do some more admin type commands next?21:32
clarkbthough as noted before those might hang because of full up slots for all those releases?21:32
ianwalso i haven't setup the kerberosy things21:33
clarkbI think we document how to do that via command line switches without setting up default domains and all that21:33
clarkbbut ya it seems like at least for reads this is working (which is going to be what the mirrors and static are all doing, zuul will do writes too)21:34
*** fp4 has quit IRC21:37
ianwok, i have 1.8.6-5 focal/bionic/xenial packages i can upload to the ppa to build if we like21:40
clarkbianw: do we want to build the focal one on your test serverfirst? or treat the testing of the three patches we've already done as good enough?21:41
clarkbbut I agree using 1.8.6-5 seems like a good option rather than maintaining an older fork21:41
ianwthe only two extra patches that has over what we just tested are for 5.8 & 5.9 kernels21:42
clarkbprobably pretty safe to push as is then21:42
ianwsorry there is also one other for afsmonitor, atool i don't think we use21:43
ianwif nobody has any other opinions, i'm happy to dput these and get the ppa buliding them21:47
clarkbI think that sounds like a reasonable next step. fungi ^21:47
*** diablo_rojo has quit IRC21:48
clarkblooks like jproulx was hit by this too22:00
clarkbmaybe jproulx has hints on the 1.6 -> 1.8 upgrade22:00
*** fp4 has joined #opendev22:01
mnaserdoes anyone know if there's some sort of setting i'm missing, our ci is not reporting to zuul-jobs,  ijust generated a new set of http credentials and im getting a 403 when its reporting22:21
mnaserhttps://www.irccloud.com/pastebin/n3ZCIyP5/22:22
mnaserssh streaming work cause it does get enqueud22:22
clarkbmnaser: basic auth was the only thing we changed for opendev zuul after the gerrit upgrade22:23
mnaseryeah thats why i changed auth_type=basic22:23
clarkbmnaser: however we just dropped the digest setting and let it default to basic22:23
clarkbmaybe explicitly setting it to basic doesn't work?22:23
mnaserok, dropping the basic and trying again22:24
fungisorry, dinner pulled me away, catching back up22:33
fungiianw: clarkb: dput them, yes please22:33
clarkbfungi: I think ianw just did that22:34
ianwyep, hitting some backport package errors with debhelerp-compat packages22:37
clarkbianw: did you only push xenial? I wonder if the other two have the right compat level22:37
ianwyeah, i've pushed bionic and we'll see if that works22:38
ianwsbuild-build-depends-openafs-dummy : Depends: debhelper-compat (= 12)22:39
ianwnope on bionic.  i feel like i just dropped this previously and they built ok anyway22:39
clarkbianw: ya looking at the ppa debian/control files there doesn't seem to be a debhelper-compat listed but the debian/control file in the upstream salsa repo has it22:41
clarkb(so I assume that means you dropped it in those other ones)22:41
*** DSpider has quit IRC22:42
clarkbhttps://manpages.debian.org/testing/debhelper/debhelper.7.en.html#Supported_compatibility_levels has more info22:42
fungigood point, dh-compat needs to list a dh version available in the target distro release22:43
ianwi think i hacked it back to 9 and it "just worked"22:43
clarkbv9 seems to be what xenail has if you want to just switch it to 922:43
ianwi also think i very stupidly didn't note that in the changelog22:43
fungiit probably will if the package isn't relying on newer dh features22:43
clarkbianw: debhelper (>= 9.20160114~) is what I see in the xenial and focal control files from our ppa22:44
ianwyep, that is what i put in22:44
clarkbfungi: it must because it worked before an sounds like the delta between our current package and the upstream package is all in the patching side?22:45
clarkbor maybe people were only looking at the patch dir?22:45
fungiclarkb: yes, i agree with that logic22:45
ianwok, focal looks like it's building ok22:46
*** fp4 has quit IRC22:46
fungigit diff --stat debian/1.8.6-1..HEAD22:47
fungiStandards-Version was increased from 4.1.3 to 4.5.0 in debian/control22:48
fungisome make var assignment methods were tweaked in debian/rules22:49
fungithe debian/watch file was updated to track upstream source via https instead of http22:50
fungiotherwise, just more quilt patches22:50
fungii don't see any changes which would have impacted debhelper use22:50
clarkbcool22:51
ianwi just have to wait a bit for the old failed ones to delete22:51
ianwwell that's brilliant22:54
ianwdeleting the failed builds appears to have also removed the prior good xenial and bionic builds22:54
ianwno sorry, i see.  the old builds have been superseded by the failed builds22:55
fungigood news! i got a response from a vos status call finally22:56
clarkbianw: I don't see xenial and bionic at https://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs/+packages anymore fwiw22:56
clarkbthey are still at http://ppa.launchpad.net/openstack-ci-core/openafs/ubuntu/pool/main/o/openafs/ though22:56
clarkbso maybe a weird UI thing?22:56
ianwclarkb: yeah, they're not listed as their status is "Superseded"22:57
ianwi should probably upload xenial/bionic as "2" with the debhelper revert explicitly22:58
clarkbwfm22:58
*** redrobot1 has joined #opendev22:58
*** redrobot has quit IRC23:02
*** redrobot1 is now known as redrobot23:02
clarkbianw: looks like bionic builds just started23:05
ianwyep, so that 2 package just has the debhelper manually reverted to 9, no other changes23:06
fungisounds fine to me23:09
ianwgosh i will be happy to have xenial gone23:12
ianwbtw we have 96gb of ram available in the rax ci tenant23:12
fungigood to know23:12
fungisounds like enough for a gerrit replacement in that case23:13
ianwyeah, enough for a 60g replacement for a little23:13
ianwwe've got ticks on the bionic x86-64/i386 packages; that's good23:25
ianwfungi: is it reliably returning23:26
ianwfungi: thinking about upgrades for server -- i think we're in a pickle enough to manually get to 1.8.6 and then prioritise actual ansible etc for replacement non-xenial servers23:27
fungiianw: no, i got a call in between some vos release completing and another kicking off, i think23:27
ianwmy thought was the best thing to do is to probably vos dump the important volumes before attempting such a thing23:27
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525823:28
fungiso i went and held locks for all the remaining mirror updates now to hopefully prevent any additional vos release calls which are waiting from actually getting serviced, not sure if it will help23:28
ianwthe other options seem to be a openstack-side snapshot of the volumes attached to the server, or posix-level rsync-type copies of vicepa somewhere23:28
fungiwe could add another cinder volume as a pv in that vg and then make an lvm snapshot23:29
fungiwe probably don't have sufficient available extents on the current pvs to comfortably snapshot the volume23:30
fungisince lvm snapshots are essentially cow we shouldn't need enough room for a full extra copy that way23:30
ianwyeah, i think we waste a lot of time though backing up the mirrors23:30
clarkbya the mirrors are like 95% of the disk use23:31
fungilvm snapshot is instantaneous23:31
fungi(effectively)23:31
clarkbah til23:31
ianwoh good point, yeah cow23:31
ianwi mentioned in this in openafs23:32
clarkbthose servers appaer to still be puppeted fwiw23:32
ianwi've been pointed at akeyconvert23:32
fungisee manpage for lvcreate if you're interested in details23:32
ianwyes, we'll need to ansiblise this.  it feels like we should do that and recrate them as focal nodes23:32
clarkband puppet == xenial for us23:32
clarkbianw: note I believe we did that last server upgrades in place via do-upgrade or whatever the ubuntu command is because afs doesn't like ip addresses changing23:33
clarkbthat said I agree getting all the way up to focal for a lot of things sounds like a great idea :)23:33
fungiour kerberos servers also probably fall into the category of related infrastructure we'll want to upgrade around the same time23:34
ianwhttp://manpages.ubuntu.com/manpages/bionic/man8/akeyconvert.8.html23:34
ianwThe akeyconvert command is used when upgrading an AFS cell from the 1.6.x release series23:34
ianw       to the 1.8.x release series.23:34
clarkbone thing to keep in mind is that bionic is supposedly getting 10 yaers of support. I don't know if that is true for focal too23:37
clarkbbut if not it may make sense to consider if bionic and sit in place for the remaining 7 years or whatever it is is a better option23:38
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525823:39
ianwyeah, if we're going to replace i'd say focal -- i mean already the packages don't build on bionic23:39
clarkbthats a good point23:39
clarkblooks like all LTS are 10 years now23:41
clarkbsorry not all23:41
clarkbbionic and newer23:41
clarkbianw: fungi I think the packages are all done now except for arm6423:44
clarkbdo we want to install the new package on a node and see if it works?23:44
ianwwe can upgrade a mirror node, yep23:45
clarkbnote the ml seems to indicate that some sort of forceful restart of things may be required (lots of people were just doing reboots)23:46
ianwstarting to track things in https://etherpad.opendev.org/p/infra-openafs-1.823:47
*** sshnaidm|ruck is now known as sshnaidm|afk23:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!