Wednesday, 2020-09-30

fungithe coverage tomorrow will be cringeworthy enough i'm certain, the real thing might contort my face into a permanent cringe00:00
ianwi won't do too much effort on automatic queue saving/restore etc because i imagine zuul v4 will invalidate all that00:07
ianwthere's still a few things coming in to the old graphite server00:13
ianwhaproxy; i think we can just restart the statsd thing there00:13
fungiclarkb: on the review-test plan... what's the reasoning there for populating the db before copying git repos? you mention making sure the repos have the expected refs, but i don't follow00:18
fungiif gerrit isn't started yet, then the ordering of sourcing the db and copying the git repos should be irrelevant00:19
fungiit's not like git is going to talk to mysql on its own, or vice versa, so i feel like i'm misunderstanding something there00:19
clarkbfungi: its the order we copy from prod00:20
clarkbsince we dont want to shutdown prod00:20
fungioh, making sure the git repos we copy are at least as recent as the database snapshot00:20
fungieasy solution there, grab the most recent db backup dump00:20
fungibut yeah, so it's not the order in which we import them which matters, but the order in which we retrieve them00:21|g00:21
ianwinteresting ... i guess someone has copied something and is trying to send us their stats00:22
ianwok restarted logstash and firehose.  other than than a couple of random things trying to send us haproxy and zuul stats, i'm not seeing anything else come into the old server00:29
openstackgerritIan Wienand proposed opendev/system-config master: Cleanup graphite01
openstackgerritIan Wienand proposed opendev/system-config master: doc: update Zuul restart instructions
openstackgerritIan Wienand proposed opendev/system-config master: Cleanup graphite01
openstackgerritIan Wienand proposed opendev/system-config master: doc: update Zuul restart instructions
openstackgerritIan Wienand proposed opendev/project-config master: [dnm] scripts to cleanup
openstackgerritIan Wienand proposed opendev/system-config master: tarballs: Add rewrite rules for tenant moves
openstackgerritwu.shiming proposed openstack/diskimage-builder master: Use unittest.mock instead of mock
openstackgerritAndreas Jaeger proposed openstack/project-config master: Some fixes for zuul tenant
openstackgerritzbr proposed zuul/zuul-jobs master: tox: allow tox to be upgraded
openstackgerritzbr proposed zuul/zuul-jobs master: tox: allow tox to be upgraded
openstackgerritzbr proposed zuul/zuul-jobs master: tox: allow tox to be upgraded
*** ysandeep is now known as ysandeep|afk11:11
*** dtantsur|afk is now known as dtantsur11:11
openstackgerritzbr proposed zuul/zuul-jobs master: tox: allow tox to be upgraded
*** ysandeep|afk is now known as ysandeep11:51
openstackgerritzbr proposed zuul/zuul-jobs master: tox: allow tox to be upgraded
*** roman_g has joined #opendev12:56
AJaegerconfig-core, please review
openstackgerritMerged openstack/project-config master: Complete retirement of x/osops-* repos
AJaegerclarkb: for 742732, see corvus comment on #zuul - let's not merge that one yet15:28
clarkbyou mean my question?15:29
AJaegerclarkb: oh, that was yours - sorry.15:29
* AJaeger should not multitask ;(15:29
clarkbno worries. Mostly just trying to do enough review that chancse of us breaking all our jobs when that merges are low :)15:30
clarkbI think I understand the ansible properly but would be good for others to double check that15:30
openstackgerritMerged zuul/zuul-jobs master: shake-build: add shake_target variable
AJaegerinfra-root,  infra-prod-manage-projects failed, see
clarkbgitea07 failed with a 500 error15:52
clarkbwe seem to still be running the plays against review.o.o though15:53
clarkbAJaeger: is set to read only so I think the only place we are out of sync is gitea0715:54
clarkbgitea07 looks generally healthy15:55
clarkb127.0.0.1:45468 - root [30/Sep/2020:15:37:34 +0000] "PATCH /api/v1/repos/openstack/pbr HTTP/1.1" 500 12238 "\" \"python-requests/2.23.0" <- it was trying to update the description15:57
clarkbwe can suppress http errors on description updates probably15:58
clarkbthe next pass will get them15:58
AJaegerGood - thanks15:58
clarkbI'm trying to get gitea server logs now15:58
clarkb2020-09-30T15:37:34.801986920Z 2020/09/30 15:37:34 ...ules/context/repo.go:792:func1() [E] GetCommitsCount: Unsupported cached value type: <nil>15:59
clarkbI think that points to a gitea bug16:00
openstackgerritDrew Walters proposed opendev/irc-meetings master: Update Airship meeting details
clarkbI need to eat something but I'll see if I can find anything like that on the gitea bug tracker and if not file an issue16:00
clarkbwe can also rerun manage-projects for gitea07 to sync up the descriptions on those osops repos or just let the next pass do it for us16:01
clarkbseparately if anyone can take a look at the manage-projects playbook and figure out why when gitea fails we still run on gerrit that would be great16:01
clarkbI don't think we expect that behavior16:01
AJaegerlet the next run - and check that it passes?16:02
clarkbAJaeger: ya16:02
AJaegerclarkb: enjoy your meal - I'm out for a bit now as well...16:02
clarkbhrm also any idea why that isn't a full traceback in the log?16:03
clarkb seems to be the full log there but I would've expected a traceback too16:04
clarkbSTACKTRACE_LEVEL maybe16:05
clarkbanyway I need food16:05
clarkbdtantsur: no, you can watch its event stream though16:42
clarkbvia ssh16:42
dtantsurokay, got it. I'll pass that on.16:42
clarkbthe now deprecated checks api plugin which was added to newer gerrit operated on a poll system over http but even that wasnt web hooks16:43
dtantsurcontext: I'd love metal3 (BM provisioning for k8s) and/or OpenShift to run a 3rd party CI on ironic16:43
fungiwe do also have a daemon you can run to convert the gerrit event stream into an mqtt publisher, if you have such a bus you want to inject the events into16:46
fungiand the underlying gerritlib library it relies on can be used from any python program too:
dtantsurI'm afraid we're talking about go programs :)16:47
dtantsurokay, so how terrible does this sound: define a normal zuul job that triggers $something on that side and waits for results, presenting them in a nice format?16:48
fungioh, then in that case i bet there's something... go and gerrit are both google projects after all16:48
dtantsurI'm a bit cautions about assuming inter-project compatibility in large companies :)16:48
clarkbdtantsur: having a zuul job trigger something is doable16:55
clarkbanother option would be to just test stuff on our side16:55
clarkbits all open source with openshift right?16:55
clarkbthough last I looked at their install docs its potentially painful to set up16:56
dtantsurclarkb: yep, it's open source and yep, it may be a big pain to reproduce and then maintain16:56
fungiyeah, it looked to me like openshift is the operating system in recent incarnations16:56
fungiyou no longer run openshift *on* some operating system, you run openshift *as* the operating system16:57
*** eolivare has quit IRC16:57
fungiwhich implies at best a reboot/kexec after overwriting the filesystem on test nodes16:57
dtantsurironic integration works by writing an RHCOS whole-disk image16:58
dtantsurwhich is what openshift is based on16:58
dtantsurI'm quite sure no kexec is involved (although.. we in ironic have been thinking of it for a while)16:58
fungiahh, so you could use a centos kernel but put rhcos in a chroot?16:59
fungino need to reboot onto an rhcos kernel16:59
dtantsurno, we write the rhcos image right away, no centos or traditional rhel involved17:00
fungithat's what i meant though, if you're starting *from* an operating system which is not rhcos, you need to replace the operating system17:00
dtantsurah, well, okay. yeah, as usual with ironic: whatever was there before gets wiped17:00
fungithinking in terms of a server into which you only have ssh access17:00
fungino ipmi17:00
dtantsuror another potential source of confusion: there must be a bootstrap node with a traditional OS17:01
clarkbright fungi is talking from the side of if we wanted to test this outselves17:01
clarkband that either involves very slow nested VMs or overwriting an existing VM17:01
dtantsurif we wanted to test this ourselves, we would need something like TripleO's OVB17:01
fungishort of having nodepool boot rhcos images and enroll those into zuul as job nodes17:01
dtantsurnested VMs are out of question. once I've tried a 64G VM with 8 vCPUs, and it wasn't enough.17:02
fungiyeah, we're talking about double-nesting there essentially. that would be ugly17:02
fungialmost but maybe not quite as painful as devstack-on-devstack-on-a-vm17:03
dtantsurit's 100% the same problem TripleO has17:04
dtantsurthey also need to install heavy software into something provisioned by ironic. hence OVB.17:04
openstackgerritClark Boylan proposed opendev/system-config master: Record stacktraces when logging errors in gitea
clarkbinfra-root ^ I think we want that to catch one of these gitea http 500s in the log with a stack trace before filing a bug17:06
clarkbthere just isn't much info to file a bug with current logs17:06
clarkbdtantsur: another appraoch with problems liek this is tothink about the interfaces between software17:06
clarkbdtantsur: if the idea is to test that ironic can deploy a coreos image we can do that with nested VMs without too much difficulty17:07
clarkbthe problem becomes if you also want to ensure that openshift runs on top of coreos but that is decoupled from ironic17:07
clarkb(because openshift in qemu will be slow)17:07
dtantsurto be more precise, we want to check that ironic is not breaking metal3.17:08
dtantsuras step #1 we could do it without installing the actual openshift17:08
dtantsurbut this is only how master nodes are installed. worker nodes are installed by ironic that is inside master nodes and hence works slightly different.17:08
clarkbdtantsur: in that case my suggestion would be to test it with k8s instead as I think that simplifies things a lot17:10
dtantsurnot sure what exactly you mean by "test with k8s"17:11
clarkbdtantsur: metal3 is not openshift specific. I would test the integration between ironic and metal3 with kubernetes because it eliminates a lot of the problems with nested virt17:12
clarkbbut then you can test the ironic bootstraps openshift/k8s via "can we deploy coreos" job/test and the can metal3 running in existing k8s deploy baremetal using faked out baremetal that don't need to be performant in another job/test17:14
clarkbwithout ever needing to properly solve the openshift deployment problem17:14
*** andrewbonney has quit IRC17:14
dtantsurI don't think the problem boils down to "can we deploy coreos", but rather to cover the exact usage scenario metal3 has.17:15
dtantsuranyway, it's becoming a bit later here17:15
dtantsurthank you for your input, much appreciated. I'll think about it and likely come with more questions next week17:16
openstackgerritClark Boylan proposed opendev/system-config master: Make gitea description update failures nonfatal
clarkbdtantsur: I'm sure I have the details wrong, I just want to point out that with testing we can often work around limitations by limiting the problem space. OpenStack in particular has historically been very bad at this relying on massive integration testing to catch bugs (where they are hard to debug)17:17
clarkbAJaeger: infra-root are the two followups to the faile dmanagement project run that I think we should get in17:18
dtantsuryep, I get your point. I just need to think how exactly it could be applied to this problem.17:18
*** hamalq has joined #opendev17:52
clarkbfungi: I've attached a 256GB SSD Volume to review-test to match prod18:15
clarkbI think it is /dev/xvdb but blkid (and other /dev structures) are clueless about it18:16
clarkbfungi: is this a case where I have to force a rescan of devices by udev or similar?18:16
clarkb(figure you may know whats up after your volume rotations that happened recently)18:16
clarkbmostly I don't want to overwrite xvdb if it is something else18:16
fungii'll take a look18:17
fungi[Wed Sep 30 18:14:18 2020] blkfront: xvdb: barrier: enabled; persistent grants: disabled; indirect descriptors: disabled;18:18
fungithat's at the end of dmesg output18:18
fungialso `sudo fdisk -l /dev/xvdb` describes it as "Disk /dev/xvdb: 256 GiB, 274877906944 bytes, 536870912 sectors" with no partition table18:19
fungiif you attached it roughly 5-6 minutes ago, then it's safe to say that's it18:19
openstackgerritClark Boylan proposed opendev/system-config master: Remove review-test from our inventory
clarkbyup that should be it thanks18:20
clarkbalso I think we'll want ^ to be in before we start the movement of data and upgrades in order to avoid ansible fighting us over config contents (docker-compose or even review_site content)18:20
fungiyou should be able to use parted to partition it, then pvcreate the primary partition block device, vgcreate main with it, then lvcreate a volume in the main vg and format that18:21
clarkbfungi: ya I tend to go off of what our bootstrapping scripts do18:21
clarkbI'll work on that next but before we move any data around I think we want to remove it from the inventory18:21
fungiwe've also documented the rough process at with cut-n-paste command examples18:22
clarkbalso thinking out luod here it has occurred to me that mordred may have already done the db and git stuff but I guess getting an up to date copy ensure that we aren't missing something newer that would pose problems18:24
clarkbreally getting the review_site on the ssd cinder volume seems like the big thing for reliable timing data18:25
clarkbfungi: ok I've done the lvm dance, created an ext4 fs and mounted it manually to /mnt18:37
clarkbfungi: if that looks good to you my next step is to wait for to land then move /home/gerrit2's contents into /mnt, update fstab, remount (reboot too probably), and go form there18:38
clarkbhrm looking at sync-to-review-test.yaml maybe we don't want to remove review-test from inventory? except its in the gerrit group so it will get things done to it :?18:40
clarkbmordred: if you're around at all was sync-to-review-test ever run? I'm curious if that is known to work yet18:41
mordredclarkb: no - it was not run yet - I think it stalled out at "let's look at it really closely and make sure it's not going to nuke anything before running it"18:41
mordredI'm PRETTY sure it won't delete things - but, you know, deleting things is bad18:42
clarkbfwiw it does seem like something was synced. I'm guessing that was manual at some point?18:43
fungi/dev/mapper/main-test--gerrit  252G   60M  252G   1% /mnt18:43
fungiyeah, lgtm18:43
clarkbmy concern with the host being in inventory right now is I want to mv /home/gerrit2 contents onto another fs and don't want an ansible run to recreate things in the old fs while that is running18:44
clarkbfor that maybe just putting it in emergency for now is fine?18:44
clarkbthen similarly when we start doing the upgrades we don't want the docker-compose file to be overwritten under us and attempt a downgrade18:45
fungiyes, adding it to the emergency list while you're copying/swapping should do what you need18:47
openstackgerritDrew Walters proposed opendev/irc-meetings master: Update Airship meeting details
clarkbok added to the emergency file18:48
clarkbok I think fs and mount surgery is all done now18:55
clarkbfungi: ^ can you take a look and double check that it looks correct to you?18:55
clarkb(I didn't reboot just did a mount -a_18:55
fungiyeah, looking19:08
fungiclarkb: looks like the stuff i would expect is in there and the rootfs utilization is small, so presumably you moved it all. however you'll want to update the ownership of the homedir itself19:10
fungiownership and permissions are taken from the root of that filesystem rather than the directory inode onto which they were mounted19:11
clarkbcool thats one thing on the gerrit todo list done. Going to do zuul zk reviews now since I keep intending to help with those. Next step on the gerrit side is to checkout the db setup and restore a db backup there19:23
fungiyep, lgtm now19:23
clarkbalso ianw's find in zuul stack19:26
*** tosky has joined #opendev19:32
*** roman_g has joined #opendev20:40
clarkbdoing more digging on review-test, its db does indeed appear to be distinct, you can tell because therea are no tables in its two gerrit databases21:18
fungishould also be able to compare the db server hostname?21:22
clarkbfungi: yup they were differnet, I was just also checking if it had a previous sync (and it does not)21:24
clarkbmaybe we can do a meetpad call and a block of time for shared root screen and sync the db and the git repos?21:25
clarkb(not great right now as we're trying to get kids to sleep after mega meltdowns)21:25
clarkband mostly that will be to ease the scary factor :)21:27
ianwfungi: was the redirects I came up with22:12
ianwfungi: that would run after
cmurphydevstack-single-node-opensuse-15 nodeset?22:41
clarkbcmurphy: we don't have opensuse 15.1 anymore at all22:48
clarkbthe decision was made with the 15.0 -> 15.1 transition to stop trying to carry each point release (I think beacuse like centos you're expected to roll forward for continued support?) and instead we just do "15" on whatever is current22:49
ianwit looks like it's probably just the match there, it's unlikely to actually be broken?22:49
cmurphyum okay, so my only choice is to get devstack to run on 15.2 for all stable branches?22:50
clarkband yup confirmed 15.1 is only supported until november (longer than I expected but not as long as 15.current)22:50
cmurphyianw: i hope so i guess?22:50
ianwwe should probably drop the point-check and make it a "15" match22:50
ianwthat would be the same as centos22:50
clarkbfwiw I got devstack master running on 15.222:50
clarkblooking at hte failure now maybe we just need to backport that change22:50
clarkbya I think that is what we should do22:51
ianwyeah we can either backport or put in a new 15 only match, and backport that22:53
clarkbianw: since there was a small code change too probably just the 15 match isn't ideal?22:53
clarkbthough we only match centos-7 I guess22:54
ianwyeah how often is a point release?  it just means we need to do the same thing in the future22:54
clarkblooks like about once a yaer22:54
cmurphyabout once a year22:55
clarkb"A Leap Minor Release (42.1, 42.2, etc.) is expected to be released annually. Users are expected to upgrade to the latest minor release within 6 months of its availability, leading to a maintenance life cycle of 18 months. " <- is the support message fwiw and that is why we dropped the differentiation on the label aiui22:55
clarkbI'll push up a backport to ussuri now22:55
clarkbany idea how far back I should push that for/22:55
ianwi mean ... all the way i guess?22:56
cmurphywe're using it through stein22:57
ianwlgtm, we can just approve that.  hopefully it means cmurphy doesn't have to do anything (but wait) :)23:00
cmurphytrying it out with depends-on23:01
clarkbbackporting is fun because things change in lvm I guess23:02
clarkbI'm working on train now and will keep going backward23:02
clarkband confirmed that rocky specified 42.3 support so I think thats it23:08
cmurphythanks clarkb23:08
openstackgerritsean mooney proposed openstack/diskimage-builder master: [WIP] add alpine element
openstackgerritsean mooney proposed openstack/diskimage-builder master: [WIP] add apk element
openstackgerritsean mooney proposed openstack/diskimage-builder master: [WIP] bootloader-alpine-support
openstackgerritsean mooney proposed openstack/diskimage-builder master: [WIP] openssh-server-alpine-support
openstackgerritsean mooney proposed openstack/diskimage-builder master: [WIP] simple-init-alpine-support

