Wednesday, 2020-06-10

openstackgerrit	Merged openstack/project-config master: Wheel publish jobs: include system-config roles https://review.opendev.org/734704	00:03
openstackgerrit	Merged openstack/project-config master: Be more explicit about using python3 to run tools/ https://review.opendev.org/734393	00:05
openstackgerrit	Merged openstack/project-config master: Add Backport-Candidate label for Kolla deliverables https://review.opendev.org/733243	00:05
*** cloudnull has quit IRC		00:23
*** cloudnull has joined #opendev		00:24
*** Meiyan has joined #opendev		01:02
*** xiaolin has joined #opendev		01:07
ianw	clarkb: hrm, i think i'm hitting "If a child job inherits from a parent which defines a pre and post playbook, then the pre and post playbooks it inherits from the parent job will run only with the roles that were defined on the parent."	01:14
ianw	i guess i need to add the system-config roles to https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L1160 ; even though they're only used by the publish	01:15
*** xiaolin has quit IRC		01:22
*** xiaolin has joined #opendev		01:28
*** xiaolin has quit IRC		01:33
*** mlavalle has quit IRC		02:01
*** xiaolin has joined #opendev		03:16
*** xiaolin has quit IRC		03:20
openstackgerrit	Ian Wienand proposed openstack/project-config master: Revert "Wheel publish jobs: include system-config roles" https://review.opendev.org/734739	03:40
*** ykarel\|away is now known as ykarel		04:11
ianw	/afs/.openstack.org/mirror/wheel/debian-10-x86_64 : Connection timed out	04:25
ianw	gosh darn it ...	04:25
ianw	i dunno what the heck is up with those volumes http://paste.openstack.org/show/794549/	04:33
*** sgw has quit IRC		04:34
ianw	clarkb/fungi/corvus: ^ maybe you could take a bit to have a look at why these volumes appear corrupt on the executors? i'm a bit brain-dead on it now	05:08
openstackgerrit	Merged openstack/project-config master: Revert "Wheel publish jobs: include system-config roles" https://review.opendev.org/734739	05:26
AJaeger	ianw: could you review https://review.opendev.org/732490 for dib - this should be fine now IMHO	05:32
AJaeger	, please?	05:32
*** xiaolin has joined #opendev		06:02
*** factor has quit IRC		06:17
*** factor has joined #opendev		06:17
*** Dmitrii-Sh has quit IRC		06:17
*** Dmitrii-Sh has joined #opendev		06:18
*** hashar has joined #opendev		07:04
*** xiaolin has quit IRC		07:05
*** iurygregory has quit IRC		07:11
*** xiaolin has joined #opendev		07:17
*** rpittau\|afk is now known as rpittau		07:21
xiaolin	hello, opendev, we want to donate computing resources, do we need to build our own cloud to meet the minimum requirements: support for a 100 concurrent VM instances, each with 8GB RAM, 8 vCPUs, and 80GB storage?	07:24
*** tosky has joined #opendev		07:29
*** iurygregory has joined #opendev		07:33
frickler	xiaolin: our experience with operating a cloud ourselves haven't been too positive, so from our side the best solution would be if you could operate a cloud yourself. if that isn't possible, we might consider some other option, but that would require some more discussion	07:33
frickler	xiaolin: the size of the cloud isn't a hard limit, in particular if you are talking about mips based ressources (iirc) instead of x86	07:34
*** xiaolin has quit IRC		07:36
frickler	xiaolin: most of the team members are US based, it would be great if you could continue discussion during their business hours. if that's too inconvenient, maybe sending a mail with how your plans look like would be easier	07:36
frickler	see http://lists.opendev.org/cgi-bin/mailman/listinfo/service-discuss	07:37
openstackgerrit	Merged opendev/irc-meetings master: Update QA office hour time https://review.opendev.org/734612	07:46
*** moppy has quit IRC		08:01
*** moppy has joined #opendev		08:01
*** ravsingh has joined #opendev		08:24
*** DSpider has joined #opendev		08:26
*** ykarel is now known as ykarel\|lunch		09:13
*** hashar has quit IRC		09:25
*** xiaolin has joined #opendev		09:45
*** ysandeep is now known as ysandeep\|lunch		09:48
*** xiaolin has quit IRC		09:51
*** ykarel\|lunch is now known as ykarel		10:04
*** xiaolin has joined #opendev		10:04
openstackgerrit	Chandan Kumar (raukadah) proposed openstack/diskimage-builder master: [DNM] missing file /etc/pki/tls/private https://review.opendev.org/734782	10:11
*** sshnaidm\|afk is now known as sshnaidm		10:14
*** rpittau is now known as rpittau\|bbl		10:19
*** ysandeep\|lunch is now known as ysandeep		10:20
*** xiaolin has quit IRC		10:23
openstackgerrit	Carlos Goncalves proposed zuul/zuul-jobs master: configure-mirrors: add CentOS 8 Stream https://review.opendev.org/734787	10:27
openstackgerrit	Carlos Goncalves proposed opendev/base-jobs master: Add centos-8-stream nodeset https://review.opendev.org/734788	10:29
*** Meiyan has quit IRC		10:30
openstackgerrit	Carlos Goncalves proposed openstack/project-config master: CentOS 8 Stream initial deployment https://review.opendev.org/734791	10:40
openstackgerrit	Luigi Toscano proposed openstack/project-config master: gerritbot: more notifications in the cinder channel https://review.opendev.org/734792	10:47
*** tkajinam has quit IRC		10:53
*** lpetrut has joined #opendev		10:59
mordred	frickler: it occurs to me - one of the options we might want to consider (and consider putting on that document) - is if someone wants to donate compute resources but are not already cloud operators, it might be easier for them to ship some computers to one of our existing cloud providers	11:22
frickler	mordred: I've been thinking that too, but gathered we should ask the affected operators first, donnyd and mnaser would first come to my mind	11:24
frickler	there's likely also complications like how to handle hardware replacements and how the hardware integrates into the existing environment, if I were a cloud operator, I'd rather get a financial donation and order the same hardware I use everywhere else	11:28
mordred	frickler: yeah - donnyd and mnaser are who I was thinking about	11:29
mordred	frickler: and yes to financial - unless there is some sort of specific hardware requirement, such as mips	11:30
openstackgerrit	Carlos Goncalves proposed openstack/diskimage-builder master: Add support for CentOS 8 Stream https://review.opendev.org/734083	11:35
fungi	agreed, often the folks interested in donating very specific hardware have ties to the manufacturer of said hardware, and so can provide it at much lower overall cost than normal market prices	11:41
fungi	so an actual hardware donation could go a lot further than a purely financial one	11:42
mordred	yeah - without needing to learn how to be a cloud operator	12:02
mordred	just for what it's worth - I am beset upon by kittens atm. one has decided my trackpad is a pillow and the other has decided my forearm is a perch. so -- computering is currently under duress	12:03
fungi	yes, christine showed me a photo. remember, they are carnivores and you are made of meat	12:04
mordred	yes, this is rather true	12:04
fungi	they could just be trying to lull their breakfast into a false sense of security	12:04
mordred	I am 100% certain they only see me as breakfast	12:05
*** rpittau\|bbl is now known as rpittau		12:05
mordred	I think they are employing the correct tactics	12:05
fungi	one of ours still hasn't stopped trying to eat us, after four years of abject failure	12:06
mordred	one day success may be forthcoming	12:06
fungi	he'll eventually wear us down, yep	12:07
*** ravsingh has quit IRC		12:20
*** olaph has quit IRC		12:28
fungi	so, looking at the mirror volumes ianw mentioned, it appears all the recently-created wheel volumes in afs are showing issues while the older 3 are behaving normally	12:31
fungi	i see the same behavior locally from my workstation even	12:31
AJaeger	ttx, config-core, looking at https://review.opendev.org/#/c/734640/ - I think we want to keep the official-openstack-repo-jobs for this step of retirement so that the repo gets empied and visible on github. Do you agree? ttx, or are you doing something in github so that this is not needed?	12:33
fungi	AJaeger: i believe retired repos have been getting deleted from github	12:34
AJaeger	fungi: ok, in that case we don't need it.	12:35
fungi	they'll still be available on opendev, but the github mirror for openstack is now only active/maintained projects	12:35
AJaeger	https://github.com/openstack/syntribos redirects to https://github.com/openstack-archive/syntribos - without the final change to delete the repo	12:36
fungi	ahh, yeah, i guess they've been getting transferred to the openstack-archive org rather than just deleted	12:36
fungi	maybe ttx knows how the repos in that archive org should look	12:37
AJaeger	So, it's not available anymore in /openstack/ - but misses the deletion. So, ttx, what's your preference? Continue like with syntribos or want the final change as well?	12:37
AJaeger	thanks fungi - let's wait for ttx	12:38
openstackgerrit	Sagi Shnaidman proposed zuul/zuul-jobs master: Add jobs for testing ensure-ansible https://review.opendev.org/734584	12:40
fungi	ianw: clarkb: corvus: i'm mildly suspicious of afs01.ord.o.o, dmesg has entries from friday which look like a xen domu suspend/restore	12:40
fungi	still trying to work out where the actual afs timeouts are coming from though, not sure if that server is involved	12:40
fungi	yeah, nevermind, vos listvol says it doesn't host copies of those volumes anyway	12:41
fungi	and afs02.dfw.o.o is probably not the source of the problem as it's only hosting readonly replicas while the timeouts are for interactions with the rw volumes	12:42
*** hashar has joined #opendev		12:46
fungi	for some reason `fs checkservers -cell openstack.org` doesn't seem to work the way the manpage implies (it tells me the local machine is unavailable)	12:50
fungi	this is weird... for some reason all the problem volumes have an extra "server afs01.dfw.openstack.org partition /vicepa RO Site" listed by `vos listvldb`	12:57
fungi	(so one rw replica on afs01.dfw and two ro replicas on afs01.dfw as well as one ro replica on afs02.dfw)	12:58
*** rajinir has quit IRC		13:01
fungi	no, wait! that's because i was querying from afs01.dfw	13:03
*** sgw has joined #opendev		13:03
fungi	if i query from my workstation it lists an rw and ro site on my workstation?!?	13:04
*** rajinir has joined #opendev		13:04
fungi	now the fs checkservers error is starting to make sense	13:04
fungi	this is _bizarre_	13:04
fungi	http://paste.openstack.org/show/794566/	13:05
openstackgerrit	Hervé Beraud proposed openstack/project-config master: gerritbot: more notifications in the oslo channel https://review.opendev.org/734827	13:05
openstackgerrit	Sagi Shnaidman proposed zuul/zuul-jobs master: Add jobs for testing ensure-ansible https://review.opendev.org/734584	13:05
*** ravsingh has joined #opendev		13:07
fungi	similarly if i run vos listvldb on the mirror-update instance, it reports "server mirror-update01.opendev.org partition /vicepa RW Site" and similar for RO	13:08
fungi	so something in the record is saying that the rw volume and one ro replica are on the client's local system	13:09
*** olaph has joined #opendev		13:09
ttx	looking	13:15
AJaeger	ttx, compare https://opendev.org/openstack/syntribos and https://github.com/openstack-archive/syntribos	13:16
AJaeger	github missed the last change that we have in opendev. Ir that correct or should those be the same?	13:17
ttx	AJaeger: in openstack-archive we have both forms. Some have a archiving commit and some do not.	13:17
ttx	I think I have a slight preference for those who have the archiving commit	13:17
AJaeger	ttx, ok - then we need to keep the official-openstack-repo-jobs until last minute - I'll know what to do.	13:18
AJaeger	thanks, ttx	13:18
ttx	as it may not be super-obvious this is archived content otherwise	13:18
ttx	(you have to notice the change in org name)	13:18
ttx	AJaeger: if that's not too much of a hassle, i think that's better yes	13:18
AJaeger	ttx, it's no problem	13:20
AJaeger	ttx, https://review.opendev.org/734835 updates the docs	13:24
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Retire Paunch https://review.opendev.org/734640	13:27
*** ykarel is now known as ykarel\|afk		13:31
fungi	i need to disappear for a grocery pickup appointment, but can resume banging my head against these weird afs volumes in a bit	13:46
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Retire Paunch https://review.opendev.org/734640	13:59
*** ykarel\|afk is now known as ykarel		14:02
openstackgerrit	Sagi Shnaidman proposed zuul/zuul-jobs master: Add jobs for testing ensure-ansible https://review.opendev.org/734584	14:04
mordred	infra-root: I've got to run an errand and will be out for a couple of hours.	14:12
*** sshnaidm is now known as sshnaidm\|bbl		14:23
*** hashar has quit IRC		14:39
openstackgerrit	Oleksandr Kozachenko proposed openstack/project-config master: Add magnum and magnum-tempest-plugin in required-projects https://review.opendev.org/734863	14:43
*** mlavalle has joined #opendev		14:47
clarkb	fungi: ianw: I wonder if that is an afs version mismatch problem? Like perhaps using your local openafs system to talk to older fileservers to create the volumes is a problem (this assumes we didn't create the volumes on the servers themselves)	14:58
*** priteau has joined #opendev		14:58
clarkb	maybe we should try to create a new test volume on afs01 and see if it exhibits the same behavior?	14:58
*** ykarel is now known as ykarel\|away		15:01
*** lpetrut has quit IRC		15:09
openstackgerrit	Drew Walters proposed openstack/project-config master: Add missing project to Airship doc job https://review.opendev.org/734874	15:14
fungi	clarkb: i saw the same behavior on mirror-update01.openstack.org which is xenial, same as afs01.dfw, and also saw it when running locally on afs01.dfw	15:18
clarkb	fungi: right but where did we create the volumes?	15:18
clarkb	I'm just wondering if it could be a creation problem with new openafs talking to old openafs to create the volume	15:19
fungi	ahh, so an issue on creation... maybe? i created some from my workstation, but i think ianw may have needed to delete and recreate them later (now i don't remember why, i'll look up the earlier discussions)	15:20
clarkb	I think it was a similar situation with extra volume replicas?	15:20
clarkb	but I'm not 100% sure on that	15:20
fungi	i'll try to find that earlier conversation after lunch	15:20
*** ysandeep is now known as ysandeep\|away		15:30
clarkb	fungi: has https://review.opendev.org/#/c/729029/1 been tested to check that the ep_headings plugin continues to work?	15:33
*** olaph has quit IRC		15:33
clarkb	fungi: if not I think what we can do is push a followup that forces the system-config-run etherpad job to fail, add a hold for that node then use /etc/hosts to talk to the test node as if it were production and check it?	15:33
*** yoctozepto has quit IRC		15:34
*** yoctozepto has joined #opendev		15:35
fungi	i have not tested it, no	15:42
openstackgerrit	Clark Boylan proposed opendev/system-config master: Test etherpad with testinfra https://review.opendev.org/734880	15:50
clarkb	k, I've put a hold on ^ and when we are done checking things that way we can drop the assert False and have a bit more checking done autoamtically too	15:51
*** rpittau is now known as rpittau\|afk		16:07
openstackgerrit	Merged openstack/diskimage-builder master: Fix yumdownloader cache dir https://review.opendev.org/698788	16:24
openstackgerrit	Carlos Goncalves proposed openstack/diskimage-builder master: Add support for CentOS 8 Stream https://review.opendev.org/734083	16:27
*** ravsingh has quit IRC		16:36
clarkb	fungi: 213.32.76.138 in /etc/hosts as etherpad.opendev.org and load https://etherpad.opendev.org/p/clarkb-test it works but its not perfect	16:41
fungi	mmm	16:42
clarkb	fungi: I think next step may be disabling ep_headings and checking if it renders properly?	16:42
fungi	probably. it's just one line in the config	16:43
clarkb	trying to figure out how to do that on the running instance	16:45
clarkb	we bake it into the image, might be easiset to just push another update without that in the image	16:48
clarkb	before I do that I'll try rebuilding thei mage on the test node and restart with docker compose	16:49
clarkb	heh I've just realized the ep_headings thing is the only thing we change so I can just switch to upstream image to test this	16:53
clarkb	I'll do that if the image rebuild fails for some reason	16:53
clarkb	fungi: ya without ep_headings its a bit better. What i notice though is that we're running with the new format not the old ui	16:54
clarkb	which may be related	16:54
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Allow upload-docker-image role to be used outside of promote https://review.opendev.org/734890	16:56
openstackgerrit	Oleksandr Kozachenko proposed openstack/project-config master: Add openstack/magnum and openstack/magnum-tempest-plugin in required-projects https://review.opendev.org/734863	17:00
*** priteau has quit IRC		17:01
clarkb	fungi: ok test it now. I think the problem is 1.8.3 switched to colibris skin by default	17:06
clarkb	even though the 1.7.x series docs said this wouldn't happen until etherpad 2.0	17:06
clarkb	forcing skinName to no-skin in the settings seems to fix this	17:06
clarkb	I'll get a change up taht does that and we can recapture the host and dobule check it is happy with config management (and not my manual fiddling)	17:07
openstackgerrit	Clark Boylan proposed opendev/system-config master: Upgrade Etherpad to 1.8.4 https://review.opendev.org/729029	17:10
openstackgerrit	Clark Boylan proposed opendev/system-config master: Test etherpad with testinfra https://review.opendev.org/734880	17:10
clarkb	I'm putting a hold on that again	17:10
clarkb	fwiw using the test node as a test system seems to be working reasonably well. And better yet I can recylce them easily	17:11
clarkb	*test node as dev system	17:13
fungi	clarkb: we set the "noTheme" (or whatever it's called) skin explicitly in our configs, so should override the colibris default	17:14
clarkb	fungi: I couldn't find that fwiw	17:15
clarkb	fungi: it wasn't until 1.8.3 that the default changed from no-skin to colibris though	17:15
clarkb	which is why I think we were fine (I looked for a place we were explicitly overriding it and couldn't find it so I added it to your change)	17:15
*** hashar has joined #opendev		17:16
fungi	it used to be in our config, i even removed the line so we could test the colibris default on etherpad-dev	17:16
fungi	i wonder if we lost that when we containerized	17:16
clarkb	fungi: that could be	17:17
fungi	though yes, you're right that was when i was testing a commit from the devel branch while they were working toward 1.8.3	17:17
fungi	but it was explicitly set to no theme in our config at that point	17:18
fungi	which prevented us from following the default change to colibris	17:18
clarkb	I personally don't care for colibris, it feels more like a official document than a ethereal note pad	17:19
clarkb	but I think we should update to colibris as a separate step if we want to go that route	17:19
clarkb	(for anyone wondering why not jump to colibris)	17:19
fungi	yes, i mean, we tried it out on the old review-dev and none of us seemed impressed with it	17:23
fungi	it's trying too hard to be google docs i think	17:23
fungi	i don't want a collaborative word processor, i want a collaborative text editor	17:24
fungi	it's possible folks who spend a lot of time in wysiwyg environments prefer the word processor feel, while folks who spend a lot of time in terminals emulators and shell environments prefer the text editor feel	17:25
clarkb	that could be	17:25
clarkb	it is rare that i start soffice	17:26
clarkb	(and yes libreoffice still installs that binary)	17:26
fungi	i do everything i can to avoid starting a word processor	17:26
fungi	including command-line converters which turn word processing documents into plain text	17:26
fungi	(antiword, for example)	17:27
fungi	i suppose giving users a dislplay toggle to switch between themes individually/locally would allow for the best of both worlds	17:27
fungi	but it doesn't appear that they've designed for such a case	17:28
clarkb	fungi: ya we'd need headings and other potential plugins to work with both skins in that case	17:28
clarkb	the previous test instance that I fixed manually continues to lgtm	17:31
clarkb	if the new test instance which should be up in about 20 minutes or so looks good to others then I think we can go ahead and land the upgrade	17:31
fungi	awesome, thanks for picking that up!	17:38
fungi	i've been buried under ml discussions and reviews	17:38
*** sshnaidm\|bbl is now known as sshnaidm		17:40
clarkb	well I wanted to approve it and then realized I should double check it was ready :)	17:49
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Deprecate Paunch https://review.opendev.org/734640	17:49
*** hashar has quit IRC		17:57
fungi	clarkb: ianw: back to the afs volumes, this actually looks like the same thing we saw previously with those volumes: http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-05-26.log.html#t2020-05-26T23:45:01	17:58
fungi	it's odd, both the volumes i created and those ianw created are exhibiting the issue	18:00
clarkb	infra-root 104.239.168.111 in /etc/hosts as etherpad.opendev.org seems to be working now with my update to set no-skin on fungi's upgrade change	18:01
clarkb	https://etherpad.opendev.org/p/clarkb-test is the etherpad I used really quickly there	18:02
openstackgerrit	Clark Boylan proposed opendev/system-config master: Test etherpad with testinfra https://review.opendev.org/734880	18:02
clarkb	that change should pass now that I don't need it to fail for node holding. I've also cleaned up my earlier holds	18:04
fungi	clarkb: seems to work, though now the weird background overlap we were seeing through meetpad appears in this test deploy of etherpad	18:05
clarkb	fungi: its not as bad this time, my 'g' renders with my color properly	18:06
clarkb	but its breaks into your color which is weird	18:06
clarkb	the meetpad situation was the g had no tail	18:06
clarkb	oh wait I had to add more text :)	18:06
clarkb	:/	18:06
clarkb	hard refresh doesn't change that	18:07
clarkb	I wonder if this is new etherpad bug and we were simply noticing it with meetpad beacuse we were doing some testing	18:07
fungi	entirely possible	18:07
fungi	it's like they added more top and bottom padding within the authorship color container or something	18:08
fungi	also the ability to independently dock or float the authors and chat boxes has changed	18:09
fungi	now you can alter the behavior through the config modal, but there are no buttons to switch them between docked and floating independently	18:10
fungi	also the author colors toggle no longer temporarily fixes the background overlaps like we saw with meetpad, so maybe this is a slightly different problem	18:11
clarkb	I can transition the chat box between docked, float and closed using the little buttons for it	18:14
clarkb	and clicking the authors thing toggles it too	18:14
clarkb	do you mean some other behavior?	18:14
fungi	looking again	18:16
fungi	oh, yep, i missed the buttons for chat	18:17
clarkb	I do also find toggling authorship colors doesn't change the overlap	18:17
fungi	i guess it's that the floating user list is now independent of your chat mode	18:17
clarkb	maybe we should try without ep_headings again and see if that color overlap behavior changes	18:18
fungi	so author list can float over the chat column unless you toggle the "show chat and users" config option	18:18
clarkb	fungi: to do that you can edit /etc/etherpad-docker/docker-compose.yaml on the server to change the image for etherpad from our ehterpad image to the upstream 1.8.4 image since the only thing we changein our image is the addition of the ep_headings plugin	18:19
clarkb	(not sure if you are interested in doing that or I should go for it	18:19
clarkb	though I'm about to pop out for a bike ride	18:19
fungi	i can't seem to ssh into 104.239.168.111 as root	18:19
fungi	oh! it has my user on it	18:20
clarkb	fungi: yes, its been converted to one of our production nodes (at least as far as behavior goes)	18:20
fungi	i guess that's an artifact of our production-like testing	18:20
clarkb	yup	18:20
fungi	neat	18:20
fungi	yeah, i can do it, just need to find the name of their dockerhub org	18:21
fungi	looking now	18:21
clarkb	fungi: its in our dockerfile	18:21
clarkb	I think you want to change docker.io/opendevorg/etherpad to docker.io/etherpad/etherpad:1.8.4	18:22
*** mlavalle has quit IRC		18:22
clarkb	then sudo docker-compose down && sudo docker-compose up -d and refresh browser	18:22
fungi	yep, i concur	18:22
clarkb	fungi: looks like you used 1.8.0? we want 1.8.4 (unless you want to check the old and new behavior in comparison)	18:23
fungi	aha, yep	18:23
fungi	and after `cd /etc/etherpad-docker/`	18:23
clarkb	we should check it with the older version anyway	18:23
fungi	well, i switched to .4 just now	18:24
clarkb	ya we can do it after	18:24
fungi	but yes, we can do both	18:24
fungi	still seems to do it with upstream image	18:24
clarkb	problem continues after removing ep_headings	18:24
clarkb	(which makes me think it is a bug in etherpad on the no skin skin)	18:24
fungi	i'll try dropping back to .0 again using upstream	18:24
clarkb	1.8.0 looks fine	18:25
fungi	downgrading to 1.8.0 makes the overlap go away	18:25
clarkb	I also notice that the font sizes changed	18:25
fungi	yeah	18:25
fungi	they did indeed	18:25
clarkb	I wonder if those things are related	18:25
fungi	likely	18:26
clarkb	on a positive note testing things like production in testing is remarkably easy	18:26
fungi	amazingly	18:26
clarkb	fungi: maybe we should file a bug with etherpad about it (and take screenshots using this test isntance?) and see what they say before upgrading?	18:27
clarkb	I'm worried that we'll be told no-skin isn't supported anymore	18:27
fungi	that makes sense as a next step, sure	18:27
fungi	and yes, i have the same expectation	18:27
clarkb	k, I can work on that after a bike ride and lunch if you don't want to bother.	18:28
clarkb	and with that I'm popping out now for a bit	18:28
*** sshnaidm is now known as sshnaidm\|afk		19:05
Open10K8S	Hi team	19:19
Open10K8S	https://review.opendev.org/#/c/734863/	19:19
Open10K8S	Please check this PS	19:19
Open10K8S	I updated the commit message	19:19
Open10K8S	Regards	19:19
Open10K8S	https://review.opendev.org/#/c/734891/ is waiting	19:20
clarkb	Open10K8S: +2	20:30
corvus	it looks like https://review.opendev.org/733409 ran for the first time in a cloud with differing public and private ip addresses in the gate and failed	20:37
corvus	i guess we should actually map the nodepool private_ipv4 to our ansible inventory public_v4 in the gate	20:39
corvus	since it seems like the private addresses are what we write to /etc/hosts	20:39
openstackgerrit	James E. Blair proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409	20:42
clarkb	corvus: hrm, we put the private IP there to avoid traversing NAT which has been problematic in the past	20:43
clarkb	(for things like vxlan tunnels)	20:43
clarkb	fungi: I think 5fd6aeeea62674cecf997421546a675d91cf45ef may be the commit that broke things	20:43
corvus	clarkb: yeah, i think just using the private ip should be fine	20:44
clarkb	fungi: I don't understand why yet, but the commit message in etherpad-lite makes it seem likely	20:44
clarkb	fungi: I'll file a bug now with a couple screenshots anda pointer to that commit and see if they say anything	20:47
fungi	clarkb: thanks! i could do it, but not until after i'm done prepping dinner	20:48
openstackgerrit	James E. Blair proposed opendev/system-config master: Fake zuul_connections for gate https://review.opendev.org/730929	20:49
corvus	and i think the public/private ipv4 issue also broke the last run of that, so that's a rebase	20:49
openstackgerrit	James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302	20:50
clarkb	no worries, I'm going to update etherpad on our test node to get screenshot of the broken stuff	20:50
fungi	cool, that's how i was considering doing it too, since it's just a one-liner edit and down/up the container	20:52
openstackgerrit	Merged zuul/zuul-jobs master: Allow upload-docker-image role to be used outside of promote https://review.opendev.org/734890	20:55
clarkb	https://github.com/ether/etherpad-lite/issues/4106 has been filed	21:03
clarkb	corvus: on the ip address switch, do we use public_ipv4 for anything? (iptables?)	21:04
clarkb	I know ansible itself is going to use the ansible_host value	21:04
corvus	clarkb: yes that's exactly it	21:04
clarkb	thanks	21:05
clarkb	change lgtm then	21:05
corvus	that's the value we (just recently) started using in the iptables rules; that replaced a dns lookup -- it's now an ansible inventory lookup so that we can do iptables by ansible group	21:05
clarkb	rgr	21:05
clarkb	corvus: btw not sure if you saw but the "use zuul test node as -dev server standin" worked really well earlier today	21:06
clarkb	mordred: ^ you too	21:06
corvus	clarkb: oh nice, sorry i missed a bunch earlier. but that's pretty cool. might make some kind of self-service hold thing worthwhile	21:07
clarkb	I changed my vote on the etherpad upgrade change from +2 to -W with a link to the issue I filed	21:10
clarkb	we'll see where that takes us I think	21:10
fungi	awesome, thanks again!	21:28
*** mlavalle has joined #opendev		21:29
corvus	woohoo! zuul started in system-config-run-zuul: https://zuul.opendev.org/t/openstack/build/ef6229a9233f4206a1d24e0724839f83/log/zuul01.openstack.org/debug.log	21:47
corvus	i'm going to do one more rebase of that stack	21:47
openstackgerrit	James E. Blair proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409	21:48
openstackgerrit	James E. Blair proposed opendev/system-config master: Fake zuul_connections for gate https://review.opendev.org/730929	21:48
openstackgerrit	James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302	21:48
*** DSpider has quit IRC		21:58
Open10K8S	clarkb: thank you	22:00
clarkb	fungi: going back to the openafs oddity, is your local openafs a 1.8 version? I wonder if that could be part of it and we should try a create on the fileserver itself?	22:01
fungi	well, a bunch of those volumes (possibly all of them?) were created by ianw, so it may make more sense to check how and from where he created them	22:02
clarkb	ah	22:02
fungi	or at least double-check whether i'm misreading the discussion here from may 26	22:03
fungi	but yes, i've currently got openafs 1.8.6~pre1-3	22:04
fungi	from debian unstable	22:04
* clarkb looks at python2 things again		22:09
openstackgerrit	Clark Boylan proposed openstack/project-config master: Install git-review under python3 for proposed updates https://review.opendev.org/735019	22:13
clarkb	that was an easy one to address so I went for it	22:13
ianw	clarkb/fungi: i'm guessing the afs volumes still aren't happy?	22:13
clarkb	ianw: ya	22:13
clarkb	ianw: one thing I was wondering about is if a newer openafs was used to create them which potentially caused problems	22:14
ianw	the thing is they were happy, for a bit	22:14
clarkb	I've only ever created them on the fileserver and while its been a while those have always been happy as far as I know	22:14
ianw	vos examine mirror.wheel.focala64	22:18
ianw	Could not fetch the information about volume 536871131 from the server	22:18
ianw	Possible communication failure	22:18
ianw	that's on miror-update	22:18
clarkb	corvus: comment on https://review.opendev.org/#/c/730929/6	22:18
ianw	i wonder if tcpdumping again we'll see this icmp stuff	22:18
ianw	server mirror-update01.opendev.org partition /vicepa RW Site	22:19
ianw	this seems familiar. somehow it has decided that mirror-update01 is in the vldb, or at least is showing it as such	22:19
clarkb	ianw: was the volume created from mirror-update01.opendev.org or something like that	22:19
clarkb	ianw: and ya fungi found that this was the same problem we had earlier (a week or two back) when we hit prolbmes around these volumes	22:20
clarkb	that was based on irc logs	22:20
ianw	clarkb: i think it was, but this is similar to what i saw when looking at fungi's volumes; why i deleted them and recreated them, which worked when i left it, but appears to have gone back to the same thing now	22:20
ianw	rx data vldb reply get-entry-by-name-n "mirror.wheel.focala64" numservers 4 servers 127.0.1.1 127.0.1.1 23.253.73.143 104.130.138.161 partitions a a a a rwvol 536871131 rovol 536871132 backup 536871133 (504)	22:26
ianw	that's from a tcpdump to afsdb01	22:26
ianw	that sure looks like it's saying the servers for wheel.focala64 include 127.0.1.1	22:26
fungi	yep	22:26
fungi	that's exactly what i found looking into it today too	22:26
fungi	so the vldb records have somehow replaced the afs01.dfw.openstack.org record for the rw volume with 127.0.1.1, and added a 127.0.1.1 ro replica	22:28
ianw	127.0.1.1 and openafs has some google hits	22:28
fungi	could it be because /etc/hosts on afs01.dfw has "127.0.1.1 afs01.dfw.openstack.org afs01"	22:28
fungi	and so it's resolving its ip address based on that?	22:29
ianw	"This seems to bite everyone who installs the Debian or Ubuntu packages on	22:29
ianw	a non-modified server which has"	22:29
fungi	last modified date on that file is more than a year ago though	22:29
clarkb	fungi: its part of our normal setup to do that	22:29
fungi	right	22:29
fungi	which is why i'm wondering what has caused that to suddenly become an issue	22:30
clarkb	ya thats why I wondered if openafs version used to do the create is important	22:30
fungi	unless maybe these are the first volumes we've added since the hosts file was "normalized" to match our other servers?	22:30
clarkb	beacuse we've left these things alone for a long time and they've been fine	22:30
clarkb	fungi: thats possible, I don't know if we've added new volumessince the xenial upgrade	22:30
ianw	well i think we have a smoking gun ... first thing is how to get rid of it	22:31
ianw	https://lists.openafs.org/pipermail/openafs-info/2013-December/040285.html	22:31
fungi	yeah, occam's razor says we had it set up correctly on trusty, the xenial upgrade undid the hosts file back to "normal" and these are the only volumes we've added since	22:33
clarkb	ianw: I'll admit I don't quite understand any of what that email is trying to say	22:34
clarkb	like the 127.0.1.1 problem is because it is already in /etc/hosts ? why do we need to update it?	22:34
ianw	clarkb: heh, me either yet :)	22:34
ianw	but i think it's our current best clue :)	22:34
clarkb	ya I agree it seems to be the thread to pull on	22:34
clarkb	https://docs.openafs.org/Reference/5/NetRestrict.html is the other thing I didn't recognize	22:35
clarkb	I think that means we can add netrestrict files on the servers to exclude 127.0.1.1 and whatever else	22:35
clarkb	which is probably a reasonable enough workaround for us as we can stick all of that into config managmeent	22:35
ianw	i think if we do the vos remsite on afs01 it will remove appropriately	22:36
clarkb	ianw: maybe we need both things? remsite to fix the existing volumes and netrestrict to avoid this in the future?	22:37
ianw	yeah, remove the 127.0.1.1 entries, then make sure they don't come back	22:37
mordred	clarkb: yay! (re: -dev server standin)	22:38
mordred	also - sorry, my "I'll be gone for a couple of hours for errand" - I forgot to translate that to nola time	22:39
clarkb	mordred: I just assume the rain flooded the streets so everyone raided the bars for togo cups full of $drink_of_choice	22:39
ianw	$ vos remsite -server afs01.dfw.openstack.org -partition a -id mirror.wheel.focala64	22:39
ianw	Deleting the replication site for volume 536871131 ...Removed replication site afs01.dfw.openstack.org /vicepa for volume mirror.wheel.focala64	22:39
mordred	clarkb: that's an excellent assumption with very accurate details	22:40
ianw	22:39:08.878673 IP 104.130.136.20.7003 > 104.130.137.130.56972: rx data vldb reply get-entry-by-name-n "mirror.wheel.focala64" numservers 3 servers 127.0.1.1 23.253.73.143 104.130.138.161 partitions a a a rwvol 536871131 rovol 536871132 backup 536871133 (504)	22:40
clarkb	ianw: so its still reporting 127.0.1.1?	22:40
mordred	clarkb: the real issue is that we're still only at 30% capacity for bars that don't serve food - so the lines for go-cups of $drink are extra long	22:41
ianw	clarkb: before (top) and after (bottom) : http://paste.openstack.org/show/794603/	22:41
ianw	i think there's two entries ... and it removed the "real one"	22:42
clarkb	ianw: ya your tcpdump shows 127.0.1.1 so I assume it removed the real one	22:42
mordred	clarkb, ianw: so - just to catch up - our entries in /etc/hosts with 127.0.1.1 hostname are bad and break things. didn't we spend a bunch of time at one point to make sure those entries existed?	22:42
clarkb	mordred: yes, those entries make other things happy iirc	22:43
clarkb	something to do with unbound maybe	22:43
clarkb	(because it listens on that addr?)	22:43
clarkb	ianw: remsite says you can provide the ip address or the name	22:43
clarkb	ianw: so maybe add the proper site back then use 127.0.1.1 to remsite the wrong one?	22:44
ianw	ok, "vos examine" with "-noresolve" shows : server 127.0.1.1 partition /vicepa RW Site	22:44
mordred	clarkb: nod. well - fwiw, we only do that in set-hostname which we only do in launch-node	22:44
ianw	so the rw volume is on 127.0.1.1	22:44
clarkb	i'm looking at http://docs.openafs.org/Reference/1/vos_remsite.html fwiw	22:44
clarkb	mordred: ah so we could maybe make an exception for afs servers	22:44
mordred	so if we wanted it to be different on some of the hosts, it shouldn't be hard, nor should it break anything	22:44
mordred	yeah	22:44
mordred	I think we could just write an /etc/hosts file in the afs role	22:45
mordred	nothing should fight	22:45
*** tkajinam has joined #opendev		22:45
ianw	maybe vos move is way to update the rw volume?	22:49
ianw	something happened ... maybe ... http://paste.openstack.org/show/794604/	22:53
clarkb	fwiw vos move looks correct for RW sites reading manpages	22:53
clarkb	addsite is RO only	22:53
clarkb	ianw: does vos examine -noresolve look happier now too?	22:54
ianw	no ;)	22:58
ianw	$ vos examine -noresolve mirror.wheel.focala64	22:58
ianw	Could not fetch the information about volume 536871131 from the server	22:58
ianw	: No such device	22:58
*** aannuusshhkkaa has joined #opendev		22:59
*** shtepanie has joined #opendev		23:00
clarkb	doesn't seem like there is a way to convert a RO site to a RW site?	23:00
rm_work	Hey, trying to walk some folks through registering for a new openstack account, and it's failing to create accounts right after clicking register on https://openstackid.org/auth/register	23:00
rm_work	Is this a known issue?	23:00
clarkb	otherwise I'd say addsite with the correct IP, this gives us a RO volume. Then switch it to the RW volume	23:00
rm_work	HTTP 500: openstackid.org is currently unable to handle this request.	23:00
clarkb	rm_work: I'll ping the foundation sysadmins and see if the server logs say anything obvious to me	23:01
rm_work	ok, both of them are getting it, and I was able to replicate as well.	23:01
clarkb	it looks like PHP is running out of memory but the service itself has plenty	23:02
clarkb	*server itself	23:02
clarkb	rm_work: foundation sysadmin is looking at it now. Will let you know what they find	23:04
rm_work	alright, thanks!	23:08
ianw	clarkb: the move seems to have failed, i'm not sure what to do now	23:10
clarkb	ianw: ya I don't know either. Maybe ignore that one for now (and we'll rebuild it), but try the remsite using the IP on another one and see if that fixes it?	23:11
ianw	i tried on bionica64 and it did the same thing	23:13
ianw	Failed to create a transaction on the source volume 536871125	23:13
ianw	VOLSER: volume is busy	23:13
clarkb	ianw: using the IP it removed the other IP?	23:13
clarkb	with remsite I mean	23:13
*** tosky has quit IRC		23:15
clarkb	rm_work: I've been askedto confirm you are ticking the captcha box	23:21
rm_work	yes.	23:21
ianw	clarkb: no, it seemed to fail. i'm going to try just removing all the broken volumes	23:21
clarkb	ianw: ok, before we create new ones should we add the netrestriction?	23:21
clarkb	I guess we can do that manually then put it in config management later if that is easier	23:22
rm_work	you can fairly easily replicate, they should be able to throw a junk name and email in and replicate themselves in like 10 seconds	23:22
ianw	clarkb: yeah, let me clean what's there then we can try making them again :)	23:22
clarkb	rm_work: the main person says they haven't been able to reproduce though someone else has reprodcued	23:22
rm_work	huh. all three of us trying here got the same thing. so maybe if we just hammer it? :D	23:22
clarkb	rm_work: we are looking at the memory issue as a possible cause though	23:22
ianw	oh, i wonder if it's the mount in the r/w parition?	23:22
clarkb	ianw: oh we need to remove it from the fs side, the move then remount? that would make sense	23:23
ianw	clarkb: i can't do an rmmount because the "file doesn't exist"	23:25
clarkb	hrm	23:26
clarkb	maybe we have to rmmount before remsite?	23:26
clarkb	(I claim no expertise though, maybe mordred or corvus know better?)	23:29
mordred	I dind't do it	23:29
ianw	vos remove -noresolve -server afs01.dfw.openstack.org -partition a -id mirror.wheel.bionica64	23:31
ianw	Volume 536871125 does not exist on server and partition	23:31
ianw	VOLSER: no such volume - location specified incorrectly or volume does not exist	23:31
ianw	this is really starting to annoy me :)	23:31
*** mlavalle has quit IRC		23:32
clarkb	rm_work: latest word is that it may be input specific	23:32
clarkb	rm_work: debugging continues, but I'm just playing messenger right now so don't have all the details	23:32
rm_work	O_o	23:32
rm_work	so on our side, we had three separate people doing it and it failing in the same way. I wonder what all three of us could have independently done to get that error	23:33
fungi	sounds like it was input-dependent	23:40
fungi	so that all of them were hitting it isn't too surprising	23:41
openstackgerrit	sebastian marcet proposed opendev/system-config master: OpenstackId v3.0.10 https://review.opendev.org/735022	23:43
rm_work	^^ related?	23:44
rm_work	I mean what input would three different people have shared? different names, different email domains...	23:44
rm_work	unless being in the USA is an input problem :D	23:44
openstackgerrit	sebastian marcet proposed opendev/system-config master: OpenstackId v3.0.11 https://review.opendev.org/735022	23:44
clarkb	rm_work: field size I think	23:45
clarkb	rm_work: not the exact data, but its width	23:45
clarkb	(which may also explain memory issues if the allocated memory is dependent on expected input sizes?	23:45
rm_work	O_o	23:45
clarkb	but again I don't know specifics	23:45
rm_work	My name and email address were both pretty short	23:45
rm_work	but yeah	23:46
* rm_work shrugs		23:46
rm_work	as long as it works	23:46
fungi	shrug php ;)	23:47
ianw	clarkb: through a series of vos removes on afs01.dfw i think i have cleared the bad volumes	23:51
ianw	i've removed the mounts	23:55
ianw	i think we can try recreating	23:55
ianw	i need a cup of tea first :)	23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!