Wednesday, 2020-06-10

openstackgerritMerged openstack/project-config master: Wheel publish jobs: include system-config roles  https://review.opendev.org/73470400:03
openstackgerritMerged openstack/project-config master: Be more explicit about using python3 to run tools/  https://review.opendev.org/73439300:05
openstackgerritMerged openstack/project-config master: Add Backport-Candidate label for Kolla deliverables  https://review.opendev.org/73324300:05
*** cloudnull has quit IRC00:23
*** cloudnull has joined #opendev00:24
*** Meiyan has joined #opendev01:02
*** xiaolin has joined #opendev01:07
ianwclarkb: hrm, i think i'm hitting "If a child job inherits from a parent which defines a pre and post playbook, then the pre and post playbooks it inherits from the parent job will run only with the roles that were defined on the parent."01:14
ianwi guess i need to add the system-config roles to https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L1160 ; even though they're only used by the publish01:15
*** xiaolin has quit IRC01:22
*** xiaolin has joined #opendev01:28
*** xiaolin has quit IRC01:33
*** mlavalle has quit IRC02:01
*** xiaolin has joined #opendev03:16
*** xiaolin has quit IRC03:20
openstackgerritIan Wienand proposed openstack/project-config master: Revert "Wheel publish jobs: include system-config roles"  https://review.opendev.org/73473903:40
*** ykarel|away is now known as ykarel04:11
ianw /afs/.openstack.org/mirror/wheel/debian-10-x86_64  : Connection timed out04:25
ianwgosh darn it ...04:25
ianwi dunno what the heck is up with those volumes http://paste.openstack.org/show/794549/04:33
*** sgw has quit IRC04:34
ianwclarkb/fungi/corvus: ^ maybe you could take a bit to have a look at why these volumes appear corrupt on the executors?  i'm a bit brain-dead on it now05:08
openstackgerritMerged openstack/project-config master: Revert "Wheel publish jobs: include system-config roles"  https://review.opendev.org/73473905:26
AJaegerianw: could you review https://review.opendev.org/732490 for dib - this should be fine now IMHO05:32
AJaeger, please?05:32
*** xiaolin has joined #opendev06:02
*** factor has quit IRC06:17
*** factor has joined #opendev06:17
*** Dmitrii-Sh has quit IRC06:17
*** Dmitrii-Sh has joined #opendev06:18
*** hashar has joined #opendev07:04
*** xiaolin has quit IRC07:05
*** iurygregory has quit IRC07:11
*** xiaolin has joined #opendev07:17
*** rpittau|afk is now known as rpittau07:21
xiaolinhello, opendev, we want to donate computing resources, do we need to build our own cloud to meet the minimum requirements: support for a 100 concurrent VM instances, each with 8GB RAM, 8 vCPUs, and 80GB storage?07:24
*** tosky has joined #opendev07:29
*** iurygregory has joined #opendev07:33
fricklerxiaolin: our experience with operating a cloud ourselves haven't been too positive, so from our side the best solution would be if you could operate a cloud yourself. if that isn't possible, we might consider some other option, but that would require some more discussion07:33
fricklerxiaolin: the size of the cloud isn't a hard limit, in particular if you are talking about mips based ressources (iirc) instead of x8607:34
*** xiaolin has quit IRC07:36
fricklerxiaolin: most of the team members are US based, it would be great if you could continue discussion during their business hours. if that's too inconvenient, maybe sending a mail with how your plans look like would be easier07:36
fricklersee http://lists.opendev.org/cgi-bin/mailman/listinfo/service-discuss07:37
openstackgerritMerged opendev/irc-meetings master: Update QA office hour time  https://review.opendev.org/73461207:46
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** ravsingh has joined #opendev08:24
*** DSpider has joined #opendev08:26
*** ykarel is now known as ykarel|lunch09:13
*** hashar has quit IRC09:25
*** xiaolin has joined #opendev09:45
*** ysandeep is now known as ysandeep|lunch09:48
*** xiaolin has quit IRC09:51
*** ykarel|lunch is now known as ykarel10:04
*** xiaolin has joined #opendev10:04
openstackgerritChandan Kumar (raukadah) proposed openstack/diskimage-builder master: [DNM] missing file  /etc/pki/tls/private  https://review.opendev.org/73478210:11
*** sshnaidm|afk is now known as sshnaidm10:14
*** rpittau is now known as rpittau|bbl10:19
*** ysandeep|lunch is now known as ysandeep10:20
*** xiaolin has quit IRC10:23
openstackgerritCarlos Goncalves proposed zuul/zuul-jobs master: configure-mirrors: add CentOS 8 Stream  https://review.opendev.org/73478710:27
openstackgerritCarlos Goncalves proposed opendev/base-jobs master: Add centos-8-stream nodeset  https://review.opendev.org/73478810:29
*** Meiyan has quit IRC10:30
openstackgerritCarlos Goncalves proposed openstack/project-config master: CentOS 8 Stream initial deployment  https://review.opendev.org/73479110:40
openstackgerritLuigi Toscano proposed openstack/project-config master: gerritbot: more notifications in the cinder channel  https://review.opendev.org/73479210:47
*** tkajinam has quit IRC10:53
*** lpetrut has joined #opendev10:59
mordredfrickler: it occurs to me - one of the options we might want to consider (and consider putting on that document) - is if someone wants to donate compute resources but are not already cloud operators, it might be easier for them to ship some computers to one of our existing cloud providers11:22
fricklermordred: I've been thinking that too, but gathered we should ask the affected operators first, donnyd and mnaser would first come to my mind11:24
fricklerthere's likely also complications like how to handle hardware replacements and how the hardware integrates into the existing environment, if I were a cloud operator, I'd rather get a financial donation and order the same hardware I use everywhere else11:28
mordredfrickler: yeah - donnyd and mnaser are who I was thinking about11:29
mordredfrickler: and yes to financial - unless there is some sort of specific hardware requirement, such as mips11:30
openstackgerritCarlos Goncalves proposed openstack/diskimage-builder master: Add support for CentOS 8 Stream  https://review.opendev.org/73408311:35
fungiagreed, often the folks interested in donating very specific hardware have ties to the manufacturer of said hardware, and so can provide it at much lower overall cost than normal market prices11:41
fungiso an actual hardware donation could go a lot further than a purely financial one11:42
mordredyeah - without needing to learn how to be a cloud operator12:02
mordredjust for what it's worth - I am beset upon by kittens atm. one has decided my trackpad is a pillow and the other has decided my forearm is a perch. so -- computering is currently under duress12:03
fungiyes, christine showed me a photo. remember, they *are* carnivores and *you* are made of meat12:04
mordredyes, this is rather true12:04
fungithey could just be trying to lull their breakfast into a false sense of security12:04
mordredI am 100% certain they only see me as breakfast12:05
*** rpittau|bbl is now known as rpittau12:05
mordredI think they are employing the correct tactics12:05
fungione of ours still hasn't stopped trying to eat us, after four years of abject failure12:06
mordredone day success may be forthcoming12:06
fungihe'll eventually wear us down, yep12:07
*** ravsingh has quit IRC12:20
*** olaph has quit IRC12:28
fungiso, looking at the mirror volumes ianw mentioned, it appears all the recently-created wheel volumes in afs are showing issues while the older 3 are behaving normally12:31
fungii see the same behavior locally from my workstation even12:31
AJaegerttx, config-core, looking at https://review.opendev.org/#/c/734640/ - I think we want to keep the official-openstack-repo-jobs for this step of retirement so that the repo gets empied and visible on github. Do you agree? ttx, or are you doing something in github so that this is not needed?12:33
fungiAJaeger: i believe retired repos have been getting deleted from github12:34
AJaegerfungi: ok, in that case we don't need it.12:35
fungithey'll still be available on opendev, but the github mirror for openstack is now only active/maintained projects12:35
AJaegerhttps://github.com/openstack/syntribos redirects to https://github.com/openstack-archive/syntribos - without the final change to delete the repo12:36
fungiahh, yeah, i guess they've been getting transferred to the openstack-archive org rather than just deleted12:36
fungimaybe ttx knows how the repos in that archive org should look12:37
AJaegerSo, it's not available anymore in /openstack/ - but misses the deletion. So, ttx, what's your preference? Continue like with syntribos or want the final change as well?12:37
AJaegerthanks fungi - let's wait for ttx12:38
openstackgerritSagi Shnaidman proposed zuul/zuul-jobs master: Add jobs for testing ensure-ansible  https://review.opendev.org/73458412:40
fungiianw: clarkb: corvus: i'm mildly suspicious of afs01.ord.o.o, dmesg has entries from friday which look like a xen domu suspend/restore12:40
fungistill trying to work out where the actual afs timeouts are coming from though, not sure if that server is involved12:40
fungiyeah, nevermind, vos listvol says it doesn't host copies of those volumes anyway12:41
fungiand afs02.dfw.o.o is probably not the source of the problem as it's only hosting readonly replicas while the timeouts are for interactions with the rw volumes12:42
*** hashar has joined #opendev12:46
fungifor some reason `fs checkservers -cell openstack.org` doesn't seem to work the way the manpage implies (it tells me the local machine is unavailable)12:50
fungithis is weird... for some reason all the problem volumes have an extra "server afs01.dfw.openstack.org partition /vicepa RO Site" listed by `vos listvldb`12:57
fungi(so one rw replica on afs01.dfw and two ro replicas on afs01.dfw as well as one ro replica on afs02.dfw)12:58
*** rajinir has quit IRC13:01
fungino, wait! that's because i was querying from afs01.dfw13:03
*** sgw has joined #opendev13:03
fungiif i query from my workstation it lists an rw and ro site *on* my workstation?!?13:04
*** rajinir has joined #opendev13:04
funginow the fs checkservers error is starting to make sense13:04
fungithis is _bizarre_13:04
fungihttp://paste.openstack.org/show/794566/13:05
openstackgerritHervĂ© Beraud proposed openstack/project-config master: gerritbot: more notifications in the oslo channel  https://review.opendev.org/73482713:05
openstackgerritSagi Shnaidman proposed zuul/zuul-jobs master: Add jobs for testing ensure-ansible  https://review.opendev.org/73458413:05
*** ravsingh has joined #opendev13:07
fungisimilarly if i run vos listvldb on the mirror-update instance, it reports "server mirror-update01.opendev.org partition /vicepa RW Site" and similar for RO13:08
fungiso something in the record is saying that the rw volume and one ro replica are on the client's local system13:09
*** olaph has joined #opendev13:09
ttxlooking13:15
AJaegerttx, compare https://opendev.org/openstack/syntribos and https://github.com/openstack-archive/syntribos13:16
AJaegergithub missed the last change that we have in opendev. Ir that correct or should those be the same?13:17
ttxAJaeger: in openstack-archive we have both forms. Some have a archiving commit and some do not.13:17
ttxI think I have a slight preference for those who have the archiving commit13:17
AJaegerttx, ok - then we need to keep the official-openstack-repo-jobs until last minute - I'll know what to do.13:18
AJaegerthanks, ttx13:18
ttxas it may not be super-obvious this is archived content otherwise13:18
ttx(you have to notice the change in org name)13:18
ttxAJaeger: if that's not too much of a hassle, i think that's better yes13:18
AJaegerttx, it's no problem13:20
AJaegerttx, https://review.opendev.org/734835 updates the docs13:24
openstackgerritEmilien Macchi proposed openstack/project-config master: Retire Paunch  https://review.opendev.org/73464013:27
*** ykarel is now known as ykarel|afk13:31
fungii need to disappear for a grocery pickup appointment, but can resume banging my head against these weird afs volumes in a bit13:46
openstackgerritEmilien Macchi proposed openstack/project-config master: Retire Paunch  https://review.opendev.org/73464013:59
*** ykarel|afk is now known as ykarel14:02
openstackgerritSagi Shnaidman proposed zuul/zuul-jobs master: Add jobs for testing ensure-ansible  https://review.opendev.org/73458414:04
mordredinfra-root: I've got to run an errand and will be out for a couple of hours.14:12
*** sshnaidm is now known as sshnaidm|bbl14:23
*** hashar has quit IRC14:39
openstackgerritOleksandr Kozachenko proposed openstack/project-config master: Add magnum and magnum-tempest-plugin in required-projects  https://review.opendev.org/73486314:43
*** mlavalle has joined #opendev14:47
clarkbfungi: ianw: I wonder if that is an afs version mismatch problem? Like perhaps using your local openafs system to talk to older fileservers to create the volumes is a problem (this assumes we didn't create the volumes on the servers themselves)14:58
*** priteau has joined #opendev14:58
clarkbmaybe we should try to create a new test volume on afs01 and see if it exhibits the same behavior?14:58
*** ykarel is now known as ykarel|away15:01
*** lpetrut has quit IRC15:09
openstackgerritDrew Walters proposed openstack/project-config master: Add missing project to Airship doc job  https://review.opendev.org/73487415:14
fungiclarkb: i saw the same behavior on mirror-update01.openstack.org which is xenial, same as afs01.dfw, and also saw it when running locally *on* afs01.dfw15:18
clarkbfungi: right but where did we create the volumes?15:18
clarkbI'm just wondering if it could be a creation problem with new openafs talking to old openafs to create the volume15:19
fungiahh, so an issue on creation... maybe? i created some from my workstation, but i think ianw may have needed to delete and recreate them later (now i don't remember why, i'll look up the earlier discussions)15:20
clarkbI think it was a similar situation with extra volume replicas?15:20
clarkbbut I'm not 100% sure on that15:20
fungii'll try to find that earlier conversation after lunch15:20
*** ysandeep is now known as ysandeep|away15:30
clarkbfungi: has https://review.opendev.org/#/c/729029/1 been tested to check that the ep_headings plugin continues to work?15:33
*** olaph has quit IRC15:33
clarkbfungi: if not I think what we can do is push a followup that forces the system-config-run etherpad job to fail, add a hold for that node then use /etc/hosts to talk to the test node as if it were production and check it?15:33
*** yoctozepto has quit IRC15:34
*** yoctozepto has joined #opendev15:35
fungii have not tested it, no15:42
openstackgerritClark Boylan proposed opendev/system-config master: Test etherpad with testinfra  https://review.opendev.org/73488015:50
clarkbk, I've put a hold on ^ and when we are done checking things that way we can drop the assert False and have a bit more checking done autoamtically too15:51
*** rpittau is now known as rpittau|afk16:07
openstackgerritMerged openstack/diskimage-builder master: Fix yumdownloader cache dir  https://review.opendev.org/69878816:24
openstackgerritCarlos Goncalves proposed openstack/diskimage-builder master: Add support for CentOS 8 Stream  https://review.opendev.org/73408316:27
*** ravsingh has quit IRC16:36
clarkbfungi: 213.32.76.138 in /etc/hosts as etherpad.opendev.org and load https://etherpad.opendev.org/p/clarkb-test it works but its not perfect16:41
fungimmm16:42
clarkbfungi: I think next step may be disabling ep_headings and checking if it renders properly?16:42
fungiprobably. it's just one line in the config16:43
clarkbtrying to figure out how to do that on the running instance16:45
clarkbwe bake it into the image, might be easiset to just push another update without that in the image16:48
clarkbbefore I do that I'll try rebuilding thei mage on the test node and restart with docker compose16:49
clarkbheh I've just realized the ep_headings thing is the only thing we change so I can just switch to upstream image to test this16:53
clarkbI'll do that if the image rebuild fails for some reason16:53
clarkbfungi: ya without ep_headings its a bit better. What i notice though is that we're running with the new format not the old ui16:54
clarkbwhich may be related16:54
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Allow upload-docker-image role to be used outside of promote  https://review.opendev.org/73489016:56
openstackgerritOleksandr Kozachenko proposed openstack/project-config master: Add openstack/magnum and openstack/magnum-tempest-plugin in required-projects  https://review.opendev.org/73486317:00
*** priteau has quit IRC17:01
clarkbfungi: ok test it now. I think the problem is 1.8.3 switched to colibris skin by default17:06
clarkbeven though the 1.7.x series docs said this wouldn't happen until etherpad 2.017:06
clarkbforcing skinName to no-skin in the settings seems to fix this17:06
clarkbI'll get a change up taht does that and we can recapture the host and dobule check it is happy with config management (and not my manual fiddling)17:07
openstackgerritClark Boylan proposed opendev/system-config master: Upgrade Etherpad to 1.8.4  https://review.opendev.org/72902917:10
openstackgerritClark Boylan proposed opendev/system-config master: Test etherpad with testinfra  https://review.opendev.org/73488017:10
clarkbI'm putting a hold on that again17:10
clarkbfwiw using the test node as a test system seems to be working reasonably well. And better yet I can recylce them easily17:11
clarkb*test node as dev system17:13
fungiclarkb: we set the "noTheme" (or whatever it's called) skin explicitly in our configs, so should override the colibris default17:14
clarkbfungi: I couldn't find that fwiw17:15
clarkbfungi: it wasn't until 1.8.3 that the default changed from no-skin to colibris though17:15
clarkbwhich is why I think we were fine (I looked for a place we were explicitly overriding it and couldn't find it so I added it to your change)17:15
*** hashar has joined #opendev17:16
fungiit used to be in our config, i even removed the line so we could test the colibris default on etherpad-dev17:16
fungii wonder if we lost that when we containerized17:16
clarkbfungi: that could be17:17
fungithough yes, you're right that was when i was testing a commit from the devel branch while they were working toward 1.8.317:17
fungibut it was explicitly set to no theme in our config at that point17:18
fungiwhich prevented us from following the default change to colibris17:18
clarkbI personally don't care for colibris, it feels more like a official document than a ethereal note pad17:19
clarkbbut I think we should update to colibris as a separate step if we want to go that route17:19
clarkb(for anyone wondering why not jump to colibris)17:19
fungiyes, i mean, we tried it out on the old review-dev and none of us seemed impressed with it17:23
fungiit's trying too hard to be google docs i think17:23
fungii don't want a collaborative word processor, i want a collaborative text editor17:24
fungiit's possible folks who spend a lot of time in wysiwyg environments prefer the word processor feel, while folks who spend a lot of time in terminals emulators and shell environments prefer the text editor feel17:25
clarkbthat could be17:25
clarkbit is rare that i start soffice17:26
clarkb(and yes libreoffice still installs that binary)17:26
fungii do everything i can to avoid starting a word processor17:26
fungiincluding command-line converters which turn word processing documents into plain text17:26
fungi(antiword, for example)17:27
fungii suppose giving users a dislplay toggle to switch between themes individually/locally would allow for the best of both worlds17:27
fungibut it doesn't appear that they've designed for such a case17:28
clarkbfungi: ya we'd need headings and other potential plugins to work with both skins in that case17:28
clarkbthe previous test instance that I fixed manually continues to lgtm17:31
clarkbif the new test instance which should be up in about 20 minutes or so looks good to others then I think we can go ahead and land the upgrade17:31
fungiawesome, thanks for picking that up!17:38
fungii've been buried under ml discussions and reviews17:38
*** sshnaidm|bbl is now known as sshnaidm17:40
clarkbwell I wanted to approve it and then realized I should double check it was ready :)17:49
openstackgerritEmilien Macchi proposed openstack/project-config master: Deprecate Paunch  https://review.opendev.org/73464017:49
*** hashar has quit IRC17:57
fungiclarkb: ianw: back to the afs volumes, this actually looks like the same thing we saw previously with those volumes: http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-05-26.log.html#t2020-05-26T23:45:0117:58
fungiit's odd, both the volumes i created and those ianw created are exhibiting the issue18:00
clarkbinfra-root 104.239.168.111 in /etc/hosts as etherpad.opendev.org seems to be working now with my update to set no-skin on fungi's upgrade change18:01
clarkbhttps://etherpad.opendev.org/p/clarkb-test is the etherpad I used really quickly there18:02
openstackgerritClark Boylan proposed opendev/system-config master: Test etherpad with testinfra  https://review.opendev.org/73488018:02
clarkbthat change should pass now that I don't need it to fail for node holding. I've also cleaned up my earlier holds18:04
fungiclarkb: seems to work, though now the weird background overlap we were seeing through meetpad appears in this test deploy of etherpad18:05
clarkbfungi: its not as bad this time, my 'g' renders with my color properly18:06
clarkbbut its breaks into your color which is weird18:06
clarkbthe meetpad situation was the g had no tail18:06
clarkboh wait I had to add more text :)18:06
clarkb:/18:06
clarkbhard refresh doesn't change that18:07
clarkbI wonder if this is new etherpad bug and we were simply noticing it with meetpad beacuse we were doing some testing18:07
fungientirely possible18:07
fungiit's like they added more top and bottom padding within the authorship color container or something18:08
fungialso the ability to independently dock or float the authors and chat boxes has changed18:09
funginow you can alter the behavior through the config modal, but there are no buttons to switch them between docked and floating independently18:10
fungialso the author colors toggle no longer temporarily fixes the background overlaps like we saw with meetpad, so maybe this is a slightly different problem18:11
clarkbI can transition the chat box between docked, float and closed using the little buttons for it18:14
clarkband clicking the authors thing toggles it too18:14
clarkbdo you mean some other behavior?18:14
fungilooking again18:16
fungioh, yep, i missed the buttons for chat18:17
clarkbI do also find toggling authorship colors doesn't change the overlap18:17
fungii guess it's that the floating user list is now independent of your chat mode18:17
clarkbmaybe we should try without ep_headings again and see if that color overlap behavior changes18:18
fungiso author list can float over the chat column unless you toggle the "show chat and users" config option18:18
clarkbfungi: to do that you can edit /etc/etherpad-docker/docker-compose.yaml on the server to change the image for etherpad from our ehterpad image to the upstream 1.8.4 image since the only thing we changein our image is the addition of the ep_headings plugin18:19
clarkb(not sure if you are interested in doing that or I should go for it18:19
clarkbthough I'm about to pop out for a bike ride18:19
fungii can't seem to ssh into 104.239.168.111 as root18:19
fungioh! it has my user on it18:20
clarkbfungi: yes, its been converted to one of our production nodes (at least as far as behavior goes)18:20
fungii guess that's an artifact of our production-like testing18:20
clarkbyup18:20
fungineat18:20
fungiyeah, i can do it, just need to find the name of their dockerhub org18:21
fungilooking now18:21
clarkbfungi: its in our dockerfile18:21
clarkbI think you want to change docker.io/opendevorg/etherpad to docker.io/etherpad/etherpad:1.8.418:22
*** mlavalle has quit IRC18:22
clarkbthen sudo docker-compose down && sudo docker-compose up -d and refresh browser18:22
fungiyep, i concur18:22
clarkbfungi: looks like you used 1.8.0? we want 1.8.4 (unless you want to check the old and new behavior in comparison)18:23
fungiaha, yep18:23
fungiand after `cd /etc/etherpad-docker/`18:23
clarkbwe should check it with the older version anyway18:23
fungiwell, i switched to .4 just now18:24
clarkbya we can do it after18:24
fungibut yes, we can do both18:24
fungistill seems to do it with upstream image18:24
clarkbproblem continues after removing ep_headings18:24
clarkb(which makes me think it is a bug in etherpad on the no skin skin)18:24
fungii'll try dropping back to .0 again using upstream18:24
clarkb1.8.0 looks fine18:25
fungidowngrading to 1.8.0 makes the overlap go away18:25
clarkbI also notice that the font sizes changed18:25
fungiyeah18:25
fungithey did indeed18:25
clarkbI wonder if those things are related18:25
fungilikely18:26
clarkbon a positive note testing things like production in testing is remarkably easy18:26
fungiamazingly18:26
clarkbfungi: maybe we should file a bug with etherpad about it (and take screenshots using this test isntance?) and see what they say before upgrading?18:27
clarkbI'm worried that we'll be told no-skin isn't supported anymore18:27
fungithat makes sense as a next step, sure18:27
fungiand yes, i have the same expectation18:27
clarkbk, I can work on that after a bike ride and lunch if you don't want to bother.18:28
clarkband with that I'm popping out now for a bit18:28
*** sshnaidm is now known as sshnaidm|afk19:05
Open10K8SHi team19:19
Open10K8Shttps://review.opendev.org/#/c/734863/19:19
Open10K8SPlease check this PS19:19
Open10K8SI updated the commit message19:19
Open10K8SRegards19:19
Open10K8Shttps://review.opendev.org/#/c/734891/ is waiting19:20
clarkbOpen10K8S: +220:30
corvusit looks like https://review.opendev.org/733409 ran for the first time in a cloud with differing public and private ip addresses in the gate and failed20:37
corvusi guess we should actually map the nodepool private_ipv4 to our ansible inventory public_v4 in the gate20:39
corvussince it seems like the private addresses are what we write to /etc/hosts20:39
openstackgerritJames E. Blair proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests  https://review.opendev.org/73340920:42
clarkbcorvus: hrm, we put the private IP there to avoid traversing NAT which has been problematic in the past20:43
clarkb(for things like vxlan tunnels)20:43
clarkbfungi: I think 5fd6aeeea62674cecf997421546a675d91cf45ef may be the commit that broke things20:43
corvusclarkb: yeah, i think just using the private ip should be fine20:44
clarkbfungi: I don't understand why yet, but the commit message in etherpad-lite makes it seem likely20:44
clarkbfungi: I'll file a bug now with a couple screenshots anda pointer to that commit and see if they say anything20:47
fungiclarkb: thanks! i could do it, but not until after i'm done prepping dinner20:48
openstackgerritJames E. Blair proposed opendev/system-config master: Fake zuul_connections for gate  https://review.opendev.org/73092920:49
corvusand i think the public/private ipv4 issue also broke the last run of that, so that's a rebase20:49
openstackgerritJames E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support  https://review.opendev.org/72030220:50
clarkbno worries, I'm going to update etherpad on our test node to get screenshot of the broken stuff20:50
fungicool, that's how i was considering doing it too, since it's just a one-liner edit and down/up the container20:52
openstackgerritMerged zuul/zuul-jobs master: Allow upload-docker-image role to be used outside of promote  https://review.opendev.org/73489020:55
clarkbhttps://github.com/ether/etherpad-lite/issues/4106 has been filed21:03
clarkbcorvus: on the ip address switch, do we use public_ipv4 for anything? (iptables?)21:04
clarkbI know ansible itself is going to use the ansible_host value21:04
corvusclarkb: yes that's exactly it21:04
clarkbthanks21:05
clarkbchange lgtm then21:05
corvusthat's the value we (just recently) started using in the iptables rules; that replaced a dns lookup -- it's now an ansible inventory lookup so that we can do iptables by ansible group21:05
clarkbrgr21:05
clarkbcorvus: btw not sure if you saw but the "use zuul test node as -dev server standin" worked really well earlier today21:06
clarkbmordred: ^ you too21:06
corvusclarkb: oh nice, sorry i missed a bunch earlier.  but that's pretty cool.  might make some kind of self-service hold thing worthwhile21:07
clarkbI changed my vote on the etherpad upgrade change from +2 to -W with a link to the issue I filed21:10
clarkbwe'll see where that takes us I think21:10
fungiawesome, thanks again!21:28
*** mlavalle has joined #opendev21:29
corvuswoohoo!  zuul started in system-config-run-zuul: https://zuul.opendev.org/t/openstack/build/ef6229a9233f4206a1d24e0724839f83/log/zuul01.openstack.org/debug.log21:47
corvusi'm going to do one more rebase of that stack21:47
openstackgerritJames E. Blair proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests  https://review.opendev.org/73340921:48
openstackgerritJames E. Blair proposed opendev/system-config master: Fake zuul_connections for gate  https://review.opendev.org/73092921:48
openstackgerritJames E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support  https://review.opendev.org/72030221:48
*** DSpider has quit IRC21:58
Open10K8Sclarkb: thank you22:00
clarkbfungi: going back to the openafs oddity, is your local openafs a 1.8 version? I wonder if that could be part of it and we should try a create on the fileserver itself?22:01
fungiwell, a bunch of those volumes (possibly all of them?) were created by ianw, so it may make more sense to check how and from where he created them22:02
clarkbah22:02
fungior at least double-check whether i'm misreading the discussion here from may 2622:03
fungibut yes, i've currently got openafs 1.8.6~pre1-322:04
fungifrom debian unstable22:04
* clarkb looks at python2 things again22:09
openstackgerritClark Boylan proposed openstack/project-config master: Install git-review under python3 for proposed updates  https://review.opendev.org/73501922:13
clarkbthat was an easy one to address so I went for it22:13
ianwclarkb/fungi: i'm guessing the afs volumes still aren't happy?22:13
clarkbianw: ya22:13
clarkbianw: one thing I was wondering about is if a newer openafs was used to create them which potentially caused problems22:14
ianwthe thing is they *were* happy, for a bit22:14
clarkbI've only ever created them on the fileserver and while its been a while those have always been happy as far as I know22:14
ianwvos examine mirror.wheel.focala6422:18
ianwCould not fetch the information about volume 536871131 from the server22:18
ianwPossible communication failure22:18
ianwthat's on miror-update22:18
clarkbcorvus: comment on https://review.opendev.org/#/c/730929/622:18
ianwi wonder if tcpdumping again we'll see this icmp stuff22:18
ianw       server mirror-update01.opendev.org partition /vicepa RW Site22:19
ianwthis seems familiar.  somehow it has decided that mirror-update01 is in the vldb, or at least is showing it as such22:19
clarkbianw: was the volume created from mirror-update01.opendev.org or something like that22:19
clarkbianw: and ya fungi found that this was the same problem we had earlier (a week or two back) when we hit prolbmes around these volumes22:20
clarkbthat was based on irc logs22:20
ianwclarkb: i think it was, but this is similar to what i saw when looking at fungi's volumes; why i deleted them and recreated them, which worked when i left it, but appears to have gone back to the same thing now22:20
ianwrx data vldb reply get-entry-by-name-n "mirror.wheel.focala64" numservers 4 servers 127.0.1.1 127.0.1.1 23.253.73.143 104.130.138.161 partitions a a a a rwvol 536871131 rovol 536871132 backup 536871133 (504)22:26
ianwthat's from a tcpdump to afsdb0122:26
ianwthat sure looks like it's saying the servers for wheel.focala64 include 127.0.1.122:26
fungiyep22:26
fungithat's exactly what i found looking into it today too22:26
fungiso the vldb records have somehow replaced the afs01.dfw.openstack.org record for the rw volume with 127.0.1.1, and added a 127.0.1.1 ro replica22:28
ianw127.0.1.1 and openafs has some google hits22:28
fungicould it be because /etc/hosts on afs01.dfw has "127.0.1.1 afs01.dfw.openstack.org afs01"22:28
fungiand so it's resolving its ip address based on that?22:29
ianw"This seems to bite everyone who installs the Debian or Ubuntu packages on22:29
ianwa non-modified server which has"22:29
fungilast modified date on that file is more than a year ago though22:29
clarkbfungi: its part of our normal setup to do that22:29
fungiright22:29
fungiwhich is why i'm wondering what has caused that to suddenly become an issue22:30
clarkbya thats why I wondered if openafs version used to do the create is important22:30
fungiunless maybe these are the first volumes we've added since the hosts file was "normalized" to match our other servers?22:30
clarkbbeacuse we've left these things alone for a long time and they've been fine22:30
clarkbfungi: thats possible, I don't know if we've added new volumessince the xenial upgrade22:30
ianwwell i think we have a smoking gun ... first thing is how to get rid of it22:31
ianwhttps://lists.openafs.org/pipermail/openafs-info/2013-December/040285.html22:31
fungiyeah, occam's razor says we had it set up correctly on trusty, the xenial upgrade undid the hosts file back to "normal" and these are the only volumes we've added since22:33
clarkbianw: I'll admit I don't quite understand any of what that email is trying to say22:34
clarkblike the 127.0.1.1 problem is because it is already in /etc/hosts ? why do we need to update it?22:34
ianwclarkb: heh, me either yet :)22:34
ianwbut i think it's our current best clue :)22:34
clarkbya I agree it seems to be the thread to pull on22:34
clarkbhttps://docs.openafs.org/Reference/5/NetRestrict.html is the other thing I didn't recognize22:35
clarkbI think that means we can add netrestrict files on the servers to exclude 127.0.1.1 and whatever else22:35
clarkbwhich is probably a reasonable enough workaround for us as we can stick all of that into config managmeent22:35
ianwi think if we do the vos remsite on afs01 it will remove appropriately22:36
clarkbianw: maybe we need both things? remsite to fix the existing volumes and netrestrict to avoid this in the future?22:37
ianwyeah, remove the 127.0.1.1 entries, then make sure they don't come back22:37
mordredclarkb: yay! (re: -dev server standin)22:38
mordredalso - sorry, my "I'll be gone for a couple of hours for errand" - I forgot to translate that to nola time22:39
clarkbmordred: I just assume the rain flooded the streets so everyone raided the bars for togo cups full of $drink_of_choice22:39
ianw$ vos remsite -server afs01.dfw.openstack.org -partition a -id mirror.wheel.focala6422:39
ianwDeleting the replication site for volume 536871131 ...Removed replication site afs01.dfw.openstack.org /vicepa for volume mirror.wheel.focala6422:39
mordredclarkb: that's an excellent assumption with very accurate details22:40
ianw22:39:08.878673 IP 104.130.136.20.7003 > 104.130.137.130.56972:  rx data vldb reply get-entry-by-name-n "mirror.wheel.focala64" numservers 3 servers 127.0.1.1 23.253.73.143 104.130.138.161 partitions a a a rwvol 536871131 rovol 536871132 backup 536871133 (504)22:40
clarkbianw: so its still reporting 127.0.1.1?22:40
mordredclarkb: the real issue is that we're still only at 30% capacity for bars that don't serve food - so the lines for go-cups of $drink are extra long22:41
ianwclarkb: before (top) and after (bottom) : http://paste.openstack.org/show/794603/22:41
ianwi think there's two entries ... and it removed the "real one"22:42
clarkbianw: ya your tcpdump shows 127.0.1.1 so I assume it removed the real one22:42
mordredclarkb, ianw: so - just to catch up - our entries in /etc/hosts with 127.0.1.1 hostname are bad and break things. didn't we spend a bunch of time at one point to make sure those entries existed?22:42
clarkbmordred: yes, those entries make other things happy iirc22:43
clarkbsomething to do with unbound maybe22:43
clarkb(because it listens on that addr?)22:43
clarkbianw: remsite says you can provide the ip address or the name22:43
clarkbianw: so maybe add the proper site back then use 127.0.1.1 to remsite the wrong one?22:44
ianwok, "vos examine" with "-noresolve" shows :   server 127.0.1.1 partition /vicepa RW Site22:44
mordredclarkb: nod. well - fwiw, we only do that in set-hostname which we only do in launch-node22:44
ianwso the *rw* volume is on 127.0.1.122:44
clarkbi'm looking at http://docs.openafs.org/Reference/1/vos_remsite.html fwiw22:44
clarkbmordred: ah so we could maybe make an exception for afs servers22:44
mordredso if we wanted it to be different on some of the hosts, it shouldn't be hard, nor should it break anything22:44
mordredyeah22:44
mordredI think we could just write an /etc/hosts file in the afs role22:45
mordrednothing should fight22:45
*** tkajinam has joined #opendev22:45
ianwmaybe vos move is way to update the rw volume?22:49
ianwsomething happened ... maybe ... http://paste.openstack.org/show/794604/22:53
clarkbfwiw vos move looks correct for RW sites reading manpages22:53
clarkbaddsite is RO only22:53
clarkbianw: does vos examine -noresolve look happier now too?22:54
ianwno ;)22:58
ianw$ vos examine -noresolve mirror.wheel.focala6422:58
ianwCould not fetch the information about volume 536871131 from the server22:58
ianw: No such device22:58
*** aannuusshhkkaa has joined #opendev22:59
*** shtepanie has joined #opendev23:00
clarkbdoesn't seem like there is a way to convert a RO site to a RW site?23:00
rm_workHey, trying to walk some folks through registering for a new openstack account, and it's failing to create accounts right after clicking register on https://openstackid.org/auth/register23:00
rm_workIs this a known issue?23:00
clarkbotherwise I'd say addsite with the correct IP, this gives us a RO volume. Then switch it to the RW volume23:00
rm_workHTTP 500: openstackid.org is currently unable to handle this request.23:00
clarkbrm_work: I'll ping the foundation sysadmins and see if the server logs say anything obvious to me23:01
rm_workok, both of them are getting it, and I was able to replicate as well.23:01
clarkbit looks like PHP is running out of memory but the service itself has plenty23:02
clarkb*server itself23:02
clarkbrm_work: foundation sysadmin is looking at it now. Will let you know what they find23:04
rm_workalright, thanks!23:08
ianwclarkb: the move seems to have failed, i'm not sure what to do now23:10
clarkbianw: ya I don't know either. Maybe ignore that one for now (and we'll rebuild it), but try the remsite using the IP on another one and see if that fixes it?23:11
ianwi tried on bionica64 and it did the same thing23:13
ianwFailed to create a transaction on the source volume 53687112523:13
ianw   VOLSER: volume is busy23:13
clarkbianw: using the IP it removed the other IP?23:13
clarkbwith remsite I mean23:13
*** tosky has quit IRC23:15
clarkbrm_work: I've been askedto confirm you are ticking the captcha box23:21
rm_workyes.23:21
ianwclarkb: no, it seemed to fail.  i'm going to try just removing all the broken volumes23:21
clarkbianw: ok, before we create new ones should we add the netrestriction?23:21
clarkbI guess we can do that manually then put it in config management later if that is easier23:22
rm_workyou can fairly easily replicate, they should be able to throw a junk name and email in and replicate themselves in like 10 seconds23:22
ianwclarkb: yeah, let me clean what's there then we can try making them again :)23:22
clarkbrm_work: the main person says they haven't been able to reproduce though someone else has reprodcued23:22
rm_workhuh. all three of us trying here got the same thing. so maybe if we just hammer it? :D23:22
clarkbrm_work: we are looking at the memory issue as a possible cause though23:22
ianwoh, i wonder if it's the mount in the r/w parition?23:22
clarkbianw: oh we need to remove it from the fs side, the move then remount? that would make sense23:23
ianwclarkb: i can't do an rmmount because the "file doesn't exist"23:25
clarkbhrm23:26
clarkbmaybe we have to rmmount before remsite?23:26
clarkb(I claim no expertise though, maybe mordred or corvus know better?)23:29
mordredI dind't do it23:29
ianwvos remove -noresolve -server afs01.dfw.openstack.org -partition a -id mirror.wheel.bionica6423:31
ianwVolume 536871125 does not exist on server and partition23:31
ianw   VOLSER: no such volume - location specified incorrectly or volume does not exist23:31
ianwthis is really starting to annoy me :)23:31
*** mlavalle has quit IRC23:32
clarkbrm_work: latest word is that it may be input specific23:32
clarkbrm_work: debugging continues, but I'm just playing messenger right now so don't have all the details23:32
rm_workO_o23:32
rm_workso on our side, we had three separate people doing it and it failing in the same way. I wonder what all three of us could have independently done to get that error23:33
fungisounds like it was input-dependent23:40
fungiso that all of them were hitting it isn't too surprising23:41
openstackgerritsebastian marcet proposed opendev/system-config master: OpenstackId v3.0.10  https://review.opendev.org/73502223:43
rm_work^^ related?23:44
rm_workI mean what input would three different people have shared? different names, different email domains...23:44
rm_workunless being in the USA is an input problem :D23:44
openstackgerritsebastian marcet proposed opendev/system-config master: OpenstackId v3.0.11  https://review.opendev.org/73502223:44
clarkbrm_work: field size I think23:45
clarkbrm_work: not the exact data, but its width23:45
clarkb(which may also explain memory issues if the allocated memory is dependent on expected input sizes?23:45
rm_workO_o23:45
clarkbbut again I don't know specifics23:45
rm_workMy name and email address were both pretty short23:45
rm_workbut yeah23:46
* rm_work shrugs23:46
rm_workas long as it works23:46
fungi*shrug* php ;)23:47
ianwclarkb: through a series of vos removes on afs01.dfw i think i have cleared the bad volumes23:51
ianwi've removed the mounts23:55
ianwi think we can try recreating23:55
ianwi need a cup of tea first :)23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!