Monday, 2020-08-17

openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add ensure-rust role
ianwseems it's just out of sync
ianwdoesn't line up00:16
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add ensure-rust role
ianwmnaser: fyi the POST_FAILURES for vexxhost seem to be back with ^00:23
ianwalso seems like we've lost the log streamer on some executors00:26
ianwalthough it's saying00:27
ianwBuild ID 575534f236fd445498760c09dad0c525 not found00:27
ianw... rather than just end of stream ... that's odd i think00:27
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add ensure-rust role
ianwmnaser: update, it worked now.  so ... intermittent maybe?00:53
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: update gentoo to allow building arm64 images
prometheanfireianw: ^ :D03:42
prometheanfirefungi: added arm64 systemd support now too03:42
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: update gentoo to allow building arm64 images
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml
openstackgerritMerged openstack/project-config master: Normalize projects.yaml
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] Fedora 32 support
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] Fedora 32 support
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fedora 32 support
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fedora 32 support
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fedora 32 support
openstackgerritThierry Carrez proposed opendev/system-config master: Make read-only
openstackgerritMerged openstack/project-config master: Retire networking-l2gw and networking-l2gw-tempest-plugin
openstackgerritBenjamin Schanzel proposed zuul/zuul-jobs master: Multi Node/Context support for mirroring Git workspaces via kubectl
mnaserinfra-root: i think maybe an gitea instance is unhappy14:16
corvusmnaser: i tried "git clone" for all 8 instances and the clone at least started14:21
corvusi didn't let them all run to completion14:21
mnasercorvus: ok, perhaps it is something at their side then :\14:21
mnaseri would have done that but i dont remember the port numbers heh14:22
corvusi had to check in the system-config repo :)14:22
clarkbone trick for identifying the backend node you are talking to is to inspect the ssl cert and look at the altnames14:28
fungiyeah, i only just approved that message through list moderation, haven't had time to look into it yet14:29
clarkbcorvus: if you get a chance I put together after doing more testing Friday. Also makes javamelody not throw exceptions and the url is functional on 2.13, 2.14, 2.15, and 2.16 in my testing and the codemirror-editor error pop up doesn't happen on 2.1614:47
clarkbcorvus: for the email I'll send that out this morning if we're happy with it14:47
AJaegerconfig-core, please review - new project addition...14:50
openstackgerritMerged openstack/project-config master: New Project Request: hostconfig-operator
*** mlavalle has joined #opendev15:15
openstackgerritRiccardo Pittau proposed openstack/project-config master: Do not update upper constraints for intermediate branches
fungiianw: revisiting the f31 mirror issues, we're still seeing failures from that. i notice that the source we're mirroring from is out of date compared to the one you linked:;O=A18:05
fungino idea if/when that will be corrected. maybe we need to pick a different rsync source?18:06
fungilooks like prior to we used but both seem to be in roughly the same state18:17
fungiand before that we used but they don't seem to mirror anything later than f2918:17 has both fedora and epel with current content (and aarch64 too)18:22
fungiand is listed as having rsync available18:22
fungisame for mirror.math.princeton.edu18:26
fungihowever neither mirrors fedora atomic that i can find18:34
openstackgerritJeremy Stanley proposed opendev/system-config master: Switch Fedora mirror to
fungiinfra-root: ^ that's my attempt at fixing broken f31 jobs. if it merges soon i'm happy to watch the mirror logs and make sure things get back on track18:52
clarkbfungi: +2d but didn't approve19:29
clarkbI'll let you approve when ready19:29
corvusis there a more upstream place we can mirror from?19:30
fungialso tibbs responded in #fedora-admin and acknowledged those servers are struggling to catch up and probably still need a few dys19:30
corvusjust wondering what the policy on mirroring from a more canonical location might be; considering the amount of redhat product that goes through our system, mirroring more directly might be in everyone's interest?19:32
clarkbcorvus: if you get a chance to look at and for gerrit things today that would be most excellent19:32
corvusclarkb: will do19:32
fungicorvus: i concur. i wonder where the best place is to find out/ask19:32
fungi seems to indicate (unsurprisingly) that to be able to pull from the "master" servers, your mirror has to appear in the official download mirrors list19:37
corvusok, that might be a bit much for us to bite off19:37
fungiso if we don't want to be operating an official mirror, we might need to have a discussion with someone about an exception19:37
openstackgerritMerged opendev/system-config master: Switch Fedora mirror to
corvusfungi: ^19:59
clarkbthat should all happen pretty automatically since we do the localhost vos releases for the fedora mirror20:00
clarkbthough if yo uwant to speed it up you can run before the next cron job I guess20:01
fungiyeah, i was just going to wait, and then check logs. juggling plenty of other stuff at the same time20:02
fungibut i have a change which is an easy reproducer of the problem, so can recheck that once i see that the mirror log indicates we pulled with no errors20:03
clarkbinfra-root is another gerrit related change bug fix I discovered when doing local testing (its a bug in our ansible for gerrit)20:15
clarkbprobalby not urgent but would be good to land evebtually so that we don't have to debug that if it pops up on a rebuild20:15
corvusclarkb: msg to luca looks good; only thing i might consider adding is being more explicit that we know not to use notedb in 2.15 (though you did link directly to my saying that in footnote 0)20:19
clarkbcorvus: how does that little edit look?20:21
corvus(just so we don't get a treatise back about how we shouldn't do that; lgtm)20:21
clarkbcool I'll get that sent out. I'll cc people too so that they are included in the thread20:22
corvusclarkb: docker change +320:22
clarkbnow updating the infra meeting agenda. Anything to add to that?20:35
fungiptg/forum maybe20:39
clarkbthanks for the reminder20:40
clarkbhrm the docker images change failed on an arm64 thing /me looks into that20:47
clarkbbase | W: Failed to fetch  Connection failed [IP: 443]20:48
clarkbiirc we expect ipv6 in that cloud20:48
clarkbso its odd that we would try ipv4, maybe ipv6 failed and we are falling back?20:48
clarkbwe do have a AAAA record for that mirror20:49
clarkbthe server pings but I'm having trouble getting https to respond20:50
clarkbhrm it finally loaded20:50
clarkbafs: Lost contact with file server in cell (code -1) (all multi-homed ip addresses down for the server)20:53
clarkblots of stuff like that in dmesg and I'm guessing that is the cause20:54
clarkbits all three afs file servers that end up in the log that way20:57
clarkbthat makes me think the problem is on the linaro cloud side of things20:57
clarkbat the current moment I can ping all three of them from the mirror iwth no lost packets20:58
clarkbfungi: I'll put on the agenda too21:00
fungigreat idea21:03
corvusclarkb: btw luca is generally in london timezone if you wanted to do a voice thing21:07
clarkbthats good to know, thanks21:08
* clarkb has had a lot of early mornings lately21:08
clarkbcorvus: reading luca's response I think we should probably strongly ocnsider a 2.16 upgrade without notedb, then regroup, then do notedb21:10
corvus+1 to that21:11
clarkbalso looks like I should test an upgrade through to 3.0 next in order to check the db cleanups (mostly I just want to see that we aren't accidentally relying on the db when we expect not to)21:14
clarkbbut thats good gives me a few things to push on on our side21:14
clarkbI kept some pings running on the linaro mirror and they lost no packets but dmesg shows that afs lost contact again21:40
clarkbit seems that things generally work but then afs doesn't for some reason?21:40
clarkbfungi: ^ any ideas on what may be going on there? I guessit could be port specific? or we're relying on some stateful firewall treatment of udp packets?21:41
clarkbmaybe I should switch to a udp ping21:41
fungiyeah, also make sure it's ipv421:42
clarkbyup ws ipv421:42
clarkbI pinged the v4 addrs directly, didn't use names21:42
clarkbudp pings may need mtr? Doesn't seem like ping does them21:44
fungihping can, i think21:46
fungihping3 package probably21:47
fungi"hping3 is a network tool able to send custom ICMP/UDP/TCP packets and to display target replies like ping does with ICMP replies."21:47
clarkbI'm installing mtr since I'm familar with it21:50
fungiahh, i'd never really tried with mtr21:50
clarkbmtr's udp pings don't seem to reliably get to the fileservers21:54
fungiiptables rules?21:55
clarkbmaybe, I can try setting hte port.21:55
fungiwe don't seem to set the source ports, so it just needs to be one of the allowed destination ports21:56
fungilike afs3-fileserver/udp (7000/udp)21:56
* clarkb tries port 700021:57
clarkbya still not helping21:58
clarkb`mtr --udp -P 7000 $fielserverip` fwiw21:58
fungianother problem you're going to run into is that udp datagrams are, by their very nature, typically unacknowledged by the recipient21:59
fungiprobably best would be to add a temporary rule allowing access to a random unused udp port and then try pinging that21:59
clarkbI wonder how afs decides things ar eunreachable? it must be doing its own ping pongs?22:00
fungithough the answer you get back is typically going to be icmp port unreachable in that case, so you'd need to test in both directions as it's only udp outbound not inbound22:00
clarkbmaybe a better appraoch would be to tcpdump for those on both sides22:00
ianwfungi: thanks for looking, did the fedora change roll out?22:09
ianwyeah, it looks like it's timed out22:10
ianwi'll restart in a screen with no timeout22:10
fungiahh, the mirror pulse timed out you mean?22:11
fungimakes sense, there's several days of backlog22:11
fungithanks, i was about to go checking the log for that22:11
ianwok it's running in a screen22:12
ianwi get the feeling this is going to rewrite the whole mirror :/22:14
fungiis it blowing away/mirroring the wrong files?22:15
ianwnot sure if y'all saw the latest on the pyca wheels; we discovered a problem -- centos and ubuntu use different page sizes22:15
fungifor arm64/aarch64?22:16
ianwthe manylinux aarch64 spec doesn't specify pagesize, which is a problem it's going to have to solve22:16
clarkbthats a thing?22:16
fungiwow, that's a blocker22:16
ianwon arm, yep22:16
clarkbianw: do you have any idea why the linaro mirror loses connectivity to the afs fileservers?22:17
clarkbI'm sort of poking at that but starting somewhat from scratch22:17
ianwit's going to have to specify a 64k page size to be compatible everywhere it would seem22:17
ianwso it's unfortunately a bit of a blocker for generic wheels22:18
ianwclarkb: yeah ... been through that before and no particular answer.  even spent a while with auristor looking at it, let me see if i can find log reference22:18
ianwfungi: so far it's sitting at "receiving incremental file list" and being very slow about it :/22:20
clarkbis the issue alignments?22:21
clarkbso a 4k system would be ok with our "package" built with 64k alignments?22:21
ianwyeah, the issue is alignment22:22
ianwso choosing the biggest should be compatible.  but given the wide variety of ways things get built in python ... urgh22:23
ianwi think this problem will be fairly constant22:23
diablo_rojo_phonYou probably are aware already, but it seems ethercalc is down?22:23
fungii was not aware... checking22:24
diablo_rojo_phonOh well then. In trying to make a new one I get a Service Unavailable message.22:24
diablo_rojo_phonAnd in trying to access ones already created it has a disconnected from server..reconnecting message22:25
ianwsomebody should write a thesis on how the cost of tlb reloads starts to outweigh the benefits of reduced miss rates ... oh :)22:25
auristorianw clarkb: I looked quickly through my logs of openstack-infra and I couldn't find any details.22:25
fungiapache proxies to the ethercalc service listening on 8000/tcp on the loopback, and nothing is currently listening on that port. checking the logs to see what killed it22:26
auristorclarkb: any openafs messages in dmesg?   what does "rxdebug localhost 7001" output show?22:27
clarkbauristor: dmesg just shows lost contact then file server is back up22:28
clarkbauristor: do I run rxdebug on the client or server?22:28
fungilooks like ethercalc tripped over some js exception and terminated at 17:58:42 utc22:30
ianwclarkb/auristor: this is the last time i remember looking into it fyi
clarkbauristor: running that on the client it says:
clarkbwhich seems happy22:31
fungi#status restarted ethercalc service on ethercalc02.o.o following unexplained crash at 17:58:42 utc22:31
openstackstatusfungi: unknown command22:31
fungi#status log restarted ethercalc service on ethercalc02.o.o following unexplained crash at 17:58:42 utc22:31
openstackstatusfungi: finished logging22:31
fungidiablo_rojo_phon: ^ give it another try?22:32
ianwclarkb: iirc that was paired with a lot of failures to github, a notoriously ipv4 only service too22:32
fungii'm digging deeper in the logs of its death throes now22:32
clarkbianw: oh ya I rememebr the github side of things22:33
diablo_rojo_phonTrying now.22:33
diablo_rojo_phonAll good.22:34
diablo_rojo_phonThanks fungi !22:34
ianwrsync: opendir "/linux/development/rawhide/Everything/x86_64/os/Packages/b/.~tmp~" (in fedora) failed: Permission denied (13) ... i don't know what that's about (fedora mirror process)22:35
fungiinfra-root: looks like maybe a csv export attempt may have crashed the ethercalc process somehow:
fungiworth keeping an eye on, possible that's been fixed in a newer version than we're running22:35
clarkbianw: that looks like rsync writing to a tmpdir so it can do atomic moves22:35
clarkbianw: why that would get a permission denied I don't know. We don't have a competing rsync do we?22:35
fungiwe could use --inplace instead with rsync, we don't point anyone at the r/w volume while it's being written to22:36
fungibut yeah, i agree, check for other colliding rsync runs for the same tree22:36
ianwyeah, it's not on our side ...22:37
fungioh, or maybe that's rsync failing to read?22:39
fungimaybe we should specify an exclude for that pattern22:39
ianwWell the "Massachvsetts Institvte of Technology" mirror is living up to it's olde-timey name and about as fast as a horse drawn carriage22:39
clarkboh ya it could be upstream having that tmp dir because it too is syncing22:39
clarkbthen we try to read it and fail22:39
fungicould indeed be rsync writing to a tempfile on the remote system22:39
fungiand not making it world-readable22:39
fungiand then we try to mirror it22:40
openstackgerritMerged opendev/system-config master: Gerrit image cleanups/fixes
fungiianw: speed wise, yeah, no clue what bandwidth sipb has for those mirrors. was convenient in that it was in the same state as mirror-update and an 8ms rtt away22:40
ianwthere's a rax ord mirror too22:41
fungimit is thousands of miles from rax-dfw22:41
fungihaving seen issues in the past with rackspace's mirror, i was hesitant to try22:41
fungialso rax-ord (chicago) is nearly as far geographically as mit (boston) from rax-dfw (dallas)22:42
ianwyeah i figured they'd at least have dedicated links ... but i agree, i can't imagine it gets much attention22:43
funginot that geographic distance necessarily has much bearing on it these days, i've seen traceroutes trombone across the continent and back again to go just down the street22:43
ianw... there's always that facebook mirror yum picked the other day ...22:43
fungidoes facebook's mirror have rsync open?22:44
fungi(he asked, half expecting he already knew the answer)22:44
ianw"Due to lack of available bandwidth, we currently don't offer rsync to download/synchronize content from for debuginfo pkgs. Also worth knowing that this mirror is running on limited available bandwidth too.22:45
ianwohh poor facebook mirror ... if only it had somewhere it could get bandwidth22:45
* fungi sighs. so close22:45
fungiyou had one job, internet. ONE job22:45
ianwactually, maybe egg on my face(book) ... i think that file is synced to the mirror from the upstream and is saying to *use* the fb mirror22:47
fungibut yeah, if the rackspace mirror has rsync open and it's current, then worth a try22:47
ianwthe FB mirror does seem to have rsync open22:48
ianw... this may not be a bad option ... if it's behind whatever cdn they use22:48
fungi looks reasonable22:49
fungiianw: note that rsync is unlikely to be going through a "cdn"22:50
fungiat least not a traditional http(s) cdn at any rate22:50
ianwfungi: did you kill anything in the screen : "sync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(513) [receiver=3.1.3]" ?22:52
funginope, i haven't even attached to it22:52
ianwdamn it maybe i forgot to set the no timeout flag22:53
fungithat's do it, the timeout wrapper uses kill22:53
ianwi did22:53
ianwmaybe i'll restart it against this fb mirror and see if it's faster, you can pretty much tell by just watching it22:53
fungiyeah, i should have just hacked up the script and tested in place22:54
ianwit's going, honestly about the same speed23:08
fungimaybe check the cacti graphs for bandwith utilization on mirror-update?23:37
fungicould be the slowness is at our end courtesy of rackspace's per-flavor bandwidth caps23:37
fungior don't. looks like we didn't think to add it to cacti. working on a patch now23:38
openstackgerritJeremy Stanley proposed opendev/system-config master: Add to Cacti
ianwi should probably dig out the credentials and delete some of the old servers there too23:45
ianwlike the old git ones are confusing23:45
fungii concur23:47
ianwok i'm into cacti admin interface23:56
ianwfrom my notes, last time i cleared out hosts was 2019-08-2823:56
ianwi.e. i have completely forgotten what to do23:57

