Monday, 2020-07-20

*** ryohayakawa has joined #opendev00:00
*** bolg has quit IRC00:27
*** ryohayakawa has quit IRC00:28
*** ryohayakawa has joined #opendev00:29
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Support non-x86_64 DIB_DISTRIBUTION_MIRROR variable for CentOS 7  https://review.opendev.org/74018300:36
kevinzianw: ping01:55
ianwkevinz: hey01:55
kevinzhi ianw, morning! Is it possible to have a local mirror that cached etcd packages in LInaro US?01:57
kevinzrecently I find the it usually have network connection problem when downloading etcd: https://zuul.opendev.org/t/openstack/build/d6a61543aefb4e919644b2fa6949535f/log/job-output.txt#494801:57
ianwkevinz: yeah, i've noticed a few weird things from the cloud too, like tls errors connecting to github01:59
ianwright, that's the same thing, but we see it in infra downloading ... something from github we do02:00
kevinzyes, connection to github usually failed02:00
ianw"fatal: unable to access 'https://github.com/infraly/k8s-on-openstack/': gnutls_handshake() failed: Error in the pull function."02:01
ianwis what i'm thikning of02:01
ianwwe can add reverse proxy things ... if connections to github fail so consistently it feels like we're fighting against the network02:03
clarkbgithub is ipv4 only02:04
clarkbcould it be aproblem with thr NAT?02:05
kevinzclarkb: ianw: yes, maybe the problem with the NAT rules. I will double check02:06
fungior just the nat table has too small of a pool of available source ports02:15
fungithat's a common cause for such behavior02:15
kevinzfungi: cloud you help to clarify? I'm not quite sure about this cases. Thanks :-D03:04
kevinzInitiating SSL handshake.03:05
kevinzSSL handshake failed.03:05
kevinzClosed fd 603:05
kevinzUnable to establish SSL connection.03:05
ianwkevinz: something you'd see on the NAT host i guess ... if it has run out of ports to map back to 443 (or whatever) from it's external address03:07
ianwit might be worth a tcpdump if you're on a host exhibiting it, to see what, if anything is coming back03:08
ianwbut it seems likely the packets aren't getting back to the host03:09
kevinzianw: OK, thanks for the point, I will try to debug it03:10
auristorianw: fyi, https://repology.org/project/kafs-client/versions and the 5.8rc4 kernel kafs passes all of the xfstests suite tests that could be expected to pass03:12
ianwauristor: awesome!  was that with fscache?  iirc that was actually where we saw the most problems03:14
auristorthe fscache that you had problems with has been disabled in 5.8.03:14
auristorI believe the new fscache is targeted for the 5.9 merge window03:15
ianwthe other thing is that with all that rsync work i guess we've stopped invalidating everything for the cache constantly03:17
auristornot entirely. for openafs, it is true that the data isn't invalidated but each release does invalidate all of the cached metadata, the callback state for each cached object, and the cached volume location information and server list.03:19
auristoreven with the prior rsync behavior the file data versions weren't changing, it was only the metadata.03:21
ianwit would be great for us not to have to maintain the openafs package builds we do03:24
ianwsigh, speaking of, now i check and the wheels haven't released for a few days.  something broken i guess03:25
ianwhttps://zuul.openstack.org/builds?job_name=release-wheel-cache ... you can't really tell *why* it skipped from that :/03:33
ianw2020-07-19 08:11:06.417530 | wheel-cache-debian-buster-arm64-python2 | mkdir: cannot create directory ‘/afs/.openstack.org/mirror/wheel/debian-10-aarch64//l’: Connection timed out03:35
ianwhttps://zuul.openstack.org/build/8c919da098fb42d1a79308e2e129816d/log/job-output.txt03:35
ianw2020-07-19 06:55:39.121593 | wheel-cache-ubuntu-bionic-arm64-python3 |   "msg": "Warning: apt-key output should not be parsed (stdout is not a terminal)\ngpg: keyserver receive failed: End of file"03:37
ianwhttps://zuul.openstack.org/build/5c9c5c6bdfd84485aa91bfc2865011ad/log/job-output.txt03:37
auristorUnfortunately, openafs uses 110 for VBUSY which is ETIMEDOUT on Linux.   A connection timed out error is often a volume issue and not a connection issue.03:37
ianwauristor: yeah, in the context of prior discussion with kevinz though, we're seeing some network issues, particularly ipv4 issues in our arm64 cloud (ipv4 goes via NAT there)03:38
auristorVBUSY is also the error the fileserver returns when it cannot establish a reverse connection to the clients callback service port.03:39
auristorport mappings and ipv4 address mappings combined with short timeouts on dynamic firewall rules can easily cause problems.03:40
ianwi wonder if it is a bug or a feature to not release the volumes if any wheel build fails.  on one hand, it keeps the wheel caches consistent.  on the other, arm64 issues like this stop publishing.  i'm not sure the consistency matters...03:40
ianwauristor: yeah, these same nodes are having pretty constant issues talking to github, another ipv4 only service.  it does seem to be suggesting something at that nat layer is causing problems03:41
ianwafaik we've not had issues with things like cloning from opendev at all, which should all be ipv603:41
auristorI don't remember what openafs writes to the FileLog but I believe it does write something when there are timeouts during attempts to connect to the callback service.03:42
kevinzianw: looks the netwrok issue is due to virtual router. One of the router netns has not reconstruction after last time upgrading. Now I’ve restart l3 agent to triagger the re-creation progress and re-test, looks the issue disappear03:44
kevinzI will triagger the CI jobs to see if it worked03:44
ianwkevinz: oh good :)  well it's nice to have suspect anyway03:45
ianwis it 139.178.85.147 ?  that comes up in the logs a lot03:46
kevinzianw: yes, it is the OS-jobs router03:47
auristorwhere does the line get drawn between an infrastructure service such as might be provided by rackspace or another hosting provider and a software product that must be "open source" in order for opendev to use it?03:48
ianwFri Jul 17 08:02:28 2020 CheckHost_r: Probing all interfaces of host 139.178.85.147:23199 failed, code -103:48
ianwFri Jul 17 08:13:11 2020 CB: ProbeUuid for host 00007F9A08553D58 (139.178.85.147:12123) failed -103:48
auristorI would be happy to work with rackspace or other such that auristorfs is provided as a service to opendev to make use of.03:48
ianwthat's two messages in the openafs server logs around the failure i posted before (08:11)03:48
auristorthe probing code -1 is RX_CALL_DEAD which means the fileserver didn't receive a response to any DATA or ACK PING packet sent to the callback service port; which will be a public port on the NAT address.03:50
auristorNote 23199 is not 7001.03:50
auristorThen the probeuuid to port 12123 is an attempt to see if that client has the same UUID as a previously known host that might have switched endpoints.  Again, no reply.03:53
auristorIf the NAT has a VOIP configuration, it should be used for openafs.03:53
auristorlike afs voip connections require longer lived port mappings and tolerance for longer periods of idleness.03:54
ianwhttp://paste.openstack.org/show/796095/03:55
ianwthat's all the error messages for this ip from 10th july03:55
ianw~0800 comes up a lot, i guess because that's the wheel build jobs which are afs users03:56
auristorlooking at the pattern it appears that the port mapping is good for about 60s and a client retries for approximately five minutes before giving up and marking the fileserver dead.03:58
auristorI'm done for the night.03:59
ianwauristor: thanks ... i'll keep an eye on things after kevinz's changes and hopefully things just start to work :)04:00
*** raukadah is now known as chandankumar04:04
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7  https://review.opendev.org/74186804:05
fungiauristor: the service providers in question are running services which are themselves open source software (at least in theory, but backed up by legal agreements over use of trademarks). we avoid hard dependencies on "freemium" hosted proprietary services (the sort often advertising themselves as "free for use by open source communities" and the like)04:16
fungibasically we expect the source code for the services we're relying on to be free/libre open source software, whether we run it or someone else does04:17
fungiwe don't feel like we can legitimately represent open source development and yet rely on proprietary services to produce it, anything less would be hypocrisy on our part04:19
*** ysandeep|away is now known as ysandeep04:59
*** ysandeep is now known as ysandeep|rover04:59
*** DSpider has joined #opendev05:31
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7  https://review.opendev.org/74186805:58
openstackgerritIan Wienand proposed opendev/system-config master: gitea-git-repos: update deprecated API path  https://review.opendev.org/74156206:05
openstackgerritLIU Yulong proposed opendev/irc-meetings master: Change Neutron L3 Sub-team Meeting frequency  https://review.opendev.org/74187606:13
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] Drop dib-python requirement from several elements  https://review.opendev.org/74187706:16
*** marios has joined #opendev06:28
openstackgerritMerged zuul/zuul-jobs master: Remove copy paste from upload-logs-swift  https://review.opendev.org/74184006:41
*** qchris has quit IRC06:51
*** qchris has joined #opendev07:04
*** dtantsur|afk is now known as dtantsur07:27
*** tosky has joined #opendev07:34
*** dougsz has joined #opendev07:46
*** xiaolin has joined #opendev07:56
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:03
*** fressi has joined #opendev08:10
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7  https://review.opendev.org/74186808:24
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] Drop dib-python requirement from several elements  https://review.opendev.org/74187708:24
*** bolg has joined #opendev08:42
*** ysandeep|rover is now known as ysandeep|lunch08:48
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Bump ansible-lint to speed it up  https://review.opendev.org/74189709:27
*** sshnaidm|off is now known as sshnaidm09:27
*** ysandeep|lunch is now known as ysandeep09:30
*** ysandeep is now known as ysandeep|rover09:31
openstackgerritMerged opendev/irc-meetings master: Update neutron team meeting time  https://review.opendev.org/73978009:55
*** tosky has quit IRC10:08
*** fdegir has quit IRC10:09
*** tosky has joined #opendev10:10
*** fdegir has joined #opendev10:10
*** tkajinam has quit IRC10:29
*** dougsz has quit IRC10:31
*** dougsz has joined #opendev10:46
*** ysandeep|rover is now known as ysandeep|afk11:04
*** ysandeep|afk is now known as ysandeep|rover11:31
*** weshay_pto is now known as weshay_11:34
*** ryohayakawa has quit IRC12:23
*** xiaolin has quit IRC13:26
fungicacti says etherpad01 has over 900 connections right now (which i guess is around 450 clients)13:54
clarkbI think it may be more than 2 connections per client now but I'm not sure of that13:54
clarkbbut ya its a number of connections13:54
clarkbthe etherpad itself claims to have 8713:55
clarkb(there are likely other pads in use at any given time though)13:55
fungiyeah14:00
openstackgerritRafael Folco proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7  https://review.opendev.org/74186814:21
clarkbfungi: well even if a cloud ins't multiarch you could qemu a build as nodepool is doing14:38
clarkband having the resources available "locally" simplifies things14:38
clarkbalso looks like those packages may not be populated with content?14:38
fungiclarkb: yep14:39
fungiin reference to emulation14:39
fungii think the directory index there may be generated incorrectly, probably easier to browse via afs14:40
clarkbdo you see actual content in afs?14:40
fungii'm in the process of checking14:40
clarkbstep 0 confirm it is actually working (eg packages in the mirror) then add indexes for it to all mirrors then potentially consume it in nodepool jobs then maybe start adding in generic lists of packages to build14:41
fungiahh, yeah, the index is wrong, the tree is sharded by first letter14:42
clarkboh ya this is where we have the apache rewrite rules in the actual mirrors14:42
fungihttps://static.opendev.org/mirror/wheel/debian-10-aarch64/c/cryptography/14:42
*** fressi has quit IRC14:42
fungiso just browsing from static.o.o breaks, but right our mirror vhosts would dtrt14:43
clarkbcool so step 0 is done, we can probably add setp 1 now?14:43
*** mlavalle has joined #opendev14:43
fungii think so, yeah. i haven't looked at why it's not exposed on our other mirror servers14:43
clarkb2.9.2 is what nodepool is trying to use but on py38 not py3714:43
*** fressi has joined #opendev14:43
clarkbwe can fairly easily switch nodepool to python3.7 though14:44
clarkbexcept this is for debian python not python on debian14:44
clarkbugh14:44
clarkbI wonder if it will still work14:44
*** ysandeep|rover is now known as ysandeep|away14:47
fungiit probably would14:53
fungithough yes we're lacking py38 builds of those wheels it seems14:53
clarkbfungi: ya beacuse we're targeting the distro defaults but our containers are python build on top of debian14:53
clarkbslightly different expectations between the two14:54
*** ysandeep|away is now known as ysandeep14:54
fungiyeah, and we'd need debian-bullseye nodes to have python3.8 packages for it14:55
fungior the stow-based ensure-python role14:55
fungior use the ubuntu-bionic wheels, they'd probably work on buster14:56
fungi(bionic and focal both have python3.8 packages)14:56
*** weshay_ is now known as weshay|ruck15:12
*** zbr|ruck is now known as zbr|rover15:13
*** ysandeep is now known as ysandeep|away15:17
*** jgwentworth is now known as melwitt15:19
*** xiaolin has joined #opendev15:32
*** marios is now known as marios|out15:48
*** marios|out has quit IRC15:49
openstackgerritMerged zuul/zuul-jobs master: add-build-sshkey: Ensure .ssh exists, enable admin authorized_keys  https://review.opendev.org/74035015:57
*** sshnaidm is now known as sshnaidm|afk15:57
fungilooks like we almost reached 1k established tcp connections on the etherpad server16:00
fungicacti says highest it was was 97716:01
*** yoctozepto has quit IRC16:12
*** yoctozepto has joined #opendev16:13
*** xiaolin has quit IRC16:21
*** xiaolin has joined #opendev16:22
*** dtantsur is now known as dtantsur|afk16:29
*** xiaolin has quit IRC16:40
weshay|ruckwhat do folks think of py3 on centos7 https://review.opendev.org/#/c/741868/17:01
clarkbweshay|ruck: I don't think we should preinstall it via dib in the yum element, but I think jobs can install it if they want it17:02
clarkbwe've finally gotten to a point where we don't preinstall a bunch of extra stuff which causes confusion and problems and I think adding python3 to centos7 in dib would get us back into that situation for some things17:03
clarkbbut in the jobs you can definitely install it17:03
weshay|ruckah.. I see17:03
weshay|ruckfair point17:03
*** dougsz has quit IRC17:12
*** qchris has quit IRC18:37
*** qchris has joined #opendev18:40
*** tosky has quit IRC18:57
clarkbok going to send https://etherpad.opendev.org/p/E6m-M-3fTLwse2RkrDQL in a moment19:56
fungithanks19:57
fungifor a moment i thought we might still be supporting git:// protocol, but a quick check confirms 9418/tcp isn't open19:59
clarkband followup thread on advisory board sent20:16
clarkbinfra-root if anyone else is able (thanks fungi for early review) landing https://review.opendev.org/#/c/741277/ will enable us to make a gerritlib release to support https://review.opendev.org/#/c/741279/2. That second change also tests the first one via depends on and it seems to work20:20
*** shtepanie has joined #opendev20:31
openstackgerritMerged zuul/zuul-jobs master: Enable tls-proxy in ensure-devstack  https://review.opendev.org/74182020:49
ianwclarkb: the problem is that we've started assuming python3 in some of the in-chroot tools; centos7 is the only distro that doesn't have a /usr/bin/python322:21
ianwso since it's part of the distro now, it seems reasonable to include it and thus allow dib to be python3 only without constraints22:21
clarkbhrm22:22
clarkbI see its the actual build env that hits it22:22
clarkbianw: should we maybe have a cleanup that removes it after?22:22
clarkbthen we don't pollute the final result but dib chroot things can run?22:22
ianwtbh i don't see using the packaged python3 really as pollution at this point22:23
clarkbya I guess if tripleo doesn't mind then its probably ok, they are the group I would expect to hvae issues with it.22:24
ianwi did push back on installing things like pyyaml with pip on the base image -- *that* i consider pollution after we went to a lot of effort to remove non-packaged components22:24
clarkbpython, pip, etc are all "namespaced" properly with python3 pip3 etc under centos7 ya?22:24
clarkbso we won't accidentally flip those over to python3 with people being surprised22:24
fungiit is insofar as that if that platform doesn't normally ship python3 but something runs expecting python3 in a job, the job can pass without explicitly installing the package22:24
fungimaking it slightly less portable22:25
clarkbfungi: ya but it won't run a thing expecting python2 and get python3 and test the wrong thing22:25
fungi"this job works on centos 7 images, but oh yeah only if you remember to make sure you preinstall python3"22:25
clarkbianw: ^ do you think that case would be sufficient to have dib clean up after itself?22:26
ianwthis is true; the problem with cleanup is always if there's another element that installs it and then we go and remove thigns on them22:27
clarkbgotcha22:27
fungialso a great point22:27
ianwi feel like practically, there's not new development going on in centos 7, it's more a situation of maintaining old branches.  so as mentioned, it's like a xenial situation where "python" is python222:29
clarkbya I looked at the chnage and just didn't make the connection it was the in chroot scripts themselves that needed it22:31
clarkbI think given that the simplest thing to do is likely what you have proposed, then when centos7 is something we can stop caring about it goes away22:31
ianwit doesn't hit in the gate because i think it was svc-map or something that doesn't get called that got updated22:33
ianwin theory it allows us to drop dib-python (https://review.opendev.org/#/c/741877/2) but i need to loko into that22:33
*** shtepanie has quit IRC22:41
ianwclarkb: if you have a sec, the borg backup @ https://review.opendev.org/#/c/741366/ is ready for review22:47
ianwto the dib/centos7 thing, it's more compelling if the cleanup that follows is working.  i'll look into that today22:48
*** tkajinam has joined #opendev22:53
clarkbianw: ya I'll try to take a look though I'm fading fast. I had a very early morning22:53
ianwno probs22:54
*** mlavalle has quit IRC22:59
clarkbleft some notes, overall looks good but there are some minor things here and there23:11
*** DSpider has quit IRC23:15
*** sgw1 has quit IRC23:16
*** sgw1 has joined #opendev23:25
ianwthanks, will loop back23:31

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!