21:00:21 #startmeeting scientific-sig 21:00:22 Meeting started Tue Mar 2 21:00:21 2021 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:26 The meeting name has been set to 'scientific_sig' 21:00:33 Hello Stig 21:00:40 Hi martial 21:00:45 #chair martial 21:00:46 Current chairs: martial oneswig 21:00:49 How's things? 21:01:17 P2302 is that the NIST federation work? 21:01:29 not bad, just a little crazy :) 21:02:29 Hopefully crazy in a good way 21:02:29 IEEE actually, the NIST work got published in NIST SP500-332 21:02:38 #link https://www.nist.gov/publications/nist-cloud-federation-reference-architecture 21:03:12 g'day oneswig martial o/ 21:03:18 how is it going? 21:03:38 Hi janders, good thanks 21:03:55 doing well, thanks janders 21:04:35 Busy :-) 21:05:43 busy good? :) 21:05:51 Hi all. Bookmarked that link martial. Time to learn more about cloud federation. 21:05:54 I'm only 2 years late on this but I saw a really neat talk on large-scale Ceph administration: https://www.youtube.com/watch?v=niFNZN5EKvE 21:06:44 Well worth a look, it presents a very nice way of visualising the spread of utilisation across nodes. 21:11:01 It came up after a group we work with were adding larger drives to an existing Ceph cluster which pushed it to hitting hard limits of PGs/OSD. Sounds quite painful. 21:13:13 julianp: you were asking a while back about infrastructure. I think we are getting much closer now to having guests on the system. 21:13:37 Eeeexcellent. Thanks for thinking of me oneswig. 21:13:39 I wonder if we can ask Rion to have another Minio conversation 21:14:00 Rion having more fun with MinIO? 21:15:40 julianp: will be in touch soon I hope! 21:16:12 well we are heavy with Ceph on SSDs but if you remember we had a small video chat with Rion about why Minio was useful for deployments 21:16:25 Much obliged oneswig. 21:17:09 martial do you remember why Minio was considered useful? 21:18:14 ease of deployment seemed to be a core reason 21:18:34 Gotcha. 21:21:16 martial: ever compared it to Portworx? 21:22:40 I have a very small minio equivalent setup for testing but never tried portworx 21:25:28 any new setup for you Stig? 21:26:04 I've been banging my head on a real puzzler for the last few days. I have a set of hosts that take ~3s to run "time ssh centos@host hostname" 21:26:19 It's not DNS before you ask, pretty sure on that now :-) 21:27:22 Something auth-related, PAM? 21:27:26 what OS is the ssh connection originating from? 21:27:34 I'm still uncertain on the root cause. There's some smoking guns relating to SELinux blocking access to /etc/ld.so.cache that look suspicious. 21:27:37 CentOS 8.3 21:28:24 Hi priteau :-) auth is ssh keys - although there's plenty of pam modules involved in that login. 21:28:25 does 'setenforce 0' make any difference? 21:28:41 let me see what my NVMe cleaning lab is on 21:28:44 janders: disabling selinux and rebooting the node is not apparently helping... 21:29:11 It's bizarre because I have other nodes in the same environment for which the same test takes a more sensible 0.2s 21:29:26 oneswig disabling SEL both client and server side? 21:29:26 same hardware/kernel version? 21:29:37 oneswig melding servers would come in handy :) 21:29:38 Different hardware, same kernel 21:29:57 melding? 21:30:22 janders: making the client == the server, ie ssh localhost, has the same delay 21:30:48 ohhhh I have had this happen, it was a network device driver for me 21:31:11 Does `ssh -vvvv` show you where it gets stuck? 21:31:26 ssh localhost shouldn't be slowed down by a NIC issue though 21:31:35 oneswig does the problem seem to stick to either the piece of hardware in question being a client or a server? 21:31:41 priteau good point 21:32:09 pierre, agreeing with you but localhost also was slow 21:32:25 julianp: a bit. The ssh debug output isn't timestamped, alas. There was a message, I'll see if I can dig it out. 21:32:32 I bet it was the butler in the library with a candlestick. 21:32:46 now I can not remember if there was something else related to it 21:34:09 I've been running strace on client and server to try to spot something, that's my current effort. 21:34:18 Interesting though! 21:34:36 brb 21:34:45 Have you tried changing various other settings in sshd_config? GSSAPI maybe? 21:36:16 I was checking in our slack to see if we documented this one 21:36:39 I removed the GSSAPI auth method and that bought some time, a fraction of a second 21:36:48 no luck 21:38:08 silly question because that was part of our checklist: IPv6 disabled? 21:38:14 uninstalling cockpit also gained me about 0.1s. Small things. 21:38:37 IPV6 I haven't tried - worth a shot! 21:39:42 MTU? (although I've seen it cause hangs, not slowdowns) 21:40:33 (and it would probably not affect ssh localhost) 21:41:05 martial: just tried it, no joy alas 21:41:14 files listed first in your nsswitch.conf? so it uses /etc/hosts ? 21:41:50 oneswig a bit brute-force, but maybe worthwhile copying /etc of a "good" and a "bad" machine and doing a recursive diff? 21:42:11 priteau: I don't think it's MTU. 21:42:43 similar idea as above, you could try "UseDNS no" in your sshd_config 21:43:03 oneswig is console login normal (making sure it's ssh only)? 21:43:07 If I run "ssh-keygen" on a dodgy node, there's a long delay before it prompts me for the filename. That might be connected 21:43:35 martial: UseDNS no is set - been bitten by that one before :-) 21:44:04 okay not DNS, not IPv6 (use -4 :) ) 21:44:04 As for the timestamps not being in the ssh output, you can add it using `ts` found in `moreutils` TIL: ssh -vvvv some-host hostname 2>&1 | ts 21:44:32 janders: I'd need to get onto the BMC and it's one of those HPE boxes where you have to buy a license to use the console after the node boots... 21:45:12 julianp: that is new to me, neat trick! 21:46:00 The slow ssh-keygen is very odd, I don't think it should do any I/O 21:46:07 Network I/O I mean 21:46:12 Something related to OpenSSL then? 21:46:27 oneswig that licensing model is ridiculous! 21:46:53 oneswig: what does "cat /proc/sys/kernel/random/entropy_avail" say? 21:47:14 Oh that's a good idea priteau. We've run into that. 21:47:38 entropy_avail = 3443 - is that enough? 21:47:47 Should be 21:48:04 oneswig the two machines I'm currently working on have ~3400 and ~3800 21:48:10 priteau nice trick! 21:48:13 Doesn't look like I can get moreutils on CentOS - perhaps it's in EPEL? 21:49:02 I believe you can replicate the moreutils ts functionality using some awk. 21:49:28 julianp: now we are talking :-) 21:49:49 xD 21:52:19 ssh -vvvv -C condo hostname 2>&1 | awk '{ print strftime(),$0 }' 21:52:28 I'm clearly going to have to follow up to the SIG on this if I ever figure it out... 21:52:49 Yes please! My curiosity is piqued. 21:53:17 Anything can be curious, until you *have* to solve it 21:54:06 for AOB: couple of small points 21:54:15 nice trick julian 21:55:29 PTG (virtual) dates were announced - 19-23 April - https://www.openstack.org/ptg/ 21:56:47 Next week (Wednesday 1100 UTC) I'm hoping to have a session on Jupyterhub and OpenStack. I think there's a good deal to cover on how to provide user-friendly integrations 21:57:09 Ooh. I'm interested in that. 21:57:35 julianp: hopefully not too early a start for you? 21:57:53 oneswig from my side, I completed the initial work on support for NVMe-native disk cleaning in Ironic 21:58:02 if you are interested, I can give a preso on that in a couple weeks 21:58:21 janders: That would be great, I'd love to see it. 21:58:46 initial = works best on an all-NVMe nodes (it doesn't make much of a difference in hybrid HDD-NVMe configs or on SSDs) 21:58:58 Random but related question: Do you know, with software RAID in Ironic, can I label the RAID block devices? They come up in an arbitrary order. 21:59:29 oneswig not sure. Might be worth asking on #openstack-ironic 21:59:31 janders: would be good to hear more. 21:59:41 thanks janders, will do 21:59:50 OK, nearly time - final comments? 22:00:17 #endmeeting