21:01:06 #startmeeting scientific-wg 21:01:07 Meeting started Tue Nov 29 21:01:06 2016 UTC and is due to finish in 60 minutes. The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:10 The meeting name has been set to 'scientific_wg' 21:01:14 Greetings all 21:01:15 #chair oneswig 21:01:15 Current chairs: b1airo oneswig 21:01:20 hello 21:01:21 hello 21:01:30 gday 21:01:45 hi all 21:02:21 how's things? 21:02:33 #link Agenda items https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_November_29th_2016 21:02:38 good thanksgiving break jmlowe, trandles ? 21:02:43 getting back to normal 21:02:47 yessir, thank you 21:02:47 All well here, had the research councils UK cloud workshop today 21:03:00 Good * 21:03:01 is everyone done with the mad travel schedule for the rest of 2016? 21:03:04 ah cool - did you chair / present something? 21:03:19 I was home for a day and took the family to my inlaws in TX 21:03:23 I presented, kind of chaired, but that aspect was fairly minimal 21:03:33 well attended oneswig ? 21:03:35 trandles, yes - though i already have 5 o/s trips in my calendar for 2017 21:03:54 ouch b1airo 21:03:54 trandles: 140-odd I believe - not a sell-out but a good crowd 21:04:06 b1airo: coming to the UK? 21:04:34 oneswig, hopefully! but that isn't one of the firm ones yet 21:04:53 (sorry I am late, meeting running over) 21:04:53 What do we need to do to win the Bethwaited account, I wonder... 21:05:02 Hi Martial 21:05:06 #chair martial 21:05:07 Current chairs: b1airo martial oneswig 21:05:11 hi martial 21:05:18 Hi Stig, Blair 21:05:26 Time to tee off? 21:05:45 yep, want to tell us about your vxlan results? 21:05:52 I'll poke Bob and see if he's available 21:05:58 OK, lets do that... 21:05:58 o/ 21:06:12 #topic TCP bandwidth over VXLAN and OVS 21:06:14 hi rocky_g 21:06:26 Hi rocky_g 21:06:57 So, this has been a bit of a saga ever since we realised how much our shiny new deployment with its 50G/100G network sucked in this respect 21:07:20 Hey! This meeting time works great for me...I just stay on after the TC 21:07:27 oneswig: I take it you get near line rate when doing bare metal? 21:07:36 We were seeing instance to instance bandwidth around 1.2gbits/s measured using iperf 21:07:49 jmlowe: bare metal, 1 stream TCP, 46 gbits/s 21:07:57 close enough 21:08:04 I was satisfied :-) 21:08:07 hello 21:08:11 Hi rbudden 21:08:15 sorry, the time change got me! 21:08:16 are you sysctl's posted somewhere? 21:08:26 oneswig, 1 stream?! e5-2680v3 ? 21:08:34 what window size? 21:08:35 Not sysctls for that, didn't do any... 21:08:42 wow 21:08:53 jmlowe, +1 21:09:00 I know, I was quite pleased. 21:09:21 think there might be an element of VW tuning in play?? ;-) 21:09:27 Yeah, bare metal looks good, but the vm stuff really *sucks* 21:09:35 Anyway, we got VM-to-VM bandwidth up after much tuning 21:09:55 my sysctl's https://raw.githubusercontent.com/jetstream-cloud/Jetstream-Salt-States/master/states/10gige.sls 21:10:07 use them everywhere 21:10:31 First move was to turn off all those power-saving states and some things Mellanox support recommended disabling 21:10:40 That got us to around 2-3 gbits/s 21:10:43 oh. 21:11:16 Tried a new kernel after that - 4.7.10 - apparently there is better handling of offloading of encapsulated traffic further upstream 21:11:23 Got us to 11gbits/s 21:11:29 nice 21:11:58 Then we turned off hyperthreading - all those wimpy cores are fine for production but no good for hero numbers 21:12:08 Then we were up around 18gbits/s 21:12:25 Then I did VCPU-to-PCPU core pinning 21:12:30 that hit 21 gbits/s 21:12:31 oneswig, so where do Mellanox's shiny graphs (e.g. advertising CX-3 Pro 2+ years ago) come from...? 21:13:01 ahhh! i assumed you were pinning to begin with 21:13:14 That's CX-3 - and apparently the driver had a lot of capabilities back-ported to it which don't translate to mlx4 - that backporting process must be redone 21:13:40 b1airo: no - actually this was a proper nuisance 21:13:46 meanwhile CX-6 has been announced... 21:14:10 Certainly the next bit - NUMA passthrough - I had to build a new QEMU-KVM - what you get with CentOS is nobbled 21:14:35 the amount of fixes/improvements going into upstream kernels for cx-4 and cx-5 are staggering 21:14:35 even the qemu 2.3 from the ev repo? 21:14:36 ha yeah, 1990s 21:14:41 b1airo: announced for when? 21:15:07 jmlowe: Does that work for CentOS? Not sure if it does? 21:15:10 oneswig, same time as HDR presumably 21:15:30 b1airo: wait a minute, I thought that was announced at SC ... :-) 21:16:26 Anyhow, I built and packaged QEMU-KVM 2.4.1, set isolcpus to give the host OS + OVS 4 cores, and pinned the VM to the NUMA node with the NIC attached to it. 21:16:36 That got me to 24.3 gbits/s 21:16:39 indeed - don't know whether there are even any eng. samples yet though 21:16:44 this works just fine for us, has 2.3 in it, now if they could just get a non antique libvirt http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/ 21:17:17 jmlowe: ooh, that's good - thanks, makes my life easier 21:17:32 jmlowe: there was a few posts about this on the ML recently if my memory is correct 21:17:50 jmlowe, you're talking about the rhev repos for centos? 21:17:50 I should read the ml more carefully 21:17:59 yeah, that's the one 21:18:27 oneswig, something of a saga then 21:18:34 I had to find an EFI boot ROM RPM from somewhere as well - the package I rebuilt from fedora had a load of broken links in it 21:18:45 and still only 50% of your bare-metal performance 21:18:46 Looks like they may even have QEMU 2.6 soon: https://cbs.centos.org/koji/buildinfo?buildID=13884 21:18:53 That's where I am now - but the joy of it is that a rising tide lifts all boats. 21:18:56 what are mellanox doing to make this right? 21:19:21 From doing this tinkering, I got SR-IOV bandwidth raised from 33gbits/s->42gbits/s and bare metal to 46gbits/s 21:19:42 And have you considered writing this up as a superuser blog? 21:20:05 ha! 21:20:09 rocky_g: that would be fab. I am writing it up next time I get on a train - tomorrow - will share 21:20:19 Fantastic! 21:20:32 any thoughts as to how much of a difference if any there would be with linuxbridge vs ovs? 21:20:47 jmlowe: Ah, what a question 21:21:07 being probably the only linxubridge guy here, I had to ask 21:21:16 I think there would be a positive uplift from ditching OVS, but tripleO has no means to deploy without i 21:22:15 There was more on this: the kernel capabilities gained in 4.x are being backported by RH to the 7.3 kernel 21:22:28 I haven't been following triple o, the latest install guides have dropped ovs? wondering if triple o would follow 21:22:38 Which makes the whole process much more attainable. 21:23:15 jmlowe: I'd need to check, I may be out of date but I hadn't seen that 21:23:18 so oneswig, had mellanox never looked at vxlan performance with CX-4 ? 21:23:58 b1airo: who knows? I'm sure they are busy people :-) 21:24:16 I can't fault their efforts to get a solution once the issue was clear 21:25:02 yes, us either - just think their testing leaves something to be desired 21:25:06 o/ 21:25:10 #action oneswig to write up a report and share the details on reproducing 21:25:13 Hi leong 21:25:47 OK next topic? 21:26:15 #topic telemetry and monitoring - research computing use cases 21:26:44 OK this is one of our activity areas for this cycle, I wanted to get some WG thoughts down 21:27:08 I've got two use cases 21:27:14 so there was a good conversation on the ML recently 21:27:18 I think there are specific use cases we like that others don't need 21:27:40 #link etherpad for brainstorming 21:27:43 #link https://etherpad.openstack.org/p/scientific-wg-telemetry-and-monitoring 21:28:08 martial: got a link to the mailing list thread? 21:28:18 they mentioned Collectd 21:28:23 let me find it out 21:28:35 martial: thanks 21:29:08 for the telemetry and monitoring? are we looking at "enhancing" existing related openstack project to support those features/use-cases of scientific? 21:30:15 leong: interesting question. Lots of problems with existing projects 21:30:21 I've added my 3 current active use cases to the etherpad 21:30:42 jmlowe: what do you mean by the first? 21:31:42 FYI, we are developing a solution in house for telemetry aggregation 21:32:04 Great one b1airo 21:32:15 ok.. back step a bit.. are we aiming to create a user story in PWG and then perform gap analysis, then decide the best solution from there? 21:32:41 I have to find it in the archive, but it was an email: [Openstack-operators] VM monitoring suggestions 21:32:52 for every hour there is an event generated in the ceilometer (now new project that I can't remember) database of instance exists with a start and end time up to the next hour, the cpu count, project, user and that can relatively easily be adapted to look like a hpc job and can be fed into existing hpc job reporting systems 21:32:53 leong: that's a likely path I'd guess. We already have a federation user story in review 21:33:05 great oneswig! 21:33:34 gnochhi 21:34:06 cloudkitty 21:34:16 having initial discussion on etherpad is a good start. Moving that discussion towards PWG user story will be able to keep track of the historical viewpoint/discussion on gerrit 21:34:42 our solution relies on a python library using psutil and we are also using ganglia 21:34:47 leong: this is how the other story first took shape 21:34:57 martial: got any documentation online for it? 21:35:23 there are new project evolving, including gnocchi, aodh which might be able to meet the needs.. however, without a formal description of the problem/use-case, it is hard to comment 21:35:44 oneswig: giving a link to my team member to the etherpad to describe it 21:36:24 I'm looking at doing something with some civil engineers who have a stream of time series data coming from traffic cameras, they need to have continuous ingestion and aggregation, I'm going to try them with gnocchi and get them off of their 5TB ms sql db 21:36:25 it is also worthwhile to mention/document what existing solutions are adopted by existing scientific users in the User Story. 21:36:33 #link previous user story on federation - please review https://review.openstack.org/#/c/400738/ 21:36:54 jmlowe: I need to get you in touch with a colleague of mine 21:37:22 jmlowe: and our Data Science team that does work on traffic camera data :) 21:37:23 jmlowe, o_0 21:37:30 leong: so one relevant shortcoming is I don't believe there is any way to transiently enable high-resolution telemetry - just as an example 21:37:48 I suspect there are a lot of civil engineers grabbing data from their state's DOT and we could probably make a thing that several of them could use 21:38:39 jmlowe, i think you will be happy with gnochhi - we have it deployed in nectar and seems to work weel 21:38:42 *well 21:39:07 jmlowe: 5TB of SQL... 21:39:13 I've been using gnocchi since the 1.x series 21:39:32 each major release has been an order of magnitude improvement 21:39:43 We were using Influx as backend but are stuck now, what does Gnocchi use nowadays? 21:40:05 oneswig, stuck why? because they ripped influx out? 21:40:09 file, and ceph, and one other 21:40:26 if you use ceph make sure you are on the 3.x series 21:40:34 fluent maybe? 21:40:40 http://gnocchi.xyz/architecture.html#how-to-choose-back-ends 21:41:10 oneswig, we (well really sorrison) put influx support back and redesigned the driver 21:42:08 Others have said the Ceph backend kind of consumes the ceph - needs its own dedicated cluster - I wonder what rate of time-series metrics is attainable 21:42:12 not sure where the reviews are at - but pretty sure it is all going upstream 21:42:32 I have very nice things to say about Gordon Chang, one of the gnocchi devs, he's been immensely helpful 21:42:40 b1airo: that's good, but I don't think we'd pay Influx for the clustered backend 21:43:12 don't need to 21:43:38 i could get sorrison to join next week and give us the low-down 21:43:50 the ceph backend in the 1.x series relied too heavily on xattrs which didn't scale, the 2.x series created too many new objects which lead to lock contention and slow ops warnings, the 3.x series has been problem free 21:43:55 b1airo: which timezone works best for sorrison? 21:44:18 probably next week - he rows in the mornings 21:44:59 b1airo: sounds good to me 21:45:50 oneswig, this might(?) be the tree he's been working on: https://github.com/NeCTAR-RC/gnocchi/tree/nectar/2.2 21:47:20 What do people use for monitoring the health of OpenStack itself? We have used Monasca, the agent gathers useful data out of the box 21:48:05 oneswig, how did you find the setup ? 21:48:15 we've been using zabbix, because it was there, I'd love to use something better 21:48:21 found the ML thread: http://lists.openstack.org/pipermail/openstack-operators/2016-November/012129.html 21:48:41 oneswig: You mean the OpenStack control-plane? 21:48:49 nagios is one 21:48:57 My team mate took quite a few unhappy days on it - Monasca seems to have no concept of a lightweight solution :-) 21:48:59 we use naemon at PSC for most of our monitoring 21:49:14 oneswig: we use Nagios with plugins from https://github.com/cirrax/openstack-nagios-plugins 21:49:36 rbudden: got a link to that? 21:50:02 oneswig: http://www.naemon.org 21:50:03 leong: yes - for example we get historical data of per-service CPU & RAM consumption 21:50:07 i believe it’s a fork from Nagios 21:50:10 has anyone looked at the Vitrage project? I know it's supposed to be root cause analysis, but what does that project use to capture and store their info? 21:50:44 Wasn't Zabbix also an evolution of Nagios? I sense a bake-off coming 21:50:47 the attractive thing about monasca is that it understands openstack - nagios is easy for monitoring process and service state, but what about all the stuff flying around on the MQ 21:51:03 we use LDMS/OVIS https://ovis.ca.sandia.gov/mediawiki/index.php/Main_Page 21:51:05 rocky_g: saw the keynote - was blown away - not seen anything since 21:51:20 how heavy is heavy when it comes to monasca 21:51:54 we use nagios and ganglia at the host level, elk for api data 21:52:09 jmlowe: It uses a lot of Apache stack - Kafka, Storm, etc. 21:52:36 Glued together with data-moving services of its own 21:52:38 This is all great info to capture on the etherpad.... 21:52:53 rocky_g: I'm going to print the whole thing out and put it on my wall :-) 21:52:58 must keep from making an ouroboros 21:53:13 rocky_g: doing what I can to make that happen :) 21:53:38 jmlowe: ouroboros - I have learned something today 21:54:34 Final issues to lob in: who monitors OpenStack event notifications, and how? 21:54:57 as in the queues oneswig ? 21:54:59 oneswig: I've played with it a tiny bit using splunk 21:55:00 One aspect of Monasca I quite like is that it hoovers up everything and anything 21:55:29 b1airo: json blobs that get thrown out whenever nova/cinder/etc does anything useful 21:55:35 those things 21:55:48 oslo.notification? 21:56:29 oh, I don't touch the json, I was feeding the normal logs to splunk and building searches 21:56:52 OK, does anyone have infrastructure that triggers alerts based on log messages? - trandles looks like you're up to this 21:56:58 yeah i was just wondering if you meant notification.error in particular, or everything 21:56:58 There are config options to capture those or not. 21:57:12 oneswig: exactly 21:57:30 we have nagios alerts on certain queues 21:57:33 b1airo: everything - want to reconstruct into timelines (os-profiler?) 21:57:39 just based on ready message count 21:58:16 we're almost out of time 21:58:18 our operations folks use zenoss to actually trigger alerts but I've never talked to them about feeding openstack data into it 21:58:59 priteau, we didn't talk about workload traces 21:59:06 next time? 21:59:17 Next time... 21:59:19 priteau: sounds good to me, fits in well 21:59:20 b1airo: Let's put it on the agenda for next itme 21:59:23 does next week's TZ work for you? 21:59:28 or week after? 21:59:35 I join both meetings :-) 21:59:52 i thought you did - just a little hungover this morning... 22:00:07 (xmas parties have started already) 22:00:15 thanks all! 22:00:21 #endmeeting