21:00:35 #startmeeting scientific-sig 21:00:36 Meeting started Tue Jan 9 21:00:35 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:39 The meeting name has been set to 'scientific_sig' 21:00:49 Hello hello hello 21:01:07 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_9th_2018 21:02:05 Andrey tells me he's stuck in a taxi somewhere in Delhi, hoping to join us shortly 21:03:22 I had a discussion with some colleagues from Cambridge University earlier. 21:03:39 They have been benchmarking the effect of the spectre/meltdown fixes 21:04:13 On a lustre router, apparently there is a 40% hit! 21:04:59 Are there other people testing their platforms? 21:05:07 filesystems/IO is bad post-patch 21:05:13 too much context switching :( 21:05:23 Hi Tim, so it seems - the worst case 21:05:33 we saw 30% IO loss on VMs, 2% cpu loss. one sec and I’ll paste in my query earlier today on #scientific-wg 21:05:44 Hello Stig. David Daniel says hello. Had a meeting with him this morning. :) 21:05:44 Please do 21:05:45 hello 21:06:05 DDD - fantastic! You've made my day 21:06:19 we started testing as well. Our compute workloads don't seem affected in the initial benchmarks. I will have more info next week 21:07:04 we’ve started baremetal testing, but I don’t have any result at the moment 21:07:20 There's been some discussion around whether there is an impact for RDMA, and in which modes of usage 21:08:14 Hi everybody 21:08:23 #link Here's an intriguing coincidence form a couple of months ago https://www.fool.com/investing/2017/12/19/intels-ceo-just-sold-a-lot-of-stock.aspx 21:08:41 Hi Mike, Bob, Stig 21:08:48 Hi jmlowe rbudden belmoreira et al 21:08:52 Hey martial_ 21:08:55 #chair martial_ 21:08:56 Current chairs: martial_ oneswig 21:09:51 So, it does appear that the worst case scenarios can be readily hit. 21:09:57 humm... not a coincidence :) 21:10:09 We have broadwell arch (compute: E5-2680v4, storage: E5-2630v4). Ceph Luminous, Openstack Newton, qemu+kvm virtualization. First, after patching our hypervisors we saw a 2% CPU perf-loss in HPL benchmark running inside an unpatched centos 6.5 VM, plus 30% I/O perf-loss in the FIO benchmark inside the same VM. No further loss from patching VMs. Finally, we patched the storage servers and again saw no further degredation. Better than I expected, 21:10:10 I’m curious if it plays out that way for others. We have NVMe in our storage, which might be amortizing the cost of I/O operations at the storage servers. 21:11:22 The numbers were very close to redhat’s predictions 21:12:11 bollig: I'd guess, if it's a performance penalty incurred on every context switch, then it'll be more painful for nvme than for other devices simply because they achieve more context switches, but the penalty is constant for each. Perhaps 21:13:48 What was the extent to which an unpatched guest could read data from the hypervisor? 21:15:09 This is one of those rare circumstances where bare metal looks like the security-conscious option 21:15:41 (too used to slack, I want to +1 Stig's last comment) 21:16:10 ha, slack is too easy! 21:17:40 OK, well interesting to hear people's experiences. I'm sure it's just early days. 21:17:47 I really want to know the definitive answer to that question as well, do I need to make sure my guests are patched or is qemu and hypervisor patching sufficient to ensure guests can't read more than their own memory 21:18:45 that I dont know. We’re rebuilding all of our base images, and looking for that same answer for existing VMs 21:18:47 I had some linpac numbers from Jonathan Mills at NASA, worst case was %50 best case was %5, seemed to vary linearly with N 21:19:04 jmlowe: if you find out, will you let us know - sure it'll be widely applicable to just about everyone in OpenStack 21:19:09 we are also patching all of our images as per ususal 21:20:38 Andrey is ready, or thereabouts. He's sent ahead a presentation to share 21:20:52 #link SGX and OpenStack https://drive.google.com/file/d/1wBXVrd9v8GjyreFLET5nW7IhROaOf7A6/view 21:22:18 I have a more urgent need to patch, it seems that either my 2.1.26 i40e driver is leaking or it's the rhel/centos 3.10.0-693.5 kernel is leaking about 20GB/month, it's starting to trigger oom killer on my instances 21:22:19 Hi everyone, I was offline on a plane and had much less time than I expected, so please, consider it a first discussion 21:23:18 I will then come back and give more details, including on how we are using it with OpenStack (Ironic, LXD, KVM, and Magnum+Kubernetes) 21:24:20 Hi Andrey, thanks for joining us today 21:24:41 #topic SGX on OpenStack 21:25:29 Is this specific to Skylake? I've seen previous articles on it that appear to date from 2013 21:26:22 yes, it is to Skylake 21:26:31 previous discussion was based on simulations 21:27:25 How much of a limitation is it that the code in the enclave can't make system calls? 21:30:14 there are tools to help circumvent this 21:30:43 abrito: Are there uses for this as protection for bare metal infrastructure from malicious users? 21:30:45 for example, SCONE is a tool that places one thread inside the enclave and another outside the enclave, the one outside does the syscalls 21:31:12 the one inside takes care that there is no leak (e.g., encrypts data going to the disk operation) 21:31:36 It seems to be targeted as an application-level tool rather than a system-level tool. Is that accurate? 21:31:51 one application in baremetal would be to store certificates and other secrets in the enclave 21:32:02 exactly 21:32:21 Intel has just released an POC for doing that 21:32:38 Oooh, got a link? 21:32:45 just a sec 21:33:19 this one is from a project partner: https://github.com/lsds/TaLoS 21:34:36 https://github.com/cloud-security-research/sgx-ra-tls 21:36:02 Is there a performance penalty for accessing memory within the enclave, or executing code within the enclave? 21:38:01 in the graph in slide 8, you can see something about this 21:38:12 if the memory footprint is small 21:38:30 you see no penalty 21:38:56 this would be the case if you are, for example, streaming the data through the protected application 21:39:26 Ah, the y axis is relative slowdown of running in an enclave? 21:39:27 if you exceed the EPC size (e.g, the 128 mb) then it needs to decrypt and re-encrypt the data 21:39:39 adding a huge overhead 21:40:10 yes, the Y axis is the overhead compared to regular C code running outside an enclave 21:40:47 What is the difference in the code generated? 21:41:13 Have you found it easy to work with? 21:41:41 it allocated a piece of the "enclave memory" and the secure functions and its data are allocated inside it 21:42:17 Just curious, what's a secure process for loading code into the enclave? 21:42:32 there is some learning curve if your are using intel SDK directly, but if you do not need syscalls for the confidential algorithms/transformations, then it is mostly boilerplate code 21:43:00 can you rephrase that last question? 21:43:28 Just wondering how we trust the code as it is transferred in. I guess there is some code signing or similar? 21:43:38 yes 21:43:50 Once the code is executed you can do a remote attestation 21:44:22 the remote attestation starts by an external participant asking the application to get a "quote" of itself 21:44:40 the quote is produced by the processor where the application is running in 21:45:17 then the application gives you the signed quote and if you have never trusted that processor before 21:45:49 then you go to intel attestation service (IAS) for it to confirm that the quote was emitted from a sgx-supporting processor 21:45:54 using the current firmware 21:46:16 that has not been blacklisted and that is running in the correct mode (e.g., non-debug or simulated) 21:46:45 if you already trusted the processor, you do not need to go to the intel service again 21:47:10 interesting - so if you trust Intel then you can also trust the cpu 21:47:40 :-) 21:48:09 did I understand properly: it creates a hardware memory map in the enclave ? 21:48:15 yes, for this version of SGX you have to trust intel to tell you that the code is actually running in the correct mode and processor 21:48:44 martial_: during boot it separates a piece of memory to be used by the enclaves 21:49:11 I can see it being useful in apps where secrets are held. Do you think it will succeed for future cpu generations? 21:49:35 that piece of memory cannot be accessed by code other then the code from the enclave that allocated it on creation 21:49:52 oneswig: yes, I am optimistic 21:50:20 good to hear it. 21:50:26 one this is that recently, Azure and IBM have mentioned that they are making test services available that use SGX 21:50:27 What will you do next with it? 21:50:38 e.g.: SGX capable VMs 21:51:05 I answer heard that enclave memory is likely to become larger in the short term 21:51:19 my next step is to run kubernetes jobs on it 21:51:41 using code in python running inside the enclaves 21:51:46 with the enclave holding something for the containerised app, or something for kubernetes itself? 21:52:15 there is not much to be done with kubernetes itself 21:52:25 monitoring needs to be done differently 21:52:31 so that you consider the EPC usage 21:52:51 otherwise you can hit the 128 MB and suffer the performance hit 21:53:15 but once you have the code running, it mostly a matter of configuring the right tools 21:53:44 you also want, for example, that the tasks in the tasks queues are encrypted 21:53:58 and only workers that have been attested hold the keys 21:54:44 we (not only UFCG, but the securecloud consortium) are also working on monitoring and scheduling tools 21:55:22 so how different is it from a HSM? 21:55:25 I'd be interested to know where you take it 21:55:49 it is a HSM, the advantage is that you already have it on your table 21:56:11 it is not a separate hardware piece 21:56:26 the downside is that not many Xeon have it 21:56:36 thank you, that helps 21:56:43 OK - anything more for Andrey - we are close to time 21:57:22 there are people also looking at sgx for barbican 21:57:35 I was wondering about that... 21:57:35 exactly because of its easier availability 21:57:54 would be great to use it for holding secrets 'at rest' 21:58:04 yes 21:58:27 abrito: I was wondering about this, in 2006 the Barbican team did a Hands on during the Barcelona summit and they had a HSM setup 21:58:39 (can not remember the hardware now) 21:58:42 OK, we must press on 21:58:43 thank you Andrey - really interesting to hear about your work 21:58:53 So, I would like to thank you for the invitation 21:59:06 and apologize for the terrible slides 21:59:20 really cool indeed, thank you for explaining this to us 21:59:24 I should had been more pessimistic about the time 21:59:41 #topic AOB 21:59:42 I had one item to raise - PTG 21:59:42 The Scientific SIG have been invited to have a slot at the PTG in Dublin, and I'm planning to go as there'll be at least 5-6 members present 21:59:56 I will do a second round, and explain details 21:59:59 he possibility of using a $200 nuc as a backing store for barbican is really exciting 22:00:18 oneswig: thanks for carrying the torch for configurable deployment steps in Ironic at the Dublin PTG 22:00:29 if you need anything documenting our use case let me know offline 22:00:42 Mike: yes sounds interesting indeed :) 22:01:39 Anything development-centric on people's wish lists, let's have it before then 22:01:39 Ironic deployment steps - check 22:01:39 We'll also aim to cover some of the CERN/SKA subject areas 22:01:39 but anything else to cover - have a think and do follow up before the PTG, which is late February 22:01:40 we are out of time - anything else to raise? 22:02:56 good for me 22:02:59 thanks Stig 22:03:25 thx for the topic today, very interesting 22:04:00 thanks everyone! 22:04:43 #endmeeting