11:00:29 #startmeeting scientific-sig 11:00:30 Meeting started Wed May 22 11:00:29 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:33 The meeting name has been set to 'scientific_sig' 11:00:40 up up and away 11:00:57 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_May_22nd_2019 11:01:06 Hi all 11:01:21 Good morning ;) 11:01:23 hi All! 11:01:53 Let's get started ... 11:02:02 SDN time! 11:02:07 #topic coherent management of SDN fabrics 11:02:13 janders: what's been going wrong? 11:02:44 as discussed in the Scientific SIG at the PTG, I took the challenge of SDN consistency issues to the Neutron folks 11:03:00 so - to recap - what is the challenge in the first place? 11:03:30 I will use SuperCloud's Mellanox SDN as an example (but I believe this challenge is platform agnostic) 11:04:45 in my deployment we occasionally hit a scenario where for whatever reason (packet drops, system overload, ...) Neutron requests a change on the SDN 11:05:00 but that change never completes, despite Neutron thinking it has completed 11:05:32 First question - if this is a request via TCP, how can packet drops be a factor? 11:05:52 there are several layers of APIs on the SDN side so this might happen between them 11:06:09 good question oneswig, unfortunately I don't have the answer 11:06:32 OK, carry on... 11:06:34 I'd think NEO<>UFM traffic is TCP as well so this should not be happening - but it does 11:06:56 it's rare but when it happens it's a massive PITA to reconcile all the sources of truth 11:07:28 so basically we're looking at a scenario where Neutron requests something from the SDN, SDN returns 200 OK but for some reason things aren't right 11:07:43 (or another one where things were right but for some reason stopped being right) 11:08:16 I believe in stock-standard Neutron/OVS, the L2 agent is polling the port status and if it finds any mismatches it will rectify those 11:08:22 That sounds like it could be frustrating indeed 11:08:38 this is missing from nearly all SDN solutions that I know of - either API or ansible driven 11:09:08 Neutron configures ports once - and when it gets a 200 it never verifies the config again (let alone fixing anything) 11:09:46 The 200 OK response when a change has not been applied might be a specific issue with this driver. 11:09:56 this is not cool so as we discussed I've spoken to Miguel about this at the PTG. He was supportive and promised to have a look at this with the team - and asked me to open the bug: 11:10:07 https://bugs.launchpad.net/neutron/+bug/1829449 11:10:08 Launchpad bug 1829449 in neutron "Implement consistency check and self-healing for SDN-managed fabrics" [Undecided,New] 11:10:19 good bug, great read. 11:10:39 the Neutron guys pointed me to this blueprint: 11:10:40 https://review.opendev.org/#/c/565463/12/specs/stein/sbi-database-consistency.rst 11:10:54 it's interesting - it seems people have been looking at similar issues for a while 11:11:40 This spec covers similar areas and I'm aware that the mlnx SDN driver needed better concurrency handling on the southbound API to it. 11:11:52 oneswig: I agree there is room for improvement on the Mellanox SDN side, but even if the SDN stops returning 200 OK before the fabric is configured, I bet we'll still run into an issue or two where for some reason state is lost somewhere 11:12:42 I think you and mgoddard both hit issues with races on this interface, right? 11:12:44 plus it would be good if it can fix itself when an operator makes a mistake 11:12:55 I think so 11:13:04 I think what was killing us was garbage in the journal 11:13:40 neutron would try replaying the journal (D)DoSing the SDN API 11:13:49 with requests that were no longer valid 11:14:18 from my understanding this spec touches on both things - consistency checking/enforcement (which I like) 11:14:46 and also on the journal which was supposed to help with some of these challenges - but in my experience it does not always help, sometimes it makes things worse 11:14:59 but overall yeah I think there's value in this blueprint 11:15:23 what I wanted to get out of discussing this here is to find out if you think implementing this blueprint would fix the issues you're seeing 11:15:58 personally I would like to see generic functionality in neutron which checks the state of the fabric for inconsistencies and resolves these 11:15:59 PLUS 11:16:18 It makes reference to Open Daylight's journal but mostly is retrofitted from OVN's implementation. 11:16:24 I would like the vendors to 1) work on general resiliency of their solutions but 2) also provide the functionality that the above would plug into 11:16:42 If there's an effort to make something cross-SDN, it would be good to see more consideration of that I think 11:16:58 janders: agreed 11:17:22 I know for sure Juniper users suffer with the same (neutron and the SDN getting out of sync + having to merge manually which is a nightmare) 11:17:31 Blair isn't here but I think he's had some of this with Cumulus too 11:18:25 when I was chatting to Miguel with Adrian from Mellanox, an Ericsson developer came and joined us, saying their stuff is suffering from this too 11:19:07 I was trying to get some feedback from Mellanox on this but my contacts are OOO - so we're probably looking at next week the earliest 11:19:18 Way back (Paris summit) I recall discussing this kind of issue with the team from Big Switch - part of their solution was to checksum current state and check against the state in the SDN controller. All port state cheaply compared in one move - only costly if they don't tally 11:22:46 get_inconsistent_resources '''''''''''''''''''''''''' Get a list of inconsistent resources which the revision number from the table aformentioned differs from the standardattributes table. 11:22:57 (line 332) 11:24:01 sync_resource ''''''''''''' The method provides a way to handle cases in which the SBI needs to be called and bump the revision number once there has been success on syncing with the SBI backend. In most cases this will be called in the ``\*_postcommit()`` methods. 11:24:09 (line 226) 11:24:31 if I'm reading this right these two functions should be able to resolve issues in the most common scenarios I'm seeing 11:24:49 I was interested by the treatment on existing implementations here: https://review.opendev.org/#/c/565463/12/specs/stein/sbi-database-consistency.rst@451 11:26:36 this is what Neutron is doing today, right? 11:26:55 It's what OpenDaylight's Neutron driver is doing today, apparently 11:27:14 if that's the case it would work if it can't talk to the SDN, but if the SDN accepts the request, it's deleted from journal and that's it 11:27:45 if something goes sideways afterwards, or the port "unconfigures itself" somehow journal won't help 11:27:54 I believe the Mellanox mechanism driver does exactly the same 11:29:04 I think get_inconsistent_resources and sync_resource could provide functionality to deal with these issues 11:29:37 That's something I think at the PTG we compared with a deep scrub in Ceph, right? 11:29:43 yes 11:32:02 so - that's where things are at in regards to this challenge at this point in time 11:32:11 I'm thinking what should we do next 11:32:26 The prerequisites (line 104) - do you think Neo could track a revision sequence number like this? 11:32:38 I'll try to run the blueprint by the Mellanox guys when they're back 11:33:07 good quesiton! My guess is - the current version might not have that field, but I suppose it shouldn't be a big deal to add it.. 11:34:03 It might be exposing an assumption that (for example) the OVN controller doesn't expect there to be other users making changes, where Neo is designed as a multi-user tool for which Neutron is one user. 11:34:05 I also believe there's a fair bit of activity around SDN at Mellanox so perhaps there are some new avenues that could be used to get this functionality 11:34:35 janders: biggest hope would be to roll Neo and UFM into one ... 11:34:51 agreed and I don't think that's completely off the table 11:35:23 although it won't happen overnight.. 11:35:43 No indeed. And I wouldn't want to use it if it did! 11:35:51 haha! :) 11:36:08 do you think you would be able to run this blueprint by John and/or Mark for extra feedback? 11:36:52 I'm thinking after we've all looked through it, perhaps let's have one more discussion here (I will see if we can get the Mellanoxes here) and maybe then let's ask the Neutron folks if we could bring this up in their meeting? 11:36:57 I think there's questions to ask of this spec but it's pretty close. My main concern is whether it can be implemented elsewhere and to what extent it reinvents existing work in each implementation 11:37:15 all valid concerns 11:37:27 the approach I was thinking of taking is get it right for one vendor plugin 11:37:39 and abstract the implementation away so that the others can use it too 11:37:44 janders: will definitely like to see this go further though. 11:38:04 yeah - I think this will literally make or break SDN based deployments 11:38:06 you mean apart from the reference implementation in OVN? 11:38:09 people are struggling at scale 11:38:23 right, perhaps OVN could be that reference 11:38:41 when I made that statement first I wasn't aware to what extent this was already investigated for OVN 11:38:46 I think it needs one in addition to OVN - otherwise it will include OVN-centric assumptions 11:39:04 and I suppose being the open implementation it's well positioned as a reference / good practice 11:40:51 Tungsten or ODL would be good to see as examples of how this proposal would interoperate with a proven system. 11:41:49 It would not be a trivial change for them. The Mellanox driver is smaller and simpler. 11:42:23 indeed 11:42:48 Anyway, I'll circulate it here and see what people think. 11:43:17 what was the system that you guys built that used dual eth+ib interconnect? Was that used for ASKAP? 11:43:22 I wonder about scale of it 11:43:29 It would certainly be worthwhile if you can canvas your Mellanox contact on it too 11:43:47 janders: not for ASKAP. It's ALaSKA at Cambridge University. 11:43:57 Getting a Rocky upgrade today. 11:44:04 oh wow 11:44:20 Scale is 2 racks, 2 IB switches. 11:44:24 yeah I meant ALaSKA - when I think of SKA I automatically type ASKAP :) 11:44:29 right! 11:44:45 yeah I suspect that we can be quite successful building small-medium scale systems with the current stack 11:44:53 ASKAP's far more impressive as a prototype :-) 11:45:00 but before we go thousands we need to improve this, otherwise operations might get tricky 11:45:26 especially if it's a highly dynamic system 11:45:44 I very much agree. 11:45:55 ok! I think we have a very good plan 11:46:03 The part that isn't covered here is any means of requesting a resync from SDN to Neutron 11:46:05 I suggest we touch base on this again next week 11:46:19 true! 11:46:25 I'm out next week, alas. 2 weeks time? 11:46:41 I haven't thought about how that consistency check function is called... 11:46:56 I wouldn't mind having a CLI command to run manually to start with 11:47:08 for 1) consistency check and 2) reconcile 11:47:32 OK! sounds like a plan. That should be the right amount of time to engage the key individuals 11:47:41 Any path to complete the feedback loop would be good, no matter how slow. 11:48:02 OK janders, remind me to put it on the agenda for follow-up in 2 weeks. 11:48:10 ok! will do 11:48:20 #topic AOB 11:48:34 if you have any comments / questions in the meantime drop me a line - or just update the bug directly 11:48:41 they guys are quite responsive 11:48:42 will do, thanks 11:48:49 Anything else new? 11:49:19 nothing I'm allowed to talk about - made some good progress with procurement.. 11:49:31 that's been my life for the last week mostly :( 11:49:40 how about you? 11:49:42 Ah, shopping, what a pleasant distraction on a winter's night :-) 11:49:55 Rocky upgrade sounds exciting! :) 11:50:24 Proceeding through update to latest Queens so far... nearly ready for the jump (but I'm not driving it) 11:50:51 I'm working on my slides for CERN next week - 2 presentations on SKA and Swiss Personalized Health Network... 11:51:18 great! are the presentations recorded? 11:51:35 I know a few folks from the SIG will be there so I'm hoping to make it worthwhile :-) 11:51:41 Don't know about recording, I expect so. 11:51:44 I'm sure it will be 11:51:54 that would be great! :) 11:52:29 Better get back to the preparing then, no pressure. 11:53:06 I better leave you to it :) 11:53:07 BTW do you have a Cumulus deployment? I wonder if they might provide another view 11:53:21 I don't - I believe Blair used to run one 11:53:34 I think they are closer to networking-ansible driver 11:53:48 they do have an API that's like a proxy which in turn sshes into switches 11:54:19 Right, let's copy him in too, see if there's a known issue and if they'd be interested. 11:54:20 and - sometimes the state change doesn't propagate all the way to the port and manual intervention is needed - at least that's what I heard some time back 11:54:28 that's a great idea 11:54:39 OK janders, any more to cover? 11:54:47 no, I think we're good 11:54:54 enjoy Geneva! 11:54:58 Great, let's close, it must be late at your end. 11:54:59 Thanks! 11:55:02 till next time 11:55:03 #endmeeting