14:06:56 #startmeeting rpm_packaging 14:06:57 Meeting started Wed Oct 31 14:06:56 2018 UTC and is due to finish in 60 minutes. The chair is dirk. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:06:58 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:07:00 The meeting name has been set to 'rpm_packaging' 14:07:23 ping toabctl, dirk, apevec, aplanas, IgorYozhikov, jpena, jruzicka, number80, kaslcrof, ykarel 14:07:27 #topic roll call 14:08:27 * dirk struggles with etherpad 14:11:19 o/ 14:16:50 o/ 14:18:49 #chair jpena cmurphy 14:18:50 Current chairs: cmurphy dirk jpena 14:19:02 does anyone have topics? I can't access etherpad for some reason 14:19:16 nothing special for me 14:19:24 jpena: are you coming to the berlin summit? 14:19:36 dirk: no, not this time 14:22:34 anyone else coming that is worth talking to ? 14:22:46 I mean, sorry, in the context of contributions for the openstack rpm packaging 14:23:21 from the Red Hat side I'm not aware of many people going to Berlin 14:30:03 k, thanks 14:30:07 #topic reviews 14:30:11 do we have reviews to talk about? 14:30:40 https://review.openstack.org/#/c/613652/ looks like an easy merge 14:31:06 ah, I see cmurphy and jpena would need to look at https://review.openstack.org/#/c/610011/ again - I would like to close out this topic 14:31:13 any strong opinions on how to get this merged? 14:31:17 or if it should be merged? 14:32:43 I don't have a strong opinion on the limit increase. I saw cmurphy didn't like the increase in TasksMax 14:32:58 I'm still unclear on why raising both nproc and tasksmax is okay 14:33:00 dirk: how did you come up with those numbers? 14:33:23 jpena: I copied them from ceph-osd.service 14:33:39 when i happened to check this on a customer's production site the number of tasks was at 22 for cinder-volume 14:33:44 the issue we had was with a user that was running cinder-volume with rbd at scale. 14:34:16 and librbd just asserts() (which crashes all of cinder-volume) when it can't start a new thread 14:34:40 so TasksMax=infinity basicall removes the systemd cgroup pid controller 14:34:54 and the rest is just bumping the limits accross the sanity imho 14:35:32 cmurphy: so you considered tasksmax=infinity the unsafe part? or the NPROC? 14:38:20 dirk: the combination of raising both is what seems unsafe to me 14:38:23 but i'm not an expert 14:39:35 but i noted that ubuntu doesn't bother with these limits http://paste.openstack.org/show/732149/ 14:39:43 so LimitNPROC=1234\nTasksMax=500 means "this service and its children are allowed to use 500 pids (threads, processes), all of the same uid are allowed 1234" 14:40:14 LimitNPROC=1234\nTasksMax=inifinity means "this service is allowed to use up to the user rlimit (1234)" 14:40:36 oh, so nproc still affects 14:40:43 yes 14:40:53 nproc is number of processes for this *user* (not this process tree) 14:41:02 so e.g. cinder-api and cinder-volume share limits 14:41:07 (as they run as the same user) 14:41:20 tasksMax is a systemd feature to limit cinder-volume so that its not "eating" the quota of cinder-api 14:41:41 but exactly this feature caused the issue for the customer (he was having 8191 threads in ceph and then it asserted) 14:41:52 when it tried to spawn 8192 14:42:29 given the tradeoff "all of cinder volume is down for every user" and "well, ceph goes slightly beyond sanity but if the hardware is powerful enough we allow it" this change is changing it towards "we keep cinder volume up at all cost" 14:43:21 is librbd supposed to be using that many threads? 14:43:26 cmurphy: I have to admit I ignore the ubuntu reason because it just means they eithe rhave some other way of overriding the defaults (e.g. via a limits drop in from their orchestration) or they never had the issue so far (for example because they use the older systemd or have a DefaultTasksMax=infinity configured) 14:43:59 so if they don't set it, the global default catches, and if that one is infinity then they never have cgroup pid limits 14:44:09 cmurphy: well.. thats a whole other debate.. :-) 14:44:23 IMHO 8192 is plenty but then again we saw that this limit is being reached 14:45:40 but is it reached because it's a runaway, or because it's functioning normally? if it's a runaway and malfunctioning then raising the limits for it is just going to delay the inevitable crash 14:45:52 so, to draw a conclusion here, a simple TasksMax=infinity change without raising LimitNOFILE and limitNPROC would be acceptable? 14:46:19 cmurphy: there wasn't a leak, its just a load spike, it is normal a few seconds later 14:46:32 okay 14:46:40 e.g. the user is spawning a heat stack with a few dozenn volumes inside 14:47:33 and while I agree librbd could be less agressively spawning threads, still any decent hardware should be capable of handling 8192 threads, so its not that the machine would be crawling death with that many threads 14:48:03 (although it will eat like 64GB of RAM just for stack space) 14:48:17 okay, tasksmax=infinity without the other limit changes sounds fine to me 14:53:01 #topic open floor 14:54:29 Dirk Mueller proposed openstack/rpm-packaging master: cinder-volume: Raise limits and disable cgroup limtis https://review.openstack.org/610011 14:54:34 cmurphy: done ^^ :) 14:54:43 anything else? 14:54:49 T-2 min before ending ;) 14:54:54 (happy halloween) 14:54:57 * cmurphy nothing 14:56:03 nope 14:56:58 thanks :) 14:57:00 #endmeeting