Monday, 2021-02-01

*** baojg has joined #openstack-swift02:03
*** rcernin has quit IRC02:21
*** rcernin has joined #openstack-swift02:36
alenaiHello everyone. I'm here to ask for feedback about this patch https://review.opendev.org/c/openstack/swift/+/727876.  Did anyone, running production scale deployment (>10mln objects per container, db files on ssd and ~50+ puts/s with ~50+ deletes/s on hot containers) get any improvement on 500s answer codes. Did you get any reduction in 500 codes?02:41
*** rcernin has quit IRC02:45
*** rcernin has joined #openstack-swift02:45
alenaiI'm testing 2.25.1 version of swift before deploying it to production (from 2.25.0) and, for some reason, see minor improvements. Test setup: db file > 5.3GB, 5+mln objects, db on ssd, 3 replicas, replicator with 5 minute interval to accumulate reclaims, 3 day reclaim_age.02:48
alenaihttps://paste.pics/e048c562d7795c3228aadde12c39b0a7 for latency metrics (each spike is a replicator that arrived) and https://paste.pics/8a4ae6f744d3cb5a8f6598569570b5fe (each dip is a replicator that arrived).02:52
alenaiI tried to insert time.sleep(0.05) between batch deletes and it helped a lot, but it introduced another problem - replication run times skyrocketed. And I cannot allow this  in production. Because it would take days to complete 1 replication run.02:55
alenaiMaybe I'm naive to rely on this patch with my load profile? Window between batch deletes are only for single digit requests? So I can rely on 30-40 quantile latency reduction (seen on dashboards during some replication runs).....02:59
alenaiP.S: proxy server timeout regarding container server http requests here is 10 seconds. Current replication run takes about 1 hour in production (300k+ containers on 6 nodes with 8 ssd on each)03:01
*** rcernin has quit IRC04:00
*** fingo has quit IRC04:02
*** rcernin has joined #openstack-swift04:02
*** rcernin has quit IRC04:27
*** rcernin has joined #openstack-swift04:35
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-swift05:33
*** m75abrams has joined #openstack-swift05:46
openstackgerritMerged openstack/swift master: relinker: Improve logging  https://review.opendev.org/c/openstack/swift/+/76963206:31
*** alenai has quit IRC06:36
*** gmann has quit IRC06:45
*** gmann has joined #openstack-swift06:47
*** rcernin has quit IRC07:25
*** rcernin has joined #openstack-swift08:07
*** rpittau|afk is now known as rpittau08:11
*** rcernin has quit IRC08:24
*** rcernin has joined #openstack-swift08:26
*** rcernin has quit IRC08:31
*** alenai has joined #openstack-swift08:32
*** rcernin has joined #openstack-swift08:37
*** cschwede has joined #openstack-swift08:57
*** ChanServ sets mode: +v cschwede08:57
*** rcernin has quit IRC09:27
*** benj_ has quit IRC09:28
*** benj_ has joined #openstack-swift09:30
*** rcernin has joined #openstack-swift09:39
*** rcernin has quit IRC10:06
*** baojg has quit IRC10:09
*** baojg has joined #openstack-swift10:10
*** baojg has quit IRC10:10
*** baojg has joined #openstack-swift10:10
*** alenai has quit IRC10:11
*** baojg has quit IRC10:11
*** baojg has joined #openstack-swift10:11
*** baojg has quit IRC10:11
*** baojg has joined #openstack-swift10:12
*** baojg has quit IRC10:12
*** baojg has joined #openstack-swift10:13
*** baojg has quit IRC10:13
*** baojg has joined #openstack-swift10:13
*** baojg has quit IRC10:14
*** baojg has joined #openstack-swift10:14
*** baojg has quit IRC10:15
*** baojg has joined #openstack-swift10:15
*** baojg has quit IRC10:15
*** baojg has joined #openstack-swift10:16
*** baojg has quit IRC10:16
*** cschwede has quit IRC10:47
*** rcernin has joined #openstack-swift11:16
*** rcernin has quit IRC12:33
*** rcernin has joined #openstack-swift12:47
*** rcernin has quit IRC13:38
openstackgerritClay Gerrard proposed openstack/swift master: relinker: Add option to drop privileges  https://review.opendev.org/c/openstack/swift/+/77241914:59
claygalenai: yes we saw a reduction in 500s from container servers when we rolled out that patch - large dbs took a long time to reclaim and the patch seemed to allow more index updates with less timeouts.15:01
claygalenai: "Window between batch deletes are only for single digit requests?" - I'm not sure what you mean by this specifically15:05
claygalenai: are the containers default 3x replica?  1hr db replication cycle in a large cluster is great; replication cycle <=24hrs is probably manageable.15:07
clayg10M *rows* per container should still be managable - if you have 10M objects and 800M tombstones rows (deleted = 1) reclaim can be very challenging.  You'll want to consider https://docs.openstack.org/swift/latest/overview_container_sharding.html15:09
openstackgerritClay Gerrard proposed openstack/swift master: WIP: s3api: Make multi-deletes async  https://review.opendev.org/c/openstack/swift/+/64826315:38
*** zaitcev has joined #openstack-swift15:57
*** ChanServ sets mode: +v zaitcev15:57
*** alenai has joined #openstack-swift16:11
*** m75abrams has quit IRC16:12
alenaiclayg: "- I'm not sure what you mean by this specifically" - I'm trying to say, that with container that has nonstop 50+puts/sec and 50+ deletes/sec 24/7/365 and 1hour replication run on average, you have always many reclaimable objects and window between batch deletes is too small to squeeze all those requests (50 puts and 50 deletes) in db file.16:24
alenaiSo, we still see 500s and http timeouts.16:24
alenaiI'm still scared to use sharding in production. (Аnyways, there is only couple of containers that have 10mln+ tombstone rows).16:24
clayg"you have always many reclaimable objects" makes sense - a single db is really only good for about 100 req/s - if you're sustaining that and also managing to get your replication and reclaims in... that's probably about as good as it's going to get without sharding I think16:26
claygbut I imagine patch 727876 still would have HELPED, no?  that was basically the issue - I think we could try adding an eventlet.sleep(0) or making the reclaim size configurable (more smaller maybe better in your use-case)16:27
claygbut I think long term the solution is "for busy containers we want to scale them with sharding" 🤷‍♂️16:28
alenaiyeah, you are right. Maybe I should start testing/staging sharding.... I hoped to postpone this as long as possible... hehe16:30
claygit's going to be SO GREAT!  we *LOVE* sharding!16:32
claygshrinking OTOH 😡16:32
claygbut it's getting better 🤞16:32
alenai"Note16:35
alenaiContainer sharding is currently an experimental feature."16:35
alenaiyou know.... production and experimental ... in one sentence... Maybe there should be and update in documentation to cheer up conservative guys like me.16:35
DHEit's my understanding that automatic sharding is experimental, but using sharding manually is OK16:49
DHEI hope so. my biggest container is using it16:49
claygyes, we've had a todo to update docs to clarify the status of sharding for awhile - maybe we can do that before the next major release - but we've been using it reliably for a long time (a year?)16:55
claygall of our big db's are sharded and it's working well for our use-cases16:55
*** rpittau is now known as rpittau|afk17:21
*** alenai has quit IRC17:44
*** alenai has joined #openstack-swift19:41
*** alenai has quit IRC19:58
*** clayg_ has joined #openstack-swift20:17
*** ChanServ sets mode: +v clayg_20:17
*** jrosser_ has joined #openstack-swift20:18
*** fyx_ has joined #openstack-swift20:18
*** f0o|away has joined #openstack-swift20:25
*** jrosser has quit IRC20:26
*** clayg has quit IRC20:26
*** fyx has quit IRC20:26
*** f0o has quit IRC20:26
*** zigo has quit IRC20:26
*** sorrison has quit IRC20:26
*** clayg_ is now known as clayg20:26
*** f0o|away is now known as f0o20:26
*** jrosser_ is now known as jrosser20:26
*** fyx_ is now known as fyx20:26
*** zigo has joined #openstack-swift20:33
*** gyee has joined #openstack-swift21:16
openstackgerritTim Burke proposed openstack/swift master: Run flake8 on bin/ files  https://review.opendev.org/c/openstack/swift/+/77348521:27
*** zaitcev has quit IRC22:12
*** rcernin has joined #openstack-swift22:20
*** Underknowledge has joined #openstack-swift22:47
*** zaitcev has joined #openstack-swift23:13
*** ChanServ sets mode: +v zaitcev23:13

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!