Wednesday, 2018-07-11

openstackgerrit	Matthew Oliver proposed openstack/python-swiftclient master: Add bash_completion to swiftclient https://review.openstack.org/579037	00:53
mattoliverau	^ rebase	00:53
*** mahatic has quit IRC		01:25
*** mahatic has joined #openstack-swift		01:29
openstackgerrit	Nguyen Hai proposed openstack/swift master: add lower-constraints job https://review.openstack.org/556255	01:33
*** armaan has quit IRC		02:31
*** armaan has joined #openstack-swift		02:32
*** bkopilov has quit IRC		02:35
*** cshastri has joined #openstack-swift		03:40
*** bkopilov has joined #openstack-swift		03:55
*** nguyenhai_ has joined #openstack-swift		04:42
*** nguyenhai has quit IRC		04:45
*** links has joined #openstack-swift		05:03
*** hseipp has joined #openstack-swift		06:42
*** armaan has quit IRC		06:49
*** armaan has joined #openstack-swift		06:50
*** ccamacho has joined #openstack-swift		07:02
*** tesseract has joined #openstack-swift		07:12
*** bharath1234 has joined #openstack-swift		07:12
bharath1234	torgomatic, i am studying the unique as possible placement algorithm. I m reading the code in the get_more_nodes function which i believe is used to get the handoff nodes. I didnt get why you hashed the partition number and shifted it by the partition shift. The number of parts in my cluster is 1024 and the when we hash the partition number and shift , i get 192. Could you elaborate as on to why that was done? Thank you	07:15
*** bharath1234 has quit IRC		07:16
*** gkadam has joined #openstack-swift		07:20
openstackgerrit	Vu Cong Tuan proposed openstack/python-swiftclient master: Switch to stestr https://review.openstack.org/581610	07:31
*** mikecmpbll has joined #openstack-swift		07:41
*** mikecmpbll has quit IRC		07:42
*** mikecmpbll has joined #openstack-swift		07:50
*** bharath1234 has joined #openstack-swift		07:51
*** rcernin has quit IRC		08:03
*** bharath1234 has quit IRC		08:03
*** itlinux has joined #openstack-swift		08:05
*** bharath1234 has joined #openstack-swift		08:11
*** bharath1234 has quit IRC		08:13
*** armaan has quit IRC		08:15
*** itlinux has quit IRC		08:37
*** bkopilov has quit IRC		08:44
*** armaan has joined #openstack-swift		08:44
openstackgerrit	Christian Schwede proposed openstack/swift master: Fix misleading error msg if swift.conf unreadable https://review.openstack.org/581280	08:47
*** kei_yama has quit IRC		08:51
*** itlinux has joined #openstack-swift		08:55
mattoliverau	bharath1234: In case you read this via the logs on eavesdrop because your gone. Because we wan't a consistant way of looking for handoff nodes (when we can't find something on the primaries and so look at a few handoffs) so it's needs to be repeatable. Second, we need to hash it because we always need to work on something big enough in case the partpower (or part_shift) is too high for the given partition. Hashing it	09:01
mattoliverau	will alwasy return the correctly sized thing to shift on.	09:01
*** bharath1234 has joined #openstack-swift		09:28
*** ianychoi_ has joined #openstack-swift		09:38
*** ianychoi has quit IRC		09:41
*** ianychoi_ has quit IRC		10:05
*** spsurya_ has joined #openstack-swift		10:08
*** ianychoi has joined #openstack-swift		10:30
*** bkopilov has joined #openstack-swift		10:31
*** zaitcev_ has joined #openstack-swift		10:32
*** ChanServ sets mode: +v zaitcev_		10:32
*** psachin has joined #openstack-swift		10:35
*** zaitcev has quit IRC		10:35
*** hseipp has quit IRC		12:00
*** armaan has quit IRC		12:45
*** openstack has joined #openstack-swift		13:04
*** ChanServ sets mode: +o openstack		13:04
*** lifeless has quit IRC		13:04
*** bharath12345 has joined #openstack-swift		13:11
*** bharath12345 has quit IRC		13:11
*** itlinux has quit IRC		13:14
*** mikecmpb_ has joined #openstack-swift		13:16
*** mikecmpbll has quit IRC		13:17
*** spsurya_ has quit IRC		13:18
*** armaan has joined #openstack-swift		13:23
*** lifeless has joined #openstack-swift		13:47
*** psachin has quit IRC		13:51
*** jistr is now known as jistr\|mtg		13:56
*** armaan has quit IRC		14:07
*** armaan has joined #openstack-swift		14:07
*** linkmark has joined #openstack-swift		14:28
*** hseipp has joined #openstack-swift		14:30
*** armaan has quit IRC		14:31
*** armaan has joined #openstack-swift		14:31
notmyname	mattoliverau: unfortunately, I get kickbanned from the -meeting channel if I forget and leave patchbot in there.	14:32
notmyname	from what I can tell, everyone loves patchbot except for the -infra team ;-)	14:33
*** spsurya_ has joined #openstack-swift		14:37
*** jistr\|mtg is now known as jistr		14:54
*** cshastri has quit IRC		14:58
*** links has quit IRC		15:16
*** ray_ has quit IRC		15:23
*** tesseract has quit IRC		16:01
*** mikecmpb_ has quit IRC		16:16
*** hseipp has quit IRC		16:20
*** armaan has quit IRC		16:21
*** armaan has joined #openstack-swift		16:21
*** armaan has quit IRC		16:25
*** hseipp has joined #openstack-swift		16:37
*** hseipp has quit IRC		16:37
notmyname	good morning	16:41
wer	morning. I managed to get my 4 year old busy cluster in better shape....	16:50
timburke	good morning	16:52
DHE	I think I asked this before but.... the database replicator works based on saved "checkpoints" between two databases. in the event one of these databases rolled back after a checkpoint for whatever reason (VM load-state, backup restored, etc), would swift be able to handle that?	16:53
notmyname	wer:	16:53
notmyname	wer: nice!	16:53
wer	lol	16:53
wer	I have comments.	16:54
notmyname	DHE: yes	16:54
DHE	okay cool...	16:55
notmyname	DHE: it's a really good idea to not intentionally try that, and even if you do, don't restore data that's more than "reclaim age" old	16:55
notmyname	but with those caveats, sure. it'll be fine	16:55
DHE	ZFS has a feature that's a bit like libeatmydata but it does guarantee the database won't be corrupted should the worst happen.	16:56
DHE	but it will rollback an uncertain amount of time (a few seconds typically)	16:57
notmyname	so imagine that you've got a drive that's happy, then it gets unmounted for a few days, while it's unmounted a DELETE comes in, then it gets remounted with the old (undeleted) data. this is a normal failure mode we think about	16:57
DHE	right, object servers use tombstones for a certain period so that the DELETE command gets replicated	16:57
notmyname	to handle this, we keep tombstone markers around when deleting stuff so that this scenario doesn't resurrect old data	16:57
notmyname	yeah	16:57
notmyname	DB rows (and DBs themselves) do the same thing	16:58
notmyname	so operationally, make sure you handle failures within reclaim age settings. alternatively, set the reclaim timers to just longer than your window for doing ops tasks	16:59
notmyname	wer: such as... ?	17:02
DHE	notmyname: okay.. I was worried about it not being handled well, or some state being lost and a full database replication being necessary. I'm projecting containers with 100+ million objects in them	17:05
DHE	I also have this crazy idea where at least one account server is rigged to hold 100% of all accounts as a sort of centralized bookkeeping machine since there doesn't seem to be a good "list all accounts" command	17:06
wer	notmyname: so. operationally, I've had processes that die, or are hung. It' not uncommon. however swift-recon-cron left a stale lock file. This made two failed disks on a node hidden from my alarming.	17:07
wer	when I found and corrected this, and added new disks, io brought performance to it's knees.	17:07
wer	I write about 20g continous. And have other things that are read heavy, and delete heavy at times.	17:08
wer	gbps little b.	17:08
wer	I'm thinking that many of my containers, were rather fragmented on xfs. And a few were creating bottlenecks.	17:09
notmyname	DHE: yeah, there's no "list all accounts in the cluster" functionality. your idea of a central DB for it is something we've considered before. it's not a bad idea	17:10
wer	... Ultimately I had to reduce the concurrency on the container replicator, and object replicator, in order to survive. Disk replacement was taking a long time, and was not linear. Which is unusual.	17:10
wer	performace is back to normal, after a couple of days. But userland is weird across the entire cluster still. Most of the io bottle necks are completely gone, but listing /objects on any disk takes a long time the first time.	17:12
wer	And this is cluster wide :/	17:12
DHE	notmyname: the use of zfs' quasi-eatmydata is a performance hack for the database. makes it cushion the brunt of synchronous database updates. but ZFS at least guarantees order of writes if the point in time is unknown.	17:12
notmyname	wer: that ... doesn't sound terribly strange to me. it's a similar anecdote as what I've heard from my company's customers (via our support team). specifically the need to reduce replicator concurrency	17:13
wer	so I'm starting to wonder about buffer bloat in linux's cache, or something non swift related. As I can no longer point to anthing.	17:13
wer	yeah I've never had to do it notmyname. It's never been an issue.	17:13
notmyname	wer: my first guess on performance issues would be related to page cache. yeah, memory buffers. all the FS metadata (inodes, etc)	17:14
wer	right	17:14
notmyname	it's also likely a function of drive fullness. not just bytes, but inodes	17:14
wer	I read you guys changed your tmp strategy for somethings related to xfs. But I think this is an uptime thing now.	17:14
notmyname	oh?	17:14
wer	drive fullness is like 69%, and io is fine accross the board. But also accross the board, userland is slow, on all nodes :/	17:15
notmyname	what version of swift?	17:16
wer	so swift appears to be suffering from what I am suffering from at this point. And the only thing I can point to is the extremely full linux/buffer/cache. It's old. you'll yell at me..	17:16
wer	1.8	17:16
notmyname	lol	17:16
notmyname	RAHRAHRAH yell yel yell	17:17
notmyname	(you should upgrade) ;-)	17:17
notmyname	`Date: Thu Apr 4 15:07:22 2013 +0200`	17:17
wer	lol look I was an early adopter ;)	17:17
notmyname	$ git shortlog 1.8.0..master \| wc -l	17:18
notmyname	5846	17:18
notmyname	just sayin' ;-)	17:18
wer	also I had to hack the crap out of it for my needs at the time. But I dunno, this is strange.	17:18
wer	I'm out of the ideas at this point. But the problem is cluster wide now. Just slowish. And I can't blame swift at this point.	17:18
notmyname	ok, let's stop here and be really happy that you've running a 5 year old version, and IT'S STILL WORKING! (more most definitions)	17:19
notmyname	s/more/by/	17:20
wer	yeah honeslty, it's been really good for us. Much performance, and I've mostly not had to deal with it other than disks. And I made that componant really easy so...	17:20
notmyname	so something is making it slow. just a matter of finding the right tool that measures the right thing	17:20
wer	yep.	17:20
*** gyee has joined #openstack-swift		17:21
wer	I'm likely going to reboot one of the nodes this week, to see if what I can measure dissapears.... I've already done some defrag on containers that were hot, and my biggest container is is 277k objects and heavy on reads. All others are 127k and heavy on updates.... But all the io is fixed now and not hotspotting. So I'm just waiting for things to completely settle before testing my cache	17:24
wer	hypothesis.	17:24
notmyname	cool	17:24
wer	I bet all yo guys that update, don't see high uptimes.... I'm hoping buffer bloat is my issue. Cause I can't point at anything else now. I dunno. almost out of the woods I guess.	17:25
wer	but the problem went cluster wide, after finding those two disks :/	17:26
*** armaan has joined #openstack-swift		17:27
wer	at any rate. Keep an eye on those internal lock files for swift-recon-cron..... That's what screwed me I think.	17:27
*** armaan has quit IRC		17:31
wer	anyway. That's all I got. I'm likely just an edge case for you guys. But it's been a week of wtf's. /done with rant	17:32
*** armaan has joined #openstack-swift		17:32
wer	hye notmyname thanks for listening thouhg lol	17:40
*** armaan has quit IRC		17:54
notmyname	wer: thanks for sharing (sorry, had to step out for a meeting)	17:57
notmyname	wer: and you're not "just an edge case". one thing I've learned over the years with swift is that nobody has an exclusive view on problems. the same things are seen by everyone, eventually. if you're seeing an issue we can trace to something in swift, it will affect someone else. it's a matter of when	18:08
*** itlinux has joined #openstack-swift		18:49
*** itlinux has quit IRC		19:02
*** gkadam has quit IRC		19:54
*** itlinux has joined #openstack-swift		19:55
*** itlinux has quit IRC		19:57
openstackgerrit	Tim Burke proposed openstack/python-swiftclient master: Back out some version bumps https://review.openstack.org/568914	20:09
*** spsurya_ has quit IRC		20:36
*** itlinux has joined #openstack-swift		20:38
timburke	notmyname: fwiw, i think ^^^ might be a nice compromise now	20:46
*** itlinux has quit IRC		20:50
kota_	morning	20:59
notmyname	hello world	20:59
notmyname	meeting time in #openstack-meeting	20:59
mattoliverau	Morning	20:59
*** ccamacho has quit IRC		21:19
notmyname	https://bugs.launchpad.net/swift/+bug/1781291	22:14
openstack	Launchpad bug 1781291 in OpenStack Object Storage (swift) "sharding: container GETs to root container get slow" [Medium,New]	22:14
*** rcernin has joined #openstack-swift		22:15
notmyname	https://bugs.launchpad.net/swift/+bug/1781292	22:18
openstack	Launchpad bug 1781292 in OpenStack Object Storage (swift) "sharding: object reads may return 404s" [Medium,New]	22:18
openstackgerrit	Pete Zaitcev proposed openstack/swift master: py3: Adapt db.py https://review.openstack.org/581905	22:59
*** zaitcev_ has quit IRC		23:01
openstackgerrit	Tim Burke proposed openstack/python-swiftclient master: Add more validation for ip_range args https://review.openstack.org/581906	23:07
*** kei_yama has joined #openstack-swift		23:14
paladox	notmyname hi, is there any way to reduce swift load usage?	23:18
paladox	notmyname how would i get all uploads to go to another server? ie swift2?	23:20
notmyname	paladox: you've got 2 servers, both running proxy servers and both with the same rings. so you can send requests to either. how you do that is left to how you want to set up networking. a VIP with a load balancer? round robin dns? something else?	23:25
paladox	oh, we are just experencing low storage on swift1	23:26
paladox	and want to balance it with swift2	23:26
paladox	but also the load too	23:26
notmyname	I thought you already had that set up already	23:26
paladox	yeh i have the replicator up	23:26
paladox	but apparently it seems to not have deleted what it copied	23:27
paladox	swift2 is 33gb now	23:27
*** labster has joined #openstack-swift		23:28
notmyname	timburke: do we have something in swift to hash sensitive info? like https://github.com/openstack/swift/blob/master/swift/common/middleware/proxy_logging.py#L136-L139 but that does something different?	23:46
notmyname	it sounds familiar, but I don't recall	23:46
notmyname	and "hash" doesn't really provide useful results in swift's codebase ;-)	23:47
*** gyee has quit IRC		23:48
timburke	notmyname: not that i can remember... https://github.com/openstack/python-swiftclient/blob/master/swiftclient/client.py#L121-L139 is similar... maybe you're thinking of https://review.openstack.org/#/c/548948/ ?	23:48
patchbot	patch 548948 - swift - Add template in proxy to create custom and anonymo...	23:48
notmyname	yep. thanks	23:49
openstackgerrit	Tim Burke proposed openstack/swift master: Include s3api schemas in sdists https://review.openstack.org/581913	23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!