Wednesday, 2018-07-11

openstackgerritMatthew Oliver proposed openstack/python-swiftclient master: Add bash_completion to swiftclient  https://review.openstack.org/57903700:53
mattoliverau^ rebase00:53
*** mahatic has quit IRC01:25
*** mahatic has joined #openstack-swift01:29
openstackgerritNguyen Hai proposed openstack/swift master: add lower-constraints job  https://review.openstack.org/55625501:33
*** armaan has quit IRC02:31
*** armaan has joined #openstack-swift02:32
*** bkopilov has quit IRC02:35
*** cshastri has joined #openstack-swift03:40
*** bkopilov has joined #openstack-swift03:55
*** nguyenhai_ has joined #openstack-swift04:42
*** nguyenhai has quit IRC04:45
*** links has joined #openstack-swift05:03
*** hseipp has joined #openstack-swift06:42
*** armaan has quit IRC06:49
*** armaan has joined #openstack-swift06:50
*** ccamacho has joined #openstack-swift07:02
*** tesseract has joined #openstack-swift07:12
*** bharath1234 has joined #openstack-swift07:12
bharath1234torgomatic, i am studying the unique as possible placement algorithm. I m reading the code in the get_more_nodes function which i believe is used to get the handoff nodes. I didnt get why you hashed the partition number and shifted it by the partition shift. The number of parts in my cluster is 1024 and the when we hash the partition number and shift , i get 192. Could you elaborate as on to why that was done? Thank you07:15
*** bharath1234 has quit IRC07:16
*** gkadam has joined #openstack-swift07:20
openstackgerritVu Cong Tuan proposed openstack/python-swiftclient master: Switch to stestr  https://review.openstack.org/58161007:31
*** mikecmpbll has joined #openstack-swift07:41
*** mikecmpbll has quit IRC07:42
*** mikecmpbll has joined #openstack-swift07:50
*** bharath1234 has joined #openstack-swift07:51
*** rcernin has quit IRC08:03
*** bharath1234 has quit IRC08:03
*** itlinux has joined #openstack-swift08:05
*** bharath1234 has joined #openstack-swift08:11
*** bharath1234 has quit IRC08:13
*** armaan has quit IRC08:15
*** itlinux has quit IRC08:37
*** bkopilov has quit IRC08:44
*** armaan has joined #openstack-swift08:44
openstackgerritChristian Schwede proposed openstack/swift master: Fix misleading error msg if swift.conf unreadable  https://review.openstack.org/58128008:47
*** kei_yama has quit IRC08:51
*** itlinux has joined #openstack-swift08:55
mattoliveraubharath1234: In case you read this via the logs on eavesdrop because your gone. Because we wan't a consistant way of looking for handoff nodes (when we can't find something on the primaries and so look at a few handoffs) so it's needs to be repeatable. Second, we need to hash it because we always need to work on something big enough in case the partpower (or part_shift) is too high for the given partition. Hashing it09:01
mattoliverau will alwasy return the correctly sized thing to shift on.09:01
*** bharath1234 has joined #openstack-swift09:28
*** ianychoi_ has joined #openstack-swift09:38
*** ianychoi has quit IRC09:41
*** ianychoi_ has quit IRC10:05
*** spsurya_ has joined #openstack-swift10:08
*** ianychoi has joined #openstack-swift10:30
*** bkopilov has joined #openstack-swift10:31
*** zaitcev_ has joined #openstack-swift10:32
*** ChanServ sets mode: +v zaitcev_10:32
*** psachin has joined #openstack-swift10:35
*** zaitcev has quit IRC10:35
*** hseipp has quit IRC12:00
*** armaan has quit IRC12:45
*** openstack has joined #openstack-swift13:04
*** ChanServ sets mode: +o openstack13:04
*** lifeless has quit IRC13:04
*** bharath12345 has joined #openstack-swift13:11
*** bharath12345 has quit IRC13:11
*** itlinux has quit IRC13:14
*** mikecmpb_ has joined #openstack-swift13:16
*** mikecmpbll has quit IRC13:17
*** spsurya_ has quit IRC13:18
*** armaan has joined #openstack-swift13:23
*** lifeless has joined #openstack-swift13:47
*** psachin has quit IRC13:51
*** jistr is now known as jistr|mtg13:56
*** armaan has quit IRC14:07
*** armaan has joined #openstack-swift14:07
*** linkmark has joined #openstack-swift14:28
*** hseipp has joined #openstack-swift14:30
*** armaan has quit IRC14:31
*** armaan has joined #openstack-swift14:31
notmynamemattoliverau: unfortunately, I get kickbanned from the -meeting channel if I forget and leave patchbot in there.14:32
notmynamefrom what I can tell, everyone loves patchbot except for the -infra team ;-)14:33
*** spsurya_ has joined #openstack-swift14:37
*** jistr|mtg is now known as jistr14:54
*** cshastri has quit IRC14:58
*** links has quit IRC15:16
*** ray_ has quit IRC15:23
*** tesseract has quit IRC16:01
*** mikecmpb_ has quit IRC16:16
*** hseipp has quit IRC16:20
*** armaan has quit IRC16:21
*** armaan has joined #openstack-swift16:21
*** armaan has quit IRC16:25
*** hseipp has joined #openstack-swift16:37
*** hseipp has quit IRC16:37
notmynamegood morning16:41
wermorning.  I managed to get my 4 year old busy cluster in better shape....16:50
timburkegood morning16:52
DHEI think I asked this before but.... the database replicator works based on saved "checkpoints" between two databases. in the event one of these databases rolled back after a checkpoint for whatever reason (VM load-state, backup restored, etc), would swift be able to handle that?16:53
notmynamewer:16:53
notmynamewer: nice!16:53
werlol16:53
werI have comments.16:54
notmynameDHE: yes16:54
DHEokay cool...16:55
notmynameDHE: it's a really good idea to *not* intentionally try that, and even if you do, don't restore data that's more than "reclaim age" old16:55
notmynamebut with those caveats, sure. it'll be fine16:55
DHEZFS has a feature that's a bit like libeatmydata but it does guarantee the database won't be corrupted should the worst happen.16:56
DHEbut it will rollback an uncertain amount of time (a few seconds typically)16:57
notmynameso imagine that you've got a drive that's happy, then it gets unmounted for a few days, while it's unmounted a DELETE comes in, then it gets remounted with the old (undeleted) data. this is a normal failure mode we think about16:57
DHEright, object servers use tombstones for a certain period so that the DELETE command gets replicated16:57
notmynameto handle this, we keep tombstone markers around when deleting stuff so that this scenario doesn't resurrect old data16:57
notmynameyeah16:57
notmynameDB rows (and DBs themselves) do the same thing16:58
notmynameso operationally, make sure you handle failures within reclaim age settings. alternatively, set the reclaim timers to just longer than your window for doing ops tasks16:59
notmynamewer: such as... ?17:02
DHEnotmyname: okay.. I was worried about it not being handled well, or some state being lost and a full database replication being necessary.  I'm projecting containers with 100+ million objects in them17:05
DHEI also have this crazy idea where at least one account server is rigged to hold 100% of all accounts as a sort of centralized bookkeeping machine since there doesn't seem to be a good "list all accounts" command17:06
wernotmyname: so.  operationally, I've had processes that die, or are hung.  It' not uncommon.  however swift-recon-cron left a stale lock file.  This made two failed disks on a node hidden from my alarming.17:07
werwhen I found and corrected this, and added new disks, io brought performance to it's knees.17:07
werI write about 20g continous.  And have other things that are read heavy, and delete heavy at times.17:08
wergbps  little b.17:08
werI'm thinking that many of my containers, were rather fragmented on xfs.  And a few were creating bottlenecks.17:09
notmynameDHE: yeah, there's no "list all accounts in the cluster" functionality. your idea of a central DB for it is something we've considered before. it's not a bad idea17:10
wer... Ultimately  I had to reduce the concurrency on the container  replicator, and object replicator, in order to survive.  Disk replacement was taking a long time, and was not linear.  Which is unusual.17:10
werperformace is back to normal, after a couple of days.  But userland is weird across the entire cluster still.  Most of the io bottle necks are completely gone, but listing /objects on any disk takes a long time the first time.17:12
werAnd this is cluster wide :/17:12
DHEnotmyname: the use of zfs' quasi-eatmydata is a performance hack for the database. makes it cushion the brunt of synchronous database updates. but ZFS at least guarantees order of writes if the point in time is unknown.17:12
notmynamewer: that ... doesn't sound terribly strange to me. it's a similar anecdote as what I've heard from my company's customers (via our support team). specifically the need to reduce replicator concurrency17:13
werso I'm starting to wonder about buffer bloat in linux's cache, or something non swift related.  As I can no longer point to anthing.17:13
weryeah I've never had to do it notmyname.  It's never been an issue.17:13
notmynamewer: my first guess on performance issues would be related to page cache. yeah, memory buffers. all the FS metadata (inodes, etc)17:14
werright17:14
notmynameit's also likely a function of drive fullness. not just bytes, but inodes17:14
werI read you guys changed your tmp strategy for somethings related to xfs.  But I think this is an uptime thing now.17:14
notmynameoh?17:14
werdrive fullness is like 69%, and io is fine accross the board. But also accross the board, userland is slow, on all nodes :/17:15
notmynamewhat version of swift?17:16
werso swift appears to be suffering from what I am suffering from at this point.  And the only thing I can point to is the extremely full linux/buffer/cache.  It's old.  you'll yell at me..17:16
wer1.817:16
notmynamelol17:16
notmynameRAHRAHRAH yell yel yell17:17
notmyname(you should upgrade) ;-)17:17
notmyname`Date:   Thu Apr 4 15:07:22 2013 +0200`17:17
werlol look I was an early adopter ;)17:17
notmyname$ git shortlog 1.8.0..master | wc -l17:18
notmyname    584617:18
notmynamejust sayin' ;-)17:18
weralso I had to hack the crap out of it for my needs at the time.  But I dunno, this is strange.17:18
werI'm out of the ideas at this point.  But the problem is cluster wide now.  Just slowish.  And I can't blame swift at this point.17:18
notmynameok, let's stop here and be really happy that you've running a 5 year old version, and IT'S STILL WORKING! (more most definitions)17:19
notmynames/more/by/17:20
weryeah honeslty, it's been really good for us.  Much performance, and I've mostly not had to deal with it other than disks.  And I made  that componant really easy so...17:20
notmynameso *something* is making it slow. just a matter of finding the right tool that measures the right thing17:20
weryep.17:20
*** gyee has joined #openstack-swift17:21
werI'm likely going to reboot one of the nodes this week, to see if what I can measure dissapears.... I've already done some defrag on containers that were hot, and my biggest container is is 277k objects and heavy on reads.  All others are 127k and heavy on updates....  But all the io is fixed now and not hotspotting.  So I'm just waiting for things to completely settle before testing my cache17:24
werhypothesis.17:24
notmynamecool17:24
werI bet all yo guys that update, don't see high uptimes.... I'm hoping buffer bloat is my issue.  Cause I can't point at anything else now.  I dunno.  almost out of the woods I guess.17:25
werbut the problem went cluster wide, after finding those two disks :/17:26
*** armaan has joined #openstack-swift17:27
werat any rate.  Keep an eye on those internal lock files for swift-recon-cron..... That's what screwed me I think.17:27
*** armaan has quit IRC17:31
weranyway.  That's all I got.  I'm likely just an edge case for you guys.  But it's been a week of wtf's. /done with rant17:32
*** armaan has joined #openstack-swift17:32
werhye notmyname thanks for listening thouhg lol17:40
*** armaan has quit IRC17:54
notmynamewer: thanks for sharing (sorry, had to step out for a meeting)17:57
notmynamewer: and you're not "just an edge case". one thing I've learned over the years with swift is that nobody has an exclusive view on problems. the same things are seen by everyone, eventually. if you're seeing an issue we can trace to something in swift, it *will* affect someone else. it's a matter of when18:08
*** itlinux has joined #openstack-swift18:49
*** itlinux has quit IRC19:02
*** gkadam has quit IRC19:54
*** itlinux has joined #openstack-swift19:55
*** itlinux has quit IRC19:57
openstackgerritTim Burke proposed openstack/python-swiftclient master: Back out some version bumps  https://review.openstack.org/56891420:09
*** spsurya_ has quit IRC20:36
*** itlinux has joined #openstack-swift20:38
timburkenotmyname: fwiw, i think ^^^ might be a nice compromise now20:46
*** itlinux has quit IRC20:50
kota_morning20:59
notmynamehello world20:59
notmynamemeeting time in #openstack-meeting20:59
mattoliverauMorning20:59
*** ccamacho has quit IRC21:19
notmynamehttps://bugs.launchpad.net/swift/+bug/178129122:14
openstackLaunchpad bug 1781291 in OpenStack Object Storage (swift) "sharding: container GETs to root container get slow" [Medium,New]22:14
*** rcernin has joined #openstack-swift22:15
notmynamehttps://bugs.launchpad.net/swift/+bug/178129222:18
openstackLaunchpad bug 1781292 in OpenStack Object Storage (swift) "sharding: object reads may return 404s" [Medium,New]22:18
openstackgerritPete Zaitcev proposed openstack/swift master: py3: Adapt db.py  https://review.openstack.org/58190522:59
*** zaitcev_ has quit IRC23:01
openstackgerritTim Burke proposed openstack/python-swiftclient master: Add more validation for ip_range args  https://review.openstack.org/58190623:07
*** kei_yama has joined #openstack-swift23:14
paladoxnotmyname hi, is there any way to reduce swift load usage?23:18
paladoxnotmyname how would i get all uploads to go to another server? ie swift2?23:20
notmynamepaladox: you've got 2 servers, both running proxy servers and both with the same rings. so you can send requests to either. *how* you do that is left to how you want to set up networking. a VIP with a load balancer? round robin dns? something else?23:25
paladoxoh, we are just experencing low storage on swift123:26
paladoxand want to balance it with swift223:26
paladoxbut also the load too23:26
notmynameI thought you already had that set up already23:26
paladoxyeh i have the replicator up23:26
paladoxbut apparently it seems to not have deleted what it copied23:27
paladoxswift2 is 33gb now23:27
*** labster has joined #openstack-swift23:28
notmynametimburke: do we have something in swift to hash sensitive info? like https://github.com/openstack/swift/blob/master/swift/common/middleware/proxy_logging.py#L136-L139 but that does something different?23:46
notmynameit sounds familiar, but I don't recall23:46
notmynameand "hash" doesn't really provide useful results in swift's codebase ;-)23:47
*** gyee has quit IRC23:48
timburkenotmyname: not that i can remember... https://github.com/openstack/python-swiftclient/blob/master/swiftclient/client.py#L121-L139 is similar... maybe you're thinking of https://review.openstack.org/#/c/548948/ ?23:48
patchbotpatch 548948 - swift - Add template in proxy to create custom and anonymo...23:48
notmynameyep. thanks23:49
openstackgerritTim Burke proposed openstack/swift master: Include s3api schemas in sdists  https://review.openstack.org/58191323:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!