Tuesday, 2017-12-05

*** aagrawal has joined #openstack-swift00:22
*** abhinavtechie has quit IRC00:22
*** abhinavtechie has joined #openstack-swift00:27
*** aagrawal has quit IRC00:31
*** tovin07__ has joined #openstack-swift00:43
*** gyee has quit IRC00:56
*** JimCheung has quit IRC01:08
*** psachin has joined #openstack-swift01:10
kota_good morning01:21
*** mat128 has joined #openstack-swift01:21
kota_notmyname: great news01:23
*** cshastri has joined #openstack-swift01:26
timburkekota_: o/01:26
kota_timburke: o/01:26
acoleskota_: good morning01:27
kota_acoles: morning, oh you're at SFO?01:27
acoleskota_: yes for this week01:28
kota_make sense01:28
*** mvk has quit IRC01:57
*** mvk has joined #openstack-swift01:59
openstackgerritMerged openstack/swift feature/deep: Merge remote-tracking branch 'origin/master' into feature/deep  https://review.openstack.org/52526902:52
mattoliverauacoles: o/02:57
mattoliveraukota_: o/02:57
*** links has joined #openstack-swift03:07
*** threestrands has joined #openstack-swift03:27
kota_mattoliverau: the symlink patch is getting progress. I almost done on the functests cleanup and waiting m_kazuhiro's review. I think that will be squashed up into the main in 1-2 days.03:32
kota_mattoliverau: and it shows me not significant issues on there so that you can continue the symlink patch reviews except the func tests, imo.03:33
*** kei_yama has quit IRC03:49
*** abhinavtechie has quit IRC03:53
mattoliveraukota_: great, thanks :)03:55
*** kei_yama has joined #openstack-swift04:01
*** psachin has quit IRC04:15
*** abhitechie has joined #openstack-swift04:20
*** chsc has joined #openstack-swift04:28
*** psachin has joined #openstack-swift04:30
*** psachin has quit IRC04:32
*** chsc has quit IRC04:35
*** ianychoi has joined #openstack-swift05:07
*** psachin has joined #openstack-swift05:10
*** klrmn has quit IRC05:21
*** threestrands has quit IRC05:25
*** two_tired has quit IRC05:41
*** mat128 has quit IRC05:47
openstackgerritMatthew Oliver proposed openstack/swift master: Add a -L or --list to recon to list all results  https://review.openstack.org/52503905:49
mattoliverau^^ just a few fixes. Just needs some tests for the new helper methods and maybe some -L print tests.05:49
*** vinsh_ has quit IRC06:39
*** armaan has joined #openstack-swift06:40
*** armaan has quit IRC06:54
*** vinsh has joined #openstack-swift07:11
*** pcaruana has joined #openstack-swift07:32
*** hseipp has joined #openstack-swift07:44
*** rcernin has quit IRC07:48
*** neonpastor has quit IRC08:00
*** neonpastor has joined #openstack-swift08:02
openstackgerritVan Hung Pham proposed openstack/swift master: Replace assertTrue(isinstance()) with assertIsInstance()  https://review.openstack.org/47563908:17
*** armaan has joined #openstack-swift08:22
*** tesseract has joined #openstack-swift08:24
*** rcernin has joined #openstack-swift08:33
*** gkadam has joined #openstack-swift08:38
*** pcaruana has quit IRC08:40
*** geaaru has joined #openstack-swift08:52
*** cbartz has joined #openstack-swift09:07
*** armaan has quit IRC09:10
*** armaan has joined #openstack-swift09:20
*** gleblanc has quit IRC09:35
*** abhitechie has quit IRC09:54
*** mvk has quit IRC09:54
*** amito has joined #openstack-swift10:05
amitoHi, our cinder CI is failing in the last couple of days. looked in the logs and it seems glance is failing consistently on "BackendException: Cannot find swift service endpoint : The request you have made requires authentication. (HTTP 401)". Any idea?10:05
*** HCLTech-SSW has joined #openstack-swift10:09
*** armaan has quit IRC10:09
*** mvk has joined #openstack-swift10:22
*** kei_yama has quit IRC10:26
*** rcernin has quit IRC10:30
*** tovin07__ has quit IRC10:30
*** ianychoi has quit IRC10:55
*** ianychoi has joined #openstack-swift10:55
*** SkyRocknRoll has joined #openstack-swift10:57
*** HCLTech-SSW has quit IRC11:03
*** cshastri has quit IRC11:05
*** kukacz has joined #openstack-swift11:22
*** cshastri has joined #openstack-swift11:47
*** silor has joined #openstack-swift11:50
openstackgerritKazuhiro MIYAHARA proposed openstack/swift master: Cleanup Symlink Functional Tests  https://review.openstack.org/52420311:59
*** armaan has joined #openstack-swift12:03
*** cshastri has quit IRC12:03
*** ^andrea^ has quit IRC12:07
*** oshritf has quit IRC12:15
*** zhurong has joined #openstack-swift12:45
*** zhurong has quit IRC13:02
*** zhurong has joined #openstack-swift13:03
*** cshastri has joined #openstack-swift13:06
*** links has quit IRC13:22
*** SkyRocknRoll has quit IRC13:23
*** zhurong has quit IRC13:27
*** cshastri has quit IRC13:27
*** psachin has quit IRC13:33
*** silor1 has joined #openstack-swift13:47
*** silor has quit IRC13:47
*** silor1 is now known as silor13:48
*** mat128 has joined #openstack-swift14:03
*** geaaru has quit IRC14:24
*** armaan has quit IRC14:29
*** armaan has joined #openstack-swift14:31
*** _ix has joined #openstack-swift14:42
_ixGood morning folks. I've got a Mitaka cluster with some eight nodes running, and it's largely been pretty great.14:43
*** geaaru has joined #openstack-swift14:43
_ixUnfortunately, we had some hardware issues that made us reconfigure the rings, and a colleague made a silly mistake of turning down the services without making adjustments to the rings for about 6 weeks. We ended up reformatting those drives on the problem node, and re-introducing it to the cluster.14:44
_ixComplete replication took about 5 days after some adjustments to the rsync configurations, several re-adjustments to the rings, and anxiety, but it completed on Sunday night.14:45
_ixAnyway, the outstanding issues appear to be related to that six week period of the problem node being offline. The majority of the cluster at normal weights is sitting at some 50-60% disk utilization, while the problem node is above 90% across its disks.14:47
_ixThere are a number of objects with X-Delete-At values set at 1504*, but there don't appear to be any matching containers for the .expiring_objects account that relate to those timestamps. Essentially, the swift architecture appears to be unaware of these files.14:48
_ixMy question are: if I understand this correctly and there are a number of .data files lingering on my problem node, is there a way to ensure these files are removed in a cleanup (I guess the Mitaka auditor doesn't take care of this)?14:50
rledisez_ix: what could have happened is that the objects were deleted while your node was offline, and the tombstones were reclaimed before your node came back online, so they are now "dark data", they should not be here anymore, but swift can't know it must delete them14:50
_ixrledisez: So, I'm on the right track?14:51
_ixAre you aware of any safe ways to remove the dark data outside of swift?14:52
tdasilvabut if the node that was down for 6 weeks had the drives reormatted, how would you get dark data there?14:56
_ixtdasilva: That's another question that I had.14:56
_ixWe had some serious problems in getting the balance correct when we re-introduced this node with empty disks, and I requested from my colleagues that they drop objects that they didn't need any longer to potentially alleviate the replication times, and it's my assumption that some of this stuff was tombstoned and the reclaim age ran down before we achieved stability.14:58
_ixThat's merely conjecture, though. I'll double check the reclaim age and see if it's any less than the default.14:59
_ixI'm assuming the default was 7 days in Mitaka, too. If that's the case, this value hasn't been explicitly set, and it should be 7 days.15:01
_ixI am interested in the *why* of these circumstances, but I'm also interested in the *how*, that is, how do I clean up after our mistakes?15:03
_ixobject manifests/segments with X-Delete-At values, but no associated .expiring_objects container means it's dark data. How do we clean up dark data?15:04
*** silor has quit IRC15:14
*** silor has joined #openstack-swift15:14
*** klrmn has joined #openstack-swift15:14
*** klrmn has quit IRC15:16
_ixFrom my reading over other's experiences, this doesn't appear to be a very easy question to answer.15:17
tdasilva_ix: clayg might be a good person to ask about dark data, but he is in PST time zone15:21
_ixtdasilva: Thanks! I'm actually reading over some conversations on eavesdrop and that sounds like a safe bet. Do you happen to know what time zone redbo is in?15:23
tdasilvaredbo is in CST i believe15:26
*** armaan has quit IRC15:30
*** armaan has joined #openstack-swift15:30
*** openstackgerrit has quit IRC15:48
*** klrmn has joined #openstack-swift16:00
*** armaan has quit IRC16:04
*** oshritf has joined #openstack-swift16:38
*** mvk has quit IRC16:42
*** oshritf has quit IRC16:47
*** abhitechie has joined #openstack-swift16:47
*** chsc has joined #openstack-swift16:50
*** chsc has joined #openstack-swift16:50
frankkahlei have done another install of openstack-swift.  I have made sure all of the requiremetns are met and i have compiled the 1.5 of liberasurecode, I assume that i have to have only .... (dots) and no "E" when i run the unit tests correct.?17:03
*** kallenp has joined #openstack-swift17:09
notmynamegood morning17:13
notmynamefrankkahle: correct17:13
*** cbartz has quit IRC17:13
*** mvk has joined #openstack-swift17:14
frankkahleok so I got an E, and hit control-C to abort it and saw the error, "ERROR: test_real_config (test.unit.common.middleware.test_memcache.TestCacheMiddleware)" , how do i debug this?17:15
*** kallenp has left #openstack-swift17:19
*** pcaruana has joined #openstack-swift17:22
notmynamefrankkahle: unit tests should all pass on any server where you install swift, but the functests are what's really interesting/important for you to validate a production cluster.17:24
*** hseipp has quit IRC17:25
notmynamethe module path in the error is the python import path for it17:25
notmynameeg https://github.com/openstack/swift/blob/master/test/unit/common/middleware/test_memcache.py#L9917:25
*** pcaruana has quit IRC17:26
*** pcaruana has joined #openstack-swift17:27
*** ukaynar has joined #openstack-swift17:30
*** gkadam has quit IRC17:30
_ixclayg: Are you around this morning?17:31
timburkedon't see him at his desk yet, but iirc he's planning on coming in17:32
_ixThanks, Tim. I hope he's in the mood to talk about dark data.17:38
_ixI feel like one must whisper when saying... dark data.17:38
*** pcaruana has quit IRC17:58
*** itlinux has joined #openstack-swift17:58
claygI’m on the train. I’m not sure what to do about expired manifests and unexpired segments. It’s not really dark data if it’s in the container listing. It’s just orphaned segments.18:03
_ixclayg: Thanks for weighing in. That gives me something more to consider.18:09
*** oshritf has joined #openstack-swift18:13
*** oshritf has quit IRC18:15
*** silor has quit IRC18:16
*** tesseract has quit IRC18:21
_ixI've looked again, and the data is not in the container listings. If it were, I suppose it would be trivial to delete.18:26
_ixSo, does that in turn qualify this as dark data?18:27
*** oshritf has joined #openstack-swift18:31
_ixThe data is in a state that we coined...18:33
_ixexpired-but-not-yet-deleted-and-not-in-the-.expiring_objects-containers-to-be-deleted-properly state.18:33
*** dcourtoi has quit IRC18:38
*** openstackgerrit has joined #openstack-swift18:39
openstackgerritAlistair Coles proposed openstack/swift master: Refactor proxy-server conf loading to a utils function  https://review.openstack.org/52572818:39
acolesI'd almost forgotten how to do an upstream patch! :)18:39
*** oshritf has quit IRC18:42
*** dcourtoi has joined #openstack-swift18:53
*** armaan has joined #openstack-swift18:55
*** armaan has quit IRC19:02
*** armaan has joined #openstack-swift19:03
frankkahlei'm about ready to give up....I have now built 6 vms in total, ranging from ubuntu 14, 16, 17 and centos multiple versions, based on the instructions here (https://docs.openstack.org/swift/latest/development_saio.html#common-dev-section), carefully done it by the book. built my own version of liberasurecode from git, and yest i still cannot get the unittests to run....19:15
tdasilvafrankkahle: what's the unit test error you are seeing? may I also suggest maybe trying one of these: https://docs.openstack.org/swift/latest/associated_projects.html#developer-tools ?19:17
tdasilvafrankkahle: i main this: https://github.com/thiagodasilva/ansible-saio and it works pretty well for me...altough i'll be honest and say that rhel7.4 has a regression and some unit tests will fail but if you run with a rhel7.3 image it should work fine19:19
tdasilvas/i main/i maintain19:19
frankkahlewell i started the unittests and got lots a bunch of dots, then saw some 'E's andd hoit control-c and saw this error (ERROR: test_real_config (test.unit.common.middleware.test_memcache.TestCacheMiddleware)19:19
tdasilvawhat's the error?19:21
* tdasilva wonders if it has to do with the tmpdir issue19:21
frankkahleoh maybe something to do with the cryptography not being a high enough version???19:21
*** klrmn has quit IRC19:22
tdasilvafrankkahle: let the unit tests run and post the errors to http://paste.openstack.org/19:22
frankkahlethis is the bottom of the eror... ContextualVersionConflict: (cryptography 1.2.3 (/usr/lib/python2.7/dist-packages), Requirement.parse('cryptography!=2.0,>=1.6'), set(['swift']))19:23
frankkahleshould i upgrade that somehow?19:24
acolesfrankkahle: have you tried running the tests using `tox -e py27 -r`19:24
frankkahleand what is tox?19:25
frankkahleBTW running on ubuntu 16.04.3 LTS19:26
acolesfrankkahle: https://docs.openstack.org/swift/latest/development_guidelines.html19:26
frankkahlehmm, lol, saus it cannot find tox.ini19:29
frankkahlesaus=says19:30
acolescd to the root dir of your swift repo19:31
frankkahlelol, ok thats is running19:31
*** klrmn has joined #openstack-swift19:34
*** joeljwright has joined #openstack-swift19:36
*** ChanServ sets mode: +v joeljwright19:36
_ixclayg: any additional thoughts on expired-but-not-yet-deleted-and-not-in-the-.expiring_objects-containers-to-be-deleted-properly files?19:37
clayg_ix: sounds like my original classification may have been incorrect19:38
frankkahleits still running, question should the unitests be run as sudo user?19:39
claygan object .data file on-disk (expired or otherwise) that does have a row in it's containers listing is exactly the definition of "dark data"19:39
claygan expired but not yet reaped object  would have a row in it's container until the expirer deletes it.  so it's probably somewhat inconsequential that the dark data you're finding is expired...19:40
acolesclayg: did you mean to type 'done NOT have a row' above?19:43
clayg_ix: dark data happens when some how an object .data file persists on a non-primary node longer than the configured reclaim_age (i.e. if you disconnect a node from the primaries for some time, issue a DELETE for some data, wait for the tombstones to get reclaimed then somehow reconnect the orphaned data to the rest of the cluster - which results in the orphaned stale data being repaired in the object tier w/o19:44
claygany record of the earlier, now reclaimed, tombstone/DELETE)19:44
*** klrmn has quit IRC19:44
acolesfrankkahle: you shouldn't need to sudo the unit tests19:46
claygacoles: i'm trying to read it again, I think I meant what I said... an expired .data file (i.e. a .data that exists after the x-delete-at metadata) WOULD have a row in the container - until the object-expirer reaps it ... unless it's dark data19:46
acolesclayg: yeah, but the line before 'an object .data file on-disk (expired or otherwise) that does NOT have a row in it's containers listing is exactly the definition of "dark data"'19:47
frankkahlei had to sudo to get tox command running..and its showing a lot of OK's so far19:47
claygacoles: yup thank you19:48
*** klrmn has joined #openstack-swift19:48
clayg_ix: an object .data file on-disk (expired or otherwise) that does NOT have a row in it's containers listing is exactly the definition of "dark data"19:48
acolesclayg: teamwork!19:49
clayganyway, the fix is basically to just issue a DELETE request through the proxy for any objects you find that need to be delete'd - if the containers are deleted you might need to recreate them to get the storage-policy correct or a script that will let you set x-backend-storage-policy-override19:49
claygenumeration of dark data is the hardest part19:50
claygi don't have a great example of either...19:52
*** joeljwright has quit IRC19:56
*** joeljwright has joined #openstack-swift20:02
*** ChanServ sets mode: +v joeljwright20:02
*** chsc has quit IRC20:03
*** gkadam has joined #openstack-swift20:05
*** armaan has quit IRC20:07
_ixclayg: Thanks for the advice.20:08
claygone intensive audit I've done in the past involved getting a list of all names of all .data files on disk from object metadata (similar to swift-object-info) then doing container listings on all the accounts discovered and digging into any files on disk that didn't show up in the container listings...20:11
_ixIt seems like a few things would have to go wrong in order for darkdata to be created, however...20:14
claygyes, generally - there's no known *open* bugs that cause dark data20:15
_ixThe sequence of events that you mentioned here is difficult to compare to our own. Indeed, the node was taken offline. But, before rejoining the cluster, the drives were wiped.20:15
_ixclayg: Well, we're running Mitaka, but I think the bugs that we saw annotated were fixed as for... Newton or Ocata.20:16
claygthat's a possibility...20:16
_ixs/for/of20:16
_ixLike I said above, I read through some chat logs where an xfs bulkstat or similar was discussed with redbo. Does that ring any bells for you?20:18
claygafaik nothing ever came of that investigation - and people have made do just walking the object trees like the auditor already does20:18
claygauditor hooks that @torgomatic was working on might have been an option...20:18
redboDid you remove the node from the ring right away?  Any handoffs that take longer than a week to clear have the same problem.20:20
_ixNo, and that's probably where the first major mistake was made.20:20
_ixAnother engineer just took it offline without making adjustmnets to the ring.20:21
_ixIt sat for some six weeks, with rsync logs stacking up with errors reaching that node. I'm trying to forget this blunder.20:22
_ixAnd indeed, your week figure is pretty accurate. I think it took about six days to bring the node back into the cluster.20:23
redboSo yeah, we bulkstat to get a list of all the objects that actually exist in the system, and then dump all container listings to get a list of everything that's in containers.  Then cross collate them to find out what objects aren't in listings.  But it's not wrapped up all nice and pretty.20:24
redboBecause we have too many objects for all of that data to just put it all in a database and do a set comparison.20:26
redboI say we, I'm not working on that anymore.20:26
*** geaaru has quit IRC20:26
_ixredbo: I don't mind doing some work. I think the disk utilization pressure has been relieved after beginning to manage the ring definitions appropriately and rebalancing the cluster.20:27
_ixCan I assume this is what we're talking about https://github.com/redbo/python-xfs ?20:28
_ixI think we can even do without the node that's at issue... if we're patient and adjust the weights to 0 on all of the drives attached to this node, can I assume that eventually we'll bottom out at a point > 0, and wipe those disks once more?20:30
redboIsn't this a dark data thing?  If you drop the weights of those drives, it'll just replicate that dark data out.20:32
* _ix thinks20:33
_ixI'm not sure how the replicator works. I assumed that only non-expired data gets replicated.20:34
redboIt's not that smart.  So if all of your dark data is expired, you're lucky and can probably clear it with a custom audit type thing.20:37
_ixCan you say more about a custom audit type thing?20:38
_ixThe .data files we've come across so far in our investigations has all been expired.20:38
redboI don't know, just where the auditor pulls the metadata, you could check to see if it's expired and throw it away.20:38
redboLike clayg said, torgomatic worked on make pluggable auditor modules there, but I don't know what happened with that.20:39
_ixOK. Well, thanks very much for the discussion. I really appreciate your taking the time.20:41
redboI had to appear, my name was said 3 times.20:43
*** gyee has joined #openstack-swift20:44
*** klrmn_ has joined #openstack-swift20:51
openstackgerritJohn Dickinson proposed openstack/swift master: Added swift version to recon cli  https://review.openstack.org/41399121:18
*** mat128 has quit IRC21:34
*** gkadam has quit IRC21:38
*** itlinux has quit IRC21:39
*** itlinux has joined #openstack-swift21:44
*** joeljwright has quit IRC21:47
*** nadeem_ has joined #openstack-swift21:53
*** nadeem_ has quit IRC21:53
openstackgerritTim Burke proposed openstack/swift master: tempurl: Make the digest algorithm configurable  https://review.openstack.org/52577021:55
openstackgerritTim Burke proposed openstack/swift master: tempurl: Deprecate sha1 signatures  https://review.openstack.org/52577121:55
*** flwang has quit IRC21:56
*** flwang has joined #openstack-swift22:01
*** threestrands has joined #openstack-swift22:05
*** threestrands has quit IRC22:05
*** threestrands has joined #openstack-swift22:05
*** rcernin has joined #openstack-swift22:05
mattoliveraumorning22:23
*** klrmn has quit IRC22:28
*** klrmn_ has quit IRC23:07
*** kei_yama has joined #openstack-swift23:22
*** manous_ has joined #openstack-swift23:29
*** _ix has quit IRC23:36
*** manous_ has quit IRC23:40

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!