Wednesday, 2015-05-27

mattoliveraunotmyname: lol, of course! We need a noymydictionary bot :P00:00
notmynamegetting the list of objects in a container. so I guess "object listings" is probably better00:00
notmynamepatchbot: what does notmyname really mean?00:00
claygstandardize GET a/c as "object listing" - but use context clues when someone says it wrong00:00
claygnotmyname: FWIW I realized I call them both "container listings" - I never say "account listings" :P00:01
notmynamethe code (so far) is _really_ short. and it's kinda cool because I can put a/c/o then kill the container servers then put /a/c/o2 and it works and is really fast. no 10+ sec timeout00:01
notmynameclayg: listing a container vs list of containers. both are container listings. I think your way is best ;-)00:02
claygnotmyname:  ;)00:02
claygjust parse it as "what I mean" - simple00:02
notmyname:-)00:02
notmynameI got the TODOs all written down in one place from the summit etherpads. tomorrow I'll work on getting those in some sort of organized form. If I'm incredibly lucky (probably not) I'll have something ready by the meeting tomorrow00:03
notmynameand remember that the meeting is at 2100UTC00:03
notmynamemattoliverau: ho: dmorita: kota_: ^00:03
notmynameacoles_away: ^ cschwede: ^00:04
mattoliveraunotmyname: yup, looking forward to the sleep in :)00:04
*** annegentle has quit IRC00:04
mattoliverauthanks for the reminder, and soon the ical stull in openstack will be managed via yaml00:04
notmynameis there anything in openstack that can't be solved with a yaml file?00:05
notmynamegit@github.com:openstack-infra/irc-meetings.git  <-- the new meeting stuff, I'm told00:05
notmynamenew openstack shirt idea: I replaced your ops team with a small yaml file00:05
notmynameok, I'm headed home00:06
notmynamelater00:06
mattoliveraunope nothing, when in doubt, yaml it!00:07
mattoliveraunotmyname: night00:07
*** annegentle has joined #openstack-swift00:09
*** david-lyle has quit IRC00:20
*** annegentle has quit IRC00:21
*** annegentle has joined #openstack-swift00:27
*** annegentle has quit IRC00:36
*** swdweeb has joined #openstack-swift00:40
*** jrichli has joined #openstack-swift00:42
*** swdweeb has left #openstack-swift00:50
*** minwoob has quit IRC00:50
*** zhill_ has quit IRC01:12
openstackgerritOpenStack Proposal Bot proposed openstack/python-swiftclient: Updated from global requirements  https://review.openstack.org/8925001:13
openstackgerritOpenStack Proposal Bot proposed openstack/swift: Updated from global requirements  https://review.openstack.org/8873601:14
*** jamielennox|away is now known as jamielennox01:23
*** kota_ has joined #openstack-swift01:26
*** annegentle has joined #openstack-swift01:27
*** annegentle has quit IRC01:28
openstackgerritVictor Stinner proposed openstack/swift: Use six to fix imports on Python 3  https://review.openstack.org/18545301:29
*** setmason has quit IRC01:35
*** david-lyle has joined #openstack-swift01:35
openstackgerritMauricio Lima proposed openstack/swift: Uncomment [filter: keystoneauth] and [filter: authtoken] sessions  https://review.openstack.org/18575501:43
*** annegentle has joined #openstack-swift01:49
*** kota_ has quit IRC02:06
*** kota_ has joined #openstack-swift02:08
*** kota_ has quit IRC02:15
*** david-lyle has quit IRC02:33
*** setmason has joined #openstack-swift02:33
*** gyee has quit IRC02:36
*** annegentle has quit IRC02:38
*** annegentle has joined #openstack-swift02:40
*** jrichli has quit IRC03:00
*** david-lyle has joined #openstack-swift03:11
*** annegentle has quit IRC03:32
hugokuofew photos of Swift 5th anniversary party in Vancouver http://s217.photobucket.com/user/tonytkdk/library/OpenStack%20Swift%205th%20anniversary03:35
mattoliverauhugokuo: nice :)03:37
*** annegentle has joined #openstack-swift03:37
hugokuo:)03:39
*** links has joined #openstack-swift04:15
*** proteusguy has quit IRC04:16
*** annegentle has quit IRC04:26
*** kota_ has joined #openstack-swift04:27
*** joeljwright has joined #openstack-swift04:37
*** bkopilov is now known as bkopilov_wfh04:52
*** leopoldj has joined #openstack-swift05:18
*** ppai has joined #openstack-swift05:22
*** cloudm2 has quit IRC05:32
*** setmason has quit IRC05:34
*** setmason has joined #openstack-swift05:35
*** SkyRocknRoll has joined #openstack-swift05:40
hohugokuo: great!05:41
hugokuoho: :)05:46
*** zaitcev has quit IRC05:55
*** mmcardle has joined #openstack-swift05:56
*** kota_ has quit IRC06:01
*** setmason has quit IRC07:06
*** setmason has joined #openstack-swift07:12
*** jordanP has joined #openstack-swift07:14
*** bkopilov_wfh has quit IRC07:20
*** bkopilov has joined #openstack-swift07:25
*** hseipp has joined #openstack-swift07:25
*** hseipp has quit IRC07:26
*** hseipp has joined #openstack-swift07:26
*** annegentle has joined #openstack-swift07:27
*** krykowski has joined #openstack-swift07:30
*** annegentle has quit IRC07:32
*** silor has joined #openstack-swift07:33
*** silor has quit IRC07:34
*** setmason has quit IRC07:41
*** chlong has quit IRC07:45
*** geaaru has joined #openstack-swift07:50
*** jistr has joined #openstack-swift07:52
*** kota_ has joined #openstack-swift08:39
*** mariusv has quit IRC08:42
*** mariusv has joined #openstack-swift08:47
*** mariusv has quit IRC08:48
*** mariusv has joined #openstack-swift08:49
*** mariusv has quit IRC08:54
*** kei_yama has quit IRC09:09
*** km has quit IRC09:10
eikkeis there any existing way to get policies for ObjectControllerRouter and DiskFileRouter registered from within an out-of-tree package?09:16
*** ekarlso has quit IRC09:16
*** ekarlso has joined #openstack-swift09:22
*** haypo has joined #openstack-swift09:25
*** acoles_away is now known as acoles09:31
*** theanalyst has quit IRC09:34
*** joeljwright has quit IRC09:36
*** dosaboy_ has quit IRC09:36
*** dosaboy has joined #openstack-swift09:36
*** theanalyst has joined #openstack-swift09:37
*** joeljwright has joined #openstack-swift09:39
*** elmo has quit IRC09:54
*** elmo has joined #openstack-swift09:57
*** theanalyst has quit IRC10:02
*** theanalyst has joined #openstack-swift10:04
*** theanalyst has quit IRC10:08
*** theanalyst has joined #openstack-swift10:12
*** proteusguy has joined #openstack-swift10:15
*** ho has quit IRC10:48
*** bhanu has joined #openstack-swift10:58
*** bhanu has quit IRC11:04
*** hseipp has quit IRC11:07
*** slavisa has joined #openstack-swift11:14
*** slavisa has quit IRC11:19
*** slavisa has joined #openstack-swift11:19
*** hseipp has joined #openstack-swift11:24
*** hseipp has quit IRC11:25
*** hseipp has joined #openstack-swift11:25
*** openstack has joined #openstack-swift11:38
-cameron.freenode.net- [freenode-info] if you're at a conference and other people are having trouble connecting, please mention it to staff: http://freenode.net/faq.shtml#gettinghelp11:38
*** aix has joined #openstack-swift11:47
*** SkyRocknRoll has quit IRC11:49
openstackgerritKota Tsuyuzaki proposed openstack/swift: Fix FakeSwift to simulate SLO  https://review.openstack.org/18594011:50
*** SkyRocknRoll has joined #openstack-swift11:50
*** ppai has quit IRC11:54
*** hseipp has quit IRC12:06
*** hseipp has joined #openstack-swift12:07
*** bkopilov is now known as bkopilov_wfh12:08
*** ppai has joined #openstack-swift12:08
*** kota_ has quit IRC12:13
*** acoles is now known as acoles_away12:18
openstackgerritAlistair Coles proposed openstack/swift: Make SSYNC receiver return a reponse when initial checks fail  https://review.openstack.org/17783612:23
openstackgerritAlistair Coles proposed openstack/swift: Remove _ensure_flush() from SSYNC receiver  https://review.openstack.org/17783712:23
*** ppai has quit IRC12:36
*** annegentle has joined #openstack-swift12:41
*** ekarlso has quit IRC12:53
*** ekarlso has joined #openstack-swift12:53
cschwedeclayg: did you recently saw errors running the vagrant saio? today i always get a „AttributeError: 'VersionInfo' object has no attribute 'semantic_version‘“, i think it is somehow related to pbr and or pip :/12:56
openstackgerritMerged openstack/swift: Exclude local_dev from sync partners on failure  https://review.openstack.org/17507612:59
*** annegentle has quit IRC13:06
openstackgerritMerged openstack/swift: fixup!Patch of "parse_content_disposition" method to meet RFC2183  https://review.openstack.org/18538913:15
*** jkugel has joined #openstack-swift13:17
*** mwheckmann has joined #openstack-swift13:21
*** acoles_away is now known as acoles13:25
*** annegentle has joined #openstack-swift13:28
*** robefran_ has joined #openstack-swift13:30
cschwedeclayg: nevermind, i somehow got the vagrant saio manually working. it’s a testtools issue, needed some updates (with more dependencies) and then it worked again. going to submit a patch later13:30
*** wbhuber has joined #openstack-swift13:30
*** annegentle has quit IRC13:30
*** annegentle has joined #openstack-swift13:33
*** mwheckmann has quit IRC13:33
*** krykowski has quit IRC13:35
*** krykowski_ has joined #openstack-swift13:35
openstackgerritChristian Schwede proposed openstack/swift: Allow SLO PUTs to forgo per-segment integrity checks  https://review.openstack.org/18447913:38
*** links has quit IRC13:41
cschwedetimburke: ^^ i only wanted to fix a minor typo and avoid nitpicking, but gerrit didn’t detect this trivial change. sorry for that13:42
*** cloudm2 has joined #openstack-swift13:42
*** slavisa has quit IRC13:44
*** bhakta has left #openstack-swift13:47
*** annegentle has quit IRC13:52
*** annegentle has joined #openstack-swift13:53
*** gsilvis has quit IRC13:53
*** gsilvis has joined #openstack-swift13:54
openstackgerritAlistair Coles proposed openstack/swift: Filter Etag key from ssync replication-headers  https://review.openstack.org/17397313:54
*** mcnully has joined #openstack-swift14:01
openstackgerritAlistair Coles proposed openstack/swift: Make SSYNC receiver return a reponse when initial checks fail  https://review.openstack.org/17783614:04
openstackgerritAlistair Coles proposed openstack/swift: Remove _ensure_flush() from SSYNC receiver  https://review.openstack.org/17783714:04
*** chlong has joined #openstack-swift14:04
*** jrichli has joined #openstack-swift14:09
acoles^^ how come clayg gets to land first and I get to fix merge conflicts :/14:10
acolesclayg zaitcev : https://review.openstack.org/177836 needs your +2's again please due to a trivial import conflict14:11
*** mcnully has quit IRC14:16
*** annegentle has quit IRC14:17
*** bsdkurt has quit IRC14:17
*** leopoldj has quit IRC14:17
*** breitz has quit IRC14:19
*** breitz has joined #openstack-swift14:20
mordredwho's a good person to poke with questions about swift at rackspace? specifically, Infra are uploading images to swift and we're getting ClientException: Object POST failed returned a 504 Gateway Time-out14:22
mordredI'm wondering if there is something we're doing wrong on our side? or if there is an issue we need to report to someone14:22
*** mwheckmann has joined #openstack-swift14:24
*** minwoob has joined #openstack-swift14:32
*** annegentle has joined #openstack-swift14:32
*** esker has joined #openstack-swift14:33
*** krykowski_ has quit IRC14:33
*** krykowski has joined #openstack-swift14:37
*** proteusguy has quit IRC14:47
MooingLemuroh, great.. it looks like on my EC system, there are a bunch of objects that lost most of their fragments after I shuffled the devices around between machines14:50
*** acampbell has joined #openstack-swift14:57
*** zaitcev has joined #openstack-swift14:58
*** ChanServ sets mode: +v zaitcev14:58
*** acampbell has quit IRC14:58
*** acampbell has joined #openstack-swift14:59
*** mwheckmann has quit IRC14:59
*** proteusguy has joined #openstack-swift14:59
*** acampbel11 has joined #openstack-swift14:59
*** bhakta has joined #openstack-swift15:01
*** bhakta has left #openstack-swift15:01
*** ChanServ sets mode: +v cschwede15:02
*** mwheckmann has joined #openstack-swift15:04
*** krykowski has quit IRC15:08
*** slavisa has joined #openstack-swift15:11
*** SkyRocknRoll has quit IRC15:14
*** acampbel11 has joined #openstack-swift15:16
*** acampbel11 has quit IRC15:16
*** slavisa has quit IRC15:17
notmynamegood morning15:18
notmynamemordred: if ahale (ops) or hurricanerix_ (dev) are around, they might be able to look. otherwise a support ticket is probably faster15:19
mordrednotmyname: cool. thanks!15:19
mordrednotmyname, ahale, hurricanerix_: we've found a weirdness on our side we're investigating for the moment, if I clear that up and still get 504's, I'll come pinging15:19
notmynameeikke: I'd love to have those exposed via a python entry_point and settable via a config15:19
*** acampbell has quit IRC15:20
*** silor has joined #openstack-swift15:21
*** slavisa has joined #openstack-swift15:22
*** rbrooker_ has joined #openstack-swift15:25
*** nadeem has joined #openstack-swift15:25
*** nadeem has quit IRC15:25
tdasilvanotmyname, eikke: I was thinking of doing that here: https://review.openstack.org/#/c/159285/15:27
tdasilvaeven thou this code is to leave in the swift tree, it could serve as an example15:28
*** nadeem has joined #openstack-swift15:28
*** setmason has joined #openstack-swift15:31
acolesnotmyname: ack meeting time change, calendar updated!15:32
notmynameacoles: thanks15:32
openstackgerritVictor Stinner proposed openstack/swift: Replace StringIO.StringIO with six.BytesIO  https://review.openstack.org/18604215:32
acolesgood news is swift meetings no longer conflict with champions league matches (when they restart...)15:32
notmynamethat is good news15:33
openstackgerritVictor Stinner proposed openstack/swift: Get StringIO and cStringIO from six.moves  https://review.openstack.org/18545715:34
tdasilvaacoles: haha, good point15:34
*** zhill has joined #openstack-swift15:36
*** mwheckmann has quit IRC15:38
*** bhakta has joined #openstack-swift15:38
mordrednotmyname: hey - so ... checking in to my infra problem ...15:41
mordrednotmyname: I see that the create call worked just fine, but in trying to update two pieces of metadata, it's sitting there for _quite_ a while15:41
mordredis updating metadata more expensive than I would have thought?15:41
mordredyup15:42
notmynamemordred: yes15:42
mordredthat's where I'm getting the 50415:42
mordredAHA15:42
mordredawesome15:42
mordredsee, I'm learning things15:42
mordrednotmyname: should I just set the metadata as part of the create in the first place then?15:42
notmynamemordred: it's much better to upload the metadata with the original PUT if at all possible15:42
notmynameyes15:42
mordredcool15:42
mordredI will do that15:42
mordredthanks! bug found and solved15:42
notmynamerackspace, like most swift deployers, has post-as-copy turned on (for good reasons). it means that updating object metadata results in a whole server-side copy of the object15:43
eikkenotmyname: cool, might work on that15:43
notmynameeikke: looks like tdasilva has some ideas too (as I'd expect)15:43
eikketdasilva: assuming 159285 goes in, it won't need that, right15:43
*** mwheckmann has joined #openstack-swift15:44
*** nadeem_ has joined #openstack-swift15:44
*** nadeem has quit IRC15:44
notmynamepatch 15928515:45
patchbotnotmyname: https://review.openstack.org/#/c/159285/15:45
tdasilvaeikke: well, that's just one storage policy (which I think would cover most third-party SP), but it won't cover all15:45
mordrednotmyname: the headers param for put_object should be the same form as the ones for post_object, yeah?15:45
notmynametdasilva: I'd like to see it independently of single process swift patches. ie just something that makes life easier for anyone writing/deploying alternate DiskFile implementations15:45
notmynamemordred: correct15:45
mordrednotmyname: see... here you go having consistent parameters15:46
mordredit's confusing15:46
notmynameswift meeting time today is http://www.timeanddate.com/worldclock/fixedtime.html?hour=21&min=00&sec=015:46
mordredare you sure I dont need to append liek a unicode bunnyrabbit for one of them?15:46
eikketdasilva: agree. the reason I asked for this was to allow us to write an SP implementation similar to the single-process one even though the single-process one wouldn't be merged yet. might also allow to put some diskfile-specific config in the storage policy definition15:46
notmynamemordred: sorry. I'll try to make sure we can update our API by 1.5 versions every release and change the whole sdk every time ;-)15:46
eikkenotmyname: for diskfiles one can already use a custom entrypoint due to paste15:47
notmynameeikke: sort of. you have to have your own (basically copied) object server that instantiates it's own DiskFile. I want to see you use the upstream object server directly and reference the DiskFile that is used15:47
*** annegentle has quit IRC15:47
eikkewe only inherit from Swift ObjectController and indeed construct custom DiskFile stuff15:49
tdasilvaeikke, notmyname: yeah, SoF is basically the same15:51
notmynameright. and I'd prefer to see you use the upstream object server code. makes it so that the DiskFile interface is what you have to implement instead of the obejct server interface15:53
*** barra204 has quit IRC15:54
*** slavisa has quit IRC16:06
openstackgerritMerged openstack/swift: Allow SLO PUTs to forgo per-segment integrity checks  https://review.openstack.org/18447916:07
*** jistr has quit IRC16:07
eikkenotmyname: then how do you handle different policies?16:09
eikke(and more specifiically, their configuration)16:09
*** Fin1te has joined #openstack-swift16:09
notmynameeikke: good question, and I don't know yet :-)16:09
notmynameeikke: one idea would be to have an entry point per policy per server. that might be simplified with the current single-object-server-per-drive that swifterdarrell has been lookign at16:10
eikkenext to that, there's also the compatibility question... I really don't want to end up in a situation where our backend needs diverging codebases in order to support different swift versions (until now it's feasible with some minor hacks)16:10
notmynameeikke: but also, the DiskFile, IMO, should actually know anything about policies (other than the index or some other unique identifier). the DiskFile needs to persist data to storage media (or in your case some other system). so as long as a policy results in a deterministic read/write, then it shoudl work, right?16:12
notmynames/should/shouldn't/16:12
notmynameat least, that's the ideal in my mind :-)16:12
tdasilvaeikke, notmyname: just to clarify, that's two different problems right? one is the python-entry point for the DiskFile so the object-server can be reused. The other, is the ability to add more storage policy types and new ObjectControllers in the proxy16:12
notmynametdasilva: correct. those are 2 separate things, and I'm mostly interested in the entry point for disk file implementations on the object server16:13
eikkenotmyname: mostly, I guess (it's +- what we do, all policy management is done in the objectcontroller, which instantiates diskfiles which read/write data to some location passed in their constructor, don't care about policies themself)16:14
*** rbrooker_ has quit IRC16:17
*** bsdkurt has joined #openstack-swift16:21
openstackgerritVictor Stinner proposed openstack/swift: Get StringIO and cStringIO from six.moves  https://review.openstack.org/18545716:23
openstackgerritVictor Stinner proposed openstack/swift: Replace StringIO with BytesIO for WSGI input  https://review.openstack.org/18607116:23
openstackgerritVictor Stinner proposed openstack/swift: Replace StringIO with BytesIO for file  https://review.openstack.org/18607216:23
openstackgerritVictor Stinner proposed openstack/swift: Replace StringIO with BytesIO in ssync  https://review.openstack.org/18607316:23
*** cdelatte has joined #openstack-swift16:29
*** haypo has quit IRC16:31
*** bsdkurt has left #openstack-swift16:31
MooingLemurclayg: Just wanted to sound the alarm a bit with my experiences with EC so far, but I don't have an understanding of what happened yet.  I decided to recreate my ring (with the same part power of course).  And along with physically moving some drives around, I ended up with a lot of partitions that have only a small collection of fragments of objects and the partitions are not on other devices.  The fragments are simply gone.16:37
notmynamehttp://lists.openstack.org/pipermail/openstack-operators/2015-May/007132.html  <-- torgomatic16:37
MooingLemurclayg: I've been trying to understand the replication code path, and I really wish the reverts were somehow logged, since I think that's where the errant deletions happened.16:38
MooingLemurs/replication/reconstruction/16:38
claygMooingLemur: quarantine?16:39
MooingLemurnope, no quarantine dirs on any devices in my cluster16:39
claygMooingLemur: which patches are you running?16:40
MooingLemurclayg: just the one that forces the primary to eat its own fragment index16:40
claygMooingLemur: so same part power... so we expected all the parts to be on the wrong devices - but still totally usable to the new cluster/ring16:40
claygMooingLemur: and it only worked for *some* objects?  Or it worked like not at all.16:41
acolescd16:41
acolesoops!16:41
claygacoles:~/$16:41
acolesclayg: something like that yeah :D16:41
acolesrm -rf16:42
acolesoops ;)16:42
MooingLemurclayg: Lemme see what I can gather.  Good question.  I've just been doing things like: for i in `seq -w 01 05`; do ssh swift-storage-$i ls -l /srv/node/\*/objects/100214/???/*/*\#*; done16:42
MooingLemurbut for lines that have appeared in my logs saying only 4/9 fragments were found, etc16:42
claygPermission denied16:43
acolesphew16:43
acolesclayg: thx for all your reviews btw. i'm all over the place with jet lag but catching up slowly16:43
claygMooingLemur: how many nodes do you have?  you may have to go trolling around on all the disks - it could be that they're out there but the proxy can't find them and the reconstructor isn't making progress for some reason16:44
claygMooingLemur: I've seen pyeclib randomly segfault - there's a couple of patches about for some of those issues - but I don't think we've squared it all yet16:44
claygacoles: you must be jet lagged if you think I'm doing reviews :\16:45
*** hseipp has left #openstack-swift16:45
claygacoles: although i do technically have one of your ssync changes checked out at the moment16:45
acolesheh16:45
MooingLemurclayg: I have 5 nodes, with 4 devices each.  My ssh should be able to find them.16:45
claygacoles: the *idea* is that will result in a score16:45
claygMooingLemur: good hunting!16:45
MooingLemuralso that ssh should be objects-1, not objects... but anyway, the idea's the same16:45
acolesclayg: i think i am in timezone clay-4days16:46
MooingLemurclayg: it's pretty clear many of the objects have all of their fragments... maybe I should just sample my container listing and count the 200s and the others.16:47
MooingLemurclayg: when reconstruction reverts, does the .durable file tag along (or a new one created on the receiver)?16:48
*** annegentle has joined #openstack-swift16:48
claygMooingLemur: like I said a non-200 is something to dig into, but we need to know if the fragments are "out there somewhere"16:48
claygMooingLemur: maybe depending on your scheme 2 * replica might acctually already be hitting every disk for you - dunno16:49
claygMooingLemur: oh, could be duplicates as well - if some of the responding nodes have multiple frags the proxy can't really tease that out16:49
MooingLemurclayg: https://bpaste.net/show/68b504bc7a6216:49
claygMooingLemur: receiver will create a new one16:49
MooingLemurI only have 5 storage nodes, and /srv/node/* should cover all the mounts16:50
MooingLemurit's a 12-replica policy (9+3)16:50
*** nadeem_ has quit IRC16:50
MooingLemurthat one just happens to be a small object16:50
claygMooingLemur: yeah looks sketchy - like maybe auditor or some other jackhole swept over the hashdir and used the wrong hash_cleanup_listdir :\16:51
MooingLemurclayg: https://bpaste.net/show/5d593cb6a95016:51
MooingLemuryeah, stuff got destroyed :P16:52
*** annegentle has quit IRC16:53
notmynameclayg: https://imgflip.com/i/m2l0l16:53
MooingLemurclayg: it looks like it's way worse than an occasional lost race though16:53
*** nadeem has joined #openstack-swift16:54
claygnotmyname: good one!16:54
clayg12-frag on... what'd you say 20 devices16:55
MooingLemuryeah16:55
claygMooingLemur: yeah I'm pretty sure the auditor ate them :\16:58
MooingLemurclayg: I found some instances in the logs where the same host kept removing the same partition on the same devices an hour after it did it the first time17:00
claygMooingLemur: that's *probably* nothing, partition reaping (esspeically with mixed fragments) can be a long term effort.17:01
MooingLemurit's as if something was eating the suffixes, reconstructor was removing the empty partition dir, but then some other host kept putting it back17:01
claygMooingLemur: I'm pretty sure what happens is that you get a bunch of frags in hashdir, then the auditor spins up a replicated diskfile and calls hash_cleanup_listdir on it which eats all the frags but the last lexigraphically sorted one17:02
MooingLemurohhh.. the non-EC semantics17:03
claygI'm guessing we didn't notice it because a) it ignores durables b) it only happens with multi-frag c) we didn't allow multi-frag d) my patch allows it - but wasn't tested17:03
claygwell... *YOU* tested it - and it sucked for you.  so boo-berries on me for that; kudos to you tho17:04
claygassuming I'm right you saved someone a ton of heart-ache17:04
MooingLemur:D17:04
claygwell... I mean there's a non-zero chance we would have caught it testing/reviewing - but this is sorta better almost17:05
MooingLemurauditor doesn't seem to log how many of those "purge obsolete object" ops it did17:06
MooingLemura pretty common thing on most clusters and probably not generally useful, but it would have been in this case because my cluster has had very few object replacements17:07
*** mwheckmann has quit IRC17:07
*** nadeem has quit IRC17:08
*** nadeem has joined #openstack-swift17:08
*** annegentle has joined #openstack-swift17:10
MooingLemurclayg: I think you're on to something though, because for the objects that are bereft of all their fragements, it always seems to have fragement 11.17:10
MooingLemurbut I'm surprised it's not 9.  I thought lexically 11 comes between 1 and 2.17:11
*** annegentle has quit IRC17:13
*** petertr7 is now known as petertr7_away17:16
*** kutija has joined #openstack-swift17:16
claygMooingLemur: so in my situation the auditor did eat them - but they just got moved to /srv/node1/sdb5/quarantined/objects-1/5b22a3128fa5035650369eac48c2858917:16
claygMooingLemur: fwiw i like the throw out the ring and create a new one approach to finding edge cases in revert handling - nice17:18
MooingLemurI don't have /srv/node/*/quarantined on any host17:18
claygMooingLemur: because BALEETED!  (?)17:19
MooingLemurI guess so.  Why did your test quarantine them I wonder17:20
claygMooingLemur: good question... if it was hash_cleanup_listdir it would have just dropped them...17:20
claygMooingLemur: it says "Exception reading metadata"17:21
MooingLemurhmm, I didn't have any of those.. the xattrs should have always been good17:22
MooingLemurI'm running on ext4, but I doubt that would make that particular difference17:22
claygMooingLemur: well I don't expect the issue was xattrs - the other disks audit just fine - it was only the one hashdir with the multiple frags17:23
*** mwheckmann has joined #openstack-swift17:23
MooingLemurclayg: btw, throwing out the ring was out of wanting to know whether the overloading would be improved by setting the weights of all the target devices directly rather than the incremental raising of the weights from flat.  (The answer was not by much) :)17:30
MooingLemurthe cluster has a mixture of drive sizes, so I was finding the best weighting to use most of their drives proportionally to their capacity.  I just had to underweight the single largest drive.17:32
*** aix has quit IRC17:35
*** acoles is now known as acoles_away17:36
*** acoles_away is now known as acoles17:43
*** acoles is now known as acoles_away17:43
*** jordanP has quit IRC17:46
*** chlong has quit IRC17:53
claygwow, so how did I not know that mv will preserve xattrs by default and cp will not17:57
*** mlima has joined #openstack-swift17:57
claygdid I like know it once and then forgot it?  do i normally just use mv and assumed cp would work?  do i type cp -a out of muscle memeory?17:57
claygMooingLemur: so with xattrs copied over correctly the auditor seems indifferent to the ec hashdir with multiple fragments18:00
MooingLemurclayg: indifferent?  As in it ignores it and leaves the fragments there?18:03
claygMooingLemur: yeah18:03
mlimaI did the deploy of swift following the kilo manual and only managed to make it work after restarting the RabbitMQ service. Does that make sense?18:04
claygmlima: lol, no18:06
claygmlima: swift doesn't use rabbit - neither does keystone acctually (that I know of)18:06
claygmlima: so basically rabbit shouldn't have effected anything swift related - cielometer maybe?18:07
claygacoles_away: and this is how I end up not reviewing your patch :'(18:07
mlimamy deploy use only swift and keystone18:07
mlimahowever, I had to restart the RabbitMQ service to it works18:08
MooingLemurclayg: so, as I understand the flow18:10
mlimaI think the problem is not the swift or the keystone, but communication between them. the RabbitMQ has some interaction with them?18:10
MooingLemuroops, mishit enter18:10
mlima+clayg: I think the problem is not the swift or the keystone, but communication between them. the RabbitMQ has some interaction with them?18:10
*** setmason has quit IRC18:11
MooingLemurclayg: so, as I understand the flow of things that would reach hash_cleanup_listdir, even reconstructor.py calling it own get_hashes would reach it.  I don't think it's necessarily auditor.  Even reconstructor could trigger this cleanup.18:11
MooingLemuron its own18:11
claygmlima: neither swift nor keystone communicate via rabbit - and definately not to each other - maybe the metering stuff - does either project have anything cielometer related in the pipeline?18:11
claygMooingLemur: yeah but in all of those cases it's policy aware and they route to the *correct* hash_cleanup_listdir - i thought maybe the auditor was just to stupid - but so far it doesn't seem to be causing problems18:12
*** setmason has joined #openstack-swift18:13
claygnotmyname: whats the magic git am syntax that lets me apply a patch file with errors like I was cleaning up a merge conflict?18:13
*** mmcardle1 has quit IRC18:13
mlima+clayg: I use it [pipeline:main] pipeline = catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk ratelimit authtoken keystoneauth container-quotas account-quotas slo dlo proxy-logging proxy-server18:14
claygmlima: seems pretty reasonable - nothing in there gunna talk to amqp18:15
claygnotmyname: git apply -3 <patch> worked just fine - why did I have to say the -3 ?  there wasn't any conflicts are anything... just maybe line numbers moved around?18:17
mlima+clayg: I opened a bug and fix on manual (https://review.openstack.org/#/c/185783/), however this was not well accepted18:17
openstackgerritOpenStack Proposal Bot proposed openstack/swift: Updated from global requirements  https://review.openstack.org/8873618:18
*** geaaru has quit IRC18:20
*** setmason has quit IRC18:21
claygmlima: yup pretty strange, my devstack setup doesn't have rabbit running, when I point my standalone dev setups at a standalone keystone, i don't have rabbit running - it's just not related18:21
claygmlima: so I'm basically in the same camp as everyone else on that bug :\18:22
claygmlima: it may be trivial to reproduce what you're seeing by following the guide (or maybe not, I'm not familiar with those docs) - but it doesn't make sense - there must be something else going on - seems like folks would rather leave the possible issue in docs than cargo cult something that makes no sense18:23
claygmlima: maybe you can create a smaller reproducible example of how keystone and swift are somehow giving the appearance of interacting with rabbit?18:24
*** fthiagogv has joined #openstack-swift18:26
tdasilvaclayg: do you guys use swift-bench at all? or mostly ssbench?18:30
claygtdasilva: i use swift-bench some - ssbench doesn't support direct-to-object tests (that I know of)18:31
*** setmason has joined #openstack-swift18:31
claygMooingLemur: I tried doing the blow away and recate rings trick18:32
claygMooingLemur: all of my frags parts were basically in the wrong place after that - but with my primaries-must-eat-primary-frags patch applied after a service restart to pickup the new code basically everything slammed back to where it belonged on the first pass18:33
*** setmason has quit IRC18:34
*** setmason has joined #openstack-swift18:34
*** acampbell has joined #openstack-swift18:35
*** acampbel11 has joined #openstack-swift18:36
*** Fin1te has quit IRC18:37
claygMooingLemur: so I just ran the experiement again - after a complete ring rebuild, during the subsequent reconstruction revert I definately observed a node holding multiple fragments - but then it pushed off the misplaced one shortly after - no harm no foul18:45
*** Fin1te has joined #openstack-swift18:47
claygauditor, updater, replicator - nothing seems to produce the observation of fragment loss18:52
dmsimardI'm seeing swift-object-replicator eating a ton and a half of CPU on my storage nodes. Any way to tell if it's relevant for it to be taking so much resources ? Should I tone down the amount of workers or something ?18:53
claygdmsimard: how many cores?18:54
dmsimardclayg: 16 cores18:54
*** silor1 has joined #openstack-swift18:54
claygdmsimard: and like *eight* of them are pegged at 100%?18:55
*** silor has quit IRC18:56
claygdmsimard: yeah concurrency is the only tunable I see - maybe run_pause18:56
notmynamereminder that the meeting is NOT in 4 minutes. it's in 124 minutes (~2hours)18:56
dmsimardclayg: The current concurrency is set to 16, it's eating around 500% CPU total18:56
claygdmsimard: that makes very little sense :D18:57
claygdmsimard: aside from the subprocess calls to rsync - the replicator isn't even multiprocess - it should all e on one core greenthreaded18:58
swifterdarrelldmsimard: clayg: how many partitions do you have per disk, on avg? maybe your part power's really high compared to your disk count?18:58
swifterdarrellclayg: dmsimard: maybe the %CPU is including forked-off children (e.g. the rsyncs or something?)18:59
dmsimardclayg: What would be a sane concurrency for replicator ?18:59
*** cutforth has joined #openstack-swift19:01
claygdmsimard: whatever doesn't use all your cpu?19:03
dmsimardswifterdarrell: I'm looking at 3800ish partitions on avg, the cluster is indeed rather small19:03
dmsimard200 disks or so19:03
dmsimardclayg: Any downsides to reducing the amount of concurrency ?19:04
claygdmsimard: less partitions replicated at once means longer replication cycle time19:06
*** mmcardle has joined #openstack-swift19:07
claygdmsimard: that part count looks reasonable - you're at what part power 17-18?19:08
claygmaybe - is some of that cpu like disk wait?19:09
dmsimardclayg: yeah, 1819:10
dmsimardclayg: Not getting much i/o wait, less than 5% on avg19:11
dmsimardI just halved the concurrency of replicator, I'll monitor the impacts and see what happens19:12
dmsimardOn another note, I saw some interesting stuff at the summit in the HP Talk - putting the container and account databases right on the proxy nodes is a good idea :)19:13
claygMooingLemur: well... I can't reproduce and my theory about the auditor turned out to be false19:13
*** gyee has joined #openstack-swift19:16
dmsimardThe replication time returned by swift-recon, I'm assuming the unit is seconds ?19:25
*** theanalyst has quit IRC19:27
openstackgerritJohn Dickinson proposed openstack/swift: drop Python 2.6 testing support  https://review.openstack.org/18613719:29
*** tab__ has joined #openstack-swift19:29
*** theanalyst has joined #openstack-swift19:30
openstackgerritMerged openstack/swift: go: log 499 on client early disconnect  https://review.openstack.org/18357719:32
MooingLemurclayg: makes me think it's a rarer race than I thought.  I'm gonna audit all my objects in the EC containers.19:35
claygdmsimard: i'm not so sure - looks like units might be... minutes?19:35
MooingLemurthis will take a while :)19:35
MooingLemur5TB or so19:35
claygMooingLemur: is there any possiblity in the process of redoing the rings that services got started with a swift.conf that might have thought the ec datadir was a replicated type policy?19:37
MooingLemurclayg: I don't think so.  The swift.conf is the same on all nodes, same md5, all modified May 14, and all servers have been restarted multiple times19:39
MooingLemurI just checked19:39
claygMooingLemur: sigh19:40
MooingLemur(restarted multiple times before the whole reshuffle that happened starting last Saturday)19:41
claygMooingLemur: do you have any other patches applied to 2.3.0?19:41
*** lpabon has joined #openstack-swift19:41
MooingLemurclayg: no, just that one, and only on the storage nodes.  It was applied by user_patches as part of gentoo portage, and the patch applied cleanly, otherwise the installation would have bailed out.19:43
MooingLemurclayg: I'm certainly willing to try again and reshuffle the entire ring to see what happens (after I either re-upload or remove the orphaned object fragments)19:47
MooingLemurclayg: especially if you have an idea where I could put in some debug logging19:47
claygMooingLemur: I don't really have any good ideas :\19:49
MooingLemurclayg: I mean, I'd like to log the revert itself19:50
MooingLemurso perhaps we can get some information from what doesn't get logged, more than what is19:50
swifterdarrelldmsimard: at least one value is dealt with raw in minutes but you'd have to check the code to see if it's normalized to seconds somewhere (and which one it is--I don't remember off the top of my head)19:51
claygswifterdarrell: dmsimard: yeah sorry ment to say that I think the replication time in swift recon is minutes19:54
claygswifterdarrell: thanks19:54
*** james_li has joined #openstack-swift20:03
james_liHi All, a quick question: is delete_object a sync call, or async?20:04
claygjames_li: from the HTTP api perspective - the majority of servers will have written the tombstone and unlinked the older .data file before the client gets a response20:06
james_liclayg: ok thanks. is the delay related to the object size, i.e. deleting a larger object will cost longer time than smaller objects?20:08
claygjames_li: meh, maybe... it's probably more like how full your disk is20:10
zaitcevwith the exception of a cluster of object versioning enabled, in which case deleting bigger objects definitely takes longer20:15
james_liclayg: yeah. I am implementing a feature in Solum which includes deleting of large images from swift, I was not sure if I can do delete_object in the Solum API layer because I don't want our API gets blocked for a long time. So from your explanation I can see its probably fine to do deletion in the Solum API.20:18
*** esker has quit IRC20:19
openstackgerritSamuel Merritt proposed openstack/swift: Allow SAIO to answer is_local_device() better  https://review.openstack.org/18339520:19
openstackgerritSamuel Merritt proposed openstack/swift: Allow one object-server per disk deployment  https://review.openstack.org/18418920:19
*** silor1 has quit IRC20:23
torgomaticjames_li: when you say "delete_object", to what function, exactly, are you referring?20:24
james_litorgomatic: swiftclient.client.delete_object20:25
torgomaticjames_li: okay, good... that looks as though it simply issues a single HTTP DELETE request, which means everything everyone said holds true20:26
james_litorgomatic: thanks for clarification :)20:28
*** mlima has quit IRC20:32
*** lpabon has quit IRC20:32
*** zhill has quit IRC20:33
*** zhill has joined #openstack-swift20:33
*** robefran_ has quit IRC20:34
*** barra204 has joined #openstack-swift20:35
*** annegentle has joined #openstack-swift20:37
*** acampbel11 has quit IRC20:37
*** rdaly2 has joined #openstack-swift20:37
*** acampbell has quit IRC20:37
peluseMooingLemur, I see a looong conversation with clayg wrt EC.  Any chance you can post a quick summary of problem and discussions to date?20:43
MooingLemurpeluse: sure.20:43
*** mcnully has joined #openstack-swift20:45
*** redbo has quit IRC20:45
*** redbo has joined #openstack-swift20:45
*** ChanServ sets mode: +v redbo20:45
*** esker has joined #openstack-swift20:46
MooingLemurpeluse: Last week, I had uploaded a large amount of data to a stable cluster (5 hosts, 20 devices) with a couple of EC policy (9 data, 3 parity) containers.  I had adjusted weights and ran into issues with fragments unable to be pushed to primary nodes which was solved by clayg's patch that forced primaries to always accept their own fragments even if a different fragment of the same object existed there.  Over the weekend, I ...20:47
MooingLemur... replaced some devices (that were zero-weight), and then physically moved others around between hosts.  I also ended up recreating the ring with the same part power in an effort to determine whether the balance/dispersion would be better on a fresh ring than an organically grown/changed one.  Things seemed to revert okay at first, but after a couple days (this morning) I noticed that was still having trouble finding some ...20:47
MooingLemur... fragments.20:47
MooingLemurpeluse: I did an ls for the data on all devices in the cluster, and found that many fragments were missing on the objects that the logging was complaining about.  Something ate the fragments of some of the objects.20:48
*** ho has joined #openstack-swift20:50
*** barra204 has quit IRC20:50
pelusewow, OK that's a few steps allright :)20:50
hogood morning!20:50
MooingLemurpeluse: clayg thought it may have been the auditor using non-EC semantics which ended up reaching hash_cleanup_listdir without a fragment index, and lexically pruning out everything but the highest, but he found auditor wasn't to blame on his tests, and he was unable to reproduce.20:51
peluseMooingLemur, so it sounds like you applied clayg's fix after some failures has already occured, is that the case?20:51
MooingLemurpeluse: I applied his patch after replication got stuck, on a test I was doing last week after moving some weights.  Everything ended up happy after a day or so there.20:51
MooingLemurs/replication/reconstruction/20:52
*** mmcardle has quit IRC20:53
MooingLemurpeluse: at this point, I've added logging to the delete after revert, and after I audit and re-upload the broken objects, I'm going to scramble the ring again and see if it happens again.20:53
*** kota_ has joined #openstack-swift20:53
peluseMooingLemur, yeah, I'm just trying to think about how to break it down into a test that we can repro step by step20:53
peluseMooingLemur, that was be good if you can be 100% sure that everything is fine and then it sounds like you're saying a simple rebalance causes from frags to go missing?20:54
peluses/from/some20:55
MooingLemurpeluse: the simple rebalance didn't seem to make frags go missing.  I think it was the ring re-creation20:55
MooingLemurwhere perhapse none of the frags were in the right place20:55
pelusemaybe you can define 'scramlbe the ring' more for me in super clear terms?20:55
MooingLemurI had organically changed the original ring at first, raising weights, adding devices, rebalancing, modifying weights.20:56
MooingLemurbut then I renamed that old ring, re-created it from scratch with the final weights where they were supposed to be, then rebalanced20:56
peluseahh, scrambling :)20:56
*** slavisa has joined #openstack-swift20:57
MooingLemurso I suspect it was a completly different layout, so basically all data would end up being moved20:57
*** rdaly2 has quit IRC20:57
peluseif you can narrow it down, like it sounds like you're working on, to a ring chance and what exact actions were taken between the two states of the ring that would obviously be very good data20:58
pelusebut I didn't read the whole backlog so maybe you guys already came  to that conclusion...20:58
mattoliverauMorning20:58
openstackgerritMerged openstack/swift: drop Python 2.6 testing support  https://review.openstack.org/18613720:59
notmynameswift meeting in 1 minute20:59
*** ryshah has joined #openstack-swift20:59
peluseyo mattoliverau20:59
notmynamemattoliverau: not too early I hope ;-)20:59
mattoliveraumuch better!20:59
peluseMooingLemur, I have to run after the swift mtg to a sixth grade graduation but would love to help with this later if you'll be around early evening?20:59
MooingLemurpeluse: I think I'll be available.  I'm in UTC-7, and other than commute, I'll be on21:00
MooingLemurthanks :)21:01
*** acoles_away is now known as acoles21:01
*** esker has quit IRC21:01
slavisahow to participate swift meeting, is it only irc or another/additional way of communication?21:02
notmynameslavisa: it's in irc in #openstack-meeting21:03
notmynameweekly at 2100utc on wednesdays in that channel21:03
slavisathx21:03
*** slavisa has quit IRC21:04
*** cutforth has quit IRC21:04
*** slavisa has joined #openstack-swift21:04
openstackgerritSamuel Merritt proposed openstack/swift: Remove simplejson from swift-recon  https://review.openstack.org/18616921:06
openstackgerritSamuel Merritt proposed openstack/swift: Remove simplejson from staticweb  https://review.openstack.org/18617021:06
*** bkopilov_wfh has quit IRC21:13
*** Fin1te has quit IRC21:14
*** tab__ has quit IRC21:18
*** mandarine has quit IRC21:18
*** bkopilov has joined #openstack-swift21:23
*** bkopilov has quit IRC21:31
*** ryshah has quit IRC21:33
openstackgerritKota Tsuyuzaki proposed openstack/swift: Fix FakeSwift to simulate SLO  https://review.openstack.org/18594021:39
*** fthiagogv has quit IRC21:43
*** jkugel has left #openstack-swift21:47
*** bkopilov has joined #openstack-swift21:49
claygnice21:55
notmynameas i get the wiki pages/LP updated, I'll share those21:55
openstackgerritKota Tsuyuzaki proposed openstack/swift: Fix FakeSwift to simulate SLO  https://review.openstack.org/18594021:55
notmynameacoles: jrichli: also, later today or tomorrow, I'll get the crypto branch onto the review dashboard21:55
notmynameok, gotta step out for a few minutes21:56
MooingLemurI vaguely remember reading something to the effect that for EC policies, X wasn't yet implemented.  But I cannot find nor remember what this X was.  Maybe it was recovering from object node failures mid-download?21:56
*** kota_ has quit IRC21:57
acolesnotmyname: thx21:57
jrichlinotmyname: thx!21:57
*** slavisa has left #openstack-swift21:59
torgomaticMooingLemur: multi-range GET requests22:01
*** acoles is now known as acoles_away22:01
torgomaticpossibly also failures mid-download, but I think that one's in there already22:01
MooingLemurtorgomatic:  aha, that's what it was..22:03
*** joeljwright has left #openstack-swift22:03
MooingLemurmulti-range22:03
MooingLemurthanks :)22:04
torgomaticMooingLemur: it's at https://review.openstack.org/#/c/173497/ if you feel like banging on the code a bit22:04
hotorgomatic: is the patch #173497 successor of #166576?22:10
torgomaticho: yes, it's the same one but proposed to master instead of feature/ec22:10
torgomaticprobably also the odd update here and there, but nothing major22:10
hotorgomatic: i see. I can continue my review for it :-)22:12
openstackgerritMerged openstack/swift: go: check error returns part 1  https://review.openstack.org/18360522:16
*** annegentle has quit IRC22:20
openstackgerritOpenStack Proposal Bot proposed openstack/swift: Updated from global requirements  https://review.openstack.org/8873622:22
*** annegentle has joined #openstack-swift22:24
* notmyname back22:28
notmynamemattoliverau: clayg: torgomatic: acoles_away: cschwede: when you open the gerrit dashboard in your web browser, is that normally on a large screen or on your laptop screen?22:41
claygnotmyname: normally on my laptop i guess22:42
notmynameok22:43
mattoliverauAbout 50/50. If at my desk its a large screen, if cafe hacking on the laptop. Or may 60/40 big screen to laptop.22:43
torgomaticnotmyname: yep, same here22:43
torgomaticmy large screen is vertical anyway, so my laptop screen is the widest one I have22:43
notmynameok22:45
notmynameI'm actually wondering about vertical space22:45
notmynamefor the review dashboards. ie what will you see and what will you not see because of needing to scroll22:45
*** james_li has quit IRC22:45
*** annegentle has quit IRC22:46
*** kutija has quit IRC22:49
*** km has joined #openstack-swift22:50
notmynametdasilva: sorry, I didn't get to the py3 patches in the meeting. I just realized that22:51
notmynameI do want to bring it up next week22:51
* notmyname goes and adds it to the agenda22:51
tdasilvanotmyname: no worries :)22:51
tdasilvanotmyname: wondering if would be possible to add obj. versioning and copy middleware to priority reviews22:52
tdasilvawould like to get those done and out of the way22:52
notmynametdasilva: yes, that needs to be added back again. they dropped off with the ec work22:52
notmynametdasilva: ok, I starred them. can you verify?22:53
tdasilvagot it, thanks!22:54
*** torgomatic has quit IRC22:54
tdasilvanoticed in the etherpad that some other feature have a dependency on copy middleware, so that should help too :)22:54
tdasilvasummit etherpad22:54
*** jrichli has quit IRC23:03
*** chlong has joined #openstack-swift23:05
notmynameok, I've got some new dashboards. working on the shortlinks and I'll get the channel topic and wiki updated23:06
*** ChanServ changes topic to "New meeting time 2100UTC Wednesdays: https://wiki.openstack.org/wiki/Meetings/Swift | Review Dashboard: https://goo.gl/ktob5x | Project Overview: https://goo.gl/jTYWgo | Logs: http://eavesdrop.openstack.org/irclogs/%23openstack-swift/"23:08
notmynameok, channel topic and wiki page updated. new dashboard, including crypto work, is up. also slightly updated the "recently proposed" and "older open patches" sections. should include more now, I think23:13
*** wbhuber has quit IRC23:18
*** kei_yama has joined #openstack-swift23:24
notmynameas always, please let me know what's broken about the review dashboards and what you'd like to see made better23:27
sweepernotmyname: how about a newrelic plugin? :323:58
notmynameI've never used newrelic. what would that give us?23:59
sweeperI could mail you some homebrew apple cider?23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!