Wednesday, 2016-10-19

kota_	good morning	00:49
mattoliverau	kota_: morning	00:54
kota_	mattoliverau: morning	00:54
*** tqtran has quit IRC		01:00
charz	kota_: mattoliverau morning	01:06
kota_	charz: o/	01:06
zhengyin	good morning	01:06
mattoliverau	charz, zhengyin: o/	01:10
clayg	hehe - good morning everyone!	01:11
openstackgerrit	Clay Gerrard proposed openstack/swift: WIP: Make ECDiskFileReader check fragment metadata https://review.openstack.org/387655	01:12
*** ntata_ has joined #openstack-swift		01:13
clayg	i think overall the test failures are trending down - managed to get an implementation for the object server quarantine that i'm satisfied with	01:14
clayg	but I'm sort of worrying/wondering if for backports we should just have the full read quarantine for the audtior - but we'll see what happens when we start to cherry pick it i 'spose	01:14
*** ntata_ has quit IRC		01:21
*** blair has joined #openstack-swift		01:28
kota_	clayg: thanks for updating that, will look at	01:28
*** clu_ has quit IRC		01:39
openstackgerrit	Kazuhiro MIYAHARA proposed openstack/swift: Remove 'X-Static-Large-Object' from .meta files https://review.openstack.org/385412	02:12
*** chlong has joined #openstack-swift		02:16
openstackgerrit	Kota Tsuyuzaki proposed openstack/liberasurecode: Fix liberasurecode skipping a bunch of invalid_args tests https://review.openstack.org/387879	02:23
*** lcurtis has quit IRC		02:53
*** klrmn has quit IRC		03:10
*** kei_yama has quit IRC		03:16
*** rjaiswal has quit IRC		03:41
*** klrmn has joined #openstack-swift		03:42
*** tqtran has joined #openstack-swift		03:45
*** links has joined #openstack-swift		03:46
*** cshastri has joined #openstack-swift		03:50
*** tqtran has quit IRC		03:50
*** Guest29440 has quit IRC		03:53
*** klrmn has quit IRC		04:01
*** trananhkma has joined #openstack-swift		04:19
*** ppai has joined #openstack-swift		04:34
openstackgerrit	Tuan Luong-Anh proposed openstack/swift: Add prefix "$" for command examples https://review.openstack.org/388355	04:36
*** cshastri has quit IRC		04:51
*** klrmn has joined #openstack-swift		04:54
*** sure has joined #openstack-swift		04:56
*** sure is now known as Guest89668		04:56
Guest89668	hii all, I am doing "container syncronization" in same cluster for that i created my "container-sync-realms.conf" file like this http://paste.openstack.org/show/586313/	04:58
Guest89668	i created two containers and uploaded objects to one but those objects are not copied to other container	04:59
Guest89668	please some one help	04:59
*** klrmn has quit IRC		05:08
*** itlinux has quit IRC		05:09
*** raginbaj- has joined #openstack-swift		05:11
openstackgerrit	Bryan Keller proposed openstack/swift: WIP: Add notification policy and transport middleware https://review.openstack.org/388393	05:12
mattoliverau	Guest89668: so your container sync realms file is in /etc/swift/	05:16
*** SkyRocknRoll has joined #openstack-swift		05:16
Guest89668	mattoliverau: yes	05:17
mattoliverau	Guest89668: also you can remove the clustername2 line, you only need to define each cluster once (and you are only using 1 cluster) but that shouldn't be stopping anything	05:17
Guest89668	mattoliverau: here is error log http://paste.openstack.org/show/586314/	05:18
mattoliverau	hmm, so it's timing out and then on the retry its saying method not allowed. And it's a DELETE	05:25
mattoliverau	Guest89668: you have the same secret key on both containers in the sync?	05:27
Guest89668	mattoliverau: yes	05:27
Guest89668	here is my http://paste.openstack.org/show/586315/	05:28
Guest89668	container stats	05:28
mattoliverau	and just to make sure, your proxy or loadbalancer (or whatever your ip in your realms config is pointing at) is listening on port 80?	05:28
mattoliverau	cause thats what your realms config says	05:29
Guest89668	yes it is listening at port 80	05:29
*** qwertyco has joined #openstack-swift		05:34
mattoliverau	Guest89668: is the endpoint to your cluster (that's listening on port 80) as swift proxy? a load balancer. Just trying to figure out why the request is 405'ed	05:43
mattoliverau	and where is container_sync on the proxy pipeline?	05:43
Guest89668	my swift endpoint is " http://192.168.2.187:8080/v1/AUTH_%(tenant_id)s"	05:44
mattoliverau	oh so theyre listening on port 8080 not port 80 or do you have a load balancer listening on 80?	05:45
*** ChubYann has quit IRC		05:45
Guest89668	mattoliverau: no	05:45
mattoliverau	Guest89668: if not try changeing your end points in the realm to: http://192.168.2.187:8080/v1/	05:45
mattoliverau	Guest89668: looks like the container sync daemon is trying to update whatever is listening on port 80, maybe a webserver	05:46
Guest89668	mattoliverau: just now i changed and tried again	05:46
Guest89668	but now also same result but in log that ERROR was gone	05:47
mattoliverau	also I mentioned before you only need to specify a single cluster if you have a single cluster, so if you remove the second you'll have to update container metadata that points to cluser2 to point to cluster 1	05:47
mattoliverau	same result as in no objects?	05:47
mattoliverau	have you waited or reran the container-sync?	05:48
Guest89668	yes i reran container-sync	05:49
mattoliverau	if your on the container server in question (a container server that serves ars a primary for the container in question) you can stop container-sync and force it to run manually with: swift-init container-sync once	05:49
Guest89668	here is my new relam file http://paste.openstack.org/show/586317/	05:49
mattoliverau	if your using swift-init	05:50
mattoliverau	Guest89668: that looks right (the :8080)	05:51
Guest89668	and my proxy-server.conf http://paste.openstack.org/show/586316/	05:51
mattoliverau	Guest89668: cool, container sync is before auth	05:52
mattoliverau	Guest89668: is the container-sync logging anything? it should log something, even if its just saying its running or warning about internal client using default	05:53
Guest89668	here is that log http://paste.openstack.org/show/586318/	05:54
*** klrmn has joined #openstack-swift		05:54
*** klrmn has quit IRC		05:56
mattoliverau	hmm, yeah ok, thats a normal message, but means container sync is running.	05:57
mattoliverau	Guest89668: now that we have the ports right, how about you put another object in a container.. just in case container sync thinks it's uptodate	05:58
mattoliverau	cause it isn't erroring	05:58
Guest89668	mattoliverau: i deleted both the containers and created again but still same result	05:59
mattoliverau	Guest89668: whats your container sync interval? have you set one in the config? if not by default its 300 seconds	05:59
mattoliverau	or 5 mins	06:00
mattoliverau	Guest89668: and your container server can access your proxy servers (via the IP you specified) because thats where container sync is running from	06:01
Guest89668	mattoliverau: i am using single node swift (proxy+storage in same node)	06:03
mattoliverau	oh ok	06:03
Guest89668	and how to set container sync intervel	06:03
mattoliverau	Guest89668: in your container-server config(s) there should be a section for container-sync. Under that heading you can specify an interval by adding:	06:04
mattoliverau	interval = <number>	06:05
mattoliverau	while your in there, you can turn up the logging verbosity for just the container sync daemon, but adding to the same container-sync section:	06:05
mattoliverau	log_level = DEBUG	06:05
mattoliverau	then restart the container sync daemon	06:06
mattoliverau	and hopefully it'll log more and it might tell us what's going on	06:06
*** trananhkma has quit IRC		06:07
*** trananhkma has joined #openstack-swift		06:07
*** trananhkma has quit IRC		06:08
*** trananhkma has joined #openstack-swift		06:08
*** rcernin has joined #openstack-swift		06:11
Guest89668	mattoliverau: i added both parameters but same result	06:12
Guest89668	i didnt find any extra log	06:13
mattoliverau	Guest89668: its seems that container-sync isn't finding objects to sync.	06:18
mattoliverau	hmm weird, what could we be missing	06:22
clayg	busy busy	06:31
Guest89668	mattoliverau: then how to debug this	06:33
mattoliverau	Guest89668: any logs matching the time the container sync ran in the proxy server logs (the other side of the container sync transaction)>	06:35
mattoliverau	?	06:35
Guest89668	mattoliverau: just now i uploaded one object to the container1 here is the log http://paste.openstack.org/show/586321/	06:38
*** eranrom has joined #openstack-swift		06:40
mattoliverau	Guest89668: line 12 says your are getting a container-sync log error and your getting a 404 not found	06:40
mattoliverau	so double check your container sync paths, and make sure you can access the proxy at the ip you specify in the realms config	06:41
Guest89668	the file "openstack" what i deleted	06:41
*** winggundamth has quit IRC		06:42
Guest89668	at first time creation of containers i uploaded object called openstck	06:42
mattoliverau	it doesn't seem the log level change has taken effect because you should see alot more.	06:42
Guest89668	mattoliverau: after that i deleted two containers and again i created those	06:43
Guest89668	i have given log_level = DEBUG	06:44
Guest89668	it is correct	06:44
Guest89668	?	06:44
mattoliverau	yeah, and did you restart container-sync? Also I don't see your proxy log as apart of that.	06:45
mattoliverau	clayg: your still up!	06:45
*** winggundamth has joined #openstack-swift		06:45
Guest89668	mattoliverau: yes i restarted	06:50
*** silor has joined #openstack-swift		06:51
*** hseipp has joined #openstack-swift		06:52
onovy	clayg: no. i shutdowned it, spiked up. after power on, spiked down (and some time for sync of missing data). it was off for ~1 hour	07:00
onovy	"down" = value before shutdown, but still higher than before upgrade	07:02
*** tesseract has joined #openstack-swift		07:08
*** tesseract is now known as Guest90211		07:09
*** qwertyco has quit IRC		07:11
clayg	handoffs first?	07:12
clayg	i think there's a warning emitted if you have it turned out - but the behavior changed at some point	07:13
clayg	onovy: 01410129dac6903ce7f486997a48e36072fa0401 first appeared in 2.7 tag	07:14
*** silor has quit IRC		07:18
*** rledisez has joined #openstack-swift		07:24
*** joeljwright has joined #openstack-swift		07:34
*** ChanServ sets mode: +v joeljwright		07:34
*** trananhkma has quit IRC		07:37
*** _JZ_ has quit IRC		07:40
*** geaaru has joined #openstack-swift		07:46
*** tqtran has joined #openstack-swift		07:49
*** amoralej\|off is now known as amoralej		07:52
onovy	clayg: # handoffs_first = False	07:53
onovy	so commented out default	07:53
*** tqtran has quit IRC		07:53
onovy	clayg: btw: s/and some time for sync/after some time for sync/	07:56
onovy	don't understand why rsync metric jumped up after one node shutdown. no reason to sync anything, because handoff are used only when there is disk failure/umount, not whole server failure, right?	07:57
clayg	onovy: incoming writes/deletes will go to handoff while node is down - and i think handoffs_first would spin while waiting for the node to come back up - so it could have explained the change - oh well	08:06
clayg	onovy: I still don't understand what sort of magic you're applying to make that recon drop show up in a graph like that - that metric - and all of rsync metrics are dropped at the end of a cycle - and overwritten by the next cycle	08:07
clayg	IME the cycle while there is real part movement going on (rsync's) is much longer then the cycle of a few suffix rehashes	08:08
clayg	i kept loosing my interesting numbers because I didn't want to spin in a tight loop collection the same number over and over just to find an interesting edge	08:09
clayg	not to mention that the number only got spit out after the fact - so it gave me no insight into what was going on right now	08:09
onovy	clayg: but i don't have enabled handoffs_first, so i don't think it explains it	08:10
clayg	so I only track the statsd stuff from the repliator and the finish_time	08:10
onovy	at graph. Every 5 minutes i GET all stores and process that json replies	08:10
onovy	so i don't see "edges" when number is reset, but only numbers "every 5 minutes"	08:10
clayg	and doesn't that give you the same number lots of times? like even in a stable cluster with enough nodes i report my cycle time ~20 mins for the whole cluster (it could be smaller if a reporter gets bad timing cycle takes 15 he reports at 14 5 mins later says it finished in 20, etc)	08:12
clayg	when i have a rebalance going and real weight needs to move the cycle time is ... much longer ;)	08:12
onovy	yep, it does	08:12
onovy	it's not perfect metric, i know :)	08:12
clayg	i can't really see that come through in the graphs your sending? is it just too scaled out?	08:12
onovy	graph4?	08:13
clayg	of the spike?	08:13
onovy	jop, that's scaled out	08:13
onovy	mmnt	08:13
onovy	jop=yes :)	08:13
onovy	https://s15.postimg.org/559djmgxn/graph5.png zoom in	08:14
onovy	"max" zoom https://s17.postimg.org/5e0m1ji7j/graph6.png	08:15
*** rcernin has quit IRC		08:16
*** rcernin has joined #openstack-swift		08:16
*** rcernin has quit IRC		08:18
*** rcernin has joined #openstack-swift		08:19
*** rcernin has quit IRC		08:19
*** rcernin has joined #openstack-swift		08:19
clayg	ok, so maybe it's not an imperfect proxy - i use a statsd metric suffix.syncs which happens around the same time as rsync getting incremented but only for primary partitions in update	08:21
clayg	it'd be great if those rsync's were broken out by primary sync with peer or handoff sync to delete	08:22
clayg	onovy: it doesn't make much sense to me that it would climb like that - even if a bunch of suffixes were invalid and also out of sync - why wouldn't one pass fix most of them?	08:23
clayg	any chance some of the rsync's are failing? max connections limit or something? I think a "failure" number comes our gith next to rsyncs?	08:23
onovy	yep, many failures	08:26
onovy	i have connection limits in rsync to prevent overload drivers	08:27
onovy	https://s12.postimg.org/floivqq59/graph_failure.png	08:27
onovy	btw: we are going to change this to statsd, but i just "joined" our really old monitoring with swift using few lines of python code :)	08:28
onovy	rsyncd.conf: max connections = 64 for objects	08:32
onovy	for 24 disks per server	08:32
onovy	and "concurrency: 4" for object-replicator	08:33
*** x1fhh9zh has joined #openstack-swift		08:33
onovy	s/drivers/of drives/	08:33
*** x1fhh9zh has quit IRC		08:48
*** x1fhh9zh has joined #openstack-swift		08:50
Guest89668	mattoliverau: my problem resolved and it is syncing objects without any issue	08:51
mattoliverau	Great what did we miss? Sorry was called away	09:02
Guest89668	mattoliverau: the actual issue with the swift endpoint what i mentioned in the relams file after your intimation i changed and restarted services but i didnt checked the other container whether the objects are copied or not	09:04
Guest89668	now it is working fine and syncing the objects with the time intervel what i mentioned in container-server.conf	09:05
mattoliverau	\o/ nice work!	09:06
Guest89668	mattoliverau: you helped a lot to debug this issue	09:06
Guest89668	thanks again...!!	09:07
*** links has quit IRC		09:09
clayg	ok, well at least that explains how it's able to cycle so fast	09:15
clayg	onovy: i'm really lovin' on the rsync module per disk configuration - my rsync.conf has a few more lines in it - but the per drive rsync connection limits are really nice	09:17
clayg	so do that, and the statsd, and 10 million other things, and ... ;)	09:17
clayg	wtg mattoliverau and Guest89668 !!!!	09:18
clayg	wooooo!!!	09:18
onovy	clayg: we are using salt for deploy, so i can generate rsyncd.conf automagically	09:23
onovy	but need to fix this first use and finish upgrade first :]	09:24
patchbot	Error: Spurious "]". You may want to quote your arguments with double quotes in order to prevent extra brackets from being evaluated as nested commands.	09:24
onovy	*first issue	09:26
onovy	clayg: do you think it's safe to downgrade that one swift store node?	09:29
clayg	i'd have to look over the change log 2.5 -> 2.7	09:35
clayg	notmyname tries to highlight stuff that can't be backed out of - and we try to avoid stuff that can't be backed out of	09:36
clayg	... but it's still not something folks do very much - I don't personally have a lot of experience with it	09:37
clayg	maybe we ~broke something with suffix hashing between versions	09:38
clayg	i'm not sure that would explain the reboot tho - unless that many suffixes really got invalidated	09:39
clayg	the rsyncs could be firing and doing nothing - not really sending data (some probably still do) but - maybe - the majority of the delta is rsyncs that are finding the directories already have the same files	09:40
clayg	i sorta remember something with fast post because of ssync and ctype timestamps - we had fix something in suffix hashing	09:41
clayg	but I thought we decided it was backwards compatible	09:41
clayg	onovy: do you use fast-past? do you have .meta files in your cluster?	09:42
onovy	# object_post_as_copy = true	10:03
onovy	and no meta files	10:04
onovy	btw: do you have 2.7.0 in production already?	10:04
*** mvk has quit IRC		10:14
openstackgerrit	Stefan Majewsky proposed openstack/swift: swift-recon-cron: do not get confused by files in /srv/node https://review.openstack.org/388029	10:14
*** zhengyin has quit IRC		10:39
*** mvk has joined #openstack-swift		10:43
*** Guest89668 has quit IRC		10:56
onovy	clayg: https://github.com/openstack/swift/commit/2d55960a221c9934680053873bf1355c4690bb19 this is that patch about 'ssync' vs. suffix hashing?	11:00
onovy	cite: in most	11:00
onovy	'normal' situations the result of the hashing is the same	11:00
onovy	as before this patch. That avoids a storm of hash mismatches	11:00
onovy	when this patch is deployed in an existing cluster.	11:00
*** hseipp has quit IRC		11:02
*** ppai has quit IRC		11:04
*** x1fhh9zh has quit IRC		11:06
onovy	+ https://github.com/openstack/swift/commit/9db7391e55e069d82f780c4372ffa32ef4e79c35 this patch makes downgrades harder	11:07
*** cdelatte has joined #openstack-swift		11:23
*** x1fhh9zh has joined #openstack-swift		11:48
*** tqtran has joined #openstack-swift		11:50
*** tqtran has quit IRC		11:55
*** klamath has joined #openstack-swift		12:02
openstackgerrit	Kota Tsuyuzaki proposed openstack/swift: Items to consider for ECObjectAuditor https://review.openstack.org/388648	12:03
*** links has joined #openstack-swift		12:20
*** SkyRocknRoll has quit IRC		12:25
openstackgerrit	Shashi proposed openstack/python-swiftclient: Enable code coverage report in console output https://review.openstack.org/388669	12:30
kota_	acoles: I updated my thought to patch 387655.	12:51
patchbot	https://review.openstack.org/#/c/387655/ - swift - WIP: Make ECDiskFileReader check fragment metadata	12:51
kota_	and clayg:^^	12:51
kota_	basically the way we are going with patch 387655 seems ok.	12:51
patchbot	https://review.openstack.org/#/c/387655/ - swift - WIP: Make ECDiskFileReader check fragment metadata	12:51
*** amoralej is now known as amoralej\|lunch		12:52
kota_	that one works to detect all frag archives given from admin6 as corrupted (this is awesome!)	12:52
kota_	but i found some cornar cases we cannot detect or ability to quarantine a good frag archives.	12:53
acoles	kota_: ack	12:53
kota_	acoles, clayg: hopefully, i was just a worrier but i think it can happen so I'd like to hear your opinions for that.	12:53
kota_	acoles:!!	12:54
acoles	kota_: worriers make good reviewers !	12:54
kota_	sorry, i have to leave my office asap	12:54
kota_	that is going to be closed.	12:54
acoles	kota_: just looking at yours and clayg changes	12:54
admin6	kota_: that sounds good :-)	12:54
acoles	kota_: ok have a good night leave it with me	12:54
kota_	acoles: thanks man, and if you make comments (either gerrit, irc, etc...), i will take a look wherenever.	12:55
* acoles worries kota may be locked in office all night		12:55
onovy	clayg: upgraded second node	12:56
*** links has quit IRC		12:59
*** Jeffrey4l has quit IRC		12:59
*** remix_tj has quit IRC		13:01
*** remix_tj has joined #openstack-swift		13:01
*** jordanP has joined #openstack-swift		13:04
*** StevenK has quit IRC		13:09
*** StevenK has joined #openstack-swift		13:15
*** mvk has quit IRC		13:50
*** mvk has joined #openstack-swift		13:53
*** amoralej\|lunch is now known as amoralej		13:57
*** vinsh has quit IRC		14:07
*** silor has joined #openstack-swift		14:11
onovy	clayg: so new info: after second node upgrade, rsync metrics bumped up again. and i done test in our test env. If I have 1/2 nodes of 2.5.0 and 1/2 of 2.7.0, rsync metrics is higher than if i have all nodes on same version	14:11
onovy	so i think there is hashes compare incompatibility between 2.5.0 and 2.7.0	14:12
onovy	and it's not only about rsync metrics, rsync cmd is really called much more	14:12
onovy	https://s14.postimg.org/tboppflq9/graph_2_nodes.png // rsync metrics graph	14:14
*** jordanP has quit IRC		14:15
*** x1fhh9zh has quit IRC		14:18
*** hseipp has joined #openstack-swift		14:24
*** sgundur has joined #openstack-swift		14:28
*** jistr is now known as jistr\|call		14:28
*** silor1 has joined #openstack-swift		14:31
*** silor has quit IRC		14:32
*** silor1 is now known as silor		14:32
tdasilva	rledisez, acoles, onovy: what's the best practice for your clouds re object-expirer? do you typically run on storage nodes or proxy nodes. doesn't seem like there's good consensus, so I proposed patch 388185	14:37
patchbot	https://review.openstack.org/#/c/388185/ - swift - added expirer service to list	14:37
*** sgundur has quit IRC		14:39
tdasilva	ahale: ^^^	14:39
*** sgundur has joined #openstack-swift		14:43
*** vinsh has joined #openstack-swift		14:43
rledisez	tdasilva: for now, we run on proxy node because we don't have real scaling issues with the expirer. The rare situation were we had problem, we just increased concurrency and it was enough for us. but i guess it depends on how much object you have to expire. we expire between 1M and 1.5M objects every day and have no negative feebacks	14:51
rledisez	would be nice to have some metrics about how many expired objects are waiting to be effectively expired	14:51
rledisez	querying the containers of the special account .expired-objects (or whatever is its name)	14:52
rledisez	tdasilva: what are you calling storage node on your patch? object or account/container?	14:53
rledisez	i'm affraid that if it runs on object servers there will be too much requests on the container servers, taking down the entire clusters (it already happend to us with a homemade process that was querying containers from object servers)	14:54
rledisez	memcache would be a requirement then	14:55
*** vinsh has quit IRC		14:55
*** vinsh_ has joined #openstack-swift		14:55
*** vinsh has joined #openstack-swift		14:56
*** klrmn has joined #openstack-swift		14:58
*** vinsh_ has quit IRC		15:00
*** sgundur has quit IRC		15:00
*** hseipp has quit IRC		15:00
hurricanerix	tdasilva I am going to try and get this updated over the ocata cycle: https://review.openstack.org/#/c/252085/	15:10
patchbot	patch 252085 - swift - Refactoring the expiring objects feature	15:10
*** jistr\|call is now known as jistr		15:12
*** Guest90211 has quit IRC		15:13
*** pcaruana has quit IRC		15:14
tdasilva	rledisez: honestly i was calling storage node anything but proxy. typically we don't separate aco nodes, but i understand if you guys do	15:15
tdasilva	hurricanerix: cool, are you planning to do that on the golang code?	15:16
*** rcernin has quit IRC		15:16
hurricanerix	tdasilva not sure yet, since there is already a POC mostly done, i was just going to rebase it to get it up to master and verify that it does not break anything.	15:17
tdasilva	hurricanerix: got it	15:19
hurricanerix	tdasilva i think it also needs some more documentation, like a deployment/rollback strategy, since this will likely need to be done in phases.	15:19
glange	tdasilva: the object expirier stuff as written can cause problems with heavy usage	15:19
glange	tdasilva: besides getting behind, it can result in a large number of asyncs	15:19
tdasilva	glange: yeah, i remember dfg talking about that in tokyo	15:19
glange	tdasilva: for really heavy usage, we need a rewrite either like the one alan did or something similar	15:20
tdasilva	glange: do you guys also currently run on the proxy nodes?	15:20
glange	tdasilva: we are only keeping up in some of our clusters because we run a hacked up version of the code	15:20
glange	each of our clusters have a few extra systems that are used for various things	15:21
glange	we run the expirer there	15:21
tdasilva	glange: oh, interesting, neat	15:21
glange	these extra boxes do log processing and some other stuff	15:21
glange	we have a few customers that heavily use that feature :/	15:22
glange	it doesn't scale very well as written :)	15:22
glange	and we give the developer who wrote that feature (he sits nearby) crap about it from time to time :)	15:23
tdasilva	glange: hehehe	15:26
acoles	clayg: fyi I am working on fixing the ssync tests in patch 387655	15:28
patchbot	https://review.openstack.org/#/c/387655/ - swift - WIP: Make ECDiskFileReader check fragment metadata	15:28
acoles	clayg: back later	15:29
*** hoonetorg has quit IRC		15:29
*** acoles is now known as acoles_		15:29
*** sgundur has joined #openstack-swift		15:36
*** jistr is now known as jistr\|biab		15:39
onovy	tdasilva: hi. we run expirer on 1-4 nodes in every region	15:39
onovy	i mean 1. - 4. storage nodes	15:39
onovy	and every dones 1/4 of expiring	15:40
onovy	*does	15:40
onovy	so: processes=4, process=0 on first storage node, =1 on second, etc.	15:41
onovy	same in both region. so if one region if off, we still expire objects	15:42
onovy	in first version we had expirer on all nodes which processes=0, but there was many errors in long. expirer was trying to delete object which was just deleted i few seconds before by another one expirer	15:42
onovy	*one region is off	15:43
*** hseipp has joined #openstack-swift		15:43
onovy	tdasilva: we have aco on same servers => storage nodes. p is separated	15:43
onovy	and a+c is on SSD, o on rotational disk	15:43
notmyname	good morning	15:45
onovy	+ we have ~ x0-x00 expiring per seconds and x000 of async in queue :)	15:45
*** links has joined #openstack-swift		15:46
rledisez	tdasilva: fyi, we used to do pac / o, we are now mmoving to p / ac / o	15:48
onovy	rledisez: hi. what's your reason for separating ac from o pls?	15:49
*** tqtran has joined #openstack-swift		15:51
rledisez	performance. o are slow rotational devices while ac are fast SSD. and also number	15:52
rledisez	onovy: ^	15:52
onovy	ah. we have 1 SSD per storage node and 23 rotation disk	15:53
onovy	ac are on one SSD, o are on 23 rotation disk	15:53
rledisez	onovy: makes sense, but it would cost too much for us. we have thousands of object servers, we only need 100 or 200 SSD for ac servers	15:54
onovy	ah, right. we have ~16 stores per region now :)	15:54
notmyname	rledisez: onovy: I'd definitely appreciate it if you can help update https://etherpad.openstack.org/p/BCN-ops-swift for next week	15:56
onovy	notmyname: but i'm not op :)	15:56
onovy	i will forward it to our ops	15:56
*** tqtran has quit IRC		15:56
rledisez	notmyname: thx for the reminder, i wrote down some topics I had in mind, will try to think more :)	16:00
*** jistr\|biab is now known as jistr		16:05
*** admin6_ has joined #openstack-swift		16:07
notmyname	thanks :-)	16:08
*** klrmn has quit IRC		16:08
onovy	notmyname: is there any deadline for that etherpad?	16:09
*** admin6 has quit IRC		16:10
*** admin6_ is now known as admin6		16:10
notmyname	onovy: i put a link to the agenda item in there. that's the deadline. when the session starts	16:10
*** ChubYann has joined #openstack-swift		16:11
notmyname	cschwede: around?	16:12
openstackgerrit	John Dickinson proposed openstack/swift: use the new upper constraints infra features https://review.openstack.org/354291	16:19
*** sgundur has quit IRC		16:30
*** rledisez has quit IRC		16:31
*** links has quit IRC		16:31
*** sgundur has joined #openstack-swift		16:36
onovy	notmyname: ok, thanks, forwarded :]	16:37
patchbot	Error: Spurious "]". You may want to quote your arguments with double quotes in order to prevent extra brackets from being evaluated as nested commands.	16:37
onovy	clayg: https://bugs.launchpad.net/swift/+bug/1634967	16:41
openstack	Launchpad bug 1634967 in OpenStack Object Storage (swift) "2.5.0 -> 2.7.0 upgrade problem with object-replicator" [Undecided,New]	16:41
*** pcaruana has joined #openstack-swift		16:51
clayg	onovy: sigh (on fast-post suffix hashing change) - i'm running out of ideas!	16:51
onovy	i think it must be related to suffix hashing change	16:52
onovy	i read whole git log 2.5.0..2.7.0 and only this seems related	16:52
onovy	good news is i can reproduce it in lab	16:53
onovy	and i think everybody can :)	16:53
clayg	onovy: but - i'm still not sure that the spike isn't just because of 2.5 <=> 2.7	16:54
onovy	i'm sure it's problem with "version hybrid cloud"	16:55
clayg	it's not like all our >= 2.7 clusters got a 10x increase in rsync traffic and no one noticed	16:55
onovy	if whole cluster have same version (2.7 or 2.5) problem disappear	16:55
clayg	maybe we saw the same bumps while upgrading but didn't notice	16:55
onovy	look to bug :)	16:55
onovy	if i have 2x 2.5.0 + 2x 2.7.0 in lab, i have big spike	16:56
clayg	ok, so ... it probably was something in suffix hashing between 2.5 and 2.7	16:56
onovy	when i downgrade or upgrade whole cluster to same version, spike disappear	16:56
onovy	(after few tens of minutes)	16:56
onovy	yep	16:56
onovy	i think so	16:56
onovy	maybe it's just "feature", but we should document it than	16:56
onovy	and maybe recommend to shutdown replicator during upgrade process	16:57
onovy	because it can overload cluster imho	16:57
*** Jeffrey4l has joined #openstack-swift		16:58
clayg	I'm looking @ https://review.openstack.org/#/c/267788/ - but i made a note in the review that when I had it all loaded in my head I thought the hashes would always be the same	16:59
patchbot	patch 267788 - swift - Fix inconsistent suffix hashes after ssync of tomb... (MERGED)	16:59
clayg	maybe you could poke at the REPLICATE api with curl or do some debug logging to find out if one of your parts on 2.7 code has a different result in hashes.pkl than a 2.5 node for the same part?	17:00
onovy	can you try in your lab (with your config) reproduce it?	17:01
onovy	just install few 2.5.0 nodes and upgrade few of them to 2.7.0	17:02
onovy	we can confirm it's not "my setup" problem	17:02
clayg	onovy: not this week I can't! ;)	17:02
onovy	:]	17:02
patchbot	Error: Spurious "]". You may want to quote your arguments with double quotes in order to prevent extra brackets from being evaluated as nested commands.	17:02
clayg	trying to get ready for barca and fix some bugs :)	17:02
onovy	clayg: do you have 2.7.0 in production already btw?	17:02
clayg	onovy: this is our latest tag -> https://github.com/swiftstack/swift/tree/ss-release-2.9.0.2	17:03
clayg	we have lots of folks that have upgraded to 2.9, some are still on ... much older releases	17:03
*** amoralej is now known as amoralej\|off		17:04
onovy	ok	17:04
onovy	clayg: what about: https://review.openstack.org/#/c/387591/ ?	17:04
patchbot	patch 387591 - swift - Set owner of drive-audit recon cache to swift user	17:04
*** klrmn has joined #openstack-swift		17:05
onovy	zaitcev: torgomatic: ^ can you look too pls?	17:06
*** tqtran has joined #openstack-swift		17:09
onovy	clayg: thanks	17:09
*** joeljwright has quit IRC		17:10
clayg	onovy: do you still have a mixed environment in play - or is everything upgraded to 2.7 now?	17:12
onovy	clayg: in dev i have anything. in production i have 2 nodes on 2.7.0, and other on 2.5.0	17:13
clayg	onovy: well would you confirm/deny my supsicion about mis-mashed suffix hashing? https://bugs.launchpad.net/swift/+bug/1550563	17:13
openstack	Launchpad bug 1550563 in OpenStack Object Storage (swift) "need a devops tool for inspecting object server hashes" [Wishlist,New]	17:13
zaitcev	What about "patch add(s)" :-)	17:14
clayg	zaitcev: fix it	17:15
onovy	clayg: so i should run this? https://gist.github.com/clayg/035dc3b722b7f89cce66520dde285c9a	17:15
onovy	on 2.7.0 or 2.5.0 node?	17:15
clayg	it uses the ring to talk to primary nodes about parts - so ideally you would find a partition that is on a 2.5 and 2.7 node	17:16
clayg	hopefully you could identify such a part from the logs on the node with the high volume rsync's	17:16
openstackgerrit	Pete Zaitcev proposed openstack/swift: Set owner of drive-audit recon cache to swift user https://review.openstack.org/387591	17:16
zaitcev	your wish is my command	17:17
onovy	clayg: i have 4 nodes, 3 replicas and 2 nodes on 2.5.0 and 2 nodes on 2.7.0	17:17
onovy	every partition is on 2.5.0 and 2.7.0 node	17:17
onovy	clayg: really looong output	17:18
onovy	sdn-swift-store1.test 6000 hd7-500G	17:18
onovy	{'9fd': '282d14b6c9f3ccc447ac1f387d9c9c60', '9fe': 'bcf1431d13d69ba1123d7504216787bb',	17:18
onovy	something like this	17:18
onovy	sdn-swift-store3.test 6000 hd3-500G '9fd': '282d14b6c9f3ccc447ac1f387d9c9c60'	17:20
onovy	sdn-swift-store1.test 6000 hd7-500G '9fd': '282d14b6c9f3ccc447ac1f387d9c9c60'	17:20
onovy	so same hash... :/	17:20
*** acoles_ is now known as acoles		17:22
acoles	clayg: onovy IDK if its relevant or helpful but we do have a direct client method to get hashes from an object server https://github.com/openstack/swift/blob/0d41b2326009c470f41f365c508e473ebdacb11c/swift/common/direct_client.py#L484-L484	17:30
*** mvk has quit IRC		17:31
onovy	i'm trying to edit clay's script to compare hashes across servers, almost done	17:31
onovy	running over all partitions now...	17:32
acoles	k, i was just scan-reading backlog, ignore me ;)	17:32
onovy	:)	17:33
*** sgundur has quit IRC		17:39
*** sgundur has joined #openstack-swift		17:40
onovy	zaitcev: thanks for review and fix	17:41
*** mvk has joined #openstack-swift		18:02
*** klrmn1 has joined #openstack-swift		18:07
*** klrmn has quit IRC		18:07
openstackgerrit	Ondřej Nový proposed openstack/swift: Fixed rysnc -> rsync typo https://review.openstack.org/388843	18:17
*** sgundur has quit IRC		18:18
*** geaaru has quit IRC		18:19
onovy	tdasilva: we are from Czech republic, not Canada :P	18:35
tdasilva	onovy: i knew that, did i mis-spell something? :(	18:36
tdasilva	oops, seznam.ca sorry	18:36
onovy	:)))	18:36
tdasilva	i meant cz	18:36
onovy	clayg: thanks for pointing to rsync_module	18:40
onovy	onovy@jupiter~/tmp/salt-state (rsync_module) $ git show \| wc -l	18:40
onovy	124	18:40
onovy	i love salt => ready to deploy :)	18:40
*** vinsh has quit IRC		19:00
*** charz has quit IRC		19:08
*** mlanner has quit IRC		19:09
*** hugokuo has quit IRC		19:09
*** sgundur has joined #openstack-swift		19:09
*** alpha_ori has quit IRC		19:09
*** treyd has quit IRC		19:10
*** ctennis has quit IRC		19:11
*** zackmdavis has quit IRC		19:12
*** charz has joined #openstack-swift		19:12
*** acorwin has quit IRC		19:12
*** swifterdarrell has quit IRC		19:12
*** bobby2_ has quit IRC		19:12
*** hugokuo has joined #openstack-swift		19:12
*** timburke has quit IRC		19:14
*** sgundur has quit IRC		19:15
*** balajir has quit IRC		19:16
*** charz has quit IRC		19:17
*** treyd has joined #openstack-swift		19:17
*** mlanner has joined #openstack-swift		19:18
*** bobby2 has joined #openstack-swift		19:18
*** swifterdarrell has joined #openstack-swift		19:19
*** ChanServ sets mode: +v swifterdarrell		19:19
acoles	notmyname: are we meeting today?	19:19
*** balajir has joined #openstack-swift		19:19
*** alpha_ori has joined #openstack-swift		19:20
*** acorwin has joined #openstack-swift		19:20
*** zackmdavis has joined #openstack-swift		19:21
*** ctennis has joined #openstack-swift		19:22
*** timburke has joined #openstack-swift		19:22
*** ChanServ sets mode: +v timburke		19:22
*** charz has joined #openstack-swift		19:23
*** sgundur has joined #openstack-swift		19:25
notmyname	acoles: yes. need to go over backports and big bugs and any questions about the summit. I should have the work sessions scheduled by then	19:27
acoles	notmyname: k, thanks	19:28
*** joeljwright has joined #openstack-swift		19:31
*** ChanServ sets mode: +v joeljwright		19:32
openstackgerrit	Alistair Coles proposed openstack/swift: WIP: Make ECDiskFileReader check fragment metadata https://review.openstack.org/387655	19:35
*** hseipp has quit IRC		19:35
acoles	clayg: ^^ kota_ fixed failing ssync tests, proxy tests still to do plus kota's suggestion in dependent patch	19:35
acoles	back for meeting	19:36
*** joeljwright has quit IRC		19:37
*** acoles is now known as acoles_		19:39
clayg	yay!	19:48
*** pcaruana has quit IRC		19:50
clayg	i think i understand the ssync test fixes sort of?	19:52
*** joeljwright has joined #openstack-swift		19:59
*** ChanServ sets mode: +v joeljwright		19:59
*** silor has quit IRC		20:04
*** nikivi has joined #openstack-swift		20:04
*** sn0v has joined #openstack-swift		20:06
*** sn0v has left #openstack-swift		20:06
*** joeljwright has quit IRC		20:11
*** joeljwright has joined #openstack-swift		20:11
*** ChanServ sets mode: +v joeljwright		20:11
*** hoonetorg has joined #openstack-swift		20:19
*** nikivi has quit IRC		20:21
*** chsc has joined #openstack-swift		20:33
*** chsc has joined #openstack-swift		20:33
openstackgerrit	Shashirekha Gundur proposed openstack/swift: Invalidate cached tokens api https://review.openstack.org/370319	20:34
mattoliverau	Morning	20:36
joeljwright	morning	20:37
joeljwright	:)	20:37
*** sgundur has quit IRC		20:52
kota_	good morning	20:55
kota_	acoles: thanks for working that. I'm getting another thought on a part of my concerns in the last night, will update my comment.	20:59
*** acoles_ is now known as acoles		20:59
notmyname	meeting time in #openstack-meeting	20:59
*** sgundur has joined #openstack-swift		20:59
*** mmotiani_ has joined #openstack-swift		20:59
acoles	kota_: we definitely need to change the exceptions as you suggested, I just didn't get time to do that	21:00
*** vint_bra has joined #openstack-swift		21:12
*** vint_bra has left #openstack-swift		21:12
*** m_kazuhiro has joined #openstack-swift		21:21
*** Jeffrey4l has quit IRC		21:34
acoles	tdasilva: thanks for +2 on the reconstructor patch!	21:42
*** m_kazuhiro has quit IRC		21:44
*** mmotiani_ has quit IRC		21:52
*** nikivi has joined #openstack-swift		21:54
*** sgundur has quit IRC		21:54
*** acoles is now known as acoles_		21:57
*** nikivi has quit IRC		22:14
*** klamath has quit IRC		22:32
*** _JZ_ has joined #openstack-swift		22:37
*** jmunsch has joined #openstack-swift		22:40
*** vint_bra has joined #openstack-swift		22:47
*** joeljwright has quit IRC		22:48
*** vint_bra has left #openstack-swift		22:48
jmunsch	anyone able to verify my previous messages exist?	22:50
jmunsch	hello. How to view the X-Delete-After and X-Delete-At meta data from an object, or where in the code should i look more specifically. i have been looking through the http://git.openstack.org/cgit/openstack/deb-swift/tree/swift/obj/server.py trying to figure out how the .expiring_objects gets set, and looking to see how it gets read for GET responses. I have looked at these related links:	22:55
jmunsch	http://docs.openstack.org/developer/swift/overview_expiring_objects.html http://developer.openstack.org/api-ref/object-storage/ http://www.gossamer-threads.com/lists/openstack/dev/31872 https://blog.rackspace.com/rackspace-cloud-files-how-to-use-expiring-objects-api-functionality http://git.openstack.org/cgit/openstack/deb-swift/tree/api-ref/source/storage-object-services.inc http://git.openstack.org/cgit/openstack/deb-swif	22:55
notmyname	jmunsch: you're wanting to view the data on an existing object?	22:56
jmunsch	notmyname: the meta data	22:56
jmunsch	For example I have done something like this:	22:57
*** vint_bra has joined #openstack-swift		22:57
jmunsch	object_headers.update({'X-Delete-After': '2592000'}) # seconds in 90 days	22:57
*** gyee has joined #openstack-swift		22:57
jmunsch	On a PUT	22:58
notmyname	ok	22:59
zaitcev	guys guys guys. Where is PyECLib's upstream nowadays, https://github.com/openstack/pyeclib/ ?	23:00
mattoliverau	zaitcev: yup it's apart of OpenStack namespace now	23:00
zaitcev	mattoliverau: that explains why I could not find 1.3.1	23:01
notmyname	jmunsch: ok, so what do you want to find now that you've done the PUT?	23:01
jmunsch	notmyname: a key value response with a GET or `swift stat\|list` indicating that the created object has had the expiry set	23:09
notmyname	jmunsch: ok, so `swift stat <container> <object>` will show that, as will a direct HEAD or GET request to the object	23:10
notmyname	x-delete-after gets translated into x-delete-at an an absolute time	23:11
notmyname	jmunsch: eg https://gist.github.com/notmyname/3aa5f7f6d6b6e6c76e4499061df7fcc0	23:12
mattoliverau	jmunsch: the updating of the expiring objects container is done in the proxy on a put or post. As notmyname mentioned this is also where x-delete-after is tralated into x-delete-at to be stored as metadata in the object server	23:16
mathiasb	notmyname: sorry I fell asleep and missed the meeting :/	23:29
mathiasb	any chance of moving the topics from working session 5 from friday to thursday, since neither me nor kota_ will be around on friday?	23:29
notmyname	mathiasb: yeah, I do need to adjust that. will do it over the next 24 hours	23:30
mathiasb	..just going over the meeting logs and saw that the issue was raised there already	23:30
mathiasb	thanks!	23:30
*** Jeffrey4l has joined #openstack-swift		23:31
mathiasb	do you know anything more about the meeting room facilities, e.g., if they have projectors to show slides?	23:32
notmyname	mathiasb: I don't, for sure. but I expect them to have something like that. we have had it in the past	23:34
jmunsch	notmyname mattoliverau : thanks so much for the help	23:39

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!