Wednesday, 2019-03-06

openstackgerritPete Zaitcev proposed openstack/swift master: py3: port bulk middleware  https://review.openstack.org/61930300:12
*** hoonetorg has quit IRC00:24
*** hoonetorg has joined #openstack-swift00:26
*** itlinux has joined #openstack-swift00:58
*** gyee has quit IRC01:21
*** itlinux_ has joined #openstack-swift01:38
*** itlinux_ has quit IRC01:40
*** itlinux has quit IRC01:41
*** psachin has joined #openstack-swift02:43
notmynamekota_: rledisez: mattoliverau: looking at the meeting schedule for tomorrow (because i've got a potential conflict, or at least time crunch)...04:43
notmynameI wanted to follow up on a couple of things that were mentioned last week04:43
kota_notmyname: o/04:44
notmyname(1) for kota_ and rledisez, what's the status of gate jobs on feature/losf?04:44
notmyname(2) for rledisez, did you do anything about the docs that are "encouraging" device names in lieu of labels or uuids? (at least file a bug, if not a patch)04:44
notmynameIMO, these questions can have an async answer, and if there is no follow-up needed, then we don't need a meeting tomorrow04:45
kota_notmyname: not progressed yet so much. I briefly looked at the dsvm gate job and zuul docs, but I didn't reach out the reason why we dropped the dsvm gate on losf branch.04:45
notmynameok04:45
kota_I'll ask to infra team in this week.04:45
notmynamekota_: if you need help with getting that fixed, please ask04:45
notmynamesounds good04:45
*** ianychoi_ has joined #openstack-swift05:24
*** ianychoi has quit IRC05:28
*** hoonetorg has quit IRC05:33
*** hoonetorg has joined #openstack-swift05:50
*** e0ne has joined #openstack-swift06:31
*** zaitcev has quit IRC07:13
*** ccamacho has joined #openstack-swift07:41
*** e0ne has quit IRC07:47
*** e0ne has joined #openstack-swift07:54
*** admin6_ has joined #openstack-swift07:58
*** admin6 has quit IRC08:00
*** admin6_ is now known as admin608:00
*** hseipp has joined #openstack-swift08:01
*** tkajinam has quit IRC08:09
*** e0ne has quit IRC08:16
*** pcaruana has joined #openstack-swift08:29
rlediseznotmyname: about the device name vs label/uuid, i filed a bug ( https://bugs.launchpad.net/swift/+bug/1817966 ) and I answered to the mail. no patch yet, still on my todo list08:52
openstackLaunchpad bug 1817966 in OpenStack Object Storage (swift) "Encourage the use of static device names in fstab" [Undecided,New]08:52
rledisezclayg: I though many times of removing the REPLICATE call after data transmission. I'm pretty sure I did it once in a specific situation, but nothing I did permanently. I was not totally of all the consequences. But it seems reasonable to do it with SSYNC08:59
admin6hi team, do you have any idea why one disk, having the same size as the other disks of a node and the same relative weight in the ring, could suddenly start to be fullfilled by swift. Even after reducing it’s weight it continues to grow up to 100% full.  And furthermore, I now have the same behavior on all the disks of this server…09:30
rledisezadmin6: can you check the folders other than objects (tmp, quarantines, etc…) to see where it is growing09:44
rledisezwe already had case where an object was replicated, then quarantined, then replicated, etc...09:45
admin6rledisez: didn’t seem to be that. I’ve checked the size of dir quarantined on these disk and the values are standard, about the same compared to another server.09:57
admin6rledisez: I’ve also checked async_pending ans tmp dirs.10:01
rledisezadmin6: then it is probably dispersion10:06
admin6rledisez: could you be a bit more precise ? I know I have a really bad dispersion currently, besauce I’m trying to reduce the number of zones in my ring from 6 to 4 (or maybe 5 zones), but I’ve paused this project for a while. Here is the disperion of this ring : Dispersion is 24.258486, Balance is 21.684194, Overload is 10.00% Required overload is 358.386218% Worst tier is 43.564049 (r1z2-10.10.1.52)10:11
rledisezi'm talking dispersion on disk, not in the ring. it means the replicator/reconstructor still need to work to put all data back in place. are you using ssync or rsync ?10:33
rledisezthere is no real tool to check that easily10:33
rledisezyou basically need to request the ring for the partition placed on a given device10:34
rledisezthen compare with what's on disk (ls /srv/node/…/objects/)10:34
*** e0ne has joined #openstack-swift10:35
admin6rledisez: I’m using ssync with swift 2.17. how can I get from the ring the list of partition placed on a given drive ?10:53
*** FlorianFa has quit IRC11:01
*** FlorianFa has joined #openstack-swift11:02
*** pcaruana has quit IRC11:04
*** ianychoi_ is now known as ianychoi11:20
*** pcaruana has joined #openstack-swift11:32
admin6Do you know how I can request a ring to get the list of partitions placed on a given drive ?11:38
admin6sorry, swift-ring-builder list_parts seems to be a good candidate for my previous question ;-)12:01
*** henriqueof has joined #openstack-swift12:02
*** mvkr has joined #openstack-swift12:13
*** henriqueof has quit IRC12:16
*** ybunker has joined #openstack-swift12:28
ybunkerhi all, question, i just add two new storage nodes to an existing cluster (queens), and I notice that the account/container files are not replicating at all..obj files are working fine, but acct and cont no, any ideas? in the swift_container_server.error file the only thing Im seeing is the following err msg:12:29
ybunkercontainer-replicator: Can't find itself 127.0.0.1, ::1, 1x.xx.xx.xx, fxxx::xxxx:xxxx:xxxx:xxxx, 1x.xx.xx.xx, fxxx::xxxx:xxxx:xxxx:xxxx, 1x.xx.xx.xx, xxxx::xxxx:xxxx:xxxx:xxxx with port 5103 in ring file, not replicating12:29
ybunkerand the swift_container_server.log file is full of 404 errors on PUT operations12:30
ybunkerany ideas?12:35
ybunkeranyone?12:45
ybunkerthe container ring file (http://pasted.co/3314af99)12:54
ybunkernodes 10.1.1.19 and 10.1.1.20 are the new ones...12:54
ybunker??13:19
ybunkeron rsyncd log im seeing this error msg (unknown module 'container' tried)13:31
admin6ybunker: maybe you forgot to declare account and container section in rsyncd.conf on the new server ?13:39
ybunkeradmin6: I did it already (http://pasted.co/e62c72a9)13:43
ybunkeri dont know what else to look for... account and container is not replicating at all, check the files inside the /srv/node/ for acct cont and no files at all either13:44
ybunkeris there a way to "manually" copy from other nodes?, at least to lower a little bit the 404 errors13:51
*** pcaruana has quit IRC13:51
ybunker?13:57
*** pcaruana has joined #openstack-swift14:01
ybunkeranyone could give a hint on this? i really need to get this thing running ... :S14:07
DHEthe replication service basically runs "ip addr show" to see what IPs are on this host to find itself (all entries for all disks/devices) in the container ring file, on the port(s) it listens on. it failed to find itself.14:08
DHEthere is no 10.1.1.19 in your container ringbuilder14:09
DHEoh wait, there it is but out of order. n/m that14:09
*** FlorianFa has quit IRC14:09
ybunkerDHE: mm what do you mean with out of order?14:18
DHEyour paste doesn't list devices in IP address order. but that's my fault for not paying enough attention14:21
ybunkerDHE: oh I see, yeah i dont know why it put server 19 (10.1.1.19) with that ID, anyway, besides that i can't find any miss config or anything different on the other servers.14:23
ybunkerany idea?14:49
ybunkeri compare rsyncd file with all the other nodes in the cluster and they are the same, of course except for the address that changes on all of the nodes14:49
ybunkeralso check the container-server.conf file, its the same on all the nodes14:50
ybunkeri also check verify that rsync is accepting connections for all servers http://pasted.co/21ab29c814:59
ybunkermmm on the logs of the container now is showing: container-sync: Skipping tmp as it is not mounted15:09
ybunkercontainer-sync: Skipping containers as it is not mounted15:09
ybunkercontainer-sync: Skipping accounts as it is not mounted15:09
admin6HA all. I’m still working on my "disk full" problem for an erasure coding ring. As rledisez suggested me, I had a look at the "disk dispersion" and I found a big delta between the list_parts declared in the ring (14127 declared on this disk) and the existing directories in the object folder (57000+ directories). Looking into some of the 43000 additional dirs that are not listed in the list_part, I see a lot of directo15:21
admin6fullfilled with real datas (valid fragments of objects) that have been accessed recently, but has nothing to do there as they are not part of the main fragment places, neither the first 12 handoffs. It looks like they are old handoff that are parsed by some swift process but never have been cleaned. May I have missed something in my reconstructor config that prevent to clean these files ?15:21
notmynamerledisez: alecuyer: how do you feel about skipping today's meeting? but I see you added something about grpc to the agenda...15:30
rlediseznotmyname: it can be skipped, nothing urgent15:32
notmynamerledisez: ok15:32
ybunkerany ideas? im kind of stuck in here15:47
*** pcaruana has quit IRC15:53
*** pcaruana has joined #openstack-swift16:06
*** ccamacho has quit IRC16:27
ybunker...16:36
claygadmin6: handoffs_only + reconstructor_workers = #_of_disks should get you back on track16:51
claygybunker: for some reason it sounds like the devices config option is off a directory - all account/container/object config should be /srv/node16:53
ybunkerclayg: acct and cont are inside /srv/node/1, /srv/node/2 and /srv/node/316:55
*** pcaruana has quit IRC16:55
claygadmin6: there's a couple of open bugs I know about that can cause EC handoff partition not to remove cleanly -> https://bugs.launchpad.net/swift/+bug/1816501 is one...16:55
openstackLaunchpad bug 1816501 in OpenStack Object Storage (swift) "reconstructor doesn't remove empty handoff dirs with reclaimed tombstones" [Undecided,New]16:55
claygadmin6: also https://bugs.launchpad.net/swift/+bug/177800216:56
openstackLaunchpad bug 1778002 in OpenStack Object Storage (swift) "EC non-durable fragment won't be deleted by reconstructor. " [Medium,Confirmed]16:56
admin6clayg: thanks, but I’m already running with these options set since yesterday, but It has no real results on my disk usage. :-(16:56
claygybunker: that's a little different than normal - most of the time a single node has /srv/node/<device>/[account|container|object]16:57
claygso the accounts|containers|objects dirs are all parallel and all the nodes devices option is /srv/node16:57
claygthis is useful esp if your a&c and o share disks - you just mount everything into /srv/node/<device>16:57
ybunkerclayg: i've /etc/swift/account-server/1.conf... 2.conf... 3.conf... r_1.conf...r_2.conf and r_3.conf on all the data nodes16:57
claygybunker: you CAN set it up the way you described - but I bet you hvae to be extra careful with your rsync.conf16:58
ybunkerclayg: yeah... the thing is that rsync conf is the same for all the nodes.. and actually objects are being replicated.. but acct and cont no :(16:58
claygseparate configs for front-end and backend services is not uncommon, you certainly CAN manage all account services in a single config - or weird setups like the SAIO have multiple configs to simulate multiple "nodes" on a single machine16:59
claygybunker: so maybe the rsync config is correct for object (you said /srv/node/1?) and so then it's wrong for a&c?17:00
ybunkerclayg: objects are on /srv/node/4 .../5... /12, and acct cont on /srv/node/1... /2... /317:00
claygadmin6: for the referenced bugs something like https://gist.github.com/clayg/7a975ef3b34828c5ac7db05a519b6e8a might help 🤷‍♂️17:00
ybunkeri check permissions and are the same for all the directories17:01
claygadmin6: but if you have DATA in the disks really you should just need to crank up the reconstructor handoff_only workers and let her run17:01
claygwhat is the reconstructor *doing* - are the disks "busy" (i.e. iostat -dmx 2)17:01
*** ccamacho has joined #openstack-swift17:02
claygybunker: maybe paste you `account-replicator` config, and your `rsync.conf` and the output of `swift-account-replicator /replicator.conf once verbose`17:03
claygybunker: it seems like you have a slightly unorthodox configuratoin and probably something that normally the default is fine needs to be changed so all the pieices are wired up correctly17:03
claygthere's probably a hint in the logs - you just need to cut through the noise and zoom in on what's really broken (the error msg may be only kind of "indirectly" related)17:04
claygybunker: I'd guess it's something to do with rsync_module maybe - that stuff is trixy17:06
ybunkerclayg: let me post the config files, hopefully is something like that17:06
*** gyee has joined #openstack-swift17:06
claygadmin6: I need to bounce off for a bit - good luck!  the reconstructor is a beast - you can get it tuned - don't forget to check your incoming concurrency settings (reconstructor talks to object-server - there's a replication related concurrency setting you may need to open up)17:07
admin6clayg: thanks a lot17:07
claygadmin6: if you've never done a rebalance before there's lots of misconfigs that could be in effect (see ybunker's current crisis) - maybe https://bugs.launchpad.net/swift/+bug/1446873 ???17:08
openstackLaunchpad bug 1446873 in OpenStack Object Storage (swift) "ssync doesn't work with replication_server = true" [Medium,Confirmed]17:08
claygybunker: admin6: either of you all going to come see us in Denver for the Summit!?  Swift party time!  I'll buy you a beer 🍻17:09
admin6clayg: no chance to see me there unfortunaltely, but i’d really appreciate to drink a beer with you at another swift party time.17:11
ybunkerclayg: find the config detail (http://pasted.co/34e6f214)17:15
*** ccamacho has quit IRC17:15
guimalufhi all, when I run swift-ring-builder dispersion on my object.builder I'm getting 0.58% of dispersion, that means 189 partition within the same region. How can I fix this? Should I increase weight in one node on r1? or decrease weight in r2? any suggestion would be appreciate17:16
guimalufd17:16
*** ccamacho has joined #openstack-swift17:19
ybunkerclayg: it seems that container-replicator has nothing to replicate at all...  http://pasted.co/15f955ef17:24
*** hseipp has quit IRC17:27
*** e0ne has quit IRC17:34
ybunkerclayg: did you find anything wrong on the configs?17:53
ybunkercan I push the data from acct cont from another of the nodes?17:54
*** ccamacho has quit IRC18:00
*** psachin has quit IRC18:05
*** zaitcev has joined #openstack-swift18:13
*** ChanServ sets mode: +v zaitcev18:13
ybunker?18:18
ybunkerok.. now im getting:18:19
ybunker object-replicator: rsync error: error starting client-server protocol (code 5) at main.c(1653) [sender=3.1.1]18:19
ybunkerclayg: find that on some proxy nodes:18:37
ybunkerRing file account.ring.gz is obsolete18:37
*** ybunker has quit IRC19:33
*** e0ne has joined #openstack-swift20:03
*** e0ne has quit IRC20:21
claygthat whole "early quorum" stuff maybe isn't so great when the early part of the quorum is an error20:21
mattoliverauSo are we meeting today? it seems both of notmyname's things were answered20:32
claygthat post_quorum_timeout is a heck of a setting20:53
kota_morning21:00
* kota_ is scrolling back to know if the meeting happens21:01
mattoliveraukota_ I don't know I asked the question21:02
kota_mattoliverau: o/21:03
mattoliveraukota_: notmyname asked rledisez if he's ok if the meeting is skipped and seemed to be ok with it.. maybe that means it is being skipped (reading scrollback)21:05
mattoliverauSo /me might go and eat breakfast :)21:06
kota_mattoliverau: got it, thanks :)21:07
kota_my wife and kids are still in asleep so let's go back to my bed :P21:08
kota_oic, i found the talk line about skipping meeting.21:10
mattoliveraukota_: yeah go sleep while you can ;) see you a little later here :)21:11
*** e0ne has joined #openstack-swift21:13
notmynamekota_: mattoliverau: yeah, thanks for being flexible :-)21:16
zaitcevtimburke: do you think Request.path_info returns WSGI string or native string? Seems like returning WSGI in the current code, but what would you do if you had a clean slate?21:35
timburkezaitcev, i think i'd keep it as wsgi, just since we're proxying straight through to env[PATH_INFO]21:44
timburkewe *do* at least have Request.swift_entity_path to get us native strings...21:45
zaitcevtimburke: okay. That decision means a ton of wsgi_to_str, is all.21:45
timburke*maybe* Request.path should do that, too?21:45
timburkewhat about introducing a Request.path_info_str property or something?21:46
zaitcevI hit it in slo, and I'm going to roll those extra wsgi_to_str in other modules. Tests didn't cover it adequately.21:46
zaitcevHmm.21:46
zaitcevLet's start with adding wsgi_to_str everywhere and then maybe path_info_str if it gets too much. The annoying path really is the self.split_path, because that one has v,a,c,o = self.split_path() pattern and inserting conversions is annoying.21:47
*** e0ne has quit IRC21:58
*** e0ne has joined #openstack-swift22:01
*** e0ne has quit IRC22:01
*** rcernin has joined #openstack-swift22:52
claygoh how wise he is... https://bugs.launchpad.net/swift/+bug/1503161/comments/1722:53
openstackLaunchpad bug 1503161 in OpenStack Object Storage (swift) "[Re-open in 2015 Oct] DELETE operation not write affinity aware" [Medium,Fix released] - Assigned to Lingxian Kong (kong)22:53
*** tkajinam has joined #openstack-swift23:01
timburkei don't even remember what that was about23:04

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!