Monday, 2019-03-11

*** mikecmpbll has quit IRC01:57
*** gkadam has joined #openstack-swift02:45
kota_alecuyer: when I noticed the patch to remove gRPC, I was feeling same with clayg. To move the grpc because of a technical issue, it is smoothly better than persistent older version, just IMO.03:47
*** gkadam has quit IRC03:47
* kota_ is going to go out of his office. will be back to there tomorrow morning.05:23
*** axblueblader has joined #openstack-swift05:30
openstackgerritzhufl proposed openstack/swift master: Do not use self in classmethod  https://review.openstack.org/64235906:28
*** ianychoi has quit IRC06:32
*** ianychoi has joined #openstack-swift06:32
*** ianychoi has quit IRC06:35
*** ianychoi has joined #openstack-swift06:36
axbluebladerhello guys, I'm new to swift and currently researching about it's object versioning functionality for large object, is it ok to ask questions on this channel?06:40
*** e0ne has joined #openstack-swift06:51
*** pcaruana has joined #openstack-swift07:00
*** e0ne has quit IRC07:02
*** rcernin has quit IRC07:03
*** axblueblader has quit IRC07:53
*** axblueblader has joined #openstack-swift08:09
*** tkajinam has quit IRC08:17
alecuyerkota_: OK, then I will go ahead and work from that first patch08:22
*** axblueblader has quit IRC08:25
*** e0ne has joined #openstack-swift08:27
*** mikecmpbll has joined #openstack-swift08:56
*** hseipp has joined #openstack-swift08:59
*** e0ne has quit IRC09:22
*** mikecmpbll has quit IRC09:33
*** mikecmpbll has joined #openstack-swift09:34
*** axblueblader has joined #openstack-swift09:36
*** e0ne has joined #openstack-swift09:42
*** axblueblader has quit IRC09:50
*** e0ne has quit IRC10:43
*** e0ne has joined #openstack-swift10:47
*** e0ne has quit IRC11:37
*** ybunker has joined #openstack-swift11:47
ybunkerhi all, unfortunately im still facing problems with the account & container replication on the new nodes.. i really don't know what else to look for, so if someone can give me a hand or hint on this i will really appreciated :), here is the configuration files and some output of the logs:   http://pasted.co/f931a29a12:04
ybunkerif more info is needed to do some troubleshooting on this please don't hesitate to ask for12:04
ybunkerobjects are replicating fine12:07
*** [diablo] has quit IRC12:26
*** [diablo] has joined #openstack-swift12:29
*** e0ne has joined #openstack-swift12:30
ybunkeranyone?12:39
*** [diablo] has quit IRC12:43
*** hseipp has quit IRC13:03
*** ianychoi has quit IRC13:28
*** ianychoi has joined #openstack-swift13:29
*** e0ne has quit IRC14:14
zaitcevybunker: sorry, they are mostly in California14:16
*** e0ne has joined #openstack-swift14:23
ybunkeroh i see :(14:24
*** e0ne has quit IRC14:36
*** e0ne has joined #openstack-swift14:38
*** mrjk has quit IRC14:43
zaitcevIn theory I should be able to help, but in practice... BTW, those configs look very strange for a multinode. The rsync.conf looks like someone adapted a SAIO for multinode.14:43
zaitcevhere, take a look - http://www.zaitcev.us/things/swift/rhev-24c-01.etc.rsyncd.conf14:46
zaitcevOh, wait, n/m14:47
zaitcevThey are SAIOs with 1.conf, 2.conf etc14:48
*** mrjk has joined #openstack-swift14:48
zaitcevybunker: you must make sure that account ring includes devices with port 410314:50
*** FlorianFa has joined #openstack-swift14:50
ybunkerzaitcev: let me send you the account ring just to double check14:50
zaitcevybunker: please don't :-)  Run swift-ring-builder account.builder  without any more arguments and double-check that devices in the ring correspond to nodes and ports where account servers listen.14:51
*** FlorianFa has quit IRC14:52
*** FlorianFa has joined #openstack-swift14:52
ybunkerzaitcev: yes, there are ok (if you wanna take a look :) -> http://pasted.co/70b963c5)14:54
zaitcevybunker: make sure the replicator uses the correct ring, then. Maybe you have some docker thrown in or whatever. I cannot really tell.15:05
zaitcevI see that 10.2.1.19:4103 is in the builder file at least.15:06
zaitcevThe "rebalance" stage writes out account.ring.gz15:06
zaitcevThen you scp it from admin workstation to swift-node09:15:07
zaitcevWell, I'm sure you know all that.15:07
*** mrjk has quit IRC15:31
*** mrjk has joined #openstack-swift15:32
*** mrjk has quit IRC15:32
*** mrjk has joined #openstack-swift15:32
ybunkeryes, and then i restart the services on the nodes15:48
ybunkerzaitcev: i have the following permissions on the rings:   -rw-r--r-- 1 swift swift15:49
zaitcevybunker: Permissions are not a factor, because the replicator would've bailed if it could not read the ring. Look, it's a very simple process. When replicator starts, it gets a list of IPs available for it to listen (unless bind_ip is set). Then it reads the ring and searches for itself in it. It's simple as pie!16:12
notmynamegood morning16:13
zaitcevybunker: You have a very convoluted, strange configuration. I cannot diagnose it for you using just hearsay, sorry.16:13
ybunkerzaitcev: i know.. but the problem is that every config file looks good, so don't know where to look for16:14
ybunkerzaitcev: no problem, thanks for the tips :)16:14
zaitcevybunker: oh just strace thet replicator then and look what it opens and reads. Then md5sum on the file. It's a heavy-weight menthod, but then you'll see that it reads from a place where you didn't copy that ring.gz and it's reading something obsolete. Or you forgot the rebalance. Or heck, I saw people run rebalance, it errors out because min_hours is not reached, then they blindly copy the old ring.gz to node...16:15
tdasilvaybunker: are you able to get recon info from the account servers on those ports?16:16
ybunkerzaitcev: will check on that and get back16:16
*** mrjk has quit IRC16:17
zaitcevOh, actually, what tdasilva says. That gives the md5sum of the ring the server sees (well, hopefully it's the same as the one the replicator sees, because in the docker world you can't even be sure in that much, le sigh).16:17
tdasilvasomething like: curl `http://10.1.1.11:4101/recon/devices` or something like that...16:19
tdasilvaand then try for the replication ip:port also16:19
zaitcev10.2.1.19:4103 in his case16:19
tdasilvaI always like to add the healthcheck middleware on the pipeline too, cause it'a quick way to just get a heartbeat on the service....16:20
openstackgerritTim Burke proposed openstack/swift master: Stop monkey-patching mimetools  https://review.openstack.org/64055216:20
*** pcaruana has quit IRC16:23
*** pcaruana has joined #openstack-swift16:23
ybunkeralso i notice that some object disks are more usable in terms of % than others, so maybe something with the replication ring is going on16:30
*** e0ne has quit IRC16:41
*** gyee has joined #openstack-swift16:46
zaitcevybunker: maybe, but let's focus on the problem at hand, which is replicator not finding itself in the ring, according to your pastebin. Once you got them running solid on all nodes, then you can look at utilization.16:57
ybunkerzaitcev: yep16:57
*** e0ne has joined #openstack-swift16:57
*** mrjk has joined #openstack-swift17:04
openstackgerritTim Burke proposed openstack/swift master: Get functional/tests.py running under py3  https://review.openstack.org/64252017:05
*** e0ne has quit IRC17:13
*** patchbot has quit IRC17:14
*** patchbot has joined #openstack-swift17:14
*** e0ne has joined #openstack-swift17:14
zaitcevtimburke: I'm very sorry but I'm very confused! I went here... then grepped BaseMessage https://git.openstack.org/cgit/openstack/swift/tree/swift/common/wsgi.py?id=fac7d743db49858c17228f3ebb470948dae7cc23#n42917:17
timburkebah! i meant to switch *all* the BaseMessage stuff to wsgi.HttpProtocol.MessageClass...17:18
zaitcevoh17:18
zaitcevI thought it was some magic way to see some class methods or whatever17:19
openstackgerritTim Burke proposed openstack/swift master: Stop monkey-patching mimetools  https://review.openstack.org/64055217:22
openstackgerritTim Burke proposed openstack/swift master: Get functional/tests.py running under py3  https://review.openstack.org/64252017:22
zaitcevtimburke: Why not use __super__? Just asking.17:29
timburkezaitcev, py217:29
timburkeor, were you thinking super(..., self).blahblahblah? mimetools.Message doesn't inherit from object iirc17:30
zaitcevOh god17:31
*** e0ne has quit IRC17:31
*** mikecmpbll has quit IRC17:32
timburkeyeah, basically. :-(17:34
timburkehttps://github.com/python/cpython/blob/v2.7.15/Lib/rfc822.py#L8517:34
ybunkermmm when i query the account.builder ring:  account.builder, build version 260, id (not assigned)17:41
ybunkerthat "id (not assigned)" is ok?17:41
ybunkerzaitcev: also i nithce that swift_container_server.log file is getting all 404 on PUTs on all the nodes18:01
ybunkertdasilva: i have run the curl to recon devices and im getting the full list of disks:     {"/srv/node": ["3", "1", "7", "6", "9", "4", "10", "11", "5", "2", "12", "8"]}18:08
tdasilvaybunker: and you did that for all account servers  ip:port combos? and replication combos?18:14
zaitcevI'd start with checking 10.2.1.19:4301/recon/ringmd518:20
*** e0ne has joined #openstack-swift18:23
*** e0ne has quit IRC18:25
openstackgerritTim Burke proposed openstack/swift master: Get functional/tests.py running under py3  https://review.openstack.org/64252018:25
ybunkerzaitcev: would it be 10.2.1.19:4103 instead of 4301?18:28
zaitcevybunker: possibly.18:28
ybunkerzaitcev: ok, its the same for all the nodes18:30
ybunkerzaitcev: the md5 in the master node is the same on all the nodes18:36
ybunkerhttp://pasted.co/545a2c7a18:39
zaitcevybunker: do you still get the "swift-node09 account-replicator: Can't find itself" or has it stopped?18:56
ybunkerzaitcev: still on18:57
*** e0ne has joined #openstack-swift18:57
ybunkerzaitcev: what if i configure the account-replicator for each of the disks inside /etc/swift/account-server/ and add rsync_module = 10.2.1.19::account4201 ... then account4202 and so on for the rest ?18:58
zaitcevybunker: okay, does that md5 match the one you get with md5sum /etc/swift/account.ring.gz (or what is the right path on the master or admin system)?18:58
ybunkerzaitcev: yes, its the same18:59
zaitcevybunker: and the timestamp of it is newer than the builder? Just with ls -lt /etc/swift/account*19:00
zaitcev.ring.gz has to be on top of .builder19:00
ybunkeryes19:04
ybunkerzaitcev: if i look at the rsync logs i found :   unknown module 'container' tried from swift-node01 (10.2.1.11)19:04
zaitcevYes, you have a ton of problems there. But they do not matter unless you get replicators actually working.19:05
ybunkeryep19:05
zaitcevUsing 1.conf 2.conf etc. on a multi-node setup is absurd. When such SAIO-like setup is in place, rsync must add port number to that thing, forgot what it calls it, maybe "module". There's an obscude setting for it that SAIO sets. It's called something like "vm_mode". BUT19:07
zaitcevBUT it's much better junk those crazy 1.conf and 2.conf and just use a normal setup in production. Then you never have that "unknown module" things happening in rsync.19:07
zaitcevAnyway19:08
ybunkeryeah, once i get this thing working i will then move and take some corrective actions on the configurations, but first i need to get this think to work :(19:09
zaitcevRight19:09
zaitcevoh19:14
zaitcevI think I see it now19:14
zaitcevIn http://pasted.co/f931a29a, replicator says that it wants port 4103, right? But that port is only used for the access network, not for replication network. Look into http://pasted.co/70b963c5. All instances of 4103 are in the left column.19:16
zaitcevAnd you have two sets of listeners, 3.conf listens on 4103 and r_3.conf listens on 4203.19:17
zaitcevI can only conclude that your replicator uses 3.conf. I don't know how you start it, but it just does. Then, it sees it binding to 4103, then tries to find that among replication ports in the ring, and fails.19:18
zaitcevYou need to make sure that your replicator uses r_3.conf or whatever is the conf that its listener does. Edit some Systemd unit files or whatever.19:19
zaitcevOr here's an even better suggestion to start19:20
zaitcevJust don't use the replication network at first19:20
ybunkermmm and how can i change that in order to work? can i try put it all on one config file instead of having 3 ? (is an inherited cluster :-( )19:20
zaitcevGet it all running with just one network, and have rings with port and replication_port set to same value at first. THen, as you got that debugged, add a distinct replication network.19:21
zaitcevOooh...19:21
zaitcevSo someone invented this mad config and dumped it on you.19:21
ybunkeryes!!! exactly :-( and its a pain in the... hahah19:22
zaitcevSo, how is the account replicator started? Is it swift-init, SystemD, or something else?19:23
ybunkerswift-init19:24
zaitcevYou have to find that out. There, it gets its argument... it's a path. That path must be such that it ends reading r_3.conf instead 3.conf.19:24
zaitcevHmm, wait. That one may be computing the paths.19:24
ybunkerand i see processes for 1.conf 2.conf 3.conf and r_1.conf r_2.conf and r_3.conf19:25
zaitcevBut it's mistaken when you have both maybe. Because it never expected that to happen.19:25
ybunkeris there a way to get rid of those r_x.conf files and just we one config file for acct, another for cont and one for obj ?19:26
ybunkerat least on the new server, and then i will try to change the rest of the cluster19:27
zaitcevYes, there is, but that effectively switches off the replication network and uses the front-to-back network for replication traffic.19:27
zaitcevI'm afraid to suggest that outright, because who knows how much network capacity you're using right now. Merging them may bring the whole thing to its knees.19:28
zaitcevHold on, let me check if you can force replicator's port19:29
ybunkerthanks a lot zaitcev19:29
zaitcevrunning vi etc/account-server.conf-sample, let's see..19:29
zaitcevnope, it's not possible19:31
ybunker:-(19:31
zaitcevybunker: when you said "i see processes for ... r_3.conf", what does it actually include? Could it be that you have _two_ account replicators running: one for 3.conf, which does nothing, and one for r_3.conf?19:32
zaitcevIf so, then all you need is prevent the unnecessary and confusing extra replicator from starting.19:34
ybunkerzaitcev: http://pasted.co/d36d311019:34
zaitcevAh, yes, a full set :-)19:35
openstackgerritTim Burke proposed openstack/swift master: Get functional/tests.py running under py3  https://review.openstack.org/64252019:36
zaitcevSo, try to comment out [account-replicator] from 3.conf and do swift-init account-replicator stop, then start. Check if there aren't any bad messages.19:36
ybunkerok, well errors seems to gone away :-)19:39
ybunkerbut... still not replicating :(19:39
zaitcevHow do you know?19:40
zaitcevnumber of passes remains at zero in recon?19:40
*** e0ne has quit IRC19:40
zaitcevOr some other method?19:40
*** e0ne has joined #openstack-swift19:40
ybunkerlooking at the log files for swift_account_server.log and listing the content of /srv/node/{1|2|3} does not appear any account directory19:40
ybunkerin the [account-replicator] i only have vm_test_mode = yes19:43
zaitcevRight, that's how r_3.conf is set up. That setting appends "4203" to "account".19:43
zaitcevSo, actually, if you walk through all of ?.conf and comment out replicators, you should get rid of rsync complaining about the naked "container" module.19:44
zaitcevAlthough19:44
zaitcevThat actually means that one of those extra container replicators managed to find itself in the ring and tried to replicate.19:45
ybunkeroh ok i see19:47
ybunkerand on the main account-replicator.conf19:48
zaitcevYeah, it's a little surprising that you have both that and account/1.conf19:48
ybunkerdo i need to comment out also the [account-replicator] ? or leave it with log_facility, concurrency = 1 and vm_test_mode = yes ?19:48
zaitcevI don't know for sure. You have to check if it owns any devices in the ring...19:50
zaitcevIf you do need it, then it must keep vm_test_mode, because that's how all those rsync listeners are set up.19:51
ybunkerthe "unknown module 'container' tried from" errors continue inside the rsyncd.log19:52
zaitcevYou still have a container replicator without vm_test_mode somewhere in the cluster. Rsync should also tell you the IP of the offending node.19:53
ybunkerzaitcev: will review all the cluster and get back tomorrow hopefully with some news :), thanks a lot zaitcev for all the help today, really appreciated!19:54
*** ybunker has quit IRC19:56
*** e0ne has quit IRC20:11
*** e0ne has joined #openstack-swift20:12
*** e0ne has quit IRC20:15
*** itlinux has joined #openstack-swift20:30
*** itlinux has quit IRC20:34
openstackgerritTim Burke proposed openstack/swift master: Get functional/tests.py running under py3  https://review.openstack.org/64252020:38
*** e0ne has joined #openstack-swift20:56
*** itlinux has joined #openstack-swift20:56
*** e0ne has quit IRC21:04
*** e0ne has joined #openstack-swift21:12
*** e0ne has quit IRC21:38
*** pcaruana has quit IRC21:45
timburkeclayg, notmyname: you guys expressed some opinions on https://review.openstack.org/#/c/640552/ -- got some review bandwidth to second zaitcev's +2?21:49
patchbotpatch 640552 - swift - Stop monkey-patching mimetools - 5 patch sets21:49
*** itlinux has quit IRC21:55
notmynametimburke: so if I understand p 640552 correctly, we used to patch mimetools to set None so we can detect when the client doesn't set it. and now we do it with the `protocol_class` that's passed in to the wsgi server22:42
patchbothttps://review.openstack.org/#/c/640552/ - swift - Stop monkey-patching mimetools - 5 patch sets22:42
notmynamealthough I'm not sure where the `protocol_class` thing gets used. eventlet maybe?22:43
notmynamealso a google search for "eventlet wsgi 'protocol_class'" only shows 6 results, 5 from swift's codebase22:46
*** threestrands has joined #openstack-swift22:50
*** tkajinam has joined #openstack-swift22:56
mattoliveraumorning22:58
zaitcevI went directly to "git clone eventlet" to see how Tim chose to wedge this stuff in, seemed agreeable.23:07
notmynameya, he just explained it to me too. looks good when you finally see where it's all being called from23:10
timburkenotmyname, +1?? are you a core or not!? :P23:17
*** threestrands has quit IRC23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!