Friday, 2019-05-17

*** Guest28904 has quit IRC00:34
*** mmethot has quit IRC01:12
*** mmethot has joined #openstack-oslo01:55
*** ianychoi has joined #openstack-oslo02:00
*** ianychoi has quit IRC03:49
*** ianychoi has joined #openstack-oslo03:49
*** ianychoi has quit IRC03:51
*** ianychoi has joined #openstack-oslo03:52
*** lbragstad has quit IRC03:53
*** pcaruana has joined #openstack-oslo05:20
*** larainema has quit IRC05:35
*** Luzi has joined #openstack-oslo05:52
*** lpetrut has joined #openstack-oslo06:05
*** hberaud|gone is now known as hberaud06:47
*** starborn has joined #openstack-oslo06:58
*** tesseract has joined #openstack-oslo07:08
*** rcernin has quit IRC07:19
*** jaosorior has quit IRC07:19
openstackgerritHervé Beraud proposed openstack/tooz master: Update Sphinx requirement Cap grpcio  https://review.opendev.org/65959007:46
*** lpetrut has quit IRC08:09
openstackgerritNatal Ngétal proposed openstack/oslo.cache stable/stein: Avoid tox_install.sh for constraints support  https://review.opendev.org/65917408:12
*** tosky has joined #openstack-oslo08:15
*** hberaud has quit IRC08:22
*** hberaud has joined #openstack-oslo08:23
*** lpetrut has joined #openstack-oslo08:29
openstackgerritNatal Ngétal proposed openstack/oslo.log stable/stein: Cap bandit below 1.6.0 version and update sphinx and limit monotonic.  https://review.opendev.org/65975508:56
*** e0ne has joined #openstack-oslo09:17
*** rcernin has joined #openstack-oslo09:50
*** hberaud is now known as hberaud|school-r09:56
*** hberaud|school-r is now known as hberaud|lunch10:02
*** hberaud|lunch is now known as hberaud11:23
openstackgerritMerged openstack/stevedore master: Cap Bandit below 1.6.0 and update Sphinx requirement  https://review.opendev.org/65955111:28
*** raildo has joined #openstack-oslo11:34
*** rcernin has quit IRC11:36
openstackgerritHervé Beraud proposed openstack/stevedore master: Uncap bandit.  https://review.opendev.org/65955411:44
*** yan0s has joined #openstack-oslo11:48
*** kgiusti has joined #openstack-oslo12:52
*** kgiusti has left #openstack-oslo12:52
*** jaosorior has joined #openstack-oslo13:11
*** lbragstad has joined #openstack-oslo13:28
*** mmethot has quit IRC13:50
*** ansmith has joined #openstack-oslo13:52
bnemec#success Capped all the bandits in Oslo to unblock CI13:53
openstackstatusbnemec: Added success to Success page (https://wiki.openstack.org/wiki/Successes)13:53
stephenfinGood riddance14:00
*** lpetrut has quit IRC14:06
*** ansmith has quit IRC14:08
*** kgiusti has joined #openstack-oslo14:11
*** mmethot has joined #openstack-oslo14:13
*** Luzi has quit IRC14:25
gsantomaggio>So if we take a step back this feature will have to be implemented as a new parameter to the __init__ of the RPCClient class.14:36
gsantomaggio@kgiusti totally agree!14:36
gsantomaggioit is easier !14:36
kgiustigsantomaggio: sorry man - I should've realized that at the start :(14:39
gsantomaggio@kgiusti don't worry! it was a good opportunity to study the code14:39
*** jaosorior has quit IRC14:41
kgiustigsantomaggio: I like you Gabriele - you look on the bright side of life :)14:41
gsantomaggio@kgiusti  :) thank you!14:42
kgiustigsantomaggio: I think folks will like this feature.14:42
kgiustigsantomaggio: the RPC via a queue currently hides the possibility that there's no one listening14:43
kgiustigsantomaggio: which forces the caller to wait until the timeout hits, where the user would probably want to fail immediately instead.14:44
gsantomaggio@kgiusti yes exactly and also it is useful for troubleshooting14:44
kgiustigsantomaggio: +114:44
*** kmalloc is now known as kmalloc_away14:47
*** yan0s has quit IRC14:55
*** starborn has quit IRC15:11
*** ansmith has joined #openstack-oslo15:40
openstackgerritMerged openstack/tooz master: Unblock tooz gate  https://review.opendev.org/65820415:44
*** shardy has quit IRC15:49
*** ansmith has quit IRC15:58
*** ansmith has joined #openstack-oslo16:09
*** hberaud is now known as hberaud|gone16:14
openstackgerritRodolfo Alonso Hernandez proposed openstack/oslo.privsep master: Pass correct arguments to six.reraise  https://review.opendev.org/65983116:15
*** tesseract has quit IRC16:17
*** tesseract has joined #openstack-oslo16:18
*** ansmith has quit IRC16:29
*** ralonsoh has joined #openstack-oslo16:46
*** njohnston has joined #openstack-oslo16:46
ralonsohbnemec, hi,  one question about privsep. It's possible to call a function with a privsep decorator and have some kind of endless loop there? Something like a monitor16:47
ralonsohbnemec, or privsep is for "atomic" functions16:47
bnemecralonsoh: Now that privsep can run things in parallel you could theoretically launch a long-running privileged function.16:52
bnemecYou would need to be careful not to do it too many times though. There's a limit to how many parallel calls privsep will allow.16:52
ralonsohbnemec, yes, I saw there is a limit in the number of threads, thanks!16:53
bnemecAnd it's user-configurable, so someone could set a really low limit and block all of your other privileged calls.16:53
ralonsohbnemec, yes, that could be a serious problem16:54
*** tesseract has quit IRC16:58
*** tesseract has joined #openstack-oslo16:59
*** lbragstad has quit IRC17:12
*** lbragstad has joined #openstack-oslo17:12
*** tesseract has quit IRC17:17
*** e0ne has quit IRC17:17
*** ralonsoh has quit IRC17:24
*** ansmith has joined #openstack-oslo17:35
*** ansmith has quit IRC17:45
*** ansmith has joined #openstack-oslo17:49
*** imacdonn has joined #openstack-oslo18:18
*** ansmith has quit IRC18:26
imacdonnany messaging / AMQP expert available for a consult? Trying to diagnose an issue where an RPC call (by nova-api) is attempted on a dead TCP connection. A "Connection reset by peer" error is received, a new connection is established, and the message gets resent, but then there's a timeout waiting for the reply, for some reason....18:44
imacdonn^ kgiusti maybe ?18:47
kgiustiimacdonn: why are the TCP links down - was there a network fault?18:48
gsantomaggioI'd check also the called module, maybe the problem is simply to the called18:49
imacdonnkgiusti: "it's complicated" ... that's a whole other facet of the problem (has to do with eventlet monkey patching and threading in WSGI containers not getting along - resulting in breaking the AMQP heartbeat, so rabbitmq-server drops the conenction) ... trying to focus on the failure to utilise the reestablished connection at the moment18:50
gsantomaggio@imacdonn are you using SSL connctions ?18:52
imacdonnI've confirmed that the request is received on the target host ... the issue seems to be handling the reply, somehow .. and it only fails in the case where the conenction had just been reestablished and the request retried18:52
imacdonnnegative on SSL18:52
gsantomaggiois there something on the rabbitmq logs?18:52
gsantomaggiodid you check them ?18:53
kgiustiimacdonn: ok - so its the path back from server.  Any chance you can look at the reply queues - trying to see if the reply gets to rabbitmq or not18:53
imacdonnrabbitmq-server logs where it dropped the connection after observing no heartbeats for 60s18:53
kgiustiimacdonn: just to clarify: is the client <--> rabbitmq the only connection that is failing?  or is the rabbitmq <---> server also dropping?18:54
imacdonnkgiusti: if "client" means the host that is sending the RPC request, then that is where the problem lies18:54
kgiustiimacdonn: sorry - yes client=RPC request  server=RPC reply18:55
imacdonnlog from nova-api with some debug turned on: http://paste.openstack.org/show/AZQET5IOgntK785RHtxh/18:55
imacdonn23:48:19.113 is where the retried publish happens .. then nothing for 60 seconds until the timeout18:56
imacdonnis there a way I can log AMQP packets is oslo.messaging ?18:57
kgiustiimacdonn: I'm going through that log - that's from the RPC request client side correct?19:01
kgiustiimacdonn: I'll need a sec19:01
imacdonnkgiusti: correct, and thanks19:01
openstackgerritMerged openstack/oslo-specs master: Remove policy-merge policy  https://review.opendev.org/64876619:02
kgiustiimacdonn: do you know which oslo.messaging version you are using?19:03
imacdonnkgiusti: it's RDO Stein .. RPM version is python2-oslo-messaging-9.5.0-1.el7.noarch19:04
kgiustiimacdonn: regarding logging:  --log_levels "oslo.messaging=DEBUG,oslo_messaging=DEBUG" should give us everything - I don't think we dump the entire msg tho...19:04
kgiustiimacdonn: kk thanks19:05
imacdonnyeah, I have those, and amqp=DEBUG,amqplib=DEBUG , and a couple of env vars to make kombu talk too19:05
kgiustiimacdonn: on the "server side" - is there anything in the logs around that time with the text "sending reply msg id:" ?19:29
imacdonnkgiusti: well, I don't have the uber-debugging enabled on the server side, but I do see that it gets the request, and I have no reason to doubt that it is replying, since it always works, except in the case where the client had to reestablish its connection asnd try publishing the request19:30
kgiustiimacdonn: ok - that log message prints the msg_id and reply_q - I wanted to see if it matched the logs you sent19:32
imacdonnkgiusti: OK, I can try to add debug on the server side19:33
imacdonntrying to reproduce the problem now .. it's a bit tricky, with the timing ... working on it19:42
kgiustiimacdonn: np19:42
*** imacdonn has quit IRC19:45
*** imacdonn has joined #openstack-oslo19:52
imacdonnkgiusti: reproduced the problem .. client said:19:53
imacdonn2019-05-17 19:50:32.041 25492 ERROR nova.api.openstack.wsgi [req-02b27aed-9f3e-448b-a26b-eb689ad2d456 d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] Unexpected exception in API method: MessagingTimeout: Timed out waiting for a reply to message ID bd16e129e7e544689ab7997fc8e3e27019:53
imacdonndespite server saying:19:53
imacdonn2019-05-17 19:49:32.082 17993 DEBUG oslo_messaging._drivers.amqpdriver [req-02b27aed-9f3e-448b-a26b-eb689ad2d456 d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] sending reply msg_id: bd16e129e7e544689ab7997fc8e3e270 reply queue: reply_6579029399d848e7a4a481970fc0d3af time elapsed: 0.0391101880086s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:12519:53
kgiustiimacdonn: can you do a "sudo rabbitmqctl list_queues" against your broker?19:54
imacdonnkgiusti: anything in particular to look for in that ?19:55
kgiustiimacdonn: sorry - use the "-n <domain>" option to point the broker19:55
kgiustiimacdonn: yeah I wanted to see if there's a queue called reply_6579<etc> and if there's a message stuck on it19:56
imacdonnappears so: reply_6579029399d848e7a4a481970fc0d3af119:56
kgiustiimacdonn: taa-daa!19:57
kgiustiimacdonn: we've narrowed the problem down to the requester side.19:57
imacdonncool .. I think ;)19:57
kgiustiimacdonn: I'm running a test locally where I restart rabbitmq in the middle of an RPC call (while the server is processing before the reply is sent)19:58
kgiustiimacdonn: I'm walking through the log output here - stay tuned.19:58
imacdonnkgiusti: you may have to defeat heartbeats .... normally the heartbeat gets the connection reset and kills the old connection before anyone tries to use it19:59
imacdonn... or at least some of the time, that happens19:59
kgiustiimacdonn: can you grep through the requester-side logs for "reply_657902<etc>"?20:00
imacdonnquite a few matches for that20:01
imacdonnI can paste a new client side log .. after a min to scrub it a little20:01
*** pcaruana has quit IRC20:02
kgiustiimacdonn: kk20:03
imacdonnkgiusti: http://paste.openstack.org/show/s5tTpEgBhjagvmQ7MK2F/20:03
imacdonnI guess that reply queue gets reused .. most of the matches were from earlier20:07
kgiustiimacdonn: hrm - shouldn't get reused - it's a randomly generated uuid4 string20:08
kgiustiimacdonn: it should survive reconnects tho20:08
imacdonnkgiusti: new one for each RPC request? or...20:09
kgiustiimacdonn: nope - one for each rpc client (transport, actually)20:09
imacdonnkgiusti: k, so I just meant that there were matches in the log for my earlier attempts to reproduce the problem20:10
kgiustiimacdonn: they are somewhat long lived20:10
imacdonnI guess "reused" was the wrong term ... but that's what I meant20:11
kgiustiimacdonn: what I would *expect* to see in those logs is a "Queue.declare: reply_xxx" right before the "Reconnected to AMQP Server..." line20:12
* kgiusti thinks...20:13
imacdonnhmm, I see Queue.declare in the case where the connection reset didn't reproduce20:14
imacdonnin that case, it's right *after* a connection was established, it appears:20:15
imacdonn2019-05-17 19:44:33.121 25492 DEBUG oslo.messaging._drivers.impl_rabbit [req-4a9ae3a4-5d3a-4a30-af4b-917a755797af d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] [13325785-dcff-4bcf-ac48-580b78372f51] Connected to AMQP server on imot03mq.dcilab.oraclecorp.com:5672 via [amqp] client with port 33266. __init__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:54320:15
imacdonn2019-05-17 19:44:33.122 25492 DEBUG oslo.messaging._drivers.impl_rabbit [req-4a9ae3a4-5d3a-4a30-af4b-917a755797af d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] [13325785-dcff-4bcf-ac48-580b78372f51] Queue.declare: reply_6579029399d848e7a4a481970fc0d3af declare /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:25320:15
kgiustiimacdonn: I would expect it to be logged once when the connection is first established, then again each time the connection recovers...20:16
kgiustiimacdonn: can I send you a patched imp_rabbit.py file with extra debug logs?20:17
imacdonnkgiusti: sure .. iain.macdonnell -at- oracle.com20:18
kgiustiimacdonn: I want to determine if the channel is not being established.  kk20:18
imacdonnkgiusti: not pressuring, but just checking in case it fell in a hole ... let me know when you've sent that20:38
kgiustiimacdonn: np - no just running tests to verify I'm tracing the right stuff...20:39
imacdonncool20:39
*** ansmith has joined #openstack-oslo20:45
kgiustiimacdonn: email sent20:49
imacdonnkgiusti: OK, watching for it...20:49
imacdonnnothing so-far ... must be stuck in a queue ..... ;)20:51
kgiustiimacdonn: try https://paste.fedoraproject.org/paste/vOyS5seUjWSma~emIanwXw20:52
imacdonnkgiusti: k, patch applied .. will try to repro20:54
imacdonnkgiusti: http://paste.openstack.org/show/rpS2mcKyIESyG4JehLxd/21:00
imacdonnkgiusti: wait, paste didn't work21:00
imacdonngrumble .. it says the paste contains spam, and makes me do captcha, then cuts off most of the content21:02
*** raildo has quit IRC21:03
imacdonnkgiusti: this has the timeout part: http://paste.openstack.org/show/K5Ut8teApN4In7mYcPyG/21:03
imacdonnI was trying to paste the entire log, including when the connection was first established earlier21:03
kgiustiimacdonn: how well does it compress?  you can try emailing it to kgiusti at gmail.com21:05
imacdonnkgiusti: here's the earlier part, where the first request established the connection, and worked OK: http://paste.openstack.org/show/7wW4RtRW4wUKdvfrRG6i/21:06
*** ansmith has quit IRC21:09
kgiustiimacdonn: sorry I gotta drop off.  I can feel Wifey's burning glare on the back of my neck....21:17
imacdonnlol21:17
kgiustiimacdonn: ttyl21:18
imacdonnkgiusti: np ... I'll lurk here for whenever you want to resume21:18
*** kgiusti has left #openstack-oslo21:18
imacdonnmelwitt sean-k-mooney: above may be interesting "light reading" ... "to be continued"21:19
melwittimacdonn: excellent, looking forward to learning more. good call bringing in the oslo.messaging experts21:46
imacdonnmelwitt: :) thanks for your help yesterday, and have a good weekend!21:47
melwittimacdonn: likewise, thank you for digging into this with us. enjoy your weekend :)21:48
*** tosky has quit IRC22:42
*** dave-mccowan has quit IRC22:50
*** mmethot has quit IRC23:53

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!