Friday, 2020-02-14

*** yamamoto has joined #openstack-lbaas00:15
*** armax has joined #openstack-lbaas00:35
*** abaindur has joined #openstack-lbaas00:58
*** yamamoto has quit IRC01:19
johnsomFYI, I have found that nova anti-affinity appears to be broken: https://bugs.launchpad.net/nova/+bug/186319001:33
openstackLaunchpad bug 1863190 in OpenStack Compute (nova) "Server group anti-affinity no longer works" [Undecided,New]01:33
johnsomI was testing that failover puts the amp back in the server group correctly (which works BTW).01:34
johnsomOn that fine note, catch you all tomorrow.01:35
johnsom#canary01:35
*** vishalmanchanda has joined #openstack-lbaas01:54
*** armax has quit IRC02:54
*** abaindur has quit IRC03:02
*** abaindur has joined #openstack-lbaas03:03
*** armax has joined #openstack-lbaas03:07
*** abaindur has quit IRC03:08
*** yamamoto has joined #openstack-lbaas03:25
*** armax has quit IRC03:55
*** andy_ has quit IRC05:07
*** andy_ has joined #openstack-lbaas05:08
*** nicolasbock has quit IRC05:09
*** psachin has joined #openstack-lbaas05:37
*** ramishra has joined #openstack-lbaas05:46
*** abaindur has joined #openstack-lbaas05:59
*** goldyfruit has quit IRC06:00
*** goldyfruit has joined #openstack-lbaas06:00
*** ramishra has quit IRC06:30
*** abaindur has quit IRC06:32
*** abaindur has joined #openstack-lbaas06:32
*** ivve has joined #openstack-lbaas07:56
*** maciejjozefczyk has joined #openstack-lbaas07:58
*** abaindur_ has joined #openstack-lbaas08:05
*** abaindur has quit IRC08:08
*** abaindur_ has quit IRC08:11
*** abaindur has joined #openstack-lbaas08:12
*** gcheresh_ has joined #openstack-lbaas08:14
cgoncalveshah!08:17
cgoncalves#canary neutron-dhcp-agent seems to be broken. fails to spawn dnsmasq08:18
cgoncalvesthe bot should start collecting these messages like it does for #success08:18
cgoncalves#success Octavia is the canary08:18
openstackstatuscgoncalves: Added success to Success page (https://wiki.openstack.org/wiki/Successes)08:18
*** ccamposr has quit IRC08:19
*** ccamposr has joined #openstack-lbaas08:20
*** ramishra has joined #openstack-lbaas08:21
*** tesseract has joined #openstack-lbaas08:21
*** ccamposr__ has joined #openstack-lbaas08:39
*** ccamposr has quit IRC08:41
*** tkajinam has quit IRC08:42
ivvequestion: wouldn't it be great if octavia timed out on the "pending update" immutable state after some time (configurable) and move it back into last state with an error on last request rather than be stuck in immutable state forever (and force an admin to hack it out of existence with db commands)08:48
cgoncalvesivve, lucky us there is a timeout setting ;)08:49
ivvegreat!08:49
ivvewhats the default and what is the actual timeout?08:49
*** gcheresh_ has quit IRC08:49
ivvebecause currently everything just hangs indefinitly08:50
ivve(with default value)08:50
cgoncalvesdefault is 25 minutes IIRC08:50
ivveoh then either a) its is not OR b) it doesn't work OR c) it is refreshed if someone tries a delete command again on the entire loadbalancer08:51
cgoncalvesivve, it hangs indefinitely in PENDING_* if you restart the controller service while it was still processing the request08:51
ivvei see08:51
ivveor it lost network connection during that time, i suppose?08:51
cgoncalvesI don't think that would have an impact on the revert08:52
cgoncalvesdo you have service logs from the start of the request?08:53
ivvei have a debug command from the last request only08:54
ivveit was in available state, everything was ok. tried to delete it08:54
ivvewent into delete, then update pending08:54
ivvethen stuck08:54
ivvefor a day08:54
cgoncalvescan you confirm the octavia services were not restarted during that period?08:55
ivveyea08:55
cgoncalvesno octavia worker logs we could check?08:55
ivveusers create k8s clusters and then out of stack loadbalancer items for it (ingress objects etc)08:56
ivvewhen they done testing, lazily removes the stack08:56
ivvethis same thing happens with cinder objects08:56
ivvei guess08:57
ivvecould be long, but now i have an appointment. i will check when i come back!08:57
cgoncalvesis Heat involved in creating/deleting Octavia resources?08:58
cgoncalvesok. talk to you later08:58
*** xakaitetoia has joined #openstack-lbaas08:59
*** rcernin has quit IRC09:12
openstackgerritCarlos Goncalves proposed openstack/octavia-tempest-plugin master: Add tests for allowed CIDRs in listeners  https://review.opendev.org/70262909:19
*** baffle has joined #openstack-lbaas09:32
*** ccamposr__ has quit IRC09:33
*** ccamposr__ has joined #openstack-lbaas09:34
*** yamamoto has quit IRC09:57
*** rcernin has joined #openstack-lbaas09:59
*** yamamoto has joined #openstack-lbaas10:04
*** rcernin has quit IRC10:17
*** abaindur has quit IRC10:18
*** abaindur has joined #openstack-lbaas10:18
*** abaindur has quit IRC10:23
*** psachin has quit IRC10:25
*** yamamoto has quit IRC10:28
*** vishalmanchanda has quit IRC10:28
*** psachin has joined #openstack-lbaas10:33
*** yamamoto has joined #openstack-lbaas10:39
*** gcheresh_ has joined #openstack-lbaas10:52
*** abaindur has joined #openstack-lbaas11:05
*** abaindur has quit IRC11:10
*** psachin has quit IRC11:10
*** psachin has joined #openstack-lbaas11:12
openstackgerritCarlos Goncalves proposed openstack/octavia-tempest-plugin master: Add tests for allowed CIDRs in listeners  https://review.opendev.org/70262911:15
*** gcheresh_ has quit IRC11:16
*** yamamoto has quit IRC11:21
ivvecgoncalves: back again, so heat created the initial lb, but then the ingress controller creates bunch of other stuff and probably more loadbalancers. then when the stack is removed (some of those resources are not included in the stack)11:41
ivveand they got stuck11:41
ivvek8s ingress controller11:42
ivveso in order to get rid of them i set them as active/online in the db and quickly remove them11:43
cgoncalvesivve, I can't think of anything else how to troubleshoot than checking the logs :/11:44
ivveyea i get it11:45
ivveits just that this issue spans over multiple scenarios11:45
cgoncalveswe've also had customers reporting that their LB resources got stuck while deleting a k8s cluster on top of openstack via Heat11:45
ivverestarting controllers, even when doing it one by one11:45
cgoncalvesso your case seems similar11:45
ivveif a network failure occurs in parts of the datacenter/infrastructure11:46
ivvethen this happens again11:46
ivveand then manual db labour to fix tens upon tens of loadbalancers11:46
ivvewhat im saying is the general state recovery11:47
ivvecould it be improved? even if the service gets a restart11:47
ivvethis is the easiest way to fail it11:48
ivvestop the octavia mgmt net11:48
ivveand then hell breaks loose11:49
ivveand the api calls in existance doesn't help to reset the loadbalancers when they are in the immutable state, there is no way other than DB hacking and restarting things manually11:50
ivveexistence*11:50
cgoncalvesthere's work ongoing now in master that will mitigate resources getting stuck in PENDING_*11:51
cgoncalveshttps://review.opendev.org/#/c/647406/11:51
ivveoh okay cool11:51
*** yamamoto has joined #openstack-lbaas11:54
ivvecgoncalves: one last question, if i have a loadbalancer that ends up in error mode or immutable state, which way is the best approach to recreate it (even if it takes some hacking db/sending api commands as admin)11:56
ivvelike for example this one, it's still working but no matter my approach octavia tries to solve the issue but is unable too, and it now looks like this:11:57
cgoncalvesivve, if in PENDING_*, set it to ERROR rather than ACTIVE. then, issue a loadbalancer failver11:57
ivvehttps://hastebin.com/esoyitefaj.rb11:57
cgoncalves$ openstack loadbalancer failover $LB_ID11:57
ivvefailover the error one im assuming?11:58
*** tkajinam has joined #openstack-lbaas11:58
ivvesometimes they come back, but assume role: standalone11:58
cgoncalvescorrect, failover the LB in ERROR provisioning_status11:58
ivvenot amphora failover?11:59
*** yamamoto has quit IRC11:59
cgoncalvesbetter failover LB11:59
ivve:(12:01
ivveit got worse12:01
ivvehttps://hastebin.com/pulupayuha.rb12:02
ivvekeeps trying to create that new one12:02
ivvethen fails, then gives up12:03
cgoncalvesshow me the logs .... :)12:03
ivveill grab logs12:03
cgoncalvesheh12:03
ivveLB bb5fe733-82c3-4156-b26d-7735b9a8c7dc failover exception: port not found (port id: a9b9ce28-bf7f-4d81-b08a-cf7ab554149e).: PortNotFound: port not found (port id: a9b9ce28-bf7f-4d81-b08a-cf7ab554149e).12:04
ivveim guessing this is the problem12:04
cgoncalvesoh, either the vrrp or the vip port got deleted :/12:05
ivveaye12:05
cgoncalvesjohnsom helped a couple of users with this problem by recreating the ports manually12:06
cgoncalvesI'm not 100% I'd know all the process to do so12:06
cgoncalveshe may be help to better help you than me once he's online12:07
cgoncalvesalso, johnsom has been working on fixing the failover flow which if I recall correctly will also address this scenario, i.e. recreate the port if missing12:07
ivveyeah, i guess im just looking to be taught how to fish as this happens from time to time in a prod env12:07
cgoncalveshttps://review.opendev.org/#/c/705317/12:08
cgoncalvesif you're in a hurry and that LB can be re-created from scratch, you can do openstack loadbalancer delete12:09
ivveyea so this is where the next issue comes up12:12
ivveit's not possible12:12
ivve:)12:12
ivvethe only way i can delete it is to fake it to be all healthy and active in db before octavia notices and then delete it before it is check for health12:12
ivvechecked*12:13
*** nicolasbock has joined #openstack-lbaas12:13
cgoncalvessetting it to ERROR and deleting immediately after should work12:13
ivveyeah or that12:13
ivveit would rock if we could get "openstack loadbalancer set --state <states> <lb>" like cinder12:14
*** psachin has quit IRC12:15
cgoncalveswe lost count at how many people have asked for that :)12:15
ivvehahah12:15
ivve:D12:15
ivve"another one" :P12:15
ivvethe issue becomes when i have tons of users which in turn have tons of loadbalancers12:16
ivveand we have some kind of outtage12:16
ivveand as admin you do that dreadful openstack loadbalancer list and see all those errors :(12:17
ivveand then go through them one by one and fix them12:18
cgoncalvesthe refactor failover patch will help you in failing over broken LBs, including when ports got deleted like is your case now12:18
ivvealso maybe a openstack loadbalancer re-create <lb>12:18
cgoncalvesthe jobboard patch will help when controller handling the CUD request gets killed halfway through the flow12:18
ivvejust delete the whole thing and recreate it with identical uuids/ips. that would probably load to even more issues in the end i guess if resources half-exist or can't be removed12:19
ivveload=lead12:19
cgoncalveshow would recreate be different than failover?12:19
ivveit would not mend a lb, it woudl delete the resource and recreate it i guess12:20
ivvewell in my case (and i guess in a lot of cases) i want a active/passive topology12:21
*** maciejjozefczyk has quit IRC12:21
ivvein a standalone topology i guess the failover is exactly that and im guessing that the failover procedure for a standalone works more often then not compared to the active/passive12:21
cgoncalvesivve, that is what failover does. it recreates the amphora (delete + create)12:21
ivveyea but not if a part of them is functioning, right?12:22
ivveso i have active and a backup, the backup is error but active is fine. only fix the backup if i failover, correct or not?12:22
cgoncalvesfailover in active-standby topology will recreate the amps one at a time. this is to avoid data plane outages12:22
ivveoh, so it does recreate both? (should?)12:22
ivvei haven't seen that12:23
cgoncalvesivve, in that case you could do "openstack loadbalancer amphora failover $AMP_ID"12:23
ivveyea thats what i have been using most of the time12:23
cgoncalvesif you failover the loadbalancer, it will failover all amps associated to it12:23
ivvegot it12:23
ivvelike here is a result of attempted fix, eventually it started working but now looks like an abomination and nobody dear touch it to halt production https://hastebin.com/ucesumaduy.rb12:30
ivvethis is probably setting in error state & using amphora failover after a network outtage between controller nodes12:32
*** maciejjozefczyk has joined #openstack-lbaas12:37
ivvecgoncalves: while im at it asking questions, where there ever be support for using multiple images at the same time? today we use haproxy with some OS (tagging image with amphora)12:38
ivvei mean flavor is one thing, but what about images :)12:40
cgoncalvesivve, support to set amphora image in flavor is in the to-do list. just need someone to go there and do it. should be trivial, I guess12:41
ivvecool12:42
ivveits a request from my users, myself im just thinking for testing new images before putting it "prod"12:42
*** yamamoto has joined #openstack-lbaas12:43
ivvebut they want multiple types of OS in the background (don't ask my why)12:43
ivves/my/me12:43
cgoncalvesyeah, I understand12:44
ivveokay well, i will handle this and wait for coming updates12:45
ivvethanks a bunch for answering my questions, much appreciated12:45
*** yamamoto has quit IRC12:45
ivvei will also try do the full loadbalancer failover next time with setting it to error if it isnt already, not sure if i have done exactly that before12:46
cgoncalvessorry about the trouble. the team is working hard to fix them12:47
cgoncalveshaving users reporting issues and helping us troubleshooting is great12:47
*** yamamoto has joined #openstack-lbaas13:06
*** tkajinam has quit IRC13:17
ataradayjohnsom, Hi! Sorry for bothering you, but could you take a look at points 7-9 in https://etherpad.openstack.org/p/octavia-worker-v2-issue-tracker and leave some comments what do you think13:23
*** ivve has quit IRC13:28
*** psachin has joined #openstack-lbaas13:37
*** TrevorV has joined #openstack-lbaas14:18
*** psachin has quit IRC14:48
johnsomJust a comment on the above thread. Restart of the controller will not hang a flow, but kill -9 will. You must use graceful shutdown until our jobboard work is done.14:56
johnsomataraday: will do14:56
johnsomSounds like the root cause of ivve problem was nova or not being able to reach the database from the controllers.15:05
*** abaindur has joined #openstack-lbaas15:16
*** abaindur has quit IRC15:21
xakaitetoiai've seem a lof of communication issues and generally a good thing is to always check rabbit.15:35
cgoncalvesjohnsom, why do you say that? one problem was the vrrp port was gone in neutron15:35
johnsomcgoncalves exactly that and looking at the logs and what he said.15:36
johnsomVRRP port gone either means nova didn't release it, or we couldn't access the database. Also, stuck in PENDING_ could mean that the controllers gave up trying to write "ERROR" or "ACTIVE" to the database. It only retries so long15:37
cgoncalvesone can also port delete even if attached, I tried that the other day15:38
johnsomYes, but not when nova is hung15:39
johnsomI.e. it is having DB issues, or the compute instance is not reachable to nova15:39
cgoncalvesah, right. k, didn't try that15:39
johnsomThis is the open nova bug (that also affects cinder as he mentioned)15:40
*** maciejjozefczyk has quit IRC15:42
johnsomcgoncalves Just to clarify, heat stack deletes cannot cause a stuck PENDING_* state. That is not an RCA reason.15:42
*** armax has joined #openstack-lbaas15:53
*** TrevorV has quit IRC16:09
*** ramishra has quit IRC16:36
*** yamamoto has quit IRC17:05
*** yamamoto has joined #openstack-lbaas17:06
*** yamamoto has quit IRC17:06
*** yamamoto has joined #openstack-lbaas17:06
*** yamamoto has quit IRC17:11
*** xakaitetoia has quit IRC17:11
*** yamamoto has joined #openstack-lbaas17:26
*** tesseract has quit IRC17:29
*** gmann is now known as gmann_afk17:42
openstackgerritMerged openstack/octavia stable/queens: Use stable upper-constraints.txt in Amphora builds  https://review.opendev.org/70605217:47
johnsomWahooo! Thanks cgoncalves for your persistence. lol17:47
cgoncalveshappy to help. 23 rechecks17:54
johnsom🤦17:54
*** gcheresh_ has joined #openstack-lbaas18:14
*** gcheresh_ has quit IRC18:32
*** gcheresh_ has joined #openstack-lbaas18:42
openstackgerritBrian Haley proposed openstack/octavia master: Allow multiple VIPs per LB  https://review.opendev.org/66023919:16
*** gcheresh_ has quit IRC19:22
openstackgerritMichael Johnson proposed openstack/octavia master: WIP - Refactor the failover flows  https://review.opendev.org/70531719:26
cgoncalvescores: gate fix https://review.opendev.org/#/c/706051/19:43
johnsom+219:45
cgoncalvesthanks19:52
*** gmann_afk is now known as gmann20:12
haleybjohnsom: since you're approving gate fixes, https://review.opendev.org/#/c/707687/ :)20:22
*** abaindur has joined #openstack-lbaas21:08
*** abaindur has quit IRC21:08
*** abaindur has joined #openstack-lbaas21:09
johnsomhaleyb +W21:38
*** nicolasbock has quit IRC21:54
*** yamamoto has quit IRC22:01
*** ccamposr has joined #openstack-lbaas22:02
*** ccamposr__ has quit IRC22:03
*** yamamoto has joined #openstack-lbaas22:37
*** yamamoto has quit IRC22:45
openstackgerritMerged openstack/octavia stable/stein: Fix pep8 failures on stable/stein branch  https://review.opendev.org/70768723:20
*** armax has quit IRC23:33
*** armax has joined #openstack-lbaas23:33
*** abaindur has quit IRC23:52
*** spatel has joined #openstack-lbaas23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!