Friday, 2015-11-13

harlowja_boris-42 whats up dawg00:21
*** mdorman has quit IRC00:24
SpamapSklindgren: I'm taking a look at how much better it might get with msgpack00:47
klindgrenkk - from the server we jsut added to handle conductor loads: http://paste.ubuntu.com/13243560/00:51
klindgren^^ All day everyday00:54
SpamapSklindgren: I'd be interested to see what the rate of messages going through rabbitmq is.00:55
SpamapSklindgren: guessing it is very high.00:55
klindgren500 - 1k messages/s00:55
klindgrenand thats just for nova since its just the nova-cell00:56
SpamapSright, so, 1000/s, if each json blob must be deserialized, turned into an object, dealt with, answer serialized..00:59
SpamapSguessing kilo added something that makes each message deserialized twice or using a different, less performance deserializer.00:59
harlowja_ya, i guess it becomes a question of which part of 'json blob must be deserialized, turned into an object, dealt with, answer serialized' is the problem01:08
harlowja_SpamapS might be easy just to plugin https://github.com/openstack/oslo.serialization/blob/master/oslo_serialization/msgpackutils.py#L33601:09
harlowja_^ which is msgpack01:09
harlowja_+ some nice python extensions to handle more python types01:09
SpamapSactually01:09
SpamapSbefore that01:09
SpamapShttps://gist.github.com/lightcatcher/113641501:09
SpamapSthis suggests that we're doing it wrong01:09
SpamapSfavoring built in python json01:10
SpamapSinstead of going to faster external libs01:10
SpamapSespecially for serializing01:10
SpamapSharlowja_: plugging in msgpack is what I"m going to do if I can get devstack to actually work01:11
* SpamapS starts on a fresh vm01:11
harlowja_ya, SpamapS just plugin https://github.com/openstack/oslo.serialization/blob/master/oslo_serialization/msgpackutils.py01:11
harlowja_i made that just for u01:11
klindgrenwhat version of oslo.messaging is msgpack available on?01:11
harlowja_lol01:11
klindgrenor oslo.serialization01:12
harlowja_oslo.serialization, ummm, i forget when i made that01:12
harlowja_some version ago, lol01:12
SpamapSlooks like it's been there since what, january?01:12
harlowja_SpamapS it handles https://github.com/openstack/oslo.serialization/blob/master/oslo_serialization/msgpackutils.py#L288 those nativfe types01:12
klindgrenrunning: oslo.serialization==1.4.001:12
harlowja_perhap01:12
harlowja_oslo.messaging using msgpack though will require more work01:12
harlowja_medhi was trying to add that in, but didn't think it goet in01:12
harlowja_https://review.openstack.org/#/c/151300/ ----> abandoned :(01:13
SpamapSoops, time to go fetch kids01:13
* SpamapS disappears01:13
klindgrenkk01:13
klindgreneitherway 1.4.0 doesn't have it :-D01:13
klindgrenit is in 1.501:14
harlowja_k01:14
*** paco20151113 has joined #openstack-performance01:26
*** arnoldje has joined #openstack-performance02:14
*** arnoldje has quit IRC02:30
*** boris-42 has quit IRC02:36
*** pasquier-s has quit IRC02:37
*** ozamiatin has joined #openstack-performance02:38
*** pasquier-s has joined #openstack-performance02:40
*** ozamiatin has quit IRC02:41
*** boris-42 has joined #openstack-performance02:43
*** harshs has joined #openstack-performance03:28
*** markvoelker has quit IRC03:39
*** badari has quit IRC04:01
*** markvoelker has joined #openstack-performance04:40
*** markvoelker has quit IRC04:44
*** boris-42 has quit IRC04:48
*** harshs has quit IRC05:09
*** harshs has joined #openstack-performance06:18
*** harshs has quit IRC06:39
*** markvoelker has joined #openstack-performance06:41
*** markvoelker has quit IRC06:46
*** ozamiatin has joined #openstack-performance07:17
*** ozamiatin has quit IRC07:52
*** ozamiatin has joined #openstack-performance08:10
*** itsuugo has joined #openstack-performance08:37
*** markvoelker has joined #openstack-performance08:41
*** ozamiatin has quit IRC08:42
*** rmart04 has joined #openstack-performance08:42
*** markvoelker has quit IRC08:46
*** itsuugo has quit IRC08:48
*** itsuugo has joined #openstack-performance08:48
*** amaretskiy has joined #openstack-performance09:27
*** itsuugo has quit IRC09:29
*** markvoelker has joined #openstack-performance10:02
*** markvoelker has quit IRC10:07
*** itsuugo has joined #openstack-performance10:29
*** ozamiatin has joined #openstack-performance10:32
*** aojea has joined #openstack-performance10:33
*** itsuugo has quit IRC10:33
*** redixin has joined #openstack-performance10:45
*** ozamiatin has quit IRC10:46
*** ozamiatin has joined #openstack-performance10:51
*** aojea has quit IRC11:02
*** paco20151113 has quit IRC11:05
*** itsuugo has joined #openstack-performance11:08
*** itsuugo has quit IRC11:52
*** itsuugo has joined #openstack-performance12:00
*** markvoelker has joined #openstack-performance12:34
*** redixin has quit IRC12:37
*** markvoelker has quit IRC12:39
*** itsuugo has quit IRC12:45
*** rvasilets has joined #openstack-performance12:50
*** rvasilets has quit IRC13:38
*** redixin has joined #openstack-performance13:41
*** itsuugo has joined #openstack-performance13:45
*** xek has quit IRC13:49
*** itsuugo has quit IRC13:50
*** rmart04 has quit IRC13:56
*** rmart04 has joined #openstack-performance13:57
*** xek has joined #openstack-performance14:18
*** markvoelker has joined #openstack-performance14:34
*** badari has joined #openstack-performance14:35
*** markvoelker has quit IRC14:39
*** markvoelker has joined #openstack-performance14:42
*** mriedem has joined #openstack-performance14:59
*** itsuugo has joined #openstack-performance15:00
*** itsuugo has quit IRC15:01
*** itsuugo has joined #openstack-performance15:02
*** regXboi has joined #openstack-performance15:04
*** arnoldje has joined #openstack-performance15:31
*** mdorman has joined #openstack-performance15:49
*** harshs has joined #openstack-performance15:49
*** klindgren_ has joined #openstack-performance16:00
*** klindgren has quit IRC16:01
*** boris-42 has joined #openstack-performance16:02
*** harshs has quit IRC16:10
*** itsuugo has quit IRC16:11
*** rmart04 has quit IRC16:21
*** itsuugo has joined #openstack-performance16:23
*** harshs has joined #openstack-performance16:28
*** harshs has quit IRC16:33
*** harlowja_at_home has joined #openstack-performance16:38
*** itsuugo has quit IRC16:55
*** itsuugo has joined #openstack-performance17:26
*** harshs has joined #openstack-performance17:32
*** amaretskiy has quit IRC17:36
*** mriedem is now known as mriedem_lunch17:45
*** ozamiatin has quit IRC17:45
*** rmart04 has joined #openstack-performance17:47
*** rmart04 has quit IRC17:47
*** itsuugo has quit IRC17:47
*** rmart04 has joined #openstack-performance17:48
*** rmart04 has quit IRC17:48
SpamapSklindgren_: so I'm playing with nova-conductor by slamming it with nova boot/list commands18:41
SpamapSklindgren_: in a small scale, nova-api ends up chewing up all of the CPU18:41
SpamapSklindgren_: I suspect you have _many_ API's compared to your few conductors. Yes?18:42
klindgren_SpamapS, k I would say most of our stuff is requests for metadata18:42
klindgren_its roughly the same ratio honestly18:42
klindgren_3 physical server running about 40api services18:42
SpamapSbut the nova-api's aren't egging 32 CPU's?18:42
SpamapSpegging18:42
klindgren_but our conductor load isn't coming from boots18:43
SpamapSklindgren_: have you considered configdrive... ;)18:43
klindgren_IE we only boot like 10 vm's an hour or so18:43
klindgren_we are using config drive18:43
SpamapSoh, where are the metadata reqs coming from?18:43
klindgren_well people run puppet18:43
SpamapSYou mean like, other apps pulling it out?18:43
klindgren_puppet uses factor18:44
klindgren_with the ec2 metadata turned on by default18:44
SpamapSah so you use configdrive, but you allow metadata service18:44
SpamapSACK18:44
klindgren_yea18:44
SpamapSok let me cook up a rally scenario for metadata gets then18:44
klindgren_we also may have done some of this to ourselves, we have a cronjob running in the vm's that poll metadata every 10 minutes or so18:45
klindgren_but we put a random offset on the cronjob and we turned on memcache for the metadata services18:46
SpamapShonestly, metadata should be really fast18:47
SpamapSkind of surprised neutron-metadata-agent doesn't cache it (and the lookup by MAC)18:47
SpamapSklindgren_: oh so wait, memecache _is_ caching it, weird18:48
klindgren_but I would say the people running puppet set for puppet to reach out to the puppetmaster every 2 minutes is a bigger load.  As puppet will run factor each time it run18:48
klindgren_we aren't running neutron-metadata-agent18:48
SpamapSoh?18:48
klindgren_we are running nova-metadata on every compute node18:48
SpamapSthat is interesting18:48
SpamapSI like that better honestly. :)18:49
klindgren_we run with flat networking18:49
SpamapSof course you do. :)18:49
klindgren_I hsould say flat provider networks18:49
klindgren_kiss :-D18:49
SpamapSRight I understand, thats, IMO, the only sane way to play.18:49
SpamapS(We're building infra-cloud the same)18:49
klindgren_cool - yea we have been happy with it18:50
SpamapSlet the tenants fend for themselves! ;)18:50
SpamapSits the internet yo18:50
klindgren_most of our tenants don't want to know about networking, or they dont care18:50
SpamapSinteresting, rally has no built in metadata scenario18:51
SpamapSthis might explain something. ;)18:51
harlowja_at_homewhy are u guys kissing18:51
harlowja_at_homethats weird18:51
harlowja_at_homeha18:51
klindgren_K.I.S.S**18:51
harlowja_at_home:-p18:51
SpamapSharlowja_at_home: its the internet. Get used to seeing stuff you can't explain.18:51
harlowja_at_home:)18:51
klindgren_Those that do want to create their own networks/routers think that somehow that will give them resource isolation from other people because its not "shared"18:52
* klindgren_ sighs18:52
SpamapSklindgren_: do we make them wear hats, or signs?18:53
harlowja_at_homelol18:53
SpamapS"I booted my server on an isolated tenant network and all I got was pwned and then they sent me this t-shirt I don't know how they got my address"18:53
SpamapSmight be a tad long for a t-shirt18:54
klindgren_LIke, if I created my own network, and router, no one else uses it and I will not have perofmance problems, because all of this stuff is dedicated soley to me18:54
SpamapSmaybe sweatpants and they can write that on the butt18:54
SpamapSklindgren_: riight.. its a real, dedicated imaginary overlayed network!18:54
klindgren_Even though in this case the issue was the firewall ran out of out bound nat connections because someone was being an idiot18:54
klindgren_and it impacted the entire internal network18:55
SpamapSok, so it looks like rally has no benchmark anywhere for metadata18:55
SpamapSI sense an opportunity. ;)18:55
SpamapSklindgren_: when you say you run 'nova-metadata' on all computes, do you mean you just run nova-api on them (and redirect packets for link-local to it) ?18:57
SpamapSoh nova-api-metadata is what you meant18:57
klindgren_yea -sorry nova-api-metadata on all compute nodes18:58
klindgren_with 169.254.1692.54 bound to loop back18:58
SpamapSnp :)18:58
klindgren_and the iptables rule to redirect traffic from that to the local metadata host18:58
klindgren_we use to run metadata centralized on a few servers per flat network18:59
klindgren_but that was a physical resource waste18:59
SpamapSindeed19:00
*** badari_ has joined #openstack-performance19:00
klindgren_we actually use to run nova-api-metadata, neutron-dhcp and glance on dedicated servers per flat, but moved ro running some centralized glance servers, with metadata/dhcp getting moved onto the computes19:00
klindgren_dhcp is only runs on a few hosts, since neutron tips over if you run it on every host19:01
harlowja_at_homeSpamapS, when are u (ibm) building out that megacloud of yours??19:01
harlowja_at_homein progress?19:02
SpamapSharlowja_at_home: always19:03
*** badari has quit IRC19:03
harlowja_at_homewhats the node count so far ;)19:03
harlowja_at_homedo share, haha19:04
*** mriedem_lunch is now known as mriedem19:04
*** badari_ is now known as badari19:13
*** ozamiatin has joined #openstack-performance19:27
*** boris-42 has quit IRC19:28
*** itsuugo has joined #openstack-performance19:37
*** ozamiatin has quit IRC20:05
*** harlowja_at_home has quit IRC20:10
*** klindgren__ has joined #openstack-performance21:05
*** klindgren_ has quit IRC21:06
*** klindgren__ is now known as klindgren21:28
*** med_ has joined #openstack-performance21:29
SpamapSklindgren: ok, with fake driver you can easily reproduce conductor-slapping with just showing/listing instances22:26
SpamapSI only have 2 cores in my VM, and 2 nova-conductors, and they're eating up all the CPU that nova-api and nova-compute don't...22:26
klindgrenSide note - with neutron metadata calls, *really* hammer the neutron api as well22:27
SpamapSI'll try the stupid thing, and just see if I can get oslo.serialization to use one of the faster json things22:27
klindgrenseems like every request in for  metadata value grabs information on the fixed port from neutron22:27
klindgrenIE if I query for hostname - a call is still made to neutron for the port of the VM22:28
klindgrenor Availability zone22:28
SpamapSklindgren: thats probably just making it worse. I'm running with nova-net (trying to islolate from neutron issues)22:29
klindgrenatleast as I have read the metadata code22:30
* klindgren is not a python dev22:30
SpamapSalso, I"m doing a rally test where I create 10000 instances then list after each one..22:30
SpamapSso it's getting worse, and worse, and worse22:30
SpamapSoh actually no it isn't, it's doing create, list, delete22:32
SpamapSso I should do create(10000), and then test showing all of them22:34
*** regXboi has quit IRC22:39
SpamapSoh doh, it was creating them and listing longer and longer lists22:46
* SpamapS forgot --all-tenants22:46
SpamapS| auth_url http://192.168.122.60:5000/v2.0 | 0.00113987922668 |22:47
SpamapS| GET /servers/detail?all_tenants=1        | 5.68457818031    |22:47
notmorganooooooh i see a SpamapS and a harlowja_22:53
SpamapSnotmorgan: o/22:53
notmorganSpamapS: seriously?! 5.6?22:53
notmorganugh.22:53
SpamapSnotmorgan: thats with 347 servers22:53
notmorganthat seems.. annoying22:53
SpamapSwith 45 it was 2.5s22:54
notmorganstill, we *should* do better than that22:54
SpamapSloads better22:54
SpamapSI'm profiling without changing anything now22:54
SpamapSjust profiling conductor22:54
SpamapSthough api could probably use a profile too22:54
notmorganoh. ok. so it sucks but the suck is mostly upfront22:54
*** mriedem has quit IRC22:54
notmorganif 45 is 2.5 and approx 350 is 5.6, it's a lot of upfront suck22:54
SpamapSwell conductor is effectively a proxy22:54
notmorganyeah22:54
SpamapSa REALLY heavy proxy22:54
notmorgani need to spend some time digging back into conductor.22:54
SpamapSnotmorgan: 10 servers was 1.522:55
SpamapSnotmorgan: so I think there's some log(N) scaling problems too22:55
SpamapSgood job keystone responding crazy fast. ;)22:55
notmorganSpamapS: yay Keystone isn't the suck point22:56
notmorganSpamapS: in this case...22:56
notmorganSpamapS: though tomorrow, i'm sure it will be22:57
notmorganfor another thing22:57
SpamapSwow, this is interesting22:57
SpamapSboot and show seems to be _MORE_ painful than boot and list22:57
notmorganwait, what?22:57
notmorganhow... how is ... how is that a thing?22:57
SpamapSnotmorgan: likely show shows mroe22:57
SpamapSmoar22:57
SpamapSand more json, more packets, more messages...22:58
notmorganoh, wait it's a really nasssssty set of joins22:58
notmorgantoo22:58
notmorgannot in SQL, but effectively22:58
SpamapSyeah, so even though they're single key reads22:58
notmorganyah. icky22:58
klindgrenjust saying nova list --all-tenants on your cloud takes over a minute22:58
klindgrens/your/our22:58
SpamapSklindgren: yeah not surprised. ;)22:59
SpamapSand to be clear, thats probably not a great idea anyway22:59
klindgrenso we got pretty good at either directing people to give us the uuid of the vm - to troubleshoot issues22:59
klindgrenor22:59
klindgrenlist --all-tenants --<other modifier>22:59
klindgrenlike name or ip23:00
SpamapSname lookups are pretty fast23:00
notmorganklindgren: that doesn't really surprise me23:00
notmorganklindgren: that is a massive set of cross record lookups23:00
notmorganSpamapS: ++ on it not being a great idea23:01
klindgrennot sure how rackspace would ever be able to to a nova list --all-tenants23:01
notmorganklindgren: they don't.23:01
klindgrenwithout waiting a few hours23:01
notmorganklindgren: remember they are heavily cell based too23:01
SpamapSThats one of those things where everything should timeout at 5s no matter what23:01
SpamapS"you are doing something stupid"23:01
SpamapS"or something is broken"23:01
notmorganSpamapS: "E_STUPID_QUESTION_TO_ASK_AN_API"23:02
klindgrenSpamapS, except when you need to find orphaned resourcres23:02
SpamapShm.. cProfile didn't write me a report23:02
klindgrenbecause deletion is what - yolo23:02
SpamapSklindgren: yeah, I"m not saying we _can_ do it23:03
SpamapSjust that we should :)23:03
klindgrenIE give me a list of all vm's and tenants and compare the list of tenants to keystone and show me which ones are no longer around23:03
klindgrentrue23:03
notmorganklindgren: so, in the case of orphans, honestly, this is a case where a direct DB access is better [today]23:03
SpamapSAlso thats the kind of thing that works well against a readonly slave.23:03
notmorganklindgren: and i'm really ok with side-band management of things like that23:03
notmorganSpamapS: ++23:03
SpamapSso you have the admin-helper-api instance that only has access to RO slaves.23:04
notmorganSpamapS: that could be useful too23:04
notmorgani'm also ok with the API being for acting on available resources, orphans are not an end-user concern in most cases.23:04
notmorganjust from a pure philisophical standpoint23:05
klindgrenyep yep re: stand-alone admin helper api23:05
klindgrenalso lets you do upgrades easier, IE you can test before allowing people back in23:05
SpamapSineresting23:06
SpamapSI restarted my conductors and got this on a few instances23:06
SpamapS| fault                                | {"message": "Timed out waiting for a reply to message ID 97c0189ea28f49899c885274e1661e6b", "code": 500, "details": "  File \"/opt/stack/nova/nova/compute/manager.py\", line 366, in decorated_function |23:06
notmorganSpamapS: thats an interesting error23:06
klindgrenwe see that all the itme when we restart services23:06
SpamapSperhaps that timeout is a bit too low?23:07
klindgrenits because conductor either isn't listenting on its channel yet, or the compute nodes hasn't created the channel yet23:07
*** badari has quit IRC23:10
notmorganSpamapS: don't think it's actually a timeout. it just acts as though it was.23:11
*** arnoldje has quit IRC23:12
SpamapSossum23:12
*** mdorman has quit IRC23:14
*** mdorman has joined #openstack-performance23:14
SpamapShrm.. so far can't get the profiler to display its results23:15
SpamapSaha23:33
SpamapSadding cProfile to guru meditation works23:33
SpamapSI may have to figure out a way to make that permanent... as it is QUITE handy to be able to turn profiling on and off23:33
*** mdorman has quit IRC23:36
SpamapSseems to spend a lot of time in str.join23:37
SpamapShttp://paste.openstack.org/show/478854/23:40
SpamapShm23:41
SpamapSI think I'm only profiling the parent23:41
* SpamapS tries the workers23:41
SpamapShttp://paste.openstack.org/show/478855/23:45
SpamapSthere's a worker23:45
SpamapSnot super helpful :-P23:47
*** harshs has quit IRC23:47
notmorganSpamapS: well, it is better than nothing...but yeah not super interesting23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!