Wednesday, 2017-05-24

*** dave-mccowan has quit IRC00:17
*** zhangguoqing has joined #openstack-telemetry00:41
*** dave-mccowan has joined #openstack-telemetry00:51
*** tovin07_ has joined #openstack-telemetry00:54
*** dave-mccowan has quit IRC00:55
*** lhx__ has joined #openstack-telemetry01:47
*** zhurong has joined #openstack-telemetry02:11
*** thorst_afk has joined #openstack-telemetry02:27
*** thorst_afk has quit IRC02:35
*** thorst_afk has joined #openstack-telemetry02:37
*** masber has joined #openstack-telemetry02:45
*** thorst_afk has quit IRC02:52
*** lhx__ has quit IRC03:36
*** lhx__ has joined #openstack-telemetry03:37
*** boris-42_ has quit IRC03:39
*** lhx__ has quit IRC04:02
*** lhx__ has joined #openstack-telemetry04:03
*** ChanServ changes topic to "#openstack-telemetry is OpenStack Telemetry | http://wiki.openstack.org/Telemetry"04:09
-openstackstatus- NOTICE: Sufficient free space has been reclaimed that jobs are passing again; any POST_FAILURE results can now be rechecked.04:09
*** zhurong has quit IRC04:11
*** zhurong has joined #openstack-telemetry04:15
*** Gautam has joined #openstack-telemetry04:47
*** links has joined #openstack-telemetry05:20
*** yprokule has joined #openstack-telemetry05:34
*** lhx__ has quit IRC05:43
*** lhx__ has joined #openstack-telemetry05:43
*** rcernin has quit IRC06:00
*** Gautam_ has joined #openstack-telemetry06:01
*** Gautam has quit IRC06:04
*** rcernin has joined #openstack-telemetry06:17
*** hoonetorg has quit IRC06:25
*** hoonetorg has joined #openstack-telemetry06:37
*** Gautam has joined #openstack-telemetry06:47
*** Gautam_ has quit IRC06:49
*** lhx__ has quit IRC06:54
*** lhx__ has joined #openstack-telemetry06:54
*** toddnni has quit IRC07:10
*** jroll has quit IRC07:12
*** shardy has joined #openstack-telemetry07:14
*** pcaruana has joined #openstack-telemetry07:16
*** zhangguoqing has quit IRC07:32
*** Gautam has quit IRC07:36
*** Gautam has joined #openstack-telemetry07:37
*** zhangguoqing has joined #openstack-telemetry08:19
*** daidv has joined #openstack-telemetry09:02
*** nhanvu has joined #openstack-telemetry09:40
*** ricolin has joined #openstack-telemetry09:40
*** nhanvu has left #openstack-telemetry09:42
ricolinHi guys I got a combination alarm question09:47
ricolinwill it still work if I use the alarm_id (which I got when create combination alarm), and delete through ceilometer.alarms()?09:48
ricolinceilometer.alarms.delete09:50
ricolinjd_, ^^^09:51
ricolinThis is about what can we do with combination alarm resource in heat09:52
ricolinso some help will be nice:)09:52
ricolinhttps://review.openstack.org/#/c/439433/1009:53
*** links has quit IRC09:55
*** adriant has quit IRC09:58
*** tovin07_ has quit IRC10:03
*** zhurong has quit IRC10:04
*** links has joined #openstack-telemetry10:08
*** jroll has joined #openstack-telemetry10:25
*** daidv has quit IRC10:35
*** aks__ has joined #openstack-telemetry10:36
aks__hi all10:36
aks__do we have cassandra driver in gnocchi ?10:37
*** cdent has joined #openstack-telemetry10:44
jd_aks__: no10:59
aks__ok11:01
*** Gautam has quit IRC11:06
*** aks__ has quit IRC11:52
*** zhurong has joined #openstack-telemetry12:35
*** zhurong has quit IRC12:36
*** dave-mccowan has joined #openstack-telemetry12:41
openstackgerritMerged openstack/python-pankoclient master: use extras  https://review.openstack.org/46514112:52
*** fguillot has joined #openstack-telemetry12:54
*** efoley has joined #openstack-telemetry13:00
*** lhx__ has quit IRC13:10
*** pradk has quit IRC13:11
*** thorst_afk has joined #openstack-telemetry13:12
*** donghao has joined #openstack-telemetry13:23
Anticimexmmm, postgres planner analysis of ceilometer-newton with postgres db dispatcher backend13:29
Anticimexwith minor tweaks 10x-100x improvement in query speed, and i suppose with a better handcrafted schema 1000x-10000x doable13:30
* Anticimex planning to go improved psql route rather than gnocchi, seems wrong solution to original problem statement 13:30
silehtlol13:32
silehtstoring all metadatas for all samples doesn't make any sense...13:32
*** ddyer has quit IRC13:53
*** ddyer has joined #openstack-telemetry13:57
*** zhangguoqing has quit IRC14:06
*** lhx__ has joined #openstack-telemetry14:16
*** pradk has joined #openstack-telemetry14:23
*** vint_bra has joined #openstack-telemetry14:23
*** thorst_afk has quit IRC14:27
*** gordc has joined #openstack-telemetry14:32
*** vint_bra has quit IRC14:35
Anticimexsileht: nod, a better schema can improve a lot14:37
*** chlong has joined #openstack-telemetry14:38
Anticimexa metrics query cross joins sample against sample... which at least 9.6 planner converts into two sequence scans14:38
Anticimexrelevant indexes exists, but planner can be guided a bit.  "people on the internet" have with similar challenges been able to improve O(1000) - O(10000) by query tweaking14:39
gordcAnticimex: the better schema was gnocchi ;)14:39
Anticimexi saw a ceilometer->gnocchi performance tweaking presentation in BOS and would have to disagree i think :)14:40
gordci think i opened a bug to improve sql backend a few years back. my sql skills weren't enough to overcome that metadata hurdle.14:40
Anticimex(politely)14:40
gordcif you can modify sql and still maintain same api agreements than all the power to you.14:41
Anticimexwell, ceilometer-api is deprecated14:41
gordcyep :) who've been good two years ago14:41
Anticimexso there's officially no api to maintain compability with, but our rating engine could gain another O(10k+) from simply doing 1-3 psycopg2 queries rather than zillion http queries14:42
Anticimexso i'm thinking a new psycopg2 dispatcher against improved schema14:42
Anticimexa dispatcher with performance tweaks on insert as well of course14:43
gordcsure. i thought you were originally trying to build something to match ceiloemter api but was performant.14:43
Anticimexright, that's where i started looking today14:43
gordcno argument if you change api, you can make sql work14:43
Anticimexat the queries14:43
Anticimexand there's room to improve on sqla/pg-default here too14:44
Anticimexour ceilometer-pg vm has 4 cpu cores and 32G RAM with ~10GB data for the month, and a monthly rating run takes on the order of 10 hours, with full cpu usage on the postgres node14:45
Anticimexand that's for a 16 compute instance deployment which is like 25% utilized14:46
gordcwhat's ceilometer-pg? your own custom schema?14:46
Anticimexoh sorry, the postgres database node our ceilometer dispatches to14:47
gordcor existing api+postgres14:47
Anticimexthat14:47
Anticimexon newton14:47
gordcoh. yeah. i'm not surprised... jd_ benched it a few years back https://julien.danjou.info/blog/2015/gnocchi-benchmarks14:48
gordcthat's roughly when we stopped working on ceilometer storage.14:48
Anticimexyeah, and i see a first 10x blowup due to pg losing a bitmap scan to a sequence scan14:48
Anticimexon a cross join (sample x sample)14:49
*** vint_bra has joined #openstack-telemetry14:49
*** vint_bra has quit IRC14:50
gordcyep... 2+ years ago we realised that :P14:50
Anticimexusing resource queries to get summary data out, GET /v2/resources/2ce4315b-6b8a-4c6d-bd61-7bd9cf002ebd , so the resulting pg select that blows up on those14:50
Anticimexright14:51
Anticimexand a little bit of psql tweaking and it can be reduced 10x :)14:51
Anticimexi'll get back to chan later today with a quick example of what i mean14:51
gordccool. would be good to learn (although this won't merge because we've deprecated... and openstack is pushing a mysql agenda14:52
Anticimexyeah, will happily keep it outside openstack if i go forward with it. only have to maintain compability with the ceilometer dispatcher interface14:53
Anticimexseems to me mostly to be monty who's pushing a mysql agenda :-)14:53
gordcthe pgsql clique is not strong in openstack :P14:54
Anticimexclearly14:54
gordcif you do your pgsql wizardy and analyze gnocchi, that'd be great too :)14:54
Anticimexsomeone else already did gnocchi analyzation, don't need to redo his work14:55
gordcyou a link?14:55
Anticimexyeah trying to find, forgot its name, 1 sec14:56
gordccool cool, no rush.14:58
Anticimexhttps://www.youtube.com/watch?v=aHaGipVcIJ415:01
Anticimexwas a good presentation, job well done by the guy IMO15:03
gordcoh. that's jd_... and akrzos... agreed. really helping with us on how to improve things.15:05
Anticimexoh are they around in channel?15:06
akrzoshere15:06
akrzosthank Anticimex15:06
akrzosthanks*15:06
Anticimexideally if i try to hack together a psycopg2 dispatcehr as per my thoughts above for ourselves, i could try to replicate the benchmarks against that15:06
Anticimexand would save lots of time if the triple-o receipts were shared i think :)15:06
*** vint_bra has joined #openstack-telemetry15:07
*** rcernin has quit IRC15:18
*** shardy has quit IRC15:23
*** Tamayo has joined #openstack-telemetry15:28
*** sbadia has left #openstack-telemetry15:30
jd_Anticimex: o/15:40
*** r-daneel has joined #openstack-telemetry16:06
*** thorst has joined #openstack-telemetry16:09
*** donghao has quit IRC16:10
*** yprokule has quit IRC16:17
*** thorst has quit IRC16:23
*** pcaruana has quit IRC16:24
*** thorst has joined #openstack-telemetry16:26
*** efoley_ has joined #openstack-telemetry16:33
*** efoley has quit IRC16:37
*** efoley__ has joined #openstack-telemetry16:52
*** efoley_ has quit IRC16:55
openstackgerritgordon chung proposed openstack/python-pankoclient master: switch to use non-legacy SessionClient  https://review.openstack.org/46515717:05
openstackgerritgordon chung proposed openstack/python-pankoclient master: move shell out of osc  https://review.openstack.org/46572517:05
openstackgerritgordon chung proposed openstack/python-pankoclient master: add panko shell  https://review.openstack.org/46573617:05
*** nicodemus_ has joined #openstack-telemetry17:08
nicodemus_hello17:08
*** thorst has quit IRC17:09
nicodemus_I'm observing an unusual behavior using gnocchi 3.1.117:10
nicodemus_when using CEPH backend, metricd starts processing quite fast but after a while each new computed metric takes longer17:10
*** donghao has joined #openstack-telemetry17:11
nicodemus_http://paste.openstack.org/show/610585/ this paste shows processing time before and after the restart17:11
*** ricolin has quit IRC17:12
gordcnicodemus_: play with your filestore settings. that's what helped in my environment.17:14
gordcdisclaimer: i'm not a ceph expert... (not a ceph anything really)17:15
nicodemus_gordc, you mean CEPH settings?17:15
*** donghao has quit IRC17:15
*** links has quit IRC17:16
gordcnicodemus_: yep17:17
gordcnicodemus_: https://www.slideshare.net/GordonChung/gnocchi-profiling-v2/#17 those are configurations i tried... the bottom right is what i settled on17:18
nicodemus_gordc, Interesting... I'll give it a try and see what happens17:20
nicodemus_thanks!17:20
gordcnicodemus_: kk, i don't know if there's public data from real ceph operators... that was just me googling and testing random values.17:21
*** efoley__ has quit IRC17:22
*** thorst has joined #openstack-telemetry17:29
*** toddnni has joined #openstack-telemetry17:33
*** thorst has quit IRC17:33
*** thorst has joined #openstack-telemetry17:33
*** lhx__ has quit IRC17:59
*** thorst has quit IRC18:00
*** thorst has joined #openstack-telemetry18:19
openstackgerritgordon chung proposed openstack/python-pankoclient master: switch to use non-legacy SessionClient  https://review.openstack.org/46515718:40
openstackgerritgordon chung proposed openstack/python-pankoclient master: add panko shell  https://review.openstack.org/46573618:40
openstackgerritgordon chung proposed openstack/python-pankoclient master: move shell out of osc  https://review.openstack.org/46572518:40
AnticimexHere's a .sql file write up of how to improve ceilometer postgresql backend GET /v2/resource/<resource_id> by 31000 X: https://gist.github.com/Millnert/53f471bdd7b173d09e15a60882082a7818:42
Anticimexpg <318:43
Anticimexit becomes flat in time essentially, vs scaling with number of samples put in18:46
Anticimexnow.. to find where to put in this new query code in ceilometer-api backend driver thingaling18:47
gordccool, nice analysis.18:54
*** thorst has quit IRC19:02
*** donghao has joined #openstack-telemetry19:13
*** donghao has quit IRC19:17
nicodemus_gordc, let me ask you one question... what does exactly the metric_processing_delay parameter? Is it an interval for each metricd worker to wake up and see if there are any measures to process?19:19
nicodemus_The log is complaining about 'Metric processing lagging scheduling rate', but if I increase the number of workers, the load on the ceph cluster grows to a point in which the performance is even worse19:20
nicodemus_but then again, if I configure each worker to wake up every 60 seconds, the incoming measures will outpace metric processing19:22
gordcnicodemus_: yeah the scheduling is kind fo sketchy in gnocchi3.19:22
gordcbasically that option tells metricd every x seconds, grab a bunch of metrics to for the workers to work on.19:24
gordcthe scheduling is not smart enough to tell hold back if there is still work to be done by next scheduling cycle19:25
gordcfor the most part, you can ignore that warning (especially for ceph since it actually has the worse scheduling logic.19:26
nicodemus_so, a high value would do more harm than good19:26
*** thorst has joined #openstack-telemetry19:26
gordchigh value for metric_processing_delay?19:27
nicodemus_that is, if metricd is falling behind the amount of measures incoming19:27
nicodemus_yes, suppose I configure it for 60 seconds, I may end up with N workers idle19:27
gordcyes, for ceph, if the measurements to metric ratio is high, the scheduling suffers in v319:27
nicodemus_until the next time the scheduler grabs metrics19:28
nicodemus_I see19:28
nicodemus_Is there a silver lining in gnocchi 4?19:28
gordcyeah :) gnocchi4 has each worker figure out it's own tasks rather than have a central scheduler trying to guess when to schedule more tasks for workers19:29
gordcif you want a preview: https://www.slideshare.net/GordonChung/gnocchi-v4-preview19:30
*** thorst has quit IRC19:31
nicodemus_gordc, nice!19:31
nicodemus_another reason to consider migrating to gnocchi 4 :)19:32
gordcalthough i think you need to look at your ceph driver. i noticed how all your processing times are all >1s... that should be <10ms  i think unless your archive policy is verbose.19:33
gordcor i guess your ceph storage is not local.19:33
gordcnicodemus_: if you want to hack gnocchi3, you can change: https://github.com/gnocchixyz/gnocchi/blob/stable/3.1/gnocchi/cli.py#L154 to something much larger.19:33
nicodemus_it's all on AWS, it's not the same performance as when I had my dedicated & shiny CEPH cluster19:34
gordcah i see. makes sense.19:34
nicodemus_I'll try to hack cli.py and see what happens19:36
nicodemus_thanks a lot!19:36
gordcnp19:38
nicodemus_gordc, just one more thing... is there an ETA for gnocchi v4?19:40
nicodemus_(I'm anxious now :P)19:40
gordcnicodemus_: ask jd_  :)19:45
gordci think the features are in... just testing and fixes now i guess19:45
gordcsome time in june seems safe to say19:46
nicodemus_great! can't wait19:46
gordcyou can test it at github.com/gnocchixyz/gnocchi. was going to send a note to ML asking people to try it19:48
jd_Anticimex: is this 31s for the query?19:48
*** openstackgerrit has quit IRC19:48
Anticimex31000 ms -> 1ms yes19:48
Anticimexbut more correct probably to say it's from O(n)  (n = #samples) to O(0.00001*n) or O(1), depending on the btrees etc19:49
gordc... some reason i read that as 31min19:49
*** aagate has joined #openstack-telemetry19:49
jd_nicodemus_: good question, I'm on PTO this week (I know, you can't tell) but i'll need to chat with gordc and sileht if we're ready or not; seems to me we are now, I hope they fix all the bug by the time I come back19:50
jd_Anticimex: ok, I thought 31s was your optimization result :) good then!19:50
gordcjd_: i hope they do too.19:50
Anticimexjd_: aha, no.. :)19:50
jd_gordc: /nick they19:50
gordcyep. they is a very good worker.19:51
*** openstackgerrit has joined #openstack-telemetry20:03
openstackgerritgordon chung proposed openstack/python-pankoclient master: clean up utils  https://review.openstack.org/46774520:03
Anticimexjd_: more accurately, O(n) -> O(log n)  (had to refresh): http://bigocheatsheet.com/20:16
openstackgerritgordon chung proposed openstack/python-pankoclient master: move shell out of osc  https://review.openstack.org/46572520:24
*** fguillot has quit IRC21:11
*** donghao has joined #openstack-telemetry21:15
*** dave-mccowan has quit IRC21:20
*** donghao has quit IRC21:20
*** pradk has quit IRC21:29
*** pradk has joined #openstack-telemetry21:31
*** thorst has joined #openstack-telemetry21:47
*** rwsu has quit IRC22:15
*** vint_bra has quit IRC22:18
*** cdent has quit IRC22:18
*** adriant has joined #openstack-telemetry22:20
*** thorst has quit IRC22:23
openstackgerritgordon chung proposed openstack/panko master: support uwsgi  https://review.openstack.org/46779622:47
openstackgerritMerged openstack/ceilometer stable/newton: Add support of refereshing the resource info in local cache  https://review.openstack.org/46701222:48
*** gordc has quit IRC22:52
*** pradk has quit IRC22:57
*** nicodemus_ has quit IRC23:01

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!