Monday, 2016-12-19

*** thorst has quit IRC00:06
*** Jack_Iv has joined #openstack-telemetry00:35
*** Jack_Iv has quit IRC00:40
*** liusheng has quit IRC00:51
*** thorst has joined #openstack-telemetry01:03
*** thorst has quit IRC01:11
*** liusheng has joined #openstack-telemetry01:15
*** Jack_Iv has joined #openstack-telemetry01:29
*** Jack_Iv has quit IRC01:33
*** thorst has joined #openstack-telemetry01:57
*** thorst has quit IRC01:58
*** lhx_ has joined #openstack-telemetry02:04
*** gongysh has joined #openstack-telemetry02:07
*** zhangguoqing has joined #openstack-telemetry02:10
*** Jack_Iv has joined #openstack-telemetry02:20
*** Jack_Iv has quit IRC02:24
*** catintheroof has quit IRC02:28
*** zhangguoqing has quit IRC02:29
*** catintheroof has joined #openstack-telemetry02:30
*** catintheroof has quit IRC02:34
*** nokes has joined #openstack-telemetry03:01
*** noshankus has quit IRC03:02
*** nokes is now known as noshankus03:02
lluseems we're hitting the vine.five import error again with new kombu 4.0.2 version. pls see https://bugs.launchpad.net/oslo.messaging/+bug/1638263/comments/1503:03
openstackLaunchpad bug 1638263 in OpenStack Global Requirements "Unit tests failing on vine.five import" [Medium,Confirmed]03:03
*** thorst has joined #openstack-telemetry03:03
*** thorst has quit IRC03:12
*** frickler_ has joined #openstack-telemetry03:20
*** frickler has quit IRC03:21
*** Jack_Iv has joined #openstack-telemetry03:28
*** Jack_Iv has quit IRC03:32
*** r-mibu has quit IRC04:02
*** thorst has joined #openstack-telemetry04:09
*** r-mibu has joined #openstack-telemetry04:12
*** thorst has quit IRC04:18
*** donghao has quit IRC04:40
*** donghao has joined #openstack-telemetry04:41
*** adriant has quit IRC04:43
*** lhx_ has quit IRC05:08
*** thorst has joined #openstack-telemetry05:15
*** thorst has quit IRC05:22
*** hfu has joined #openstack-telemetry05:24
*** Jack_Iv has joined #openstack-telemetry05:46
*** Jack_Iv has quit IRC05:50
*** lhx_ has joined #openstack-telemetry05:52
*** Jack_Iv has joined #openstack-telemetry05:53
*** nadya has joined #openstack-telemetry06:01
*** thorst has joined #openstack-telemetry06:19
*** nadya has quit IRC06:20
*** thorst has quit IRC06:28
*** Jack_Iv has quit IRC06:31
*** Jack_Iv has joined #openstack-telemetry06:32
*** Jack_Iv has quit IRC06:32
*** Jack_Iv has joined #openstack-telemetry06:33
openstackgerritHanxi Liu proposed openstack/ceilometer: Change YAML file directory  https://review.openstack.org/41230906:47
*** Jack_Iv has quit IRC07:03
*** Jack_Iv has joined #openstack-telemetry07:04
*** tesseract has joined #openstack-telemetry07:04
*** tesseract is now known as Guest3325407:05
*** Jack_Iv has quit IRC07:07
*** nadya has joined #openstack-telemetry07:22
*** thorst has joined #openstack-telemetry07:25
lhx_sileht, if you aren't going to restore and change this, I would propose another one to copy the method and apply to the direct publisher07:26
lhx_https://review.openstack.org/#/c/397930/207:26
*** thorst has quit IRC07:33
*** pcaruana has joined #openstack-telemetry07:33
openstackgerritHanxi Liu proposed openstack/ceilometer: Fix publisher comment  https://review.openstack.org/41233607:48
*** shardy has joined #openstack-telemetry08:02
*** gongysh has quit IRC08:10
*** yprokule has joined #openstack-telemetry08:14
*** Jack_Iv_ has joined #openstack-telemetry08:27
*** thorst has joined #openstack-telemetry08:29
*** thorst has quit IRC08:38
*** masber has joined #openstack-telemetry08:39
lhx_jd__, sileht, why do we just handle on mongodb and es?09:16
lhx_https://github.com/openstack/ceilometer/blob/master/devstack/plugin.sh#L20409:16
jd__lhx_: because SQL is handled by devstack itself, IIRC09:18
lhx_jd__, oh, devstack default to SQL09:20
jd__yeah and it knows how to handle that09:21
*** yassine has joined #openstack-telemetry09:23
*** yassine is now known as Guest2420509:23
*** gongysh has joined #openstack-telemetry09:31
*** thorst has joined #openstack-telemetry09:35
*** thorst has quit IRC09:42
*** frickler_ is now known as frickler09:44
*** Adri2000 has quit IRC09:54
*** gongysh has quit IRC10:07
*** hfu has quit IRC10:15
*** lhx_ has quit IRC10:16
openstackgerritJulien Danjou proposed openstack/gnocchi: Merge project and user id in a creator field  https://review.openstack.org/40874010:20
openstackgerritJulien Danjou proposed openstack/gnocchi: rest: add auth_mode to pick authentication mode  https://review.openstack.org/40206810:20
openstackgerritJulien Danjou proposed openstack/gnocchi: rest: introduce auth_helper to filter resources  https://review.openstack.org/40206910:20
openstackgerritJulien Danjou proposed openstack/gnocchi: Introduce "basic" authentication mechanism  https://review.openstack.org/41238710:20
*** lhx_ has joined #openstack-telemetry10:25
*** Jack_Iv_ has quit IRC10:39
*** thorst has joined #openstack-telemetry10:40
*** thorst has quit IRC10:48
*** Jack_Iv_ has joined #openstack-telemetry10:49
*** Jack_Iv_ has quit IRC10:53
*** amoralej is now known as amoralej|brb11:05
*** cdent has joined #openstack-telemetry11:11
*** dave-mccowan has joined #openstack-telemetry11:26
*** jefrite has joined #openstack-telemetry11:32
*** Jack_Iv_ has joined #openstack-telemetry11:33
*** lhx_ has quit IRC11:41
*** thorst has joined #openstack-telemetry11:45
*** dave-mcc_ has joined #openstack-telemetry11:45
*** dave-mccowan has quit IRC11:48
*** Guest24205 has quit IRC11:52
*** thorst has quit IRC11:52
*** hfu has joined #openstack-telemetry12:00
*** lhx_ has joined #openstack-telemetry12:06
*** shardy is now known as shardy_lunch12:15
*** thorst has joined #openstack-telemetry12:15
*** amoralej|brb is now known as amoralej12:15
*** catintheroof has joined #openstack-telemetry12:18
*** dave-mcc_ has quit IRC12:18
EmilienMsileht: do you know if ceilo gate is broken? https://review.openstack.org/#/c/411393/12:29
*** hfu has quit IRC12:31
*** lhx_ has quit IRC12:32
*** hfu has joined #openstack-telemetry12:33
*** vint_bra has joined #openstack-telemetry12:39
*** hfu has quit IRC12:40
*** lhx_ has joined #openstack-telemetry12:49
*** Jack_Iv_ has quit IRC12:50
*** gordc has joined #openstack-telemetry12:59
*** lhx_ has quit IRC12:59
*** llu has quit IRC13:01
*** yassine has joined #openstack-telemetry13:05
*** yassine is now known as Guest9516713:06
*** hfu has joined #openstack-telemetry13:08
gordcsileht: how come we run tempest tests in integration gate?13:12
*** nadya has quit IRC13:13
*** Jack_Iv_ has joined #openstack-telemetry13:16
*** Jack_Iv_ has quit IRC13:16
*** lionel has quit IRC13:22
*** jwcroppe_ has quit IRC13:26
*** pradk has joined #openstack-telemetry13:31
*** leitan has joined #openstack-telemetry13:31
*** zaneb has quit IRC13:36
*** lhx_ has joined #openstack-telemetry13:46
*** jwcroppe has joined #openstack-telemetry13:51
*** hfu has quit IRC14:06
*** amoralej is now known as amoralej|lunch14:10
*** Jack_Iv_ has joined #openstack-telemetry14:13
*** chlong has joined #openstack-telemetry14:14
*** cdent has quit IRC14:16
*** shardy_lunch is now known as shardy14:16
*** yprokule has quit IRC14:21
*** pradk has quit IRC14:25
*** ryanpetrello has joined #openstack-telemetry14:26
*** fguillot has joined #openstack-telemetry14:29
ryanpetrellojd__ ceilometer gabbi tests seem to be failing for me again; `$ tox -e py27 -- gabbi`14:30
ryanpetrellohttps://travis-ci.org/pecan/pecan/jobs/18514631114:30
ryanpetrello...14:32
ryanpetrello      File "/home/travis/build/pecan/pecan/.tox/ceilometer-tip/src/ceilometer/.tox/gabbi/lib/python2.7/site-packages/kombu/five.py", line 6, in <module>14:32
ryanpetrello        import vine.five14:32
ryanpetrello    ImportError: No module named vine.five14:32
gordcryanpetrello: we need a new oslo.messaging release14:33
gordcsame reason as last week... new kombu release breaks oslo.messaging and requirements doesn't let us upper cap things14:34
ryanpetrellogotcha14:38
ryanpetrellojust saw that this had been broken for some time14:38
ryanpetrelloand wasn't sure if I was missing something14:38
ryanpetrellobut sounds like you all are already aware and working on it14:39
gordcyeah. well we fixed it. but the amount of time it took us to update requirements, create release, pick up release, etc... there was another kombu release that broke us again.14:41
gordcfun stuff14:41
gordcif it's an internal CI, i imagine you could just cap kombu <4.0.0 as that's what it should be.14:42
ryanpetrelloyea, not a big deal14:44
ryanpetrelloI just gate pecan against upstream OpenStack projects that use it14:44
ryanpetrelloand I noticed that ceilometer has been failing for some time14:44
ryanpetrellowanted to make sure I wasn't doing something wrong14:44
gordccool cool.14:45
jd__gordc: this time I hope they will14:50
jd__fix it once and for all14:50
jd__ffs14:50
*** cdent has joined #openstack-telemetry14:55
*** amoralej|lunch is now known as amoralej14:56
*** sudipto has joined #openstack-telemetry15:01
*** sudipto_ has joined #openstack-telemetry15:01
sudipto_Hi, is it possible to do h/w metric collection via Ceilometer today?15:02
*** pradk has joined #openstack-telemetry15:04
*** Jack_Iv_ has quit IRC15:05
*** Jack_Iv_ has joined #openstack-telemetry15:06
yarkothave you looked at http://docs.openstack.org/admin-guide/telemetry-measurements.html15:15
openstackgerritMerged openstack/gnocchi: rest: catch create_metric duplicate  https://review.openstack.org/41179215:16
*** Jack_Iv_ has quit IRC15:18
*** Jack_Iv_ has joined #openstack-telemetry15:19
openstackgerritHanxi Liu proposed openstack/ceilometer: Change YAML file directory  https://review.openstack.org/41230915:32
*** Jack_Iv_ has quit IRC15:39
*** Guest33254 has quit IRC16:02
openstackgerritHanxi Liu proposed openstack/ceilometer: devstack configure when no collector  https://review.openstack.org/41250816:05
*** pcaruana has quit IRC16:15
*** cdent has quit IRC16:26
*** Guest95167 has quit IRC16:50
*** yassine has joined #openstack-telemetry16:50
*** yassine is now known as Guest4586716:51
*** nadya has joined #openstack-telemetry16:54
*** cdent has joined #openstack-telemetry16:56
*** nadya has quit IRC17:02
*** Guest45867 has quit IRC17:08
*** rwsu has joined #openstack-telemetry17:22
*** rwsu has quit IRC17:23
*** sudipto has quit IRC17:25
*** sudipto_ has quit IRC17:25
*** sudipto_ has joined #openstack-telemetry17:25
*** sudipto has joined #openstack-telemetry17:25
*** lhx_ has quit IRC17:27
*** rwsu has joined #openstack-telemetry17:36
*** Jack_Iv__ has joined #openstack-telemetry17:44
*** nadya has joined #openstack-telemetry17:49
*** sudipto_ has quit IRC17:51
*** sudipto has quit IRC17:51
*** nadya has quit IRC18:04
*** shardy has quit IRC18:08
*** Jack_Iv__ has quit IRC18:34
*** Jack_Iv_ has joined #openstack-telemetry18:43
*** Jack_Iv_ has quit IRC18:48
*** nadya has joined #openstack-telemetry18:53
*** Jack_Iv_ has joined #openstack-telemetry19:06
*** Jack_Iv_ has quit IRC19:07
*** david-lyle_ has joined #openstack-telemetry19:13
*** openstackstatus has quit IRC19:13
*** david-lyle has quit IRC19:13
*** openstack has joined #openstack-telemetry19:15
*** Jack_Iv_ has joined #openstack-telemetry19:17
*** Jack_Iv_ has quit IRC19:21
*** nadya has quit IRC19:27
*** chlong has quit IRC19:34
*** chlong has joined #openstack-telemetry19:36
*** lionel has joined #openstack-telemetry19:42
*** rcernin has joined #openstack-telemetry19:56
openstackgerritMerged openstack/gnocchi: rest: remove user_id and project_id from metric schema  https://review.openstack.org/40710320:04
akrzosok20:11
akrzosI'm seeing some of these messages in my metricd.log20:11
akrzos"Metric processing lagging scheduling rate. ..."20:11
akrzos"increase the number of workers or to lengthen processing interval."20:11
akrzosby length processing interval does that mean decrease the "metric_processing_delay"20:12
gordcakrzos: increase it20:13
akrzosbut then metricd fires up less often to actually process measures20:15
akrzosbefore the next polling interval20:15
gordcright20:15
gordcwe haven't started work on improving that part of schedulign just yet20:15
akrzosok20:15
gordcakrzos: you can also just use 'refresh' optino which basically forces aggregation at query time20:16
akrzosunderstood20:16
akrzosbut if my measures backlog just continues to grow20:16
akrzoseverything will need --refresh20:16
*** amoralej is now known as amoralej|off20:17
gordcright. in theory only --refresh guarantees you're working against all known data20:17
gordcif you don't use it, the assumption is there may be some datapoints that are unaggregated and ignored (at time of query)20:18
akrzosso let me explain what i'm doing and maybe it will be easier to understand why i'm thinking of reducing the delay20:18
akrzosi'm trying to get gnocchi to saturate my ceph backend20:18
akrzosbasic scale test with an openstack cloud20:18
akrzoswith gnocchi configured with ceph driver20:19
akrzosi have 4 ceph nodes20:19
akrzos3 controller nodes20:19
akrzos10 computes to host instances to drive a workload for telemetry20:19
akrzosi booted 1k instances20:19
akrzosmeasure backlog was growing20:20
gordci see, reducing that delay won't necessarily help you test load.20:20
gordcwhat that delay does is basically the rate it schedules unaggregated data to be processed.20:20
akrzosthis is with 48 metricd processing workers per controller (144 workers total)20:20
akrzos60s delay20:21
gordcyou can make it as small as you want but if you don't have enough workers processing the scheduled items, it won't improve anything20:21
akrzosthe measures still backing up20:21
akrzoslow archival policy20:21
akrzosas soon as i reduce dthe delay to 30s20:21
akrzosthe entire backlog was consumed in like 40minutes (100-88K measures)20:22
akrzosand no more metrics lagging :)20:22
gordchmm. 144 workers is a lot.lol20:22
akrzosceph osds are not saturated either20:22
akrzoslet me pull the gnocchi version20:22
akrzosi know you added a bunch of improvements20:23
akrzos3.0.220:23
akrzoscpus on the metricd machines (OpenStack Controllers) are not saturated either they still have idle % available20:24
gordcyeah, that's latest.20:24
gordcso is the issue you want to make the scheduling rate quicker?20:24
akrzosso i wanted to get more instances cause well everyone loves scale and 1k across only 10 computes seems low20:25
gordcbasically, that warning is that your scheduling much faster than metricd workers are processing20:25
akrzosso i booted more until i saw the backlog grow again20:25
gordcit might happen sometimes... if it happens alot then it's because scheduling is too frequent20:26
gordcregarding growing backlog. it's not necessarily a bad thing. basically, if you want to save IO/CPU letting your backlog grow enables you to process in batch and thus more efficiency but you have to accept the delay20:28
akrzosok20:28
akrzosagreed20:29
gordcbut yeah, in your saturation test, i imagine you just want to see 'max' scenario20:29
akrzosthe concern here is consumers of telemetry are going to want to understand the scale limits of gnocchi20:29
akrzosso i'm tyring to best determine that20:29
openstackgerritMerged openstack/aodh: Remove notes about MongoDB  https://review.openstack.org/41133320:29
gordcfair enough.20:30
akrzosthe selling point as i understood it was that hey we are gnocchi and we compute your metrics before you ask for them20:30
gordcakrzos: just curious, how many osds do you have?20:30
akrzos36 osds20:30
akrzosso the current scale i have right now is20:31
akrzos1.8k instances20:31
akrzos48 metricd workers on each controller20:31
akrzos15s processing delay20:31
akrzos4 ceph nodes with 9 osds each20:31
akrzososds are @ ~60% disk io util20:32
akrzosand archival policy is "low" for all resources20:32
akrzosi guess my concern is adjusting that to high20:32
akrzosand the number of instances we could handle20:32
akrzoswould be less20:32
akrzosand this is applying a pretty heavy utilization on the controllers were i am running the metricd processing workers20:33
akrzosalso os_workers defaults configures you with a pretty low worker count to cpus available20:33
akrzosgranted it's easy for me to override that default20:33
gordcyeah, that's larger environmetn then what i tested with20:33
akrzosso measures doesn't appear to be growing much right now20:34
akrzosseems after adjusting to 15s delay it sunk a lot20:34
gordci had ~1/3 your workers.20:34
akrzosbtw i ended up just using collectd to query for measures backlog20:34
akrzosyour osds were ssds though20:34
gordcbut i had basically 100% CPU.20:34
akrzosmine are 10k sas drives20:34
akrzosalso i'm seeing this in my metricd.log https://bugs.launchpad.net/gnocchi/+bug/155759320:35
openstackLaunchpad bug 1557593 in tooz "Unable to extend Redis lock in certain cases" [Medium,Fix released] - Assigned to Julien Danjou (jdanjou)20:35
gordci had a ssd journal. the storage was not.20:35
akrzosnot sure that fix made it in what i'm running20:35
akrzosoh ok20:35
gordci think it's still a bug20:35
akrzosso my journal is co-located20:35
akrzosi don't have ssds right now for the journal20:36
gordci know i still see it when i put heavy load20:36
*** adriant has joined #openstack-telemetry20:36
gordcakrzos: i see... i didn't actually test how much it improves with ssd journal (or i don't recall)20:36
*** cdent has quit IRC20:36
gordcregarding archive policy, it actually might be better at high20:37
akrzosinteresting20:37
akrzosfyi here is what i'm using to monitor the measures backlog - https://review.openstack.org/#/c/411030/4/ansible/install/roles/collectd-openstack/files/collectd_gnocchi_status.py20:37
gordcweird but the reasoning is that low policy has a 1 day granularity so it constantly has to aggregate (up to) a days worth of data20:37
gordcthe high policy maxes out at 1 hr granularity.20:37
akrzosobivously that will introduce some additional load but i query this every 30s not my standard 10s with the rest of my collectd metrics20:38
akrzosOH20:39
akrzosi see20:39
akrzosso less re-calculating what the metric is across each aggregation20:39
gordcthat's more elegant than my soltuion. i just put random log messages and do calculates based on that.20:39
gordcakrzos: right20:39
gordcit just has a higher storage footprint20:39
akrzosalso i noticed you suggested aggregation worker threads20:39
akrzosthe default is 1 i beleive20:40
akrzoswondering if we should make that equal to the aggregations that are default for the default archival policy (8 aggregations)20:40
gordcright. we defaulted to 1 because the code wasn't very cpu efficient before so threading didn't help... but yeah, it might be a good idea to change default20:41
akrzos*nod* python gil20:41
gordcdid enabling aggregation_workers make things better for you?20:41
gordcnot sure if it was just my test scenario20:42
akrzosi need to try it again20:42
akrzosso first my worker count was just entirely too low for metricd20:42
akrzosfrom 6 workers per controller to 4820:43
akrzoswhich is suprising since these are 24 logical cpu cores (12 cores, 24 hyperthreads)20:43
gordci was going to ask how many cpu cores you had. i have 24physical core and only ran 32 becuase metricd+ceph osd max'd everything out20:45
akrzosso the ceph osd processes are on the ceph machines20:45
akrzosso they aren't competing for cpu20:45
gordcah i see. good call. :)20:46
gordci had to cram everything onto 3 machines.20:46
akrzosif i only had actual ssds though for the journal i'd have "more proper" ceph nodes20:47
akrzosthe other thing missing for a "proper" ceph setup20:47
akrzosis a separate 10g storage network for storage traffic20:48
gordcgood luck getting approval for hardware.lol20:48
akrzosI'll need it20:49
akrzos(the luck)20:49
akrzos(and hardware too)20:49
gordc:)20:49
akrzosso maybe instead of dropping the delay interval20:50
akrzosi should just continue to increase metricd workers20:50
akrzosuntil i saturate the ceph disks20:50
akrzosthen i can retry again with a higher archival policy20:50
akrzoswon't 1s samples cause a lot of writes too or i suppose that will be batched at the scheduling interval?20:50
akrzosalso curious to the 16 tasks per worker i saw in the code20:51
gordcit'll only generate based on incoming points. so if you're using ceilometer and polling every 10s, it'll only have the point every 10s... it won't backfill 9s20:52
gordc(iiuc)20:53
gordcthe 16 tasks per worker is basically because we don't have a good solution to scheduling unprocessed measures yet20:53
gordcit relates to: http://lists.openstack.org/pipermail/openstack-dev/2016-November/107284.html20:54
akrzosokay so i'd have to decrease the polling interval to get anything with a granulatiry lower than 10minutes since our default is 600s20:54
gordcakrzos: right20:54
akrzosthat includes samples from notification agents?20:54
gordcsamples from notification are irregular... based on when events happen20:55
gordcnova and some other projects have a periodic notification but that's every hour i think (at most)20:55
akrzosok so anything i would absolutely expect to be measured regularlly occurs by the polling then? (cpu_util, memory usage, disk write/reads)20:59
gordcdisk read/writes, network in/out, cpu, cpu_util, i don't knwo which memory ones are periodic21:00
gordcmemory.usage apparently according to code21:01
*** tlian has joined #openstack-telemetry21:01
akrzosok well i think i have a few actions i can take based on this21:01
akrzosi intend on posting results from this into performance-docs here (http://docs.openstack.org/developer/performance-docs/)21:02
akrzosthanks for your help gordc!21:02
gordcthat'd be great. we might bug you for more details once you're done so we can improve stuff21:02
gordcthanks to you as well.21:02
akrzosno problem!!21:06
*** Jack_Iv_ has joined #openstack-telemetry21:44
*** thorst has quit IRC21:46
*** thorst has joined #openstack-telemetry21:46
*** Jack_Iv_ has quit IRC21:49
*** thorst has quit IRC21:55
*** fguillot has quit IRC21:56
*** Jack_Iv_ has joined #openstack-telemetry22:07
*** Jack_Iv_ has quit IRC22:08
*** Jack_Iv_ has joined #openstack-telemetry22:09
*** Jack_Iv_ has quit IRC22:09
*** Jack_Iv_ has joined #openstack-telemetry22:09
*** dave-mccowan has joined #openstack-telemetry22:10
*** Jack_Iv_ has quit IRC22:14
*** Jack_Iv_ has joined #openstack-telemetry22:15
*** Jack_Iv_ has quit IRC22:16
*** Jack_Iv_ has joined #openstack-telemetry22:18
*** Jack_Iv_ has quit IRC22:20
*** Jack_Iv has joined #openstack-telemetry22:35
*** Jack_Iv has quit IRC22:38
*** Jack_Iv has joined #openstack-telemetry22:38
*** thorst has joined #openstack-telemetry22:52
*** vint_bra has quit IRC22:56
*** thorst has quit IRC23:00
*** pradk has quit IRC23:01
*** Jack_Iv has quit IRC23:11
openstackgerritMerged openstack/ceilometer: Fix publisher comment  https://review.openstack.org/41233623:27
*** tlian has quit IRC23:35
*** gordc has quit IRC23:37
*** dave-mccowan has quit IRC23:50
*** thorst has joined #openstack-telemetry23:57
*** jwcroppe has quit IRC23:58
*** jwcroppe has joined #openstack-telemetry23:59
*** tlian has joined #openstack-telemetry23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!