14:10:45 <tobberydberg> #startmeeting publiccloud_wg
14:10:46 <openstack> Meeting started Thu Jun  6 14:10:45 2019 UTC and is due to finish in 60 minutes.  The chair is tobberydberg. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:10:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:10:50 <openstack> The meeting name has been set to 'publiccloud_wg'
14:11:22 <tobberydberg> Simple agenda this time
14:11:26 <ncastele> Yep
14:11:29 <tobberydberg> #topic 1. Joint development effort billing
14:12:04 <tobberydberg> notes from last meeting is here: #link http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.log.html
14:12:39 <ncastele> homeworks have been done in the etherpad, and I'm quite happy we both wrote the same stuff
14:12:47 <ncastele> https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal
14:13:40 <tobberydberg> Yea, some parts are there for sure ... might be more there ... would be good if more people fill in their needs as well so we can get a full list of the needs
14:13:49 <ncastele> +1
14:13:50 <tobberydberg> But yes, pretty much the same
14:14:41 <tobberydberg> I guess we both feel that we need something that fires events as well as continues reports .... scraping method potentially
14:16:17 <ncastele> Scraping is a way to go, but can lead to an impact on control plane performances
14:17:00 <tobberydberg> What I was thinking about is that we basically need a tool or a set of tools that can handle all form of metrics, not only the metrics for the resources we bill for, but for things that are for general information/statistic purposes as well
14:17:25 <tobberydberg> well ... depends on how that scraping is done I would assume
14:17:31 <ncastele> what do you mean by things for general information ?
14:18:01 <tobberydberg> could be security groups and rules for instance
14:18:31 <tobberydberg> SOME might bill for that (we do not), but it would still be of interest for me to see the usage of that
14:19:05 <tobberydberg> Hard to track such thing today since neutron doesn't store anything deleted in the databse
14:19:13 <ncastele> Yes
14:19:21 <tobberydberg> barbican stuff is another thing
14:19:23 <wondra> What do you mean by scraping?
14:19:29 <wondra> API requests?
14:19:33 <ncastele> there are no retention at all in neutron db ?
14:20:04 <ncastele> API requests, or nice SQL queries on database
14:20:19 <wondra> Nice? :-)
14:20:28 <tobberydberg> not what I know of, no (on the retention)
14:20:41 <wondra> What about the usage of ceilometer events as I suggested? That is a log of the history in itself.
14:21:29 <tobberydberg> wondra I guess you can doing scraping in different forms ... but one can be db queries, you can also scrape each compute node for its resources ...
14:21:57 <ncastele> I'm not enough into ceilometer to ensure it can answer all our needs
14:22:37 <tobberydberg> What I hear from people, the ceilometer agents are pretty stable in that sense, but still, people do not fully rely on it anyone
14:22:42 <tobberydberg> *anyway
14:22:48 <wondra> Our billing is based on that. We did it back in Kilo and will have to rework it due to an API change for Ocata.
14:23:05 <tobberydberg> usually having something else as well and then comparing the results
14:23:07 <wondra> You get events about every entity along with details, like the owner, size, etc.
14:23:32 <tobberydberg> yes, which is good
14:23:44 <ncastele> Is it easy to enhance the content of an event with custom information depending of the service ?
14:24:04 <tobberydberg> the storage of that data and possibility to query it is hard to work with though
14:24:26 <wondra> Dunno. But you can choose what you store from the notification which is being sent by the particular OpenStack project in the events pipeline.yaml
14:24:29 <wondra> https://docs.openstack.org/ceilometer/ocata/events.html
14:25:45 <wondra> Querying it would be done with the Panko API, which we haven't learned yet. Still on the old Ceilometer one. We basically query it every day on midnight for every tenant and compute the usage.
14:26:24 <tobberydberg> It's been a little bit to long since I looked into all the side projects around ceilometer, but listening to people it seams not to work very well
14:26:56 <wondra> How does CloudKitty work anyway? Doest it query Panko or Gnocchi?
14:27:04 <tobberydberg> gnocchi being outside of openstack makes it worse
14:27:19 <tobberydberg> think it uses gnocchi
14:27:27 <witek> they don't rely on a specific DB from what I know
14:27:44 <ncastele> Cloudkitty offers drivers for prometheus/ceilometer/gnocchi
14:27:45 <witek> they can use Gnocchi, Monasca, Prometheus
14:28:09 <tobberydberg> ok ok
14:28:16 <witek> :)
14:29:45 <tobberydberg> are you folks basing your billing today on all native openstack services?
14:30:29 <tobberydberg> we do not, now days not at all using openstack telemetry services for that (unfortunately)
14:30:32 <ncastele> for the collecting part, yes, we are relying on ceilometer (in a heartbeat way, not in an event way)
14:30:45 <ncastele> for instances
14:31:11 <ncastele> For volumes, we have some sql queries that are pushing usage data into ceilometer
14:31:38 <tobberydberg> ok, so ceilometer doesn't work for that?
14:31:45 <ncastele> but it's a scrapping way (scrapping hypervisor, scrapping our ceph clusters, etc.)
14:32:26 <ncastele> ceilometer does the work because we scrap each 5 minutes, but we bill by hour, meaning that we accept we can lose some points
14:32:59 <mnaser> oh i didn't know cloudkitty can use prometheus TIL
14:33:21 <tobberydberg> ok. I would say that we need something that is down at seconds level of usage for all type of resources
14:33:49 <mnaser> i think if we forget scraping and bring evented monitoring, we dont have to track things per second anymore, because you can easily introspect things "what time did it start and end"
14:33:54 <ncastele> but when we come to seconds billing, it's not possible for us to use the actual system because for this precision, we need to base on events/on precise dates and time
14:34:09 <ncastele> +1 mnaser
14:34:13 <tobberydberg> mnaser totally agree
14:34:30 <mnaser> then it's $bill_in_whatever_increment_you_want
14:34:41 <tobberydberg> byes
14:35:05 <witek> seconds meaning 1s, 10s or 30s ?
14:35:10 <tobberydberg> do we need the scraping for other purposes?
14:35:15 <tobberydberg> 1s
14:35:36 <ncastele> seconds meaning 1s :D
14:35:43 <tobberydberg> or even less than seconds event based will give you
14:36:09 <wondra> I believe that we do it for Floating IPs. Cinder is fine with cinder-volume-audit in cron, but Neutron does not have that.
14:36:12 <witek> the actual question is if such fine resolution is really useful for billing?
14:36:14 <ncastele> it depends, if we trust at 100% our event based collecting, then we do not need scrapping. But scrapping can consolidate the event base process
14:36:31 <tobberydberg> I would say CPU load etc etc will need scraping if we sould like to cover those bits as well
14:37:25 <wondra> To clarify - most openstack components issue the .exist event, which allows you to find entities that you missed the .start notifications for. Floating IPs do not have it.
14:38:59 <tobberydberg> so same question again, if events is enough, is ceilometer reliable to use in its current shape and form?
14:39:02 <wondra> Billing by CPU load - the white unicorn of public clouds?
14:39:12 <tobberydberg> (in just collecting the "metrics")
14:39:28 <wondra> Dunno. I'm 4 releases behind.
14:39:34 <ncastele> Dunno either
14:39:46 <tobberydberg> mnaser what is you feeling there?
14:39:55 <wondra> Eh, actually 6. Damn.
14:40:46 <tobberydberg> wondra I was thinking just to have the same source of data to stuff that are interesting to visualise for users ... not necessarily bill for it =)
14:41:25 <tobberydberg> wondra just go directly to train this fall ;-)
14:41:42 <wondra> Having visualisations in our customer portal would be nice. I've got example code from a bachelor's thesis for Gnocchi.
14:42:24 <tobberydberg> +1
14:42:50 <wondra> With our main product being VPS, we don't need it in Horizon.
14:43:10 <mnaser> i think it's probably easier to start building the structure for the billable stuff
14:43:11 <tobberydberg> The biggest issue that I see here moving forward will be the storage and real time query bits
14:43:13 <mnaser> imho
14:43:45 <mnaser> well, we think of things as a resource, that's what ceilometer did at the time, resource X (could be cinder uuid) which starts created_at $x and deleted_at $y (or none)
14:44:05 <mnaser> and then we have a tool that says "query how many seconds has resource X been used in the period A => B" and it can do all the math
14:44:09 <mnaser> i have a lot of that code
14:45:38 <tobberydberg> event driven only, and I think that will cover the billing bits
14:46:10 <tobberydberg> so, using ceilometer for that mnaser? Reading off the rabbit queue?
14:46:27 <mnaser> nope, reading the db :\
14:46:38 <tobberydberg> neutron?
14:46:46 <mnaser> not billing anything for that
14:47:34 <tobberydberg> ok. I totally agree that reading o the database is very easy for a lot of the resources, but not for all
14:47:53 <tobberydberg> object storage is another thing that will be har to track that way
14:48:26 <tobberydberg> dynamic size stuff
14:48:36 <mnaser> well i was thinking for that sort of thing, parsing logs might be the way to go
14:49:03 <mnaser> logs are pretty much events of download/upload i guess
14:49:20 <wondra> Reading the DB is bad. There's no clear contract. Any OpenStack project can change it and break the code.
14:49:36 <tobberydberg> true
14:52:17 <ncastele> agree with wondra, the db contract is not stable. But it's the easiest and less impacting way to read data
14:52:34 <tobberydberg> So, just a few more minutes left of todays meeting, would be good to sum it up and find action points until next meeting
14:52:50 <ncastele> I tried a PoC on reading from Nova APIs as an admin, and I quickly reached some limits
14:53:06 <wondra> But the deployment tools are trailing the OpenStack releases. Maybe a billing product that reads the database could only exist for 1 year old releases...
14:53:43 <tobberydberg> #action All - continue to define what we really need to track
14:55:08 <tobberydberg> what else? more clear suggestions on collection method of events? DB queries? whatnot?
14:55:39 <tobberydberg> Should we try to book a meeting for next week as well to keep up the pace before all vacations?
14:56:54 <tobberydberg> I'm happy to take a meeting same time next week
15:00:49 <wondra> Not against. I have said most of what I know, though.
15:01:03 <tobberydberg> well, time is up for today ... any opinions on above before we close down?
15:01:38 <tobberydberg> Thanks wondra ... have you added your needs of metrics to the etherpad as well?
15:01:49 <witek> good understanding of what exactly has to be collected is important
15:02:23 <wondra> yes, I have.
15:02:29 <tobberydberg> yes, I think so too and that is what we need to base tooling etc on to be successful
15:02:37 <tobberydberg> +1 wondra
15:03:47 <tobberydberg> Ok. So I'll end todays meeting. I send out a reminder for a meeting next week, and we continue with once per week for a few more weeks and see what we can get out of that
15:03:52 <tobberydberg> Thanks for today folks!
15:04:20 <witek> thanks, I'm in vacation, see you in 3 weeks
15:05:17 <tobberydberg> #endmeeting