21:03:11 <rockyg> #startmeeting #log_wg
21:03:12 <openstack> Meeting started Wed Mar  2 21:03:11 2016 UTC and is due to finish in 60 minutes.  The chair is rockyg. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:03:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:03:16 <openstack> The meeting name has been set to '_log_wg'
21:04:35 <rockyg> #topic RequestID in log messages
21:05:44 <rockyg> Abhishek has mostly finished getting requestIDs into the projects and is moving now to get them into logs.
21:06:02 <rockyg> #link http://lists.openstack.org/pipermail/openstack-dev/2016-March/087878.html is where he starts the discusssion
21:06:09 * dhellmann slides in a bit late
21:06:22 <rockyg> Hey dhellmann !
21:06:39 <rockyg> Talking requestIDs into log messages.  Perfect day to be here.
21:07:49 <jokke_> dhellmann: o/
21:07:50 <dhellmann> hi, rockyg
21:07:53 <dhellmann> hi, johnthetubaguy
21:07:55 <dhellmann> oops, jokke_
21:08:12 * dhellmann had a tab-completion-failure
21:08:32 <rbradfor> sorry, was out
21:08:33 <jokke_> I'm bit concerned about that req ID list to be dumped to the logs
21:08:57 * jokke_ needs to subscribe back to the mailing lists
21:09:21 <rockyg> I agree the log format sucks, but we can work on that first, does adding request_utils module to oslo.log make sense?
21:09:52 * dhellmann goes to look at what's in request_utils
21:10:03 <rockyg> This is where dhellmann and rbradfor have the best info.  Let's give them time to look into this.
21:11:08 <rbradfor> while we are talking about request-ids (and this is likely a larger cross project discussion), a.) is "req-" needed, b) can you shorten the display of ids (e.g. like we do for git commits)
21:11:31 <rockyg> I've been doing a lousy job of actually putting stuff together this cycle.  Involved in too many projects/work groups  I apologize
21:11:57 <rockyg> So, how to differentiate req- from uuid?
21:12:32 <dhellmann> I don't even see that module any more
21:12:46 <rbradfor> rockyg why do you have to?
21:12:49 <rockyg> dhellmann,  could it be in incubator?
21:12:56 <dhellmann> rockyg : not my copy
21:13:16 <rbradfor> and looking at the email, duplication: INFO tempest.tests [req-a0df655b-18a2-4510-9e85-b9435dh8ye4 admin admin] Request ID Link: request.link 'req-a0df655b-18a2-4510-9e85-b9435dh8ye4' -> Target='nova' TargetId=req-f0fb885b-18a2-4510-9e85-b9066b410ee4
21:13:31 <dhellmann> we deleted all the remaining code in the incubator earlier this cycle, so we'll have to recover it from an older git commit, I guess
21:13:45 <rbradfor> so it seems one could use like my idea of 7-8 digits for first in the list (as it's part of the [context] already)
21:13:45 <rockyg> oh, sorry dhellmann .  He wants to resurrect it.
21:13:53 <jokke_> rbradfor: that req- prefix was specific request from ops over multiple conversations ... reasoning, can't remember
21:14:05 <rbradfor> jokke_, ok, happy with explanation.
21:14:17 <rbradfor> there was somebody wanting to or proposed a review to remove it recently
21:14:30 <rockyg> So, I don't think Ops cares about how we specify the ids, just so they can tell what they are.
21:15:10 * rbradfor wishes some things like ids were shorter then they are, and they contained structure, not just randomized
21:16:01 <rockyg> This is where the syslog key-value section of the its header can be valuable.  originator: xxx, caller: xxx, called: xxx
21:16:09 <jokke_> yeah iirc ops were pretty open for the actual ID type itself during the conversations as long as it was unique and there was some indication that it's the request id (other than just position where it shows up)
21:17:24 <jokke_> rbradfor: I totally agree having to log multiple ID's just makes log lines ridiculously long and difficult for human to parse
21:17:41 <rockyg> rbradfor, well, reqid *could* use a smaller UUID size.  Or, that could be a config option (might be best -- default as is but let ops set it based on their cloud size as appropriate)
21:19:56 <rbradfor> I do not know the varied details of uuid lengths in python, except it does exist.  I guess some things need to potentially be more unique then others, so is the config of length project wide or note. It's probably way to difficult to implement, and a discussion for elsewhere
21:20:20 <rbradfor> anyway, I digress
21:20:23 <dhellmann> this appears to be the request_utils code? https://review.openstack.org/#/c/170074/
21:20:45 <rockyg> The ability to define size is already in oslo.  Saw it somewhere.  You can pick the number of bits
21:21:14 <dhellmann> as far as I know, we're using uuid4 everywhere, which are random numbers. All UUID values have the same length, unless you format them without the - in the middle.
21:21:20 <rbradfor> rockyg, not using oslo.context http://git.openstack.org/cgit/openstack/oslo.context/tree/oslo_context/context.py#n40
21:21:34 <rockyg> Ah.  OK.  Thanks.
21:21:40 <rbradfor> but some projects could be using their own id generation,
21:22:35 <dhellmann> some do, and they end up with longer request ids iirc
21:22:44 <dhellmann> I think heat is the canonical example of that
21:23:09 <dhellmann> is the idea to include the id chain in every line, or log it once when the request starts?
21:23:43 <rockyg> Idea is to log every time a handoff from client to project or project to project happens.
21:24:09 <dhellmann> ok. I'm not sure the best place to put that, to be honest
21:24:29 <dhellmann> possibly oslo.context?
21:25:12 <dhellmann> frankly, we might want to implement it in one service first and see what the results look like before we pick a library to act as home for the code
21:25:19 <rockyg> We could do it every message, but *very* redundant over time.  but Ops might like it.  Or prefer the other.  Have to ask on that one
21:25:33 <dhellmann> oh, no, I don't think we want that, it just wasn't clear what the proposal was
21:26:43 <rockyg> Here is the way I thought we could include it in the log message:  STRUCTURED-DATA = NILVALUE / 1*SD-ELEMENT
21:26:43 <rockyg> SD-ELEMENT      = "[" SD-ID *(SP SD-PARAM) "]"
21:26:43 <rockyg> SD-PARAM        = PARAM-NAME "=" %d34 PARAM-VALUE %d34
21:26:43 <rockyg> SD-ID           = SD-NAME
21:26:43 <rockyg> PARAM-NAME      = SD-NAME    PARAM-VALUE     = UTF-8-STRING ; characters '"', '\' and ; ']' MUST be escaped.
21:26:46 <rockyg> SD-NAME         = 1*32PRINTUSASCII
21:26:48 <rockyg> ; except '=', SP, ']', %d34 (")
21:26:58 <jokke_> well the hand off is one thing, but there is no much of joy about it in the logs unless it gets logged in the actual log lines as well
21:28:12 <dhellmann> the question is whether we need it in every line for it to be useful
21:28:17 <rockyg> This is the last header field for syslog message format.  We could do SD-ID = ReqId  then originating:   dispatching:  received:  As the three
21:28:44 <rockyg> Every handoff would need to be logged.
21:29:36 <rockyg> I am not positive, but I think there is enough redundancy in the rest of the log message that instance id would be trackable in between the handoffs.
21:30:01 <dhellmann> logging just the hand-offs is relatively straightforward, we just have to find all of the places to do that
21:30:48 <rockyg> right.  And I believe Abhi took care of that when he instituted the requestID stuff in the API calls/returns
21:32:23 <rockyg> Using the syslog format, we could do one project as POC and the field would be [-] for where it's not implemented
21:34:20 <dhellmann> I'm not sure what you mean? If we're logging discrete events, when would we have an empty field?
21:34:32 <jokke_> dhellmann: the whole point of req ID was to enable tracking requests on flailure
21:35:04 <jokke_> dhellmann: if we do have busy system and no request id in logs we have no idea what error is related to what request
21:35:21 <dhellmann> it is already possible to always log the *current* request id
21:35:42 <jokke_> dhellmann: yes
21:35:56 <dhellmann> I thought this was about logging "I just called from nova into glance from request X to request Y"
21:36:21 <dhellmann> and that's one line, and if you don't have both X ad Y then why are you even logging it?
21:36:45 <dhellmann> or am I misunderstanding something more fundamental? sorry, I've been out of touch with this stuff for a bit so it may have changed from when I last looked.
21:36:53 <rockyg> dhellmann, if the SD-ELEMENT has not been added to the the log message call, syslog format would automatically put in the [-].  Once the sd element is added, it would appear in the message properly
21:37:14 <jokke_> a-ha! sorry my bad ... stuck on the previous proposal of this having list of req- IDs that has gone through system
21:37:30 <dhellmann> rockyg : I'm afraid I don't know what those abbreviations mean. What is "SD"?
21:38:07 <rockyg> from the exerpt of the RFC 5424 "SD" is structured data.  It's the last field in the syslog header format.
21:38:19 <jokke_> but that was what the client were proposed returning, not what gets logged it seems
21:39:54 <rockyg> Right now, each project creates its own req-ID when it starts a task.  Abhi's just finished work returns those req-ids in the API response.  Now he/we want to collect them in a log message.
21:41:45 <dhellmann> rockyg : our syslog API has a log level and a message field. The message is, as far as I can tell, formatted using the same configuration option that is used to specify the log line format for regular formatters. I haven't used that in production with syslog, so I'm not sure if there is extra punctuation added or noth
21:41:47 <dhellmann> *not
21:41:54 <rockyg> Every time an api call results in a new req-id, we would put the new one in the "received" or "called" keypair, put the reqid of the process that made the api call in the "caller" or "dispatched" keypair and the original client call stays the same in the originator keypair
21:42:39 <dhellmann> rockyg : if service A is calling service B, do I do the logging in A or in B?
21:42:45 <dhellmann> or both?
21:44:12 <rockyg> I originally thought A but that doesn't make all that much sense.  I think B.  The ops guys are talking about using logstash + elastic search or other agregators to get all related log messages
21:45:39 <rockyg> A and B would be more robust, but I am not sure we need that.  The key is ops want to see where the line breaks if something doesn't happen.
21:46:47 <jokke_> well we can't log A's req ID in B because that req ID doesn't get sent to B
21:46:48 <rockyg> A andB would allow quick determination that handoff didn't happen but B would show the breakage point.
21:47:15 <rockyg> jokke_, that's how it doesn't make sense ;-)
21:47:33 <jokke_> so only place to log that is in A after B has returned
21:47:39 <dhellmann> ok, so we log requests in A after B returns, since the response from B includes the request id
21:47:54 <jokke_> dhellmann: that's the only way to do that
21:48:05 <rockyg> Oh, good.  The mistake in my logic is found.  Thank you!
21:48:06 <dhellmann> so A is logging "I just called B with request id Y" and A's request id X is automatically already in the log output
21:48:33 <jokke_> so you need to parse ahead to seek the request ID that has been logged, then roll back in time to look the service B logs what happened
21:48:49 <rockyg> Yeah.  Hopefully. If not, we have to insert it.
21:49:58 <dhellmann> so I guess the next step is to either update everywhere we have a client call, or just update the clients, to log the request ids
21:50:33 <dhellmann> if we do it in the clients, using python's standard logging, the overrides from the apps that use oslo.log will give them the current request as part of the format string
21:51:01 <dhellmann> so the question is, do we actually want to do it in the clients, or do we want to do it where the clients are used?
21:51:39 <dhellmann> the latter is a lot more changes
21:51:46 <jokke_> dhellmann: based on the previous discussions around the topic I doubt people want's that logging to happen by the client
21:51:53 <dhellmann> the former would mean always logging a given call at the same level, though
21:52:24 <rockyg> Here's another question: client says: give me a vm.  Sends to Nova.  Nova send to Neutron: give me a network    Does "give me a network" call come from a client?
21:52:26 <dhellmann> ok, so we need to find all outgoing client calls in all apps and wrap them with try:finally and log calls I guess
21:52:44 <dhellmann> rockyg : yes, nova uses neutron client to talk to neutron
21:52:49 <rockyg> Cool.
21:53:09 <rockyg> I believe the amount of work necessary is moot as Abhi and his team plan on doing it.
21:53:31 <dhellmann> well, it's not, because that's a *lot* of code churn and reviews
21:53:51 <rockyg> That's true.  Sorry.  Wasn't thinking it through.
21:54:12 <jokke_> yes that brings lots of of big reviews or huge amount of small reviews
21:54:36 <dhellmann> it's much less work all around to log from within the clients, so I'm curious about why that was rejected
21:55:04 <rockyg> So, Abhi's email proposed three options:  putting the utils in oslo.log, putting it into oslo.utils or putting it in the individual projects.  I believe 3 equates to the every client?
21:55:17 <dhellmann> yes, at least
21:55:54 <dhellmann> it's not clear how the request_utils.py code applies to this, if I've found the right version
21:55:59 <dhellmann> https://review.openstack.org/#/c/170074/2/openstack/common/request_utils.py
21:57:21 <dhellmann> it looks like it just builds a very verbose log message?
21:57:46 <dhellmann> well, and there's the notifier bit at the end
21:57:59 <jokke_> rockyg: if it's not done by client there is no way it's done anywhere without logging specifically after each client call
21:58:17 <rockyg> Yeah.  That's not gonna work.  I see now.  His email just restates the code implementation.
21:58:39 <dhellmann> jokke_, rockyg : I guess my point is, why do we even need a function for this? it's just a call to the logger.
21:58:55 <rockyg> So I believe the answer is, that's not the way to do this, and we suggest what we just discussed.
21:58:56 <jokke_> dhellmann: I totally agree
21:59:21 <dhellmann> logging.getLogger(__name__).info('%(current_function)s used request ids [%(request_id_chain)s]')
21:59:46 <rockyg> It goes in clients.  It utilizes the syslog structured data field in the message header.  Point him to this discussion
22:00:00 <dhellmann> rockyg : we do not, by default, use syslog, so we have to leave that part out
22:00:00 <jokke_> rockyg: as said I highly doubt community will agree clients doing that logging
22:00:48 <jokke_> as people tends to be really touchy libs logging stuff
22:01:40 <rockyg> Cool.  So, it's just adding the extra formatter field, plus the related code. So, how do we do the logging in B rather than A?  How do we get A's request ID and the originating ID?
22:01:45 <jokke_> we're out of time
22:01:59 <jokke_> rockyg: we don't
22:02:23 <rockyg> Argh.
22:02:35 <jokke_> rockyg: and that extra formatter field won't help as someone needs to initiate that log event
22:02:50 <rockyg> Well, great discussion.  We point him to this and continue the ML discussion.
22:03:11 <jokke_> thanks everyone
22:03:17 <rockyg> Is that reasonable for the next week worth of design?
22:03:34 <rockyg> And Yes.  Gotta go.  docs has this channel next.
22:03:36 <dhellmann> yeah, let's continue on the mailing list
22:03:41 <rbradfor> ok, later
22:03:47 <rockyg> #endmeeting