20:00:47 <Rockyg> #startmeeting log_wg
20:00:47 <openstack> Meeting started Wed May  6 20:00:47 2015 UTC and is due to finish in 60 minutes.  The chair is Rockyg. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:48 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:51 <openstack> The meeting name has been set to 'log_wg'
20:00:59 <Rockyg> Roll call!
20:01:07 <dhellmann> o/
20:01:27 <Rockyg> jokke_, nkrinner
20:01:50 <jokke_> o/
20:02:02 <Rockyg> bknudson:
20:02:36 <Rockyg> Am I missing anyone?
20:02:54 <bknudson> hi
20:02:58 <Rockyg> eugeniya is not on at least on this channel
20:03:01 <Rockyg> Hey!
20:03:34 <Rockyg> And dhellmann, thanks for showing up.  We'll be going for a better time after the summit.
20:04:01 <Rockyg> #topic log sessions at the summit
20:04:16 <dhellmann> ok, this meeting time isn't terrible for me but if we can find a time that's better for others that would be good, too -- we need more members :-)
20:04:33 <Rockyg> We have two ops sessions.  A general session and a working session
20:05:05 <Rockyg> We do not have a dev session, but a conversation with doug leads me to believe we aren't ready for that yet.
20:05:30 <dhellmann> here's one: https://libertydesignsummit.sched.org/event/764d77baafe13caaad8ff1badabb9b9a#.VUpz_s5hW9Y
20:05:30 <Rockyg> The ops sessions will get us much closer to that point if we do our jobs.
20:05:31 <dhellmann> where is the second?
20:05:57 <Rockyg> 4:30pm Wednesday
20:06:34 <dhellmann> https://libertydesignsummit.sched.org/event/fbfceb17bc4927136c0aa778d38586d1#.VUp0Rs5hW9Y
20:06:57 <Rockyg> Thanks
20:07:28 <Rockyg> I've started the etherpad but we need to flesh it out some.  We will use it for both the general and the working session.
20:07:51 <Rockyg> https://etherpad.openstack.org/p/YVR-ops-logging
20:08:38 <Rockyg> I was trying to edit it yesterday, though and had problems.  Might have been my connect to ehterpad.openstack.org, but seemed the site might have been overloaded then
20:09:18 <Rockyg> Is there anything any of you want specifically addressed and discussed at the summit?
20:09:46 <dhellmann> Rockyg: that happens once in a while; you have to force the page to reload to make it work again, unfortunately
20:10:38 <Rockyg> Yeah.  I did that and kept getting an error.  Other pages weren't having the problem.  That page was in a wierd state, so I figured I'd come back and try today.
20:10:49 <bknudson> will be interesting to hear from ops if they have any specific complaints
20:11:04 <dhellmann> Rockyg: hmm, weird
20:11:06 <Rockyg> If I still have an issue, it's off to openstack-infra to discuss after the meeting.
20:11:28 <bknudson> e.g., I can't tell what happened when scheduler fails.
20:11:41 <dhellmann> Rockyg: I'm able to edit the etherpad (I added a link to an oslo.log spec)
20:12:02 <Rockyg> The three big issues from Paris were: 1)  tracing the error back to its true origination
20:12:34 <bknudson> luckily we made lots of progress on those 3 issues!
20:12:47 <Rockyg> 2) error messages that were useless (noops, misleading, worse)
20:13:32 <Rockyg> 3) and now I can't remember.
20:14:25 <Rockyg> Another one from before the summit from ops folks I know:  stuff happens in eventlets that never make it to logs
20:14:44 <Rockyg> So, propagating important notifications to logs
20:15:43 <Rockyg> Oh, yeah.  consistency.  Same format, same number of fields.  If there is an optional field in the message not used, put a "-" in its place in the message
20:16:22 <Rockyg> Thank you dhellmann for adding that spec link.  that was what I was trying to add yesterday
20:17:04 <Rockyg> and yeah, bknudson, we've made progress ;-)
20:17:16 <Rockyg> Actually, we have, thanks to dhellmann and the oslo team
20:17:24 <Rockyg> We have better docs
20:17:38 <Rockyg> we have better guidelines for devs
20:18:06 <bknudson> keystone logs didn't improve... still need the time to work on it.
20:18:09 <Rockyg> And we have improvements on config and a few other things that ops wanted.
20:18:40 <Rockyg> 3) global config setting of log format (syslog)
20:18:51 <Rockyg> 3) is done.
20:19:00 <dhellmann> Rockyg: the ops session on logging is at the same time as the oslo session on logging, so we should talk to dims and whoever is scheduling the ops track to see if we can move one of them
20:19:20 <dhellmann> http://libertydesignsummit.sched.org/event/35475d6e34ad1b4c2ffc5a2ff8cc68ed#.VUp3Rc5hW9Y
20:19:26 <dhellmann> it would be good to have the ops session before ^^
20:19:58 <Rockyg> dhellmann:   Definitely.  And agreed.  If we can get actionable items out of the ops working session before the oslo session, that would be great.
20:20:11 <dhellmann> that ops session is also up against a talk on logging, which is unfortunate: http://libertydesignsummit.sched.org/event/407b15645cef2cfb4248a28a0c96c9fe#.VUp3a85hW9Y
20:20:18 <dhellmann> we may want to move both of them
20:20:45 <dhellmann> both the ops session and the oslo session, that is
20:20:48 <Rockyg> #action connect with fifeld and dims about reschedule of one of the log sessions currently Wednesday at 4:30
20:21:53 <Rockyg> #link http://libertydesignsummit.sched.org/event/407b15645cef2cfb4248a28a0c96c9fe#.VUp3a85hW9Y
20:22:29 <Rockyg> #link https://libertydesignsummit.sched.org/event/fbfceb17bc4927136c0aa778d38586d1#.VUp0Rs5hW9Y
20:23:39 <Rockyg> Ok.  For first meeting, I would like to present the specs and/or reviews that have/will make logging better, and docs
20:24:04 <Rockyg> dhellmann: are you aware of any user docs that talk about logging?
20:24:24 <Rockyg> Config, admin, etc?
20:24:26 <dhellmann> I haven't looked for anything like that, but I would expect the ops manual to cover it somewhere?
20:24:44 <dhellmann> I don't know if they still call the manual "ops guide" or whatever
20:25:02 <Rockyg> I think there is less that a paragraph somewhere.
20:25:10 <Rockyg> I think it's config.
20:25:22 <Rockyg> At least that's what it was like Paris timeframe.
20:26:06 <bknudson> keystone logging: http://docs.openstack.org/admin-guide-cloud/content/keystone-logging.html
20:26:35 <bknudson> we should be able to have keystone logging docs just point to a general logging config doc that all the others point to
20:26:41 <Rockyg> I think that's another reason there are so many complaints.  Ops has to dig into dev docs or code to find stuff.
20:27:24 <bknudson> http://docs.openstack.org/admin-guide-cloud/content/section_manage-logs.html
20:27:28 <bknudson> there's the compute one... nice!
20:28:41 <Rockyg> I would like to get ops volunteers to document what they do and have dev review the results.  And yeah, bknudson, that's much better.
20:30:05 <Rockyg> I think a target for Liberty is to have a chapter on log, configuring, managing, etc that is for a full stack
20:30:34 <Rockyg> I think that is an attainable goal.
20:30:43 <jokke_> yeap
20:31:10 <Rockyg> So, uh, topic drift.  do we have anything else for summit planning?
20:32:36 <Rockyg> #topic general discusson
20:34:12 <Rockyg> I am thinking that a good chunk of the dissatisfaction with logging is lack of understanding of the underlying python log stuff along with  a lack of specific documentation of what oslo.log adds/modifies.
20:35:11 <Rockyg> The more I dig and learn about it, the more I see that the naive approach to the logs makes managing them much more difficult for ops.
20:35:17 <jokke_> Looking the logs, it's just not that ... lots of te logging we do is for developers not for ops. What we log, the way we log it etc.
20:36:20 <Rockyg> jokke_: That's also very true.  We need to educate developers of how large systems use logs and what they expect from them
20:36:41 <Rockyg> So at a minimum, better docs on both sides
20:37:26 <jokke_> having that would be good start
20:37:39 * dims checks in
20:37:49 <jokke_> then having people actually reading those docs and acting by that would be awesome ;)
20:38:04 <Rockyg> Hey dims, we have a summit scheduling conflict
20:38:34 <Rockyg> the ops log working session and the oslo.log session are at the same time
20:39:00 <dhellmann> dims: during the 4:30 period on wed
20:39:16 <Rockyg> we really need the ops session to happen before oslo.log session
20:39:16 <dhellmann> we have an oslo logging session, an ops logging session, and a conference talk on logging
20:39:51 <dims> dhellmann: ack, will try to move oslo.log
20:39:58 <dims> to thu
20:40:08 <Rockyg> Thanks, dims
20:40:44 <dhellmann> dims: welcome to summit scheduling :-)
20:42:23 <Rockyg> folks, do you think we could get a youtube talk on good logging algorithms (when to log) and message style?  Would that help for developers?
20:43:35 <dhellmann> we should probably write down what we would want to say, first
20:44:21 <Rockyg> Definitely.  But if it would help, we could spend a meeting on planning the talk.  Either at the summit, or after.
20:45:46 <dhellmann> ok. I'm not sure if it would be useful or not, so I'll leave that up to others
20:46:37 <Rockyg> Agreed.  If we could get an ATC who is also ops to do it, that might help.  Something to think about.
20:46:46 <Rockyg> Anything else?
20:48:09 <Rockyg> One thing I was thinking about, after looking at the  og files generated from a tempest run:  I want to start a discussion with ops as to whether consolidating some of the logs through a collector might be something they would like
20:48:50 <jokke_> Rockyg: care to elaborate a bit?
20:50:07 <Rockyg> So, a description of how to.  From my ops experience, getting all the nova logs (api, etc) into a single log would at least put all of them in one single file and one place for looking for nova issues
20:50:46 <Rockyg> https://docs.google.com/spreadsheets/d/1XTncfK_droY8E-Uy2icVuU-z9ya38ZBK_ZIRvGfPOXc/edit#gid=0
20:51:27 <Rockyg> is my start on listing all the files and their types.  Sorry, bknudson, keystone logs aren't in there yet because I haven't gotten to the apache logs
20:51:30 <jokke_> in my experience / by my knowledge OS is impossible to run without something like that already in place ;)
20:52:15 <Rockyg> for instance, nova has:  api, cond, cpu, crt, obj, sch, agt logs all separate
20:52:30 <jokke_> I'd be very surprised if someone ran production without having some kind of centralized logging where they pull all that together
20:52:43 <Rockyg> Wow.  major freenode delay1
20:53:34 <Rockyg> right now, they use logstash or splunk to pull them together.  but, if they were already more consolidated, setup would be a lot easier.  And it might make life easier for smaller cloud installations.
20:54:36 <Rockyg> All the logs and their locations should also be documented in the various project chapters for the amin guides
20:55:03 <bknudson> you can configure the logs to go anywhere
20:55:15 <jokke_> do you mean something like instructions how to set all services logging to syslog and managing the split to files from there?
20:55:47 <Rockyg> bknudson: yes, but you can also configure them to write related logs to a single "local" log file first.
20:57:54 <Rockyg> jokke_ bknudson: more like have oslo.log send all nova.xxx files to /etc/logs/nova.syslog.  then the thirdparty aggregator only has to grab one nova log (well, likely a couple because of other types of logs you'd want from that system) from each nova instance
20:59:27 <Rockyg> right now, nova.conductor messages go to nova-cond and nova.api messages go to nova-api.  Why wouldn't you want the api messages in the same file as the conductor messages?  They are both part of the nova "system?"
20:59:33 <jokke_> doubt anyone wants to have their logs in /etc/ but I see your point ;)
21:00:37 <Rockyg> sorry :-(  but, yeah.  I want to discuss this with ops, first, but if a slightly modified default setup for the files themselves make sense, we can do that with either a sample config file, or ....
21:01:00 <Rockyg> /var/logs ...that better?
21:01:01 <dhellmann> having multiple processes write to the same file will introduce race conditions and corrupt the logs
21:01:15 <jokke_> yup
21:01:30 <dhellmann> that's why we write to separate files, and deployers can use something like syslog if they want a single file
21:01:43 <Rockyg> dhellmann: if we use the collector and parent setup from python.log, modelled on syslog?
21:01:47 <jokke_> thus you need something like the syslog if you want to have them all in one
21:02:01 <dhellmann> Rockyg: then that's another service we have to maintain, instead of using one that already exists
21:02:34 <Rockyg> I think it's already in python.log
21:05:12 <Rockyg> I will look more closely and see what I find.  I thinkit's the syslog handler or socket handler.
21:05:21 <Rockyg> But, it is TIME.
21:05:24 <dhellmann> Rockyg: ok, I'm not sure what you mean but I'll look for it too
21:05:30 <Rockyg> #endmeeting