#openstack-meeting log

22:02:49 <devananda> #startmeeting db
22:02:50 <openstack> Meeting started Thu Feb  7 22:02:49 2013 UTC.  The chair is devananda. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:02:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:02:53 <openstack> The meeting name has been set to 'db'
22:03:30 <devananda> (kinda lost my train of thought with the break, heh)
22:03:51 <comstud> i have 2 separate issues
22:03:56 <comstud> though somewhat related
22:04:12 <comstud> but I dunno what the agenda here is..
22:04:34 <devananda> so i didn't have an agenda for today
22:04:40 <comstud> ok
22:04:49 <devananda> last meeting, we went over the status of all the db BP's, which was good, but i dont think we need to rehash that
22:05:10 <devananda> the problems you're seeing with pooling is pretty big IMO
22:05:17 <devananda> especially how that will affect conductor
22:05:33 <comstud> they've uncovered at least one problem for sure, one with which you were already aware, I think
22:05:36 <comstud> yeah
22:05:53 <comstud> i'll recap
22:06:01 <comstud> starting with this first issue
22:06:57 <comstud> well, first...
22:07:20 <comstud> With the move to nova-conductor... we're now pushing DB queries to a single node (or set of nodes if you run multiple nova-conductors)...
22:07:31 <comstud> but in a large deployment, you have much less conductors than computes, likely.
22:07:35 <comstud> You don't want to have 1 per compute :)
22:08:04 <comstud> This means that DB pooling becomes important... as without it, DB calls block the whole python process.
22:08:40 <comstud> The RPC queue will back up because the service cannot keep up with all of the DB requests if it has to block on every single one without any sort of parallelism
22:09:00 <comstud> Good so far?
22:09:05 <devananda> yep
22:09:23 <comstud> Cools... so I've been doing testing with db pooling
22:09:45 <comstud> With the trunk code, under load, you can get some connect timeouts, etc.  This patch is needed:
22:09:47 <comstud> https://review.openstack.org/#/c/21461/
22:09:57 <comstud> which changes how the thread pooling is done
22:10:15 <comstud> After applying that...
22:10:34 <comstud> issue #1) innodb reporting Deadlock
22:11:07 <boris-42> Seems that DB pooling start working=)
22:11:08 <comstud> when you get 2 bw_usage_update() requests in parallel for the same instance uuid (can be different mac addresses/networks), this happens
22:11:19 <comstud> This is due to doing UPDATE+INSERT in a single transaction
22:11:53 <devananda> iirc, it should also deadlock the first time any instance is provisioned for a new tenant, if that happens in parallel.
22:12:05 <devananda> and in a few places around fixed / floating IP allocation, if they happen fast enough in parallel
22:12:21 <comstud> Yeah, you had a list that reported a number of other spots where this could happen
22:12:31 <comstud> besides bw_usage_update()
22:12:40 <devananda> right. there are also some open bugs on LP about these now
22:13:13 <comstud> In any case... We've found that bw_usage_cache table should have a unique index on instance_uuid+mac+start_period, which it doesn't right now.
22:13:29 <comstud> And that would allow us to do things like INSERT ON DUP KEY UPDATE
22:13:40 <comstud> if we can find a portable way to do it with sqlalchemy :)
22:14:37 <dripton> sounds like we should add the index regardless, while we hunt for the portable upsert
22:14:42 <comstud> I tested a variation on this that removed the Deadlock
22:14:49 <comstud> yes
22:15:11 <comstud> I think we can solve that issue without *too* much trouble, however
22:15:13 <comstud> issue #2:
22:15:53 <comstud> Even after getting innodb Deadlocks to go away, I'm seeing queries getting stuck and timing out due to waiting on innodb locks.
22:16:23 <comstud> The only conclusion is that sqlalchemy is not committing a transaction... it's holding it somewhere
22:16:26 <devananda> i dug through my notes and remembered that i found this issue a while back, too
22:16:34 <devananda> https://bugs.launchpad.net/nova/+bug/1007038/comments/4
22:16:36 <uvirtbot> Launchpad bug 1007038 in nova "Nova is issuing unnecessary ROLLBACK statements to MySQL" [Low,Confirmed]
22:16:40 <comstud> but I can only see this when thread pooling is on
22:16:58 <comstud> devananda: I was able to cause an extra rollback very easily..
22:17:05 <comstud> kind of a side toopic
22:17:08 <comstud> topic
22:17:08 <devananda> comstud: it was happening in august last year. was thread pooling in at that time?
22:17:12 <comstud> no
22:17:28 <comstud> it seems to be our wrapping of DB exceptions
22:17:30 <devananda> rather. it was happening when I disabled pool_reset_on_return
22:17:34 <comstud> from what I could tell
22:17:48 <devananda> hmmm
22:17:50 <comstud> i had code such as this:
22:17:55 <comstud> with session.begin():
22:18:07 <comstud> try:
22:18:09 <comstud> do a query
22:18:18 <comstud> except DBDuplicateEntry:
22:18:21 <comstud> pass
22:18:34 <comstud> and I noticed that 2 rollbacks would happen if there was an exception
22:18:43 <comstud> and there'd be a error logged about it
22:18:48 <devananda> kinda makes sense
22:18:59 <devananda> sqla is probably doing a rollback before the exception bubbles up to your code
22:19:04 <comstud> it seems like a rollback would happen when the exception occurred
22:19:13 <comstud> and then also when we exited the context manager.
22:19:22 <devananda> and then after you leave the with session context, another implicit rollback happens (because of pool_reset_on_return)
22:19:25 <devananda> right
22:19:25 <comstud> yep!
22:19:27 <comstud> exactly
22:19:40 <comstud> that's a side issue, I think
22:19:57 <comstud> i'm concerned about where sqlalchemy seems to be holding a transaction
22:20:04 <devananda> yea. me too
22:20:07 <comstud> causing these lock wait timeouts
22:20:18 <comstud> I can cause the whole process to stall by slamming DB requests
22:20:33 <comstud> after meetings are done today, I was going to debug with eventlet backdoor
22:21:17 <comstud> anyway, that's where I am with 'bugs' that show up with db pooling.
22:21:45 <comstud> separate issue, somewhat related:
22:22:08 <comstud> sqlalchemy is extremely slow compared to raw queries with MySQLdb.
22:22:14 <comstud> And there are no lock wait timeouts, either
22:22:15 <comstud> :)
22:22:35 <devananda> :)
22:22:37 <dripton> comstud: I think that's only with the ORM.  Low-level sqla is pretty fast IMX
22:22:48 <comstud> it very well might be, I just haven't tested it yet
22:22:57 <comstud> belliott and I have been working on this together
22:23:02 <devananda> by raw queries, you mean model.insert? or session.execute("INSERT ... INTO table")
22:23:27 <devananda> just want to be clear
22:23:31 <comstud> mysqldb .cursor(....)
22:23:35 <dripton> the fast ones are session.execute, but it could be the Python insert() not necessaily the string insert
22:24:01 <dripton> comstud: please confirm for yourself, but I want to make sure we don't throw out the baby with the bathwater
22:24:23 <comstud> cursor.execute('UPDATE bw...')
22:24:28 <dripton> my experience is that low-level SQLAlchemy is an excellent portability shim, and high-level sqla ORM doesn't scale to big projects
22:24:38 <comstud> dripton: I'm not proposing we do.  I'm just stating what I've seen so far
22:24:42 <dripton> ok
22:24:53 <devananda> "doesn't scale" sums up my exp with all ORMs
22:24:56 <dripton> +1
22:25:01 <comstud> it was just a quick hack to test something
22:25:04 <dripton> but I haven't tried all of them so I'm being nice
22:25:24 <comstud> wanted to eliminate innodb just being stupid somehow
22:25:30 <dripton> yep
22:25:33 <dripton> one layer at a time
22:26:00 <devananda> would it be viable to replace just the areas that are likely to deadlock with raw sql?
22:26:21 <dripton> I think we'd end up replacing entire api calls that have problems
22:26:26 <comstud> i was thinking about that yes
22:26:33 <devananda> ie, anywhere that currently uses with_lockmode or select-then-update-or-insert
22:26:55 <devananda> and yea, that does mean several api calls get entirely rewritten to avoid locking issues
22:26:56 <comstud> but I think all of the calls potentially have this 'lock wait timeout' issue
22:27:13 <comstud> because it's a sqlalchemy orm layer problem
22:27:14 <boris-42> comstud +1
22:27:23 <comstud> with db pooling anyway
22:27:25 <devananda> well
22:27:29 <dripton> We can start with a couple that are known bad then start attacking en masse if it works well
22:27:31 <devananda> while i agree with that
22:27:40 <comstud> but in particular, I've seen it with 3 or 4 calls
22:27:45 <devananda> if we replaced all the ORM code that touches table X with raw sql, that didn't have locking problems
22:27:48 <comstud> they just happen to be the most common calls that cells uses.
22:27:59 <comstud> so that doesn't eliminate any of the others
22:28:15 <devananda> true. but lock waits and deadlocks are table specific
22:28:25 <devananda> it's not like holding on lock on table_a causes a problem for table_b
22:28:33 <comstud> sure sure
22:28:34 <comstud> i know
22:28:42 <comstud> but i'd expect that it could happen on any table
22:28:42 <devananda> granted, there _is_ a larger problem with sqla here, which i'd love to see fixed :)
22:28:53 <comstud> yea
22:28:54 <devananda> ah. so I wouldn't
22:29:07 <devananda> afaict it will only happen when the ORM is trying to pre-emptively lock rows
22:29:20 <devananda> ie, with_lockmode or SELECT-then-INSERT
22:29:26 <devananda> there aren't that many places which do that
22:29:35 <comstud> but if there's somehow an implicit transaction for all of our DB api calls, does that not mean there's locking of some sort on every table?
22:30:00 <devananda> well, yes, but no :)
22:30:08 <comstud> i was able to see this problem even after removing session.begin()
22:30:12 <comstud> for instance
22:30:21 <devananda> select 1; begin; update table ... where ..; commit; rollback;
22:30:23 <comstud> (the lock wait timeouts, not the Deadlock)
22:30:28 <devananda> that's the usual sqla query pattern for a single update
22:30:34 <dripton> But if we change an API call to not use the ORM then that should go away, right?
22:31:36 <dripton> We can put sqla in trace mode to see all the commands it sends to the DB, and verify that simple stuff doesn't do any extra transaction stuff.
22:31:36 <devananda> if sqla somehow forgets to do the final "commit; rollback;" then yea, the lock wait problem could happen anywhere
22:31:53 <comstud> nod
22:32:03 <comstud> atm, that's what I suspect somewhere.. but I don't know for sure
22:32:09 <devananda> k
22:32:14 <comstud> possible i'm wrong and it's only happening in certain calls
22:32:35 <comstud> dunno!
22:32:53 <devananda> i wonder whether it's related to eventlet?
22:33:02 <comstud> but I know we're approaching the limits here at RAX in global-cells
22:33:07 <comstud> with all of the DB calls it has to do
22:33:13 <comstud> so I need Db pooling RSN
22:33:20 <dripton> devananda: me too
22:33:36 <comstud> I may have to resort to raw queries for things at least internally if I can't figure this out quickly
22:33:53 <comstud> and I know this same problem will show up in conductor under load
22:34:05 <comstud> (we're still using local conductor only)
22:34:37 <comstud> i'm going to attempt to find the sqlalchemy issue first
22:34:44 <dripton> sounds good
22:34:44 <comstud> and if that takes too long, look at low level sqlalchemy calls
22:34:49 <devananda> ++
22:35:33 <devananda> i wouldn't mind seeing low level sqla calls upstream, esp considering it's better for performance
22:35:39 <dripton> I will have a work-in-progress patch containing low-level sqla up soon, if you need a reference.  It doesn't quite work yet.
22:35:55 <comstud> i could def use a reference
22:36:06 <dripton> I'll ping you when I upload it.
22:36:10 <comstud> or.. it'd save some googling at least :)
22:36:15 <comstud> hehe ty
22:36:59 <dripton> Here's my fundamental problem with db-archiving: my current api call is to move *all* deleted rows to shadow tables in one call.  That could take a long time.  I don't know a clever way to subdivide the job.
22:37:26 <devananda> comstud: would it be possible to disable eventlet in conductor, while still enabling tpool?
22:37:44 <devananda> probably a crazy idea ...
22:38:02 <devananda> dripton: ORDER BY id LIMIT X;
22:38:05 <comstud> tpool is implemented in eventlet
22:38:10 <comstud> eventlet.tpool
22:38:13 <devananda> hah. yep, crazy idea
22:38:43 <comstud> we could do our own threading, i suppose.. but you lose pseudo-parallelising pulling off the rabbit queue
22:39:02 <comstud> maybe it's a win in the end, though, i dunno
22:39:14 <comstud> i suspect kinda not
22:39:17 <comstud> but i dunno!
22:39:25 <dripton> devananda: yes, that works within one table.  I'm worried about cross-table issues due to foreign keys, but we don't have many so maybe it's okay.
22:39:42 <devananda> dripton: there shouldn't be any FKs in production IMNSHO ....
22:40:00 <devananda> dripton: but even so, i'm not sure how that would matter
22:40:11 <devananda> dripton: loop on one table, in small chunks, until it's finished. then move to next table
22:40:23 <devananda> make sure each chunk is a separate transaction, too
22:40:27 <boris-42> devananda +1
22:40:44 <devananda> otherwise you can blow away the innodb undo space, and it will stall replication, etc...
22:40:47 <dripton> devananda: sure, FKs would just impose an ordering on how to move things.  But, yeah, I'll do a bunch of little transactions and have a row limit per call to the api
22:41:45 <comstud> i'm concerned about down time for large deployments
22:41:53 <comstud> with the changes to soft delete
22:41:55 <comstud> the deleted column
22:42:18 <comstud> maybe we just prune the tables first if we don't care about archiving
22:42:38 <dripton> comstud: you mean downtime when running the archiving operation?
22:42:45 <comstud> DB migration
22:42:47 <comstud> in general
22:42:48 <dripton> comstud: or when migrating the deleted column?
22:42:50 <dripton> ah
22:42:52 <comstud> yeah that one
22:42:55 <comstud> we ran a test...
22:43:13 <comstud> we took the proposed migration and ran it against a copy of the DB
22:43:18 <comstud> it took about 45 minutes IIRC
22:43:18 <comstud> :)
22:43:23 <boris-42> =)))
22:43:26 <devananda> i would assume that deployuers are not going to just run all the migrations blindly, but we should include a note about the larger migrations
22:43:36 <dripton> +1
22:43:36 <devananda> comstud: 45min is nothing for a big ALTER TABLE ;)
22:43:52 <comstud> yeah, but unfort due to use of sqlalchemy orm...
22:43:53 <boris-42> alter table for all tables=)
22:43:55 <dripton> comstud: I was planning to include another API call to nuke deleted rows
22:43:57 <comstud> it means shit is broken during the whole migration
22:44:16 <devananda> sure. which is why you have two db's & HA
22:44:17 <devananda> :)
22:44:17 <comstud> i think we just prune our tables first
22:44:32 <devananda> or that, hehe
22:45:55 <comstud> i gotta drop off.. anything else for me?
22:46:00 <devananda> anyone have other topics to bring up? it's seeming like we're about done
22:46:06 * comstud waits
22:46:18 <dripton> I'm done
22:46:22 <devananda> cool
22:46:26 <devananda> thanks guys :)
22:46:30 <dripton> thanks for the tip devananda
22:46:34 <devananda> #endmeeting