The writings of Merlin Moncure, professional database developer, about work, life, family, and everything else.

Monday, December 17, 2007

Advisory Locks, an Update

Update: more testing has shown this to not work as well as I thought...details will follow
Update 2: this doesn't work at all. I need to give it some more thought...M

I've made no secret that 'advisory locks' is one one of the coolest features in PostgreSQL (see my posts here and here). As a general purpose database mutex, you can do all sorts of wonderful things with them...long term locks, cooperative locks, sub transaction locks, etc.

But what about sub statement locks?

Consider the gapless sequence as described here (normal sequences produce 'holes' in the number line when rollbacks or database crashes occur). In the example, an external record is used to track the counter and provide concurrency protection. Well, how about this:

create table t(id int primary key);

insert into t select coalesce(id, 1) from
(
select pg_advisory_lock(1),
id + 1 as id,
pg_advisory_unlock(1) from t
order by id desc limit 1
) q;

The above statement (ab)uses the left to right order of execution of the fields of the statement. No external object is locked, and the outer select statement could do a bunch of other time consuming things without tying up the lock and killing concurrency. There is no chance of two transactions grabbing the same value of id in read committed mode (there may be a small chance of a race to grab the first value of id). This may not be the best or even a good way to do this, but it's an interesting demonstration of the technique.

Sunday, December 16, 2007

12 Days of Christmas, PostgreSQL style

In case you have trouble remembering the lyrics...

select v || case
when v=1 then ' partridge in a pear tree'
when v=2 then ' turtle doves'
when v=3 then ' french hens'
when v=4 then ' calling birds'
when v=5 then ' GO-O-OLDEN RINGS!'
when v=6 then ' swans a swimming'
when v=7 then ' geese a laying'
when v=8 then ' maids a milking'
when v=9 then ' ladies dancing'
when v=10 then ' lords a leaping'
when v=11 then ' pipers piping'
when v=12 then ' drummers drumming'
end from
(select generate_series(generate_series(1,12), 1, -1) as v) q;

Happy Holidays!

[update: now the lyrics are in the right order for day #2 and beyond...also decided to correctly spell 'pear' against my better judgement]

Managing Trees with Arrays, Part 2

Einhverfr writes:
Out of curiosity, have you looked at contrib/tablefunc and its connectby() function? We use it and the parent_id solution in LSMB and it works pretty well. It also allows for a cleaner semantics because we don't have to store the full path in the record (this and "level" are both generated by connectby!).

Thanks for your interest.

I have looked at connectby, and I have a lot of issues with it. It's a good approach and probably better if your data is highly volatile (updates are much cheaper), but less flexible. If it meets your requirements perfectly I'd use it, but I've tried a few times and end up bumping into limitations. The materialized path approach as advertised scales extremely well, except for the insert problem or if your trees are extremely deep because the index gets huge.

First, the planner knows zero about the connectby function. This can lead to planning problems for complicated joins that it interfaces with. The materialized path approach is mostly wide open to the planner. I find myself often writing complex queries, and absolutely detest pushing data issues into the application layer. This is why I favor views over functions generally when there is a choice. The view abstracts the interface a bit for application without hobbing the planner. The view abstraction approach I used is what really sells the idea, btw.

Secondly, connectby takes a single key as a parameter only. This can lead to very awkward situations if you need to generalize the input over a range of records (give me all the families for items in range of (a,b)), and the query will be slow because the table is unordered. Since the materialized path approach orders the table over the key, the query is simple and fast.

Third, the indexing strategies are different...all operations traversing the tree with the parent/child index relationship require recursion. The connectby function hides the recursion from you but the recursion is still there. This means that for many operations the net result will be less efficient than when the path is handed to you.

Fourth, while it's kind of a pain because you have to create an opclass (and other reasons), the array approach at least opens the door to composite keys.

Fifth, connectby gives you a delimited text string, which has to be parsed and wrapped in a view (my examples above do this already). You also need to wrap if you want to be able to traverse up the tree in a single operation. I would advise doing this...it's probably not a good idea to have the application invoke the connectby function directly.

In fact, sacrificing a little bit of cpu overhead, I could write the parent child relationship abstracted into a few recursive sql functions that traverse the tree. While this can get dangerous as you only want to do really heavy recursion in C as iteration, it's probably good enough for most circumstances and sidesteps some of the land mines the planner throws you with connectby.

Thanks again for commenting. I really should write some benchmarks demonstrating the various techniques. Maybe, if there is a enough interest, I'll take the time to do that.

merlin