Several Postgres partitioning questions (hierarchical partitioning, HASH, PK order)

All we need is an easy explanation of the problem, so here it is.

I’m pretty new to Postgres and to RDBMS partitioning. I have a few related questions about the performance benefits of partitioning.

Background: I am trying to fix slow batch queries on a 4 column table with over 300 million rows. The PK is on all columns and the 3 columns that aren’t at the beginning of the PK also have single column indexes.

The indexes are growing out of control and are collectively larger than all RAM, and 3x the size of the underlying data. Batches regularly read and write a couple hundred million rows and they are slow.

What I have already done/understand: I’ve analyzed the code and determined that only the first and last columns are used in SELECT WHERE clauses. Other than INSERTs, no other queries hit this table (besides a DELETE that will be replaced with partition dropping). So I already know I can drop two of the three indexes since they are unused, which will lead to reduced index size, fewer indexes, and hopefully improved INSERT/SELECT performance.

In addition to dropping the unused indexes, I am going to use partitioning to replace a lengthy DELETE statement (which has a WHERE on the two middle columns) with, instead, LIST partitioning on those columns so I can DROP partitions.

Where I have questions: My proposed LIST partition also breaks up the table somewhat, so the partitions are no more than 1/4 the size of the original table. However, even that size is very large. I am considering adding a second hierarchical partition layer with a HASH partition on the 4th column, which could further considerably reduce partition size. My hope is that this would further reduce RAM consumption and/or improve performance in other ways.

However, I have a few questions about this:

  1. While I would be reducing partition size, my queries actually select on nearly every value in the column where I would be using HASH partitioning. I don’t know the distribution of those selected values, so there might be hot and cold partitions, but few totally cold partitions. In this case, would HASH partitioning even help, or would it be no improvement over the single level of partitioning? It’s kind of confusing to me, because since hashing is random by design, I would expect this to be a problem whenever HASH is used.

The only reason I think it might help is by reducing the height of the index trees, since the indexes exist within partitions. But all indexes would still be used. Basically what I’m wondering is, is it still an improvement to reduce index height even if all the indexes may be used, versus one huge index? Is there a best practice as to when HASH might help?

  1. The upper level of the partition hierarchy is a LIST on the two middle columns because that’s how my DELETE (that I am replacing) is defined. But those columns are never used in WHERE clauses. Is it a problem if the top partition hierarchy level isn’t even used in WHERE clauses? I could reverse the order so the HASH is the parent partition, since I actually select by that, but then I’d need to drop multiple child partitions when I DROP instead of one parent partition. Should I just bite the bullet and switch the order?

  2. Is it best for the order of the partition hierarchy to match the order of the PK? In other words, if my parent partitioning is LIST(col2, col3) and my child partitioning is HASH(col4), should I change my PK from 1,2,3,4 to 2,3,4,1 to match, or does it not matter?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I don’t see any point in the HASH partitioning layer here. You might reduce the index tree height by one level, but only by adding another level to the partitioning "tree" height, which is not going to be a positive trade off.

You could get a benefit for doing index maintenance, if you sorted the bulk rows by their future hash partition before inserting them. But PostgreSQL offers no facility for doing this pre-sort, nor even one for making it easy for you do it yourself.

Indeed I don’t really see the point of HASH partitioning in general. I think the best case for it is when you have master table and detail table, and both are partitioned by the same key, which is the join key. Then you can do partition-wise joins. But this doesn’t seem to apply to you.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply