Inefficient handling of LO-restore + Patch 
Author Message
 Inefficient handling of LO-restore + Patch

I've detected that the restoring of large objects may consume huge amounts of
diskspace when using unusual blocksizes (e.g. 32KB). My setup is Postgresql-7.2.1
+ 32KB blocks + LOBLKSIZE 16KB, a unusual combination I think, , because this setup gave
the very best performance. I wanted to restore a  database containing 2 gigabytes of
large objects, and noticed that it took around 6 gigabytes of diskspace to finish.
After it finished I ran "VACUUM FULL VERBOSE pg_largeobject",
and had around 140000 of live tuples, and around 480000 of dead tuples (I don't remember the exact
values, but I think there were 3 times dead tuples to live tuples).

I checked the pg_dump sources and found out that data is writen in 4KB chunks to the large object.
Since in my database the LO tuples are 16KB each, that would mean:
1. write 4KB -> have 1 live 4KB tuple
2. write 4KB -> 1 live 8KB tuple and 1 dead 4KB tuple
3. write 4KB -> 1 live 12KB tuple and 2 dead tuples
3. write 4KB -> 1 live 16KB tuple and 3 dead tuples

So creating a 16KB chunk took 16+12+8+4 => 40KB of diskspace, so recovering 2GB large objects
takes around 40/16 * 2 => 5GB diskspace and leaves 3 times the number of dead tuples (supposing
all LO's have sizes which are multples of 16KB).

I've written a patch which buffers LO's in 32KB blocks and tested again, and had 140000 live tuples
and nearly no dead tuples (around 10000, I'm still not sure where they're coming from).

Is there a better way to fix this? Can I post the patch to this list (~150 lines).And I did not find out how I can detect the large object chunksize, either from getting it from the headers (include "storage/large_object.h" did not work) or how to get it from the database I restore to. Any hints?

Best regards,
        Mario Weilguni

---------------------------(end of broadcast)---------------------------



Mon, 27 Sep 2004 16:57:15 GMT
 
 [ 1 post ] 

 Relevant Pages 

1. Inefficient handling of LO-restore + Patch

2. Inefficient handling of LO-restore + Patch

3. pgsql/contrib/lo (lo.c lo.sql.in)

4. pgsql/contrib/lo (lo.c lo.sql.in)

5. pgsql/contrib/lo lo.c

6. pgsql-server/contrib/lo lo.sql.in

7. [PATCH] One more (was Re: [PATCH] One line patch for

8. Patch to make pg_hba.conf handle virtualhost

9. 7.4 patch for cube to handle extra_float_digits

10. Wanted: patch for Oracle 6 (DOS) to handle startup problem with >16M RAM

11. ODBC Patch to deal with poor boolean handling.

12. Patch to make pg_hba.conf handle virtualhost access control and samehost keyword


 
Powered by phpBB® Forum Software