COPY BINARY file format proposal 
Author Message
 COPY BINARY file format proposal

Quote:

> Given all that, here is a proposed spec for the header:
>  ...
> Comments?

I've been thinking about this.  

I'd like to see a timestamp for when the image was created, and a
128-byte comment field to allow annotations, even after the fact.
(I don't think we're pressed for space, right?)  The more chances
that you don't have to actually load the file to find out what's
in it, the better.

(I have also suggested, in private mail, that the "header length"
field should be the length of the whole header, not just whatever
was added on in versions 2..n.  Tom didn't agree.)

Nathan Myers



Fri, 30 May 2003 08:38:35 GMT
 COPY BINARY file format proposal

Quote:


> > Given all that, here is a proposed spec for the header:
> >  ...
> > Comments?

> (I have also suggested, in private mail, that the "header length"
> field should be the length of the whole header, not just whatever
> was added on in versions 2..n.  Tom didn't agree.)

I had the same thought, but didn't get around to posting it.

Ross



Fri, 30 May 2003 09:10:12 GMT
 COPY BINARY file format proposal

Quote:

> I'd like to see a timestamp for when the image was created, and a
> 128-byte comment field to allow annotations, even after the fact.

Both seem like reasonable options.  If you don't mind, however,
I'd suggest that they be left for inclusion as chunks in the header
extension area, rather than nailing them down in the fixed header.

The advantage of handling a comment that way is obvious: it needn't
be fixed-length.  As for the timestamp, handling it as an optional
chunk would allow graceful substitution of a different timestamp
format, which we'll need when 2038 begins to loom.

Basically what I want to do at the moment is get a minimal format
spec nailed down for 7.1.  There'll be time for neat extras later
as long as we get it right now --- but there's not a lot of time
for extras before 7.1.

                        regards, tom lane



Fri, 30 May 2003 09:55:40 GMT
 COPY BINARY file format proposal

Quote:

> > I'd like to see a timestamp for when the image was created, and a
> > 128-byte comment field to allow annotations, even after the fact.

> Both seem like reasonable options.  If you don't mind, however,
> I'd suggest that they be left for inclusion as chunks in the header
> extension area, rather than nailing them down in the fixed header.

> The advantage of handling a comment that way is obvious: it needn't
> be fixed-length.  As for the timestamp, handling it as an optional
> chunk would allow graceful substitution of a different timestamp
> format, which we'll need when 2038 begins to loom.

> Basically what I want to do at the moment is get a minimal format
> spec nailed down for 7.1.  There'll be time for neat extras later
> as long as we get it right now --- but there's not a lot of time
> for extras before 7.1.

The have the look of creeping-featurism to me.

--
  Bruce Momjian                        |  http://candle.pha.pa.us

  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026



Fri, 30 May 2003 10:10:50 GMT
 COPY BINARY file format proposal

Quote:
>Recovering the data on a machine
>of different endianness is a project for future data archeologists.

It's frightening to think that in 1000 years time people will be deducing
things about our society from the way we stored data.

Quote:

>Tell you the truth, I don't believe in file-format version numbers at
>all...
>(RFC 2083, esp section 12.13) --- the versioning philosophy described
>there is largely yours truly's.

Seems to be a much better approach; (non)critical chunks & chunk types are
much more portable.

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \

Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Sat, 31 May 2003 23:08:56 GMT
 COPY BINARY file format proposal

Quote:

>> How about a CRC? ;-P

>I take it from the smiley that you're not serious, but actually it seems
>like it might not be a bad idea.  I could see appending a CRC to each
>tuple record.  Comments anyone?

More a matter of not thinking it was important enough to worry about, and
not really wanting to drag the MD5/MD4/CRC64/etc debate into this one.
Having said that, I think it would be a nice-to-have, like CRCs on db pages
- in the latter case I'd really like VACCUM (or another utility) to be able
to report 'invalid pages' on a nightly basis (or, better still, not report
them).

Quote:
>Attached is the current state of the proposal.  I haven't added a CRC
>field but am willing to do so if that's the consensus.

Sounds good to me. I'm not sure you need it on a per-tuple basis - but it
can't hurt, assuming it's cheap to generate. Does the backend send tuples
or blocks of tuples? If the latter, and if CRC is expensive, then maybe 1
CRC for each group of tuples.

Also having a CRC on a per-tupple basis will prevent getting out of sync
with the data, and make partial data recovery

Quote:
>Next 4 bytes: length of remainder of header, not including self.  In
>the initial version this will be zero, and the first tuple follows
>immediately.  Future changes to the format might allow additional data
>to be present in the header.  A reader should silently ignore any header
>extension data it does not know what to do with.

Don't you need to at least define how to specify non-essential chunks,
since the flags are not to be used to describe the header extensions. Or
are we going to make the initial version barf when it encounters any header
extension?

Quote:
>Tuples
>------

>Each tuple begins with an int16 count of the number of fields in the
>tuple.  (Presently, all tuples in a table will have the same count, but
>that might not always be true.)

Another option would be to:

- dump the field sizes in the header somewhere (they will all be the same),
- for each row output a bitmap of non-null fields, followed by the data.
- varlena would have a -1 length in the header, an an int32 length in the row.

This is harder to read and to write, but saves space, if that is desirable.

Quote:

>For non-NULL fields, the reader can check that the typlen matches the
>expected typlen for the destination column.  This provides a simple
>but very useful check that the data is as expected.

CRC seems like the go here...

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \

Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Sun, 01 Jun 2003 00:12:44 GMT
 COPY BINARY file format proposal

Quote:

> More a matter of not thinking it was important enough to worry about, and
> not really wanting to drag the MD5/MD4/CRC64/etc debate into this one.

I'd just as soon not drag that debate in here either ;-) ... but once we
settle on an appropriate CRC method for WAL it's easy enough to call the
same routine for this code.

Quote:
> Sounds good to me. I'm not sure you need it on a per-tuple basis - but it
> can't hurt, assuming it's cheap to generate. Does the backend send tuples
> or blocks of tuples? If the latter, and if CRC is expensive, then maybe 1
> CRC for each group of tuples.

Extending the CRC over multiple tuples would just complicate life,
I think.  The per-byte cost is the biggest factor, so you don't really
save all that much.

Quote:
>> Next 4 bytes: length of remainder of header, not including self.  In
>> the initial version this will be zero, and the first tuple follows
>> immediately.  Future changes to the format might allow additional data
>> to be present in the header.  A reader should silently ignore any header
>> extension data it does not know what to do with.
> Don't you need to at least define how to specify non-essential chunks,
> since the flags are not to be used to describe the header extensions. Or
> are we going to make the initial version barf when it encounters any header
> extension?

No, the initial version will just silently skip the whole header
extension; it's defined so that that's a legal behavior (everything
in the header extension is inessential).  We can come back and define
a format for the entries in the header extension area when we need some.

Quote:
> Another option would be to:
> - dump the field sizes in the header somewhere (they will all be the same),
> - for each row output a bitmap of non-null fields, followed by the data.
> - varlena would have a -1 length in the header, an an int32 length in the row.

That would work if you are willing to assume that all the tuples indeed
always have the same set of fields --- you're not, for example, doing an
inheritance-tree-walk "COPY FROM foo*".  But Chris Bitmead still has a
gleam in his eye about that sort of thing, so we might want it someday.
I think it's worth a small amount of extra space to avoid that
assumption, especially since it simplifies the code too.

                        regards, tom lane



Sun, 01 Jun 2003 00:13:59 GMT
 COPY BINARY file format proposal

Quote:

> How about a CRC? ;-P

I take it from the smiley that you're not serious, but actually it seems
like it might not be a bad idea.  I could see appending a CRC to each
tuple record.  Comments anyone?

You seemed to like the PNG philosophy of using feature flags rather than
a version number.  Accordingly, I propose dropping the version number
field in favor of a flags word.  (Which was needed anyway, because I had
*again* forgotten about COPY WITH OIDS :-(.)

Attached is the current state of the proposal.  I haven't added a CRC
field but am willing to do so if that's the consensus.

                        regards, tom lane

COPY BINARY file format proposal

The objectives of this change are:

1. Get rid of the tuple count at the front of the file.  This requires
an extra pass over the relation, which is a lot more trouble than the
count is worth.  Use an explicit EOF marker instead.
2. Send fields of a tuple individually, instead of dumping out raw tuples
(complete with alignment padding and so forth) as is currently done.
This is mainly to simplify TOAST-related processing.
3. Make the format somewhat self-identifying, so that the reader has at
least some chance of detecting it when the data doesn't match the table
it's supposed to be loaded into.

The proposed format consists of a file header, zero or more tuples, and a
file trailer.

File Header
-----------

The proposed file header consists of 24 bytes of fixed fields, followed
by a variable-length header extension area.

Signature: 12-byte sequence "PGBCOPY\n\377\r\n\0" --- note that the null
is a required part of the signature.  (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
transfer.  The proposed signature will be changed by newline-translation
filters, dropped nulls, dropped high bits, or parity changes.)

Integer layout field: int32 constant 0x01020304 in source's byte order.
Potentially, a reader could engage in byte-flipping of subsequent fields
if the wrong byte order is detected here.

Flags field: a 4-byte bit mask to denote important aspects of the file
format.  Bits are numbered from 0 (LSB) to 31 (MSB) --- note that this
field is stored with source's endianness, as are all subsequent integer
fields.  Bits 16-31 are reserved to denote critical file format issues;
a reader should abort if it finds an unexpected bit set in this range.
Bits 0-15 are reserved to signal backwards-compatible format issues;
a reader should simply ignore any unexpected bits set in this range.
Currently only one flag bit is defined, and the rest must be zero:
        Bit 16: if 1, OIDs are included in the dump; if 0, not

Next 4 bytes: length of remainder of header, not including self.  In
the initial version this will be zero, and the first tuple follows
immediately.  Future changes to the format might allow additional data
to be present in the header.  A reader should silently ignore any header
extension data it does not know what to do with.

Note that I envision the content of the header extension area as being a
sequence of self-identifying chunks (but the specific design of same is
postponed until we need 'em).  The flags field is not intended to tell
readers what is in the extension area.

This design allows for both backwards-compatible header additions (add
header extension chunks, or set low-order flag bits) and non-backwards-
compatible changes (set high-order flag bits to signal such changes,
and add supporting data to the extension area if needed).

Tuples
------

Each tuple begins with an int16 count of the number of fields in the
tuple.  (Presently, all tuples in a table will have the same count, but
that might not always be true.)  Then, repeated for each field in the
tuple, there is an int16 typlen word possibly followed by field data.
The typlen field is interpreted thus:

        Zero            Field is NULL.  No data follows.

        > 0          Field is a fixed-length datatype.  Exactly N
                        bytes of data follow the typlen word.

        -1              Field is a varlena datatype.  The next four
                        bytes are the varlena header, which contains
                        the total value length including itself.

        < -1         Reserved for future use.

For non-NULL fields, the reader can check that the typlen matches the
expected typlen for the destination column.  This provides a simple
but very useful check that the data is as expected.

There is no alignment padding or any other extra data between fields.
Note also that the format does not distinguish whether a datatype is
pass-by-reference or pass-by-value.  Both of these provisions are
deliberate: they might help improve portability of the files (although
of course endianness and floating-point-format issues can still keep
you from moving a binary file across machines).

If OIDs are included in the dump, the OID field immediately follows the
field-count word.  It is a normal field except that it's not included
in the field-count.  In particular it has a typlen --- this will allow
handling of 4-byte vs 8-byte OIDs without too much pain, and will allow
OIDs to be shown as NULL if we someday allow OIDs to be optional.

File Trailer
------------

The file trailer consists of an int16 word containing -1.  This is
easily distinguished from a tuple's field-count word.

A reader should report an error if a field-count word is neither -1
nor the expected number of columns.  This provides a pretty strong
check against somehow getting out of sync with the data.



Sun, 01 Jun 2003 00:19:07 GMT
 COPY BINARY file format proposal

Quote:
> I take it from the smiley that you're not serious, but actually it seems
> like it might not be a bad idea.  I could see appending a CRC to each
> tuple record.  Comments anyone?

Let's not get paranoid. If you compress the output the file will get checksummed
anyway. I am against a CRC in binary copy output :-)

Andreas



Sun, 01 Jun 2003 01:35:09 GMT
 COPY BINARY file format proposal

Quote:


> > I'd like to see a timestamp for when the image was created, and a
> > 128-byte comment field to allow annotations, even after the fact.

> Both seem like reasonable options.  If you don't mind, however,
> I'd suggest that they be left for inclusion as chunks in the header
> extension area, rather than nailing them down in the fixed header.

> The advantage of handling a comment that way is obvious: it needn't
> be fixed-length.  As for the timestamp, handling it as an optional
> chunk would allow graceful substitution of a different timestamp
> format, which we'll need when 2038 begins to loom.

I don't know if you get the point of the fixed-size comment field.  
The idea was that a comment could be poked into an existing COPY
image, after it was written.  A variable-size comment field in an
already-written image might leave no space to poke in anything.  A
variable-size comment field with a required minimum size would
satisfy both needs, at some cost in complexity.  

Quote:
> Basically what I want to do at the moment is get a minimal format
> spec nailed down for 7.1.  There'll be time for neat extras later
> as long as we get it right now --- but there's not a lot of time
> for extras before 7.1.

I understand.

Nathan Myers



Sun, 01 Jun 2003 07:06:52 GMT
 COPY BINARY file format proposal

Quote:

> I don't know if you get the point of the fixed-size comment field.  
> The idea was that a comment could be poked into an existing COPY
> image, after it was written.

Yes, I did get the point ...

Quote:
> A variable-size comment field in an
> already-written image might leave no space to poke in anything.  A
> variable-size comment field with a required minimum size would
> satisfy both needs, at some cost in complexity.  

This strikes me as a perfect argument for a variable-size field.
If you want to leave N bytes for a future poked-in comment, you do that.
If you don't, then not.  Leaving 128 bytes (or any other frozen-by-the-
file-format number) is guaranteed to satisfy nobody.

                        regards, tom lane



Sun, 01 Jun 2003 13:26:55 GMT
 
 [ 11 post ] 

 Relevant Pages 

1. COPY BINARY file format proposal

2. COPY BINARY file format proposal

3. Postgresql binary copy file format

4. Decoding the binary format of Ingres copy-out data

5. Oracle export file binary format

6. DLL to copy binary files into SQL Server as a stored procedure

7. HLP: Copying binary file to field

8. Informix v2.x binary file format

9. store chr(0) in database or microsoft binary file format

10. help info for binary file format for sql micrisoft

11. VB Binary File Format


 
Powered by phpBB® Forum Software