Understanding Record Size and Record Delimiters

The seemingly simple concept of record size can be confusing to some people when converting records from one computer to another.  This discussion is about the differences in fixed length records between mainframes and PCs.  Variable length records are not considered here.
 

Fixed-Length Records Defined

A file that has "Fixed-Length Records" is a file where the records are of a fixed (unchanging) size.  All records in the file are the same size.  Such a file will also have fixed-length fields.  Each field will have a predetermined and unchanging size, set when the record layout is designed, and the sum of the field sizes will add up to the record size.  If the data stored in a given field contains fewer characters than the defined size, the rest of the field will be filled with spaces, or some other character.  For example, if the "LASTNAME" field in a file is set to 15 characters, and the last name is "Smith" there will be 10 spaces after the name, to fill out the field.

Field Delimiters

Fixed length records don't have field delimiters.  This is true on both mainframe and PC platforms.  Since the fields are always the same size, they are always in the same location in the record, and no delimiter is needed to locate any field.

Record Delimiters

As we will see below, fixed length records sometimes have record delimiters.  A "record delimiter" is a character or set of characters that are used to mark the end of a record.  If records vary in size, as they do in a variable length file, this is necessary to be able to separate records.  A record delimiter must be a code that is not found in the data; the only place it will be found is at the end of a record, so every time the computer finds that code, it knows it has reached the end of the record. The most common record delimiters are the carriage-return (CR), line-feed (LF), or carriage-return line-feed (CR-LF) combination.

Accessing Fixed-Length Records

Because the size of each fixed-length record is known in advance, you don't really need a record delimiter to locate any record in a fixed-length file.  For example, if your records are 100 bytes in size, then the first record begins at byte 1 of the file, record 2 begins at 101, record 3 begins at 201, etc., and each record is always 100 bytes in size.

Technically, fixed-length records can be accessed perfectly well without delimiters.  However, many PC programs expect record delimiters, even on fixed-length records, and won't work properly without them.

Fixed-length records take more disk space than variable-length records, but they have advantages.  For example, there is no need to read the record byte-by-byte, searching for a record delimiter, and there is no danger of a rogue CR falsely indicating the end of the record.  But more importantly, you can locate any record in the file by a simple calculation, which makes random access possible and efficient, as you can jump to any record in the file without reading through the previous records.  You cannot do that with variable-length records.

Fixed-Length Records on Different Computer Systems

Fixed-length records are stored differently on mainframe computers and PCs.

Fixed-Length Mainframe Records

Data on a mainframe computer is almost always stored as fixed-length records with no record or field delimiters.  (This article will not deal with indexed or other database files).  When these records are written to tape, the same is true -- there are no record delimiters on the tape.

Fixed-Length PC (MSDOS and Windows) Records

Although fixed-length records can be accessed perfectly well without delimiters, many PC applications require record delimiters, even on fixed-length records.  So record delimiters are standard practice for fixed-length PC files.  The standard record delimiter is the two byte carriage-return and line-feed (CR-LF) pair, 0D, 0A hex.

Fixed-Length UNIX Records

Many UNIX applications work with fixed-length records with no delimiters.  Those that use a delimiter usually use the UNIX "newline", which is the LF character,  0A hex.

Fixed-Length Macintosh Records

The Macintosh record delimiter is a single CR, which is 0D hex.  You are more likely to find variable length records on Macintosh than fixed-length, but when fixed-length records are used they are usually delimited with a CR.
Need to convert Mainframe files? Request an IBM Mainframe conversion quote
That's our business!

Converting Between Mainframe and PC  Records

Now that we have the necessary background, let's discuss converting fixed-length records between a mainframe and a PC.

When fixed-length mainframe records are written to a tape, they are written as fixed-length with no delimiters, just like they are stored on disk.  Since most PC applications have trouble with that type of file, DISC normally adds a record delimiter to the end of each record during the conversion process.  For Microsoft operating systems we add a carriage-return and a line-feed, CR-LF.  For UNIX applications we add a "newline", which is the LF character, and for Macintosh we add a carriage-return, CR.  The CR is 0D hex and the LF is 0A hex.

When we convert a file from a PC to a mainframe tape, we remove the record delimiter, as the mainframe neither needs nor wants a delimiter.  Writing the PC record delimiter to a mainframe tape would cause the mainframe programmer quite a bit of grief.  Mainframe languages generally have no provision for handling a record delimiter automatically, so the programmer would have to treat it as junk data at the end of the record, and define a 2 byte filler field to hold it, and increase his defined record length in the JCL accordingly.

Measuring Record Length

If a mainframe record is 100 bytes long, then it's clear the size is 100 bytes, period.  There is no ambiguity to the size.  But that same record, when transferred to a PC, is 100 bytes of data plus a CR-LF, for a total of 102 bytes.  And on a Macintosh or a UNIX system, it's 100 bytes plus a 1 byte delimiter.  So is it a 100 byte record, a 101 byte record, or a 102 byte record?  In each case the record contains 100 bytes of actual data, and such a record is commonly referred to as a "100 byte record" on all the computers.  For common references to record length, the record size is considered to be the amount of data the record holds, and any record delimiters are part of the file structure, not the data.  This makes for consistency between systems.

But clearly the physical space occupied by the PC record is 102 bytes, not 100.  Many times you will have to use the physical size, such as when calculating where in a file the 89th record starts, or the disk space required to store a million records.  So it becomes necessary to use both values in different situations.

When you need to make the distinction between the two numbers, you usually call the sum of the data (100) the "logical record size", and the data plus delimiter(s) the "physical record size".

Summary

Mainframe computers do not use record delimiters.  Almost all MSDOS and Windows applications require record delimiters, and the standard delimiter is the two byte CR-LF (carriage return - line feed) pair.  The standard Macintosh delimiter is a single CR, and the standard UNIX delimiter is a LF, called "newline" in UNIX.  The CR is 0D hex and the LF is 0A hex.

When converting files from a mainframe to a PC, DISC adds a delimiter, and when converting PC files to a mainframe, we remove the delimiter.

The "record size" is the number of characters you can type in a record.  The number of bytes occupied by a record on a PC disk will be two bytes greater than the record size, to account for the CR-LF record delimiter.

If you need to make the distinction between these two values, the number of characters in the record is called the "logical record size", and the number of bytes including any delimiters is called the "physical record size".

 

Additional Information

For more articles on data conversion, see our TechTalk Index.

Disc Interchange Service Company, Inc.
15 Stony Brook Road
Westford, MA 01886

Home