That's our business! |
This page discusses how data is stored in COBOL "comp-3", or "packed" fields.
(See note 1 about terminology)
COBOL Comp-3 is a binary field type that puts ("packs") two digits into each byte, using a notation called Binary Coded Decimal, or BCD. This halves the storage requirements compared to a character, or COBOL "display", field. Comp-3 is a common data type, even outside of COBOL, and is fairly standard across platforms -- that is, it is not dependent upon the operating system, language, or CPU, as the COBOL "comp" is. (See COBOL Computational Fields for information on the comp data type). However, comp-3 is not commonly found in PC languages.
The Binary Coded Decimal (BCD) data type is just as its name suggests -- it is a value stored in decimal (base ten) notation, and each digit is binary coded. Since a digit only has ten possible values (0-9), it can be represented in binary form with only 4 bits. Four bits is called a "nybble", and each nybble contains one digit of the value. Therefore, you can get two digits in each 8 bit byte. (There's an example below). Normal EBCDIC or ASCII character representation (COBOL "Display") only stores one character (digit) per byte, so packed data only requires half the storage of unpacked (character) data. (See Character, BCD, and Binary Fields if this description is not clear.)
The value in a comp-3 field is stored high-order to low-order. That is, the upper nybble of the first byte encountered in the file is the most significant digit of the value, the lower nybble of that byte is the next digit, etc., etc. The last nybble -- the low nybble of the least significant byte -- is used to store the sign for the number. Unlike IBM Signed fields (see EBCDIC to ASCII Conversion of Signed Fields), this nybble stores only the sign, not a digit. "C" hex is positive, "D" hex is negative, and "F" hex is unsigned. In COBOL comp-3 fields (and in most other languages) this nybble is reserved for the sign whether or not the field is denoted as a signed field in the COBOL PIC.
Comp-3 packed fields are aligned on byte boundaries, and the field is always a whole number of bytes. The sign nybble is always the low nybble of the LSD (least significant digit). Since the sign takes one nybble, and because there are always an even number of nybbles in any number of bytes, an odd number of digits will fully-fill a comp-3 field. (An odd number of digits plus a sign nybble makes an even number of nybbles, or fully-filled bytes). If the size of the field is specified as an even number of digits, as in "PIC S9(6) comp-3.", the upper nybble is ignored and is usually, but not always, set to zero.
Comp-3 fields are denoted in COBOL with the "usage is" clause after the PIC, like this:
PIC S9(5) usage is computational-3.
However, the "usage is" is not required and seldom used, and "computational-3" is usually abbreviated "comp-3", so you more commonly see:
PIC S9(5) comp-3.
The COBOL PIC, or picture, for a comp-3 packed field specifies the number of digits after unpacking. The actual number of bytes occupied in the file is about half that. To calculate the number of bytes from the PIC, add 1 (for the sign) to the total number of digits, divide by 2, and round up if necessary. For example:
PIC S9(7) COMP-3. Byte size = (7 + 1) / 2 = 4 PIC S9(5)V99 COMP-3. Byte size = (5 + 2 + 1) / 2 = 4 PIC S9(6) COMP-3. Byte size = (6 + 1) / 2 = 3.5, rounded to 4
Comp-3 fields reserve a nybble for the sign, even for "unsigned" values, so the following fields are still 4 bytes:
PIC 9(7) COMP-3. Byte size = (7 + 1) / 2 = 4 PIC 9(6) COMP-3. Byte size = (6 + 1) / 2 = 3.5, rounded to 4
Lets look at some examples of how comp-3 data is stored. The left column in the table below is the decimal value being stored, and the right column is the hexadecimal value you will see in the file:
Value Comp-3, hex +0 0C +1 1C +12 01 2C +123 12 3C +1234 01 23 4C -1 1D -1234 01 23 4D
Each underlined value above represents one byte, in hexadecimal (hex) form. We have only used as many bytes as needed to store the value shown on the left.
When you "unpack" a packed, or comp-3, field, the size of the field will double. This will cause all fields following it to shift down. If the field is in a redefined area, it will likely no longer fit in the allocated space, and the original field it redefined will have to be modified, or filler will have to be added, to accommodate the larger unpacked field. Just a few comp-3 fields can make for a messy situation, affecting many other fields, and even other records if the file contains multiple record types.
A comp-3 field may also have an implied decimal.
For a more verbose explanation of BCD and binary fields, see Character, BCD, and Binary Fields.
For further discussion of other COBOL data types, see COBOL Computational Fields, and EBCDIC to ASCII Conversion of Signed Fields.
And for more complete information on COBOL files, see Reading COBOL Layouts.
For more articles on data conversion, see our TechTalk Index.
Our COBOL Conversion ServicesWith over 32 years of experience with thousands of files, we have the knowledge and methods to catch problems with the data before they cause you grief. |
|||||
|
Note 1:
There is considerable
ambiguity and confusion over the terms "packed" and "comp-3". Although
the terms are pretty standardized in COBOL, they may mean something different
in another language.
Although a vendor could use "comp-3" to describe a different type of field, "comp-3" almost always means the definition given on this page, in all languages, and on all platforms, even PCs. "Packed", however, when used by other languages often means something different than (although similar to) the COBOL definition. For example, "packed" sometimes describes a BCD field with no sign.
The bottom line is this: If you are reading a COBOL layout these fields are always called "comp-3" or "computational-3". If you are reading a non-COBOL layout, the term "packed" frequently means comp-3, but is occasionally used to denote other kinds of packed fields.