That's our business!
This is a discussion of COBOL Computational fields. Several numeric data types are discussed, including the common "packed" and "comp-3" fields.
By default, numeric values in COBOL files are stored in "display", or character, format, in the same way the letters of the alphabet are stored. That is, the value is stored as a base-ten number, with each digit represented by the corresponding EBCDIC (or ASCII) character. For example, the value 1234 is stored in four bytes which contain "1", "2", "3", and "4" (F1, F2, F3, F4 Hex for EBCDIC or 31, 32, 33, 34 hex for ASCII).
But because computers perform computations with binary numbers, it is more efficient to store values in their native binary form than to store them in human readable base ten. If the number is stored in its native binary format it can be input from the file and used directly. If it's stored in a base ten format it needs to be converted to binary before performing calculations on it, then converted back to base ten for storage. Binary is faster -- typically about 8 times -- and usually requires less storage space.
In addition to display format, COBOL defines several binary data types. We will explain some here, but before we start, there is one very important point to understand: The COBOL standard leaves the actual implementation of most data types up to the vendor who writes the COBOL compiler. The reason for this is because different computers -- CPUs -- use different binary representations internally, and function best with their own type of binary numbers. This approach results in better and faster compilers and programs, but also causes confusion, because a "comp" data type on one machine is not necessarily the same as "comp" on another machine.
Some of the differences between platforms that create this situation are:
|• The register size of the CPU||The register size of the CPU is typically some binary multiple of 8 bits; 8, 16, 32, or 64 bits. The computer is more efficient when working in its native register size.|
|• The word order -- big endian or little endian||Some CPUs store the most-significant-byte of the value first (big endian), while some store the least-significant-byte of the value first (little endian).|
|• The smallest binary data size||The minimum unit of computation on some machines is 16 bits, so the smallest size of a binary value on many machines is 2 bytes. But others can use 1 byte. Sometimes this is a compiler option.|
|• The increment in binary size||Most binary values are stored in either 2, 4, or 8 bytes. Some COBOL compilers only permit these multiples, while others permit one byte increments: 1, 2, 3, 4, 5, 6, 7, or 8 bytes.|
|• Floating point representation||Many vendors use the IEEE floating point standards, but others, notably IBM, don't.|
|• The representation of the sign||Most pure binary integers use 2's-complement, but each vendor is free to chose his own method for all types.|
The following common COBOL data types are discussed below, under Data Types:
Some compilers have comp-4 and comp-5 data types, usually to emulate a comp type of another system, like an IBM mainframe. There are other COBOL binary data types, such as index and pointer, but they are used internally in the program, and are not found in files, so are not of concern to us. Comp-3 is so common and so uniform across platforms that we have written a separate Tech Talk brief for it. See COBOL Comp-3 Packed Fields.
Which data type a field uses for storage is determined by the "usage is" clause in the field definition. For example,
05 BALANCE-DUE PIC S9(6)V99 USAGE IS COMPUTATIONAL-3.
says to store the field in the computational-3 format. The "usage is" part is optional and generally left off, and "computational" can be abbreviated "COMP", so you will more commonly see this written:
05 BALANCE-DUE PIC S9(6)V99 COMP-3.
The number of bits, bytes, or words that are stored for any given field usually depends on the number of digits given in the COBOL PIC. For binary numbers, 8 bits, or 1 byte, will store unsigned values from 0 to 255 or signed values from -128 to +127. This is enough to store values up to two digits (99), but not up to three digits (999). So a PIC 9 or PIC 99 would require 1 byte, but a PIC 999 would require 2 bytes.
In addition, most compilers have some minimum requirements for comp storage. For example, the smallest unit of storage may be 2 bytes, so even if you specify PIC 9 (only 1 digit), the compiler will reserve two bytes. Also see Synchronization and Alignment below.
Floating point numbers, however, follow standard binary formats and as such their sizes are not determined by a PIC, and no PIC is used in the field definition.
Comp-3 stores two digits per byte, in BCD form, as explained below.
Typical storage for common data types is given below.
The COBOL standard specifies that this should be a binary data type, but the exact implementation is up to the vendor. Negative numbers are typically in 2's-complement.
Comp (with no suffix) leaves the choice of the data type to the compiler writer. The intent of this data type is to make it the most efficient format on any given machine, which is usually some binary format. Because of this, comp varies greatly between platforms, more than most other types.
Comp-1 is usually a single precision floating point value, stored in 4 bytes. Many vendors follow the IEEE floating point standard, but IBM does not.
Comp-2 is usually a double precision floating point value, stored in 8 bytes. Many vendors follow the IEEE standard, but IBM does not.
Although comp-3 is "vendor specific", the format of comp-3 fields is almost universal across platforms, even on ASCII machines. Comp-3 stores data in a BCD -- binary coded decimal -- format with the sign after the least significant digit. Comp-3 is so common that we have written a separate Tech Talk brief about it. See COBOL Comp-3 Packed Fields.
This is the "official" BCD packed format of the COBOL standard. It is implemented as "comp-3". See comp-3, above.
This topic is a bit involved for this tutorial, but you should be aware of it. When using binary storage (binary and comp), some compilers on some machines (computers) may require that a numeric field start on some boundary. For example, on a 32 bit machine, it may require that a comp field start on a 32 bit boundary.
If you specify a comp field in the middle of a record, and it doesn't happen to begin on a 32 bit (4 byte) boundary, the compiler will "align" it to a 32 bit boundary to "synchronize" it. What's actually stored in the file is not the same as the PICs on the layout. This is not a very common problem, partly because binary and comp fields are not very common in files, but you should be aware of it.
Signed display fields were not discussed here because they are not binary data types. See EBCDIC to ASCII Conversion of Signed Fields for a discussion.
For more articles on data conversion, see our TechTalk Index.
Our COBOL Conversion Services
DISC can convert most COBOL numeric data types, including all the IBM mainframe EBCDIC data types. Our library of conversion routines permits us to handle those difficult jobs that standard COBOL compilers can't convert. With over 26 years of experience with thousands of files, we have the knowledge to catch problems with the data before they cause you grief.
Disc Interchange Service Company, Inc.
15 Stony Brook Road
Westford, MA 01886