|
That's our business! |
There is a numeric data type commonly used in COBOL files on IBM mainframes which is called "Signed" (also called "IBM Signed", or "Zoned"). This data type originated in the days of punch-cards. COBOL represents this type of field by an "S" in the picture clause of a DISPLAY field (i.e. a field with no "comp" or "comp-3" modifier). An example is "PIC S9(6).". The value stored in this field is a mix of character and binary codes, and cannot be properly converted from EBCDIC to ASCII with a simple translation table. This page is a brief description of the signed data type, and what happens if someone improperly converts it from EBCDIC to ASCII as character data. (See note 1 about other representations of signed fields.)
A Signed field is composed of regular EBCDIC numeric characters, one character per byte, for all the digits of the field except the one that holds the sign, either the most-significant (sign leading) or the least-significant (sign trailing) digit -- usually the least-significant digit. The digit that holds the sign combines, or "over punches" the sign of the number onto that digit. (See note 2). This saves one byte that the sign would otherwise occupy. The value of that digit is stored as a binary value, and is OR'd with the sign code, which is D0 hex for negative numbers, C0 hex for positive values, and F0 hex for "unsigned" values. (See note 3 below for an alternative viewpoint.)
Let's look at two examples. This is how the value +123 would be stored in an EBCDIC Signed field with the sign in the LSD (least significant digit):
The resulting field in hex is: F1, F2, C3. The proper way to interpret the C3 is as two entities: the sign of C0 and the value of 03. However, it also happens to be the code for the EBCDIC letter "C", so if you view the field in EBCDIC character mode, you will see "12C".
The value of +120 would likewise become F1, F2, C0 (00 OR'd with C0). Since the EBCDIC character assigned to the value C0 is a left brace, "{", when this field is viewed in EBCDIC character mode, you will see "12{".
Most languages that run on ASCII-based computers, and most PC applications, require a separate sign, usually a leading sign, like "-123". To properly convert from an IBM Signed field to a leading-sign field, you must define the layout of the EBCDIC record containing the signed fields, then create a different layout for the converted file, using a separate sign, and then write a COBOL program (or use some other utility) to convert between them. This requires programming, and is generally a lot more expensive than a global EBCDIC to ASCII character translation with a translation table. Just a single Signed field means you now have to write a program, and that can make a big difference in the cost of a conversion.
In addition to the programming cost, because converting from a Signed field to a leading sign field causes the field size to expand by 1 byte (to accommodate the sign), there are many potential complications created by this conversion. All the following fields in the record shift down, the record will increase in size, and if there are redefined fields and/or multiple record types, they may all need to be changed. Converting a single Signed field to leading sign could cause hundreds of other fields to shift position, causing cascading problems.
Converting an EBCDIC Signed field to ASCII as-if it were a character field, using a translation table, results in an improper data-type, as described in detail below. The resulting ASCII field still has the LSD and the sign combined in one byte, and the EBCDIC to ASCII translation has further scrambled the field. No standard PC languages or applications can directly read this field. Although this is not a proper way to convert a signed field, it happens frequently, usually out of ignorance of what a Signed field is and how to convert it, or because the operator who performed the conversion wasn't aware there were signed fields in the data. Disc Interchange routinely scans the files we convert, to detect signed fields. If we detect signed fields in your data, we will inform you before returning your files, and give you options for converting them.
If you receive a file that originally contained EBCDIC signed fields and was converted to ASCII with a translation table prior to being sent to you, you may have to deal with this situation. Additionally, some of our customers have realized they can deal with signed fields themselves, and avoid paying us to write a program, and avoid potential complications from altering the layout. Below is a description of how to deal with signed fields after a character conversion to ASCII. Although we don't recommend treating signed fields this way, it is possible to recover the value of a Signed field after conversion from EBCDIC to ASCII, without loss of data, if the following translation is clearly understood.
Let's look at what happens when an EBCDIC Signed field is converted to ASCII with a translation table. If a global EBCDIC to ASCII character conversion is performed on a signed field, all bytes are converted as-if they were characters. This results in the correct ASCII characters for all digits of the converted field except the one containing the sign (remember, that byte is not a simple character, but a binary combination of the value and the sign). So it's the digit containing the sign overpunch that you have to contend with, and one reason this is not trivial is because the EBCDIC and ASCII collating sequences do not follow the same patterns, especially for the "{" and "}" characters.
When the byte containing the sign is converted as-if it were a character, a value of +1 (which is C1 hex) is treated as an EBCDIC "A", and is converted to an ASCII "A" (41 hex). Likewise, +2 is treated as an EBCDIC "B" and is converted to an ASCII "B", +3 is treated as an EBCDIC "C" and is converted to an ASCII "C", etc., etc. When the values of +0 and -0 are encountered, they are treated as the EBCDIC characters "{" and "}" and they are converted to the same characters in ASCII.
Notice, however, that the "{" and "}" in the ASCII code set are not adjacent to the same letters that they were adjacent to in the EBCDIC code set. In EBCDIC the "{" (C0 hex) is immediately before the letter "A" (C1 hex); but in ASCII the "{" (7B hex) is not even close to the letter "A" (41 hex). Hence the ASCII plus-zero and minus-zero codes are quite different hex values, and are "out of sequence" compared to the hex values for +1 to +9 and -1 to -9. (See EBCDIC and ASCII Character Tables.)
Let's clarify this with an example:
Value EBCDIC hex Characters ASCII hex +120 F1, F2, C0 "12{" 31, 32, 7B +121 F1, F2, C1 "12A" 31, 32, 41 +122 F1, F2, C2 "12B" 31, 32, 42 +123 F1, F2, C3 "12C" 31, 32, 43
The left column of the table above denotes the decimal value of the field to be converted.
The "EBCDIC hex" column gives the hex values stored in the EBCDIC file, before conversion.
The "Characters" column is what you would see if you viewed the field in character mode.
The "ASCII hex" column is what the field would contain after passing it through an EBCDIC to ASCII translation table.
Notice how the LSD of these four values in the EBCDIC file increments from C0 to C1, C2, and C3 hex, representing the values +0 to +3. But now notice that the hex values of the LSD in the ASCII field do not follow the sequence of 40, 41, 42, 43 hex as you might expect. This is because the "{" character is assigned a value of 7B hex in the ASCII code set, and the translation table converted an EBCDIC "{" to an ASCII "{". Because of this you cannot simply AND the field with 0F hex (or subtract 40 hex) to get the binary value, and you cannot AND the field with F0 hex to recover the sign, as you can with the EBCDIC Signed field. A similar situation exists for the value of -0.
Below is a table listing the EBCDIC hex codes for signed values from +0 to +9 and -0 to -9, plus the character representation, and the hex value each becomes after a character translation from EBCDIC to ASCII.
Value Character EBCDIC Hex code ASCII Hex code +0 { C0 7B +1 A C1 41 +2 B C2 42 +3 C C3 43 +4 D C4 44 +5 E C5 45 +6 F C6 46 +7 G C7 47 +8 H C8 48 +9 I C9 49 -0 } D0 7D -1 J D1 4A -2 K D2 4B -3 L D3 4C -4 M D4 4D -5 N D5 4E -6 O D6 4F -7 P D7 50 -8 Q D8 51 -9 R D9 52
To convert the zoned ASCII field which results from an EBCDIC to ASCII character translation to a leading sign numeric field, inspect the last digit in the field. If it's a "{" replace the last digit with a 0 and make the number positive. If it's an "A" replace the last digit with a 1 and make the number positive, if it's a "B" replace the last digit with a 2 and make the number positive, etc., etc. If the last digit is a "}" replace the last digit with a 0 and make the number negative. If it's a "J" replace the last digit with a 1 and make the number negative, if it's a "K" replace the last digit with a 2 and make the number negative, etc., etc. Follow these rules for all possible values. You could do this with a look-up table or with IF or CASE statements. Use whatever method is best suited for the language you are using. In most cases you should put the sign immediately before the first digit in the field. This is called a floating sign, and is what most PC programs expect. For example, if your field is 6 bytes, the value -123 should read " -123" not "- 123".
We mentioned above that there is also an "unsigned" overpunch, but we have not addressed it yet. The overpunch for an EBCDIC "unsigned" value is F0 hex, which, when OR'd with the binary values 0-9, results in F0-F9. Notice that F0-F9 are simply the EBCDIC characters 0-9. When converted to ASCII with a translation table these will become the ASCII characters 0-9. In this case, and this case only, the field can be treated as if it were an EBCDIC unsigned numeric field (all characters), and no further conversion is needed after translation to ASCII, as it is with the signed overpunch.
If you write your own program to recover signed fields after an EBCDIC to ASCII translation, be aware that there could be a mix of signed and unsigned overpunches in a field designated as a signed field. This results in positive numbers being over punched with either C0 hex or F0 hex at different locations (records) in the file. Your program should be able to deal with both representations.
COBOL compilers that run on ASCII platforms have a "signed" data type that operates in a similar manner to the EBCDIC Signed field -- that is, they over punch the sign on the LSD or MSD. However, this is not standardized in ASCII, and different compilers use different overpunch codes. For example, Computer Associates' Realia compiler uses a 30 hex for positive values and a 20 hex for negative values, but Micro Focus and Microsoft use 30 hex for positive values and 70 hex for negative values.
Furthermore, in all cases, the representation is not the same as the result of converting an EBCDIC Signed field to ASCII with a translation table. Therefore, you cannot perform a character conversion of an EBCDIC file to ASCII and then use an ASCII COBOL compiler to process the signed fields.
For more articles on data conversion, see our TechTalk Index.
Our COBOL Conversion ServicesIf you get stuck with a file which was improperly converted from EBCDIC to ASCII with a simple translation table, and don't want to hassle with the methods explained above, we can untangle the mess for you. |
|||||
|
Note 1:
The COBOL standard leaves the
implementation of signed fields up to the vendor, so there can be different
types of signed fields. What we are talking about here is the common
IBM mainframe COBOL Signed field with "usage display" (i.e., not
a comp field). If the "sign is separate" clause is applied to the
field or group specification then the sign is stored in a separate byte,
not as described here.
Note 2:
The term "overpunch" originated
in the days of punch-cards to describe a "hole over the digit" that was
used to hold the sign. Today it's more commonly thought of in terms
of bits and bytes.
Note 3:
Another way of looking at Signed
fields is that all characters of the field are "zoned", and that
the overpunch for all the characters
except the one that contains
the sign is F0 hex, and the overpunch for the sign is either C0 or D0 hex.
We have found it easier to explain signed fields using the first description,
and only present this view to be technically complete.
Disc Interchange Service Company, Inc.
15 Stony Brook Road
Westford, MA 01886