Converting IBM Mainframe Tape Files to PC Files

Disc Interchange converts thousands of mainframe tapes each year, and the vast majority -- well over 90% -- are written in a few standard formats, as described below.  If you will be converting a mainframe tape we encourage you to read this entire section, but if you don't have the time, please at least read Information You Should Get.

Mainframe Tape and File Issues

Mainframe computers most commonly use 9-track round tapes or IBM 3480, 3490, 3490E, 3570, 3590, or 3592 cartridge tapes, or third-party tapes such as StorageTek 9840 and 9940.  It's also possible to write mainframe tape formats to other media such as DLT, LTO, and QIC in the same manner as 9-track and IBM cartridge tapes.  Additionally, 4mm and 8mm helical scan tapes can accept mainframe file formats, although internally they record in very different ways than the linear tapes.
Some tapes can be recorded in different densities (for example 9-track can be recorded at 800, 1600, 3200, or 6250 BPI), and others (for example 3480, 3490, 3490E) can be written either uncompressed or compressed.  DLT, LTO, and QIC have several different models and capacities, using different tape formulations.   If you are unsure what type of media you have, see Identifying Media.
Mainframe computers use the EBCDIC code set, so most tapes are written in EBCDIC.  ASCII is perfectly valid, too.  If the data is binary, such as geographic image maps, the tape may be neither EBCDIC nor ASCII, but binary.  DISC has provided an EBCDIC and ASCII code table in our Tech-Talk section.
Tape Format:
Also called the "file structure" on the tape, this is the method by which data is written to the tape.  The most common formats are "Unlabeled", "IBM Standard Label (SL)", and "ANSI Standard Label".  Unlabeled tapes contain nothing but the data file, while labeled tapes contain "header labels" and "trailer labels" on the tape, which describe the data files.  The tape labels should normally be written in the same code set as the data files -- EBCDIC labels for EBCDIC data, etc.  However, if ASCII data is written to an IBM SL tape, we can still read it.
File Type:
Most mainframe tapes are written in either "Fixed Block" or FB format, or in "Variable Block" or VB format.  Binary data is sometimes written in an "Undefined" or U format.
FB, or Fixed-Block format is the most common, since mainframe data files are usually stored as fixed length records.  Fixed block tapes simply contain some whole number of fixed-length records in each tape block. This is commonly a round number like 10, 20, 50, or 100, but can be any whole number.  Tape blocks are usually 32K or smaller.
VB, or variable block format is used to store records that vary in length. Variable length records are common in complex files containing multiple record types, and in COBOL files with "Occurs depending on..." clauses.  There are different VB tape formats, but in general, each record is preceded by a numeric value indicating the length of that record, then several records are assembled to create a tape block, the length of that block is calculated and written to the tape, then the records are written.
U, or undefined format, is, by definition, not a defined structure.  The data may be written to the tape without regard for record size or structure.  The block size may vary from block to block.
Multivolume Tapes:
If a file doesn't fit on a tape, the system will write until the tape is full, then write a tape mark (which ends the tape file), and then write a special label (if it's a labeled tape), then continue on another tape.  DISC will append all the tape segments back together to create the original file, unless you request otherwise.  Labeled tapes contain a sequence number in the label, so we are assured of recombining the file in the correct order, but unlabeled tapes have no such check, so we must rely on the paper labels applied to the tapes.
File Contents:
The file may contain almost anything, from text to pictures, but most files contain EBCDIC or ASCII character fields, and may contain numeric values in a binary format.  See our Tech-Talk article Mainframe Data Types for a further discussion.  You should always get a layout of the file(s) on the tape. You will need the layout to process the data, and DISC may need it to convert the files.
The most common mainframe tape formats are:
DISC can convert nearly all combinations of:
  • Unlabeled tapes
  • IBM Labeled tapes
  • ANSI Labeled tapes
  • Fixed block
  • Variable block
  • Single or Multivolume tapes
On the following media:
  • 9-track
  • 3480, 3490 & 3490E
  • 3570 B & C
  • 3590 B, E, H
  • 3592 J1A
  • 4mm
  • 8mm & AIT
  • LTO Ultrium 1, 2 & 3
  • DLT & SDLT
  • QIC & SLR tapes, in several sizes
  • StorageTek 9840 & 9940
  • ...and others

More information about mainframe tapes and files may be found in the articles in our TechTalk section.

How to Read the Paper Tape Label

The paper labels applied to the tape should specify the file name (called the Data Set Name, or DSN), density, code set, tape format, file type, record size, block size, and tape sequence for multivolume sets. It may also contain the creation and expiration dates, and the record or block counts.   Not all labels will be complete, and the presentation of the information may be rather cryptic.  For example:  341 17050 FB SL 01/02 means the record size is 341 bytes, the block size is 17,050, the tape is in a fixed block (FB) format with standard labels (SL), and this is tape 1 of 2. It does not specifically say which SL format, or if the tape is in EBCDIC or ASCII.  Dates are usually written in a Julian format, which is the day-of-the-year, not month and day.  For example: 01091 is the 91st day of the year 01, or April 1st, 2001.  This may also be written like 2001/091. If there are two dates given, the earlier one is usually the creation date and the later one is the expiration date -- the date on which the tape can be erased and reused.

Information You Should Get:

The single most important thing to get is the file layout(s).  Without it you can only guess at the file contents. A layout should always accompany the tape; if you don't get it, call the tape supplier.  DISC can often convert a tape without the layout, and if you don't supply it we will try to work without it.  But if we have to convert specific fields (such as binary values) we will need the layout, and if we are performing any data processing we will need the layout.
If necessary, most of the other items above can be figured out by inspecting the tapes.  However, if you want an accurate quote in advance of processing the data, we will need to know what we will be getting and how much data we will be converting.  In any case, please provide as much information as you have, as it serves as a double check of the tape and data.

Specify the Converted File Type

Of course we need to know what type of file you want back from the conversion.  Please see "Converting Files to the PC Platform" for a brief discussion of PC file types. You may also wish to review our other conversion articles in our TechTalk section.
Whenever possible we recommend converting to a PC file type that is as similar to the mainframe file as possible.  In particular, if the mainframe file has fixed length records, as most do, we suggest converting to fixed-length ASCII whenever possible.  This will minimize the cost and avoid introducing additional complications.
However, if your application can't handle such a file, we can convert the mainframe fixed-length file to a delimited file.  There are several types discussed in the articles referenced above and in our Tech-Talk section.  DISC can also import the data directly into several different database formats.

Additional Information

For more articles on data conversion, see our TechTalk Index.

Disc Interchange Service Company, Inc.
15 Stony Brook Road
Westford, MA 01886