Simple Data Conversion Tutorial:

(DISC has published two data conversion tutorials. This is the simple overview of data conversion. For a more detailed look, please see our Intermediate Level Data Conversion Tutorial.)

Data Conversion is the generic term given to the process of converting computer data between different applications and/or between different computers.

The Four Issues of Data Conversion

Data conversion involves up-to four different issues. A conversion may involve any combination of:
Need to convert Data? Request a Data Conversion quote
That's our business!

  1. Media.
  2. Logical Tape format.
  3. File type.
  4. File content.

Media

The media (this article discusses tape) is the most obvious issue. If you get a DLT tape, you will need a DLT drive to read it. You will also need the right model DLT drive; there are 15 different DLT drives and they all write different physical recording formats.

Furthermore, the type of tape does not always indicate the physical recording format, and therefore the drive you need.  A DLT IV tape can be used in DLT 4000, DLT 7000, DLT 8000, DLT-1, and VS-80 drives, and they write to tape differently. You simply can't tell by the type of tape what physical format has been recorded on it.

Logical Tape Format

The logical tape format, also described as the file structure on the tape, is the most misunderstood conversion issue. There are hundreds of programs used to write files to tape, and each one does it differently. In nearly all cases the tape program creates a data structure on the tape -- sort of like a container -- to aid in storing and retrieving the files, then places your files within that structure.  In order to retrieve your files you will need to extract them from that structure using the same program, and sometimes even the same version, to read the tape as was used to write the tape.

There is more information about tape formats, and examples, in our Intermediate Data Conversion Tutorial.

File Type and File Content

The "File Type" we are discussing here is the file type on disk, before it is written to tape, or after it is restored from tape.

The File Type and File Content are closely related, with overlapping issues and interactions. What "File Type" and "File Content" refer to depends on both the operating system and the kind of file. Furthermore, the issues are different for mainframes and PCs, and for different kinds of files, so it's difficult to make global statements about either. What follows is a simple overview. Our Intermediate Data Conversion Tutorial contains a more complete description.

File type

File type refers to how the file is stored on disk. In the case of mainframe computers, the "how" is handled by the operating system, while on Windows, UNIX, and Macintosh computers it's handled by the application program. So "file type" has very different meanings and implications on mainframes than on PCs.

Regardless of where it is handled, it refers to the kind of file. Under an operating system that uses structured files, such as a mainframe, "file type" describes, for example, an indexed or sequential file, with fixed-length or variable-length records, and likely other file parameters such as the record length or type of indexing. Under operating systems that don't use structured files, such as UNIX or Windows, "file type" commonly refers to the application that created the file, such as "a Microsoft Access file", or to some common file type used by many applications, such as a comma-delimited file.

This is discussed in much greater detail in our Intermediate Data Conversion Tutorial.

File content

File content refers to what is stored in the file, and what is stored in the file depends on what the file is -- text, word processing, database, spreadsheet, binary data, object file, executable, etc.

So File Content encompasses many concepts, and takes on different meanings for different types of files. For database files it may mean character fields versus binary fields, EBCDIC versus ASCII, etc. It may also include issues such as redefined fields or redefined records (multiple record types in one file). When used to describe a data conversion, "file content" generally does not refer to the specific data in individual fields or records in the file, such as "John Smith" or "Jane Doe", but to the method or data type used to store that data.

File Content may also be dictated by the application. For example, the content of an Access file is controlled by Access and the layout you specify when you create the file.

The issues are numerous, and are discussed in greater detail in our Intermediate Data Conversion Tutorial.

Specifying a Data Conversion

Before we can convert your data we will need to identify all these issues for the source tape, and determine what you want back on the destination tape or disk. Let's look at a very simple conversion of a UNIX text file to a PC text file. Let's say you receive a DLT-IV tape in tar format, containing a plain-text file created on UNIX. So far we know the following:

  1. Media: DLT-IV media
  2. Tape Format: tar file
  3. File type: UNIX text file
  4. File Content: Plain text (no word processor codes).

DLT-IV tapes can be recorded in different physical recording formats, but the recording format was not specified and will have to be determined. A tar file was specified, but the exact type of tar file and the block size were not specified and will have to be determined.  The file type is a UNIX text file, so it will use standard ASCII characters, and each line of the file will end with a UNIX Newline. The File Content will be text only, with no word processor codes.

Now that we know what we have, it's time to specify what you want back. While we have primarily discussed tapes in this article, DISC commonly delivers PC files on CD or DVD, so let's specify that:

  1. Media: 74 minute CDR
  2. Format: Windows Joliet format
  3. File type: PC text file
  4. File content: Plain text.

After determining which tape drive to use, we would then inspect the tape to determine the tar block size and tar type. We would then extract the text file from the tar file, and convert the UNIX Newline to a carriage-return line-feed pair for a PC. The converted file would be written to a 74 minute CDR in Joliet format, for use on a Windows computer.

Our Intermediate Data Conversion Tutorial presents an example of converting a mainframe data file to a PC file for Access.

Summary

Converting files between computers can involve any combination of the following four issues:
  1. Media.
  2. Logical Tape format.
  3. File type.
  4. File content.
The Media means the tape, but the media specification should include not only the type of tape, but the drive it was written in, as tapes are often used in several different drive models. The Logical Tape Format is determined by the backup program or tape utility used to write the tape. File Type means how the file is written to disk, and the File Content means what is contained in the file. The File Type and File Content are determined by both the operating system and the application that created it.

Some important issues may not be explicitly given. For example, most mainframe tapes will be in EBCDIC, but that may not be specified, just like most UNIX, PC, and MAC tapes will be in ASCII, but that may not be specified. You will have to deduce it from knowing what computer the tape originated on, or by inspecting the tape. A PC will not understand an EBCDIC file, so it needs to be converted before the PC can use it.

If you are getting a data file, you will need a record layout that specifies the fields in the file. If some of those fields are in binary format, they will probably need to be converted by us, as binary data types are generally not compatible across platforms (computers and CPUs).

Before submitting a conversion, you should try to get as much information as you can, and give some thought to what kind of file you want back. The more accurately you specify your conversion, the better job we can do for you.

More Detailed Information

This has been a simplified overview of Data Conversion. Greater detail, and references to other Disc Interchange articles, is available in our Intermediate Data Conversion Tutorial. DISC has also published numerous data conversion articles, via the link below.

Additional Information

For more articles on data conversion, see our TechTalk Index.

Our Data Conversion Services

Disc Interchange Service Company can convert most tape and file formats, including all the IBM mainframe EBCDIC data types, and most ASCII data types from PC, UNIX. and VAX systems. Our library of conversion routines permits us to handle those difficult jobs that others can't convert.
 
Mainframe & AS/400 Conversions
Mainframe & AS/400 Conversion to PC

With 32 years experience, we are the experts at transferring mainframe data to PCs.
Get more information on IBM Mainframe conversions
Request a COBOL quote

Disc Interchange Service Company, Inc.
Media Conversion Specialists
15 Stony Brook Road
Westford, MA 01886

Copyright © 1997 - 2015 by Disc Interchange
All rights reserved. See our copyright page.

Home