Importing Variable Data Files into Access

This TechTalk article deals with importing a file that has a varying number of fields.  Although this is discussed in the context of a fixed-field file, it also applies to delimited files.  This article was written in 2001 for Access 2000 and Access 97, and has not be verified for later versions of Access.  For detailed instructions on how to create an import specification for fixed-width fields, see the TechTalk article "Importing Fixed Field Files into Access".

What is a "Variable Data File"?

For our discussion, this is a file with records that do not all contain the same number of fields.  Some records may be missing one or more fields at the end of the record.  However, all the fields up to the end of the record will be in the same order and with the same size as all other records. That is, there will be no fields missing from the middle of the record, only from the end.  All records must be terminated with CR-LF (carriage-return and line-feed), or with CR or LF.  There are three primary causes of this type of file:

  1. Some applications try to be "helpful" when exporting a file, and trim trailing spaces. For example, if the last field of a file is, say, a secondary address, and that address is not used in some records, then rather than outputting spaces for that blank field, the program will terminate the record after the last field that contains data. This creates records of varying length, containing a varying number of fields.
  2. Occasionally a conversion program or communications method may trim trailing spaces from lines, to speed up the communication or reduce the file size.
  3. Some file layouts, particularly from COBOL, have a field or group of fields at the end of the record that repeat a varying number of times.  For example, one record may have 1 occurrence of that group, others may have 2, and still others may have 3 occurrences. See the example below, and our TechTalk articles "Occurs...Depending On"   and  "Reading COBOL Layouts" for more information.

How Do I Import this into Access?

First, the "variable fields" must be at the end of the record. This is almost always the case.  The approach you should take differs for trimmed records and for COBOL occurs...depending on.  The two situations are outlined below.   (Please see note 1 at the end of the page.)

When you have Trimmed Lines:

If your file has trailing spaces removed from the records (examples 1 and 2 above), then proceed this way:

1. Define the Access table (or import 'spec) to include all the fields.
2. Import the file normally, as if it didn't have short records.

When Access encounters a short record it will import all the fields up to the CR-LF, then upon seeing the CR-LF, it will fill the remaining fields with spaces or zeros, and advance to the next record. Of course, when it encounters a full size record, it will import that record normally.

When you have a COBOL Occurs...Depending On:

If the cause of the variable fields is an intentional variation in the number of fields, such as a COBOL Occurs...Depending on, then you have to decide how many occurrences of those fields you want to include in your Access file. It's possible to define none, some, or all of those varying fields, and Access will import the file correctly.  Let's look at an example:

A COBOL Occurs...Depending On Example

Let's say you  have a COBOL database which contains all the purchases each of your customers made in the past month, with one record for each customer. Each record would contain some fixed information for the account, such as account number, company name, address, etc., and then it would contain a group of, say, nine fields for each purchase made (order date, PO, buyer, ship date, packer, shipping method, shipping weight, invoice number, invoice amount).  If a customer places three orders in a month, then this group of nine fields will appear three times in the record for that month.

When the COBOL programmer designed this system years ago, he had to assign a maximum number of possible monthly orders.  Because there is virtually no penalty for using a high number in a COBOL variable record, he might have chosen a maximum of 100.  That is, there is a line in the COBOL layout that says "Occurs 1 to 100 times, depending on ORDER-COUNT.  The field ORDER-COUNT then tells you how many actual orders were placed that month, and thus how many occurrences of these 9 fields the record will have.

Because Access doesn't support the COBOL concept of "occurs depending on...", you would have to define 900 fields (plus the fixed portion of the record) to hold the maximum possible number of 100 orders.  Since Access is limited to 255 total fields, you can't import all 900 possible fields.  But let's say you know, either from examining the data or from experience, that no customer has placed more than ten orders in a month.  Now you only have to define ten groups of nine fields to accommodate the data. But what does Access do if there are only two orders a month, or if there should ever be more than ten?

If you have created a database with ten occurrences of the nine fields and then import a record with three occurrences, Access will correctly import the fixed portion plus the 27 fields (3 x 9), then find a CR-LF record delimiter, stop importing that record, fill the remaining fields with spaces or zeros, and advance to the beginning of the next record.

If your COBOL file has more than ten occurrences, Access will import the ten groups, which completely fills the Access record, then discard the rest of the fields up to the CR-LF record delimiter, then begin with the next record.  Both situations will work okay (although of course, all groups after the 10th occurrence would be lost), and the Access database will be in-tact.

For other options when dealing with repeating groups, see our TechTalk article "Occurs...Depending On".

Access Limits:

The following limits are important to keep in mind when importing large files into Access:
Access Database Limits
Parameter Access 97 Access 2000
Number of fields in a table 255 255
Number of characters in a text field 255 255
Number of characters in a record
(excluding memo fields & OLE objects)
2000 2000
Maximum table size 1 GB 1 GB
Maximum MDB size 1 GB 2 GB

If the file you need to import is larger than some Access limit, you should first consider removing unneeded fields to reduce the size. If the file is still too large, then you'll need to split it into multiple tables, generate a key, and relate the tables via the key. In Access 2000 the total of all the tables can be larger than the 1 GB table limit, but the MDB cannot be over 2 GB.

The size of the MDB includes all the indexes, queries, forms, reports, and macros, so the amount of actual data you can store in tables in Access 97 is less than 1 GB.  Be sure to leave some room for future queries, reports, etc.  In Access 2000 the MDB size has been increased to 2 GB, while the maximum table size is still 1 GB.

Additional Information

For more articles on data conversion, see our TechTalk Index.

Note 1: Microsoft states that when importing a text file, "...all the records must contain the same number of characters".  (See their Knowledge Base article Q209698)  However, the method above has worked for us every time we have tried it, and we believe this approach is sound.  Always double-check your data after import.

Disc Interchange Service Company, Inc.
15 Stony Brook Road
Westford, MA 01886