Personal Computers and small servers become more capable each year, and many databases previously found on mainframes are being ported to PCs. Disc Interchange has ported mainframe files as large as 250 GB to Windows, but some of our customers have had problems with these large files. The purpose of this article is to provide some insight into the file size limitations of Windows operating systems, so you can make informed decisions about how we should convert your files.
This article discusses some of the common file size limitations for PCs, and gives some brief examples. This discussion is limited to Microsoft products, and does not include the HPFS file system. If you want to skip the technical discussion and just see the bottom-line, there is a table of limits in the Summary. This article was written in 2003 and was reviewed in January 2010. The contents are still valid as-of 2010.
This article uses binary values for KB, MB, and GB, and denotes them as KiB, MiB, and GiB. That is, 1 KiB = 1024 bytes, 1 MiB = 1,048,576 bytes and 1 GiB = 1,073,741,824 bytes. The terms "megabyte" and "gigabyte" refers to the decimal value. See "What is a Gigabyte?" for a discussion of binary vs decimal notation.
As recently as 1990, an 80 megabyte disk drive on a PC was considered "huge". Today (2003), 40 to 80 gigabyte drives are commonplace on small PCs, and 160 to 200 gigabyte IDE drives are inexpensive and found on many small servers. The leap from 80 megabytes to 80 gigabytes is a 1,000 fold increase in disk capacity, but the ability of the operating systems and application programs to handle larger files has not increased correspondingly. Although a PC may have a 100 gigabyte disk drive, it's unlikely it could handle a file of 100 gigabytes.
DISC can convert TB files! |
There are four main issues which limit file size:
Each of these is discussed in detail below. In most cases the last two impose the smallest limits.
The maximum file size supported by the OS may also be limited by the method used to access the file, the number of files open, the amount of physical memory in the system, buffer and virtual memory settings, the number of processes running, etc., etc. For servers, some of these limits are the sum of all users who are active on the system. For simplicity, this article will not deal with limitations caused by individual installations or by multiple users on a server. We will deal with the fundamental limits of the system.
This is the largest disk drive you can install and still be able to access the entire drive. The limitations may be hardware or OS (Operating System) limits, or a combination. The disk controller BIOS may have limitations on the largest drive it can address, or the OS may have an inherent limit.
There have been many drive size limits over the years, for many different reasons -- too many to discuss in detail here. Common limits were 32 MiB, 504 MiB, 7.8 GiB, 32 GiB, and more recently 128 GiB. In the past these drive size issues limited file size, but that's seldom the case today, and this consideration has all but vanished. Consequently, we will only give two brief examples to demonstrate these two points.
An example of an OS limit is the MSDOS 7.8 GiB limit (commonly called the "8.4 gigabyte" limit). No matter how large a drive you connect, MSDOS won't see it as larger than 7.8 GiB. This is due to the INT 13 CHS addressing used in all versions of MSDOS, and installing a new motherboard or BIOS won't help.
The only hardware boundary likely to be encountered on current systems is the 128 GiB IDE limit caused by the 28 bits of LBA addressing of some BIOSes. Breaking this barrier requires a new 48 bit addressing method. Most hardware made since about 2000 can address IDE drives up to 128 GiB, and most new (2003 and later) designs can handle drives far larger than 128 GiB, using 48 bit addressing. A firmware upgrade will allow many older controllers to access the new large drives.
This is the largest size a disk volume (e.g. drive D:) can be. A volume is a logical partition on a physical disk (or in some cases on multiple disks), and is sometimes smaller than the physical disk.
For FAT volumes, the limit is determined by the maximum number of clusters that can be addressed, times the cluster size. A FAT-16 volume, which uses a 16 bit pointer for the File Allocation Table (FAT), can have almost 64K clusters (2^16 = 64K). The maximum cluster size is usually 32KiB, so the maximum volume size is 64K clusters times 32K bytes, or 2 GiB. Some operating systems permit 64K clusters, so their FAT-16 disks can support 4 GiB volumes. FAT-32 volumes use a 32 bit pointer, and can generally be up to 2 TiB in size.
NTFS volumes are structured in such a way that they are not limited by a fixed-size map like the FAT volumes are. Microsoft claims an NTFS volume can be up to 16 Exabytes, although there are issues that arise over 2 TiB. But in any case, it can be at least 2 TiB in size.
Systems such as NT and W2K allow you to combine multiple disk drives to form one NTFS volume. So even if your disk controller BIOS has a 128 GiB limit, you can combine, say, four 120 GiB drives into one 480 GiB volume.
This is the primary Operating System file size limitation. The maximum file size the OS can handle is generally determined by the smallest of the following:
Now things start to get more complex. The "fundamental OS limits" means the limits imposed by the code in the OS. These include such things as file pointer size, which determines how much space can be addressed, access methods, buffering schemes, buffer sizes, etc. The "disk formatting scheme" refers to the disk's file structure, such as FAT or NTFS, which impose various limits on both volume size and file size.
The technical details are lengthy, and far beyond the scope of this article, but we will present a couple different examples to illustrate these issues. Actual limits are shown in the Summary, below.
Example 1: MSDOS
MSDOS uses a file handle
that has a 32 bit pointer for addressing the file. This would limit
MSDOS file size to 4 GiB (2^32 = 4 GiB), but because the FAT-16 volume size
which MSDOS uses is limited to 2 GiB, effectively MSDOS is limited to a
2 GiB file. This is a case where the disk formatting scheme limits
the file size. (In fact, it's possible for MSDOS to create 4 GiB files
on a network server, where it's not limited by the FAT-16 volume size.)
(It should be noted that some third party vendors found ways around some Microsoft limits, using tricks such as increasing the cluster size. This article only deals with unmodified Microsoft products).
Example 2: Windows NT
The issues with NT become
much more complex. NT supports both FAT and NTFS file systems.
FAT file systems are limited to files of 2 or 4 GiB, so the file system
becomes the limiting factor when you use a FAT volume. But NTFS supports
large files. The OS itself may have limits built-in, and it becomes
the limiting factor when using NTFS volumes.
For NTFS volumes on NT, Microsoft states: "Volumes much larger than 2 terabytes (TB) are possible.", and "File size limited only by size of volume.". This implies you can have files "much larger than 2 TB" on NT. Although this is a true statement for the NTFS volume, files that large are not directly supported by the NT operating system. NT 4 is limited to about 92 GiB for all open files, due to hard-coded limitations in the OS. (We believe a Service Patch has increased the NT limit to about 180 GiB, but can't find any documentation on it). To open larger files you would have to bypass the OS disk I/O routines and use DLLs to read and write raw sectors, and you would have to perform your own blocking and deblocking (basically write your own disk I/O routines). Although you can have a 1 terabyte file on an NT, you can't do much with it. You can't even COPY it, or TYPE it, and you certainly can't open it with most normal applications.
Windows 2000 had a similar limitation in the initial release, but Service Pack 2
corrected that problem, and according to Microsoft, Windows 2000 can now open files
"of arbitrary size". We believe that's also true of XP. We have opened files of
several hundred gigabytes in both Windows 2000 and XP.
Applications (languages, databases, specialty applications, etc.) will have their own file size limits, depending on how they were written and/or what language they were written in. It can be surprisingly difficult to find such specifications for many products. Furthermore, the type of file and the way you open it can effect the maximum size supported.
It should be noted that some operations may work (or appear to work) past these limits, while others will fail. For instance, sequential reads or writes may work past the stated limit, but if you try to reposition in the file, it will fail, and it may fail when you close the file.
Out of curiosity we once tried creating a large file on NT 4 using QuickBasic. Since QB is an MSDOS program, we expected it to fail at 4 GiB. While some operations did fail at 4 GiB, we were quite surprised to find we were able to create files over 100 GiB using a simple "print" statement. But when we tried to close the file, we got a "Bad record number..." error. However, as we expected, QuickBasic fails immediately with a "Path/File access error...." if you try to open a file over 4 GiB. All I/O operations appear to work correctly up to 4 GiB.
Visual Studio (at least the versions we know) on NT or W2K can open any type of file in any mode, up to 2 GiB. Over 2 GiB some modes or operations fail, while others will work up to 4 GiB. You can read and write text files with the "OpenTextFile" method up to 8 GiB, but cannot access binary files with that method. Over 8 GiB for text files, or 2 GiB for binary files, you need to use DLL calls for file I/O.
With the understanding that this is a complex issue, and there are many exceptions to such statements, here is what we can say in general:
* When we say "applications will work up to 2 GiB", this doesn't mean, for example, that your favorite word processor can edit files of 2 GiB. It means the file I/O will work, but other code in the word processor may limit how large a file you can actually edit.
Our experience with programs that claim to handle "files of unlimited size" is that many have failed at either 2 or 4 GiB, and most have failed over 8 GiB. And as mentioned in the section above, NT and un-patched Windows 2000 have hard-coded buffer limits.
There has recently (2003)
been a trend towards supporting larger files. PKZIP and WinZip have
just introduced new versions that will handle big files. We have
successfully used PKZIP on files over 180 GiB and expect WinZip would work
as well. But keep in mind those files cannot be unzipped to a Windows
98 system or any drive with a FAT file structure.
Although today's PCs support huge drives of hundreds of GB, they are often unable to access files of that size. The limiting factors are the disk structure (e.g. FAT), the internal code in the Operating System kernel which limits the address space, or the inability of the application program to address large files.
All Microsoft Operating Systems from MSDOS 5 to XP can access files up to 2 GiB without restriction. Systems using FAT-32 can access files up to 4 GiB if the application program supports it. Systems using NTFS can access files via normal disk I/O calls (e.g. "Open <file>"), up to 2 or 4 GiB under all conditions, and can access larger files under certain conditions. But over 4 or 8 GiB, NTFS systems usually require the use of DLLs for file I/O. More details can be found in the section "Maximum OS File Size", above.
Many applications and even some languages have trouble accessing files over 2 GiB, but the main barrier is at 4 GiB because 32 bit pointers are common, and 2^32 = 4 GiB. Only programs specifically written to access large files will break the 4 GiB barrier. And, of course, only on NTFS volumes, not on FAT volumes.
Failure modes and messages vary. Most often you will get a simple read or write failure message, but you may see a less meaningful message like "Bad record number", "Error positioning in file", or something similar. Sometimes a write will report "No space left on device", and sometimes the application will appear to terminate normally, but the file size will be truncated when writing, or you will be missing records on input.
You should also be aware that accessing large files over a network has additional considerations. For example, when Windows 98 accesses an NT, W2K , or XP directory via the network, it sees file size modulo 4 GiB, so a file of 5,000,000,000 bytes on the NT will appear as a file of 705,032,704 bytes (5,000,000,000 - 4 GiB). The application program also sees this size, and in many cases it will read either 705,032,704 bytes, or 4 GiB, then terminate "normally", giving no clue that data was lost.
The best advice we can give when working with large files is to check everything you do. Check each file you create for the correct size, and check that the number of records you read from a file is correct. If you're writing a program, remember to use data types that can handle the large values required.
The table below lists the
maximum file and volume sizes for various disk structures and operating
systems. Notice that the Volume size of the FAT structure is also
dependent on the operating system. As noted above, there are many
exceptions and complexities that could alter these limits, most notably
the application program used.
Disk Structure | Operating Systems | Maximum Volume Size | Maximum File Size |
---|---|---|---|
FAT-16 | MSDOS 5-up, Win-95, 98, ME | 2 GiB (Note 1) | 2 GiB (Note 2) |
FAT-16 | NT 3.51, NT 4, W2K, XP | 4 GiB | 4 GiB |
FAT-32 | Win 95 OSR2, 98, ME, XP | 2 TiB | 4 GiB |
FAT-32 | W2K | 32 GiB (Note 3) | 4 GiB |
NTFS | NT 4 | Over 2 TiB (Note 4) | 180 GiB (Note 5) |
NTFS | W2K (SP2), XP | Over 2 TiB (Note 4) | Only limited by volume size (Note 6). |
Notes:
For more articles on data conversion, see our TechTalk Index.
Author's note: The original draft of this article attempted to explain many more technical issues, but became unbearably long and complex. This revised version, on the other hand, is superficial in many ways. I would appreciate your feedback on what is most useful to you; would you like to see more technical details, or just a summary of the limits? Please email your comments to: and reference "Large files" in the subject. Thank you.
Our Large File Conversion Services |
|||||
|
Disc Interchange Service Company, Inc.
15 Stony Brook Road
Westford, MA 01886