At a minimum, every SQL Server database has two operating system files: a data file and a log file. Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database. Data files can be grouped together in filegroups for allocation and administration purposes.
A single database consists of one primary file, zero or more secondary files and one or more log files. The file extensions SQL Server uses conventionally are .MDF, .NDF and .LDF respectively. Typically, for a new SQL Server database a primary .MDF file and single .LDF file is created for the user.
SQL Server databases have three types of files, as shown in the following table.
|Primary||The primary data file contains the startup information for the database and points to the other files in the database. User data and objects can be stored in this file or in secondary data files. Every database has one primary data file. The recommended file name extension for primary data files is .mdf.|
|Secondary||Secondary data files are optional, are user-defined, and store user data. Secondary files can be used to spread data across multiple disks by putting each file on a different disk drive. Additionally, if a database exceeds the maximum size for a single Windows file, you can use secondary data files so the database can continue to grow.|
The recommended file name extension for secondary data files is .ndf.
|Transaction Log||The transaction log files hold the log information that is used to recover the database. There must be at least one log file for each database. The recommended file name extension for transaction logs is .ldf.|
MDF is more commonly known as the primary data file. It is believed that it originally stood for the master database file before the master database concept arrived. It contains a database header and system tables, and points to other files in the database. User objects and relational data can be stored in the primary data file and/or in many secondary files.
NDF is an instance of a secondary data file, N denoting the canonical term for ‘1..N’. Some databases do not need or have any secondary files. Separation of a primary and N secondary data files is primarily useful to allow systematic growth of the database across multiple disks. It is also very useful to increase IO performance, allowing less contention for file locks and allowing multi-threaded access to the data.
LDF is the Log Data File (the transaction log). One or more transaction log files can exist for a database and are used to help disaster recovery scenarios as well as ensure that operations on the data are ACID.
Within SQL Server, a number of files can be grouped into a logical container called a file group. This is the logical concept with which tables, indexes and transaction logs bind to storage, objects do not bind to MDF/NDF files directly.
Although you have access (at the operating system level) to the data files within a file group, the data will be split between many files in a file group, so do not be tempted to back up certain files within the file group only, treat all files within a file group as an atomic entity.
Each database contains at minimum one primary file group and zero or more user-defined file groups.
The Primary File Group contains the primary data file (MDF as mentioned above) and any other files which are not directly assigned a file group. Any system table allocations for the database are performed in the primary file group.
User-Defined File Groups can be created and destroyed by a user at will and are useful for dividing logical separations of data and to help manage growth. They may also be used to group data files which sit on similar disks, or RAID array so that you can partition by speed/fault tolerance.
Every database has a primary filegroup. This filegroup contains the primary data file and any secondary files that are not put into other filegroups. User-defined filegroups can be created to group data files together for administrative, data allocation, and placement purposes.
For example, three files,
Data3.ndf, can be created on three disk drives, respectively, and assigned to the filegroup
fgroup1. A table can then be created specifically on the filegroup
fgroup1. Queries for data from the table will be spread across the three disks; this will improve performance. The same performance improvement can be accomplished by using a single file created on a RAID (redundant array of independent disks) stripe set. However, files and filegroups let you easily add new files to new disks.
All data files are stored in the filegroups listed in the following table.
|Primary||The filegroup that contains the primary file. All system tables are allocated to the primary filegroup.|
|Memory Optimized Data||A memory-optimized filegroup is based on filestream filegroup|
|User-defined||Any filegroup that is specifically created by the user when the user first creates or later modifies the database.|
File and Filegroup Fill Strategy
Filegroups use a proportional fill strategy across all the files within each filegroup. As data is written to the filegroup, the SQL Server Database Engine writes an amount proportional to the free space in the file to each file within the filegroup, instead of writing all the data to the first file until full. It then writes to the next file. For example, if file f1 has 100 MB free and file f2 has 200 MB free, one extent is allocated from file f1, two extents from file f2, and so on. In this way, both files become full at about the same time, and simple striping is achieved.
As soon as all the files in a filegroup are full, the SQL Server Database Engine automatically expands one file at a time in a round-robin manner to allow for more data, provided that the database is set to grow automatically. For example, a filegroup is made up of three files, all set to automatically grow. When space in all the files in the filegroup is exhausted, only the first file is expanded. When the first file is full and no more data can be written to the filegroup, the second file is expanded. When the second file is full and no more data can be written to the filegroup, the third file is expanded. If the third file becomes full and no more data can be written to the filegroup, the first file is expanded again, and so on.
Rules for designing Files and Filegroups
The following rules pertain to files and filegroups:
- A file or filegroup cannot be used by more than one database. For example, file sales.mdf and sales.ndf, which contain data and objects from the sales database, cannot be used by any other database.
- A file can be a member of only one filegroup.
- Transaction log files are never part of any filegroups.
Following are some general recommendations when you are working with files and filegroups:
- Most databases will work well with a single data file and a single transaction log file.
- If you use multiple data files, create a second filegroup for the additional file and make that filegroup the default filegroup. In this way, the primary file will contain only system tables and objects.
- To maximize performance, create files or filegroups on different available disks as possible. Put objects that compete heavily for space in different filegroups.
- Use filegroups to enable placement of objects on specific physical disks.
- Put different tables used in the same join queries in different filegroups. This will improve performance, because of parallel disk I/O searching for joined data.
- Put heavily accessed tables and the nonclustered indexes that belong to those tables on different filegroups. This will improve performance, because of parallel I/O if the files are located on different physical disks.
- Do not put the transaction log file(s) on the same physical disk that has the other files and filegroups.
- Create 2 file groups. One group to keep data (PRIMARY) and one group for index (INDEX). Log file do not need a file group.
- Create one file (.mdf) for data on PRIMARY file group. Create one file (*.ndf) for index on INDEX file group. Create one file (*.ldf) for log which does not belong to a file group.
- Place different file on own drive. One drive for PRIMARY (X:) and one drive for INDEX (Y:) Log file on (L:)
- One PRIMARY file group. Create multiple file groups for data and multiple file groups for index.
- One file (.mdf) on PRIMARY. Multiple files (*.ndf) for data placed on multiple data file groups. Multiple files (*.ndf) for index placed on multiple index file groups. Create one file (*.ldf) for log which does not belong to a file group.
- Place different file type on own drive. One drive for data files (X:) and one drive for index files (Y:) Log file on (L:)