I had just put this blog off to my archives, since long time. A sudden selfish interest has come now, I have decided to update this blog often atleast to enable me to ponder more things in SQL Server.
Let me look into the SQL Server Data Storage internals now. The most of the following descriptions are taken from the Microsoft technet. I have just organised it in a better manner to understand
We will split into three sections,
1. Heap storage
2. Clustered index storage
3. Non-clustered index storage
Though there are lot of sites to explain this (Esp. Books Online) , I ll try to explain this to understand easily.
Heap Storage
SQL Server uses the IAM pages to move through the heap. The first IAM page will be normally located in the starting part of the mdf file. An IAM (Index Allocation Map) page tracks approximately 4GB worth of space in a single file, aligned on a 4GB boundary. For the next 4GB, another IAM page will be allocated. This need not again be allocated in the start of the mdf file, this can be allocated anywhere in the mdf and IAM chains are formed with the previous IAM page. An IAM page has a header that indicates the starting extend of the range of extends covered by the IAM page. The IAM page then contains a large bit map in which eat bit represents a extend. The first bit represents the first extend covered in the 4GB pile (also called ‘GAM interval’) , second bit represents the second interval and so on.
.gif)
Accessing the data in the Heap:
————————————–
The data pages and the rows within them are not in any specific order and are not linked. The only logical connection between data pages is the information recorded in the IAM pages. Table scans or serial reads of a heap can be performed by scanning the IAM pages to find the extents that are holding pages for the heap.
Clustered Index Storage
In SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.
The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows. SQL Server moves down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.
.gif)
Non Clustered Indexe Storage
Non clustered indexes also have the structure of a B tree and the difference is that the leaf node of the index has index pages and not data as in clustered index.
Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.
The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following:
- If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).
- If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier.
I have tried to sort out the various data storage and access methods of SQL Server from various sources for best understanding. Hope it is useful for the reader.
Love,
CK.
References:
————–
http://msdn.microsoft.com/en-us/library/ms180978.aspx
https://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/24/645803.aspx