Windows Azure provides four types of storage: blobs, drives, tables and queues.
Blobs
Blobs are able to store large pieces of data including images, video, documents or code. They are stored in containers, which can hold any number of blobs. Blobs are referenced by a URL in this format:
http(s):// <client-account-name>.blob.core.windows.net/<container>/<blob name>
Alternately, blobs can be accessed by using the concept of a root container, for example:
http://<client-account-name>.blob.core.windows.net/$root/<blob name>
Containers create security boundaries for blob storage and access to blob storage requires awareness of a secret key. But access polices for a container can be modified to allow anonymous access. A blob-only access policy can be used. The blob’s URI must be known when this policy is in use.
A content delivery network (CDN) in Windows Azure, an optional feature, provides efficient distribution of blob content. Often the CDN stores blobs closer to the application that it uses or to a server that is geographically close to its users.
Blobs can be one of two types, block blobs or page blobs. For block blobs:
- Each one can store up to 200 gigabytes.
- Each is divided into data blocks containing up to 4 megabytes each.
- Blobs are optimized to handle streaming workloads.
- They are well suited for large items of data such as images, documents, code and streaming video.
- An API can be used for parallel uploads of block data.
- Failed uploads can be resumed for specific blocks.
The characteristics of page blobs include:
- The maximum size can be up to 1 terabyte.
- Each consist of an array of 512 byte size pages
- They are optimized for random access read/write I/O
- Require that write operations be aligned to a page while read operations can access any address within a valid range.
- Writing data is done in offsets that are multiples of 512 bytes.
- Clients are charged for page blobs according to the amount of data stored rather than by the amount of space used.
Drives
Under Windows Azure drives are page blobs that are single-volume virtual hard drives with a NTFS format. A single role instance can mount drive only in exclusive read/write mode. Multiple instances can mount a single drive in read-only mode. In a typical configuration one instance mounts the drive in read/write mode. At intervals a snapshot is taken of the drive. Then the snapshot is mounted in read-only mode for use by other instances. Any page blob operations can be performed on Windows Azure drives.
Once a drive is mounted by an instance or node, all data written is persisted within the blob. Using a storage concurrency control mechanisms called a lease, a lock is placed on the blog before writing is permitted to take place. These features makes Windows Azure drives appropriate for legacy applications that require the NTFS file system and standard I/O libraries.
Tables
In Windows Azure scalable structured storage is provide by tables. Tables are always related to a storage account. And Windows Azure tables do not function in the same manner as tables in a relational database since they do not make use of relationships or have schemas. Instead each entity in a table can have a different group of properties of different data types. Optimistic concurrency, based on time stamps, is used for updates and deletions. With optimistic concurrency any updates or deletions that cause a concurrency violation is rejected.
All entities in a table have three properties: PartitionKey, RowKey and LastUpdate. Windows Azure tracks the PartitionKey to determine if there is sufficient activity to automatically scale tables which could possibly distributed entities over thousands of nodes. The PartitionKey also ensures that related entities stay together. When combined with the PartitionKey, the RowKey uniquely identifies any given entity in a table. The LastUpdate is a system-controlled property.
The data that Windows Azure table storage returns comes in pages with up to 1,000 entities per query. When there are more than 1,000 entities, the returned result set includes a continuation token for receiving the next set. Tables provide no aggregation functions for counting or summing entities. These operations are left to be implementation on the client side. And within a single partition in a single table transactions are supported allowing creations, deletions, and updates of entities in a single atomic operation (a batch operation) within 4 MB payload limit.
Queues
The primary use of queues in Windows Azure is for communication with worker roles, usually for notifying or scheduling of tasks. They provide persistent asynchronous messaging with message sizes of up to 8 KB. Applications using queues should be idempotent, allowing an operation to be performed many times without the result being changed. They should also be constructed to handle messages containing flawed data (also known as poison messages) which will trigger an exception from the queue processor.
In our post next week we will discuss SQL Azure.