In my last post I described how unstructured and metadata-rich application workloads drove the rise of Network-Attached Storage (NAS). The diagram below allowed me to highlight differences between block and file system architecture.
Unstructured content benefits from metadata association. NAS systems provided the binding between the two. The approach used by many vendors involved the interspersal of content and metadata within a disk array infrastructure. Block-based systems of that era, on the other hand, viewed all blocks as “content”, and had no fundamental awareness of application metadata. The overlay below highlights this difference.
The NAS approach of tight interspersal of content and metadata became a hurdle for a new class of application workloads. To quote my EMC colleague Stephen Manley, these new applications wanted to do “even cooler” things with their metadata.
For example, applications wanted to:
- attach increasingly larger amounts of metadata to content.
- create formal ontologies for metadata (e.g. XML rules for metadata structure).
- search through metadata at high speed.
- enforce policies on content via metadata keywords (e.g. retention periods).
The increased importance that these new workloads placed on metadata drove the industry to treat metadata as a first-class citizen. The “interspersal” technique used by most NAS devices did not lend itself to the new workloads.
As a result, the industry evolved (yet again) in response to these new applications and facilitated the rise of object-based storage systems.
Object-based systems allow applications to “attach” rich metadata to content and bind them together via an object-identifier. Under the covers, object-based storage systems were not constrained to intersperse the metadata and the content. They could be stored as separate entities, which “freed” the metadata to be used in more diverse and beneficial ways. In fact, the content itself was “freed” from the linkage to a specific directory, which facilitated new levels of sharing and collaboration for content.
The implementation of object-based storage systems also gave vendors the opportunity to address additional shortcomings that NAS-based systems were experiencing at the time, including file size maximums and file count limits.
The first object-based implementation was termed content-addressable storage, or CAS. Wikipedia provides the definition of CAS below:
a mechanism for storing information that can be retrieved based on its content, not its storage location.
The diagram below highlights CAS function and operation in the context of one of the first CAS implementations (known as Centera):
Instead of using the traditional file-based access methods (e.g. file open, read, write, and close), the Centera approach allowed an application to write a random stream of data, associate it with relevant metadata, and store it as a package to the Centera storage system. In return the Centera system would return a unique identifier to the application.
This approach caused a fundamental shift in application architectures, which enabled:
- A permanent binding between file content and an unlimited amount of metadata associated with the file content.
- The removal of responsibility for “where” the application placed data. The application no longer had to specify a logical directory location for each file.
- Object counts could scale into the billions, well beyond the limit of many file system capacities at the time.
- The metadata contained keywords to implement policies (such as how long to retain a document and disallow deletion).
A third access pillar was added to the data center as a result of new application workloads. Many customers deployed all three access methods: block, file, and object. Capacity-based, object workloads are graphically depicted in the lower-half of our workload framework. Some object-based workloads required high service levels (e.g. hospital applications) while some did not (e.g. YouTube).
As a result of all three types of application access methods (block, file, and object), data and meta-data continued to grow unabated within customer data centers. This gave rise to a new problem: the growth of new forms of metadata related to the data center operation itself.
I’ll cover “The Rise of Metadata Part 2” in my next post.
image credits: universalmachine.com; stevetodd.com
Wait! Before you go…
Choose how you want the latest innovation content delivered to you:
- Daily — RSS Feed — Email — Twitter — Facebook — Linkedin Today
- Weekly — Email Newsletter — Free Magazine — Linkedin Group
Steve Todd is an EMC Fellow, the Director of EMC’s Innovation Network, and a high-tech inventor and book author Innovate With Global Influence. An EMC Intrapreneur with over 200 patent applications and billions in product revenue, he writes about innovation on his personal blog, the Information Playground. Twitter: @SteveTodd