RAID technology, in a nutshell, uses math to rebuild missing data (e.g. data that was only present on a disk drive that has since failed). I often use the diagram below to depict a scenario where data from a failed disk is at risk. The missing disk should contain the value “3”. Mathematical algorithms can recover the value “3” by subtracting the remaining values from the number “10”.
Operating system and server vendors argued that the math should run at the server level. They reasoned that the CPU is the brain, and the brain has the smarts to solve the problem. Storage geeks argued otherwise; the failure permutations were so complex that the entire mathematical exercise needed to be offloaded to a storage CPU.
The storage geeks, in the end, were right. Ultimately the disk array industry was born.
The disk array approach promised (and delivered) “correctness”. Hospitals, governments, banks, and big business all relied on the mathematics running closer to the disks. If the data was ever delivered incorrectly (even just once!), the results could be tragic. Running the math in the server increased the likelihood of incorrectness.
A New Kind of Correctness
In 2013 I am seeing a very similar customer request, a very similar need for mathematical algorithms, a very similar architectural dilemma between server and storage, and an eventual solution which will drive innovation in much the same way!
Customers are no longer asking for the “bit-for-bit” correctness of the 1980s (they assume this problem has already been solved).
They are asking for “analytic correctness”, which is a very different thing indeed. They point their mathematical algorithms at vast amounts of data and wait for the correct answer.
Where should customers run these analytic algorithms? Shouldn’t they run on lighting-fast server CPUs?
The math should (once again) shift into the storage system. The analytic result will be more correct by running the analytic models closer to the data.
Correct analytic results are a function of processing a massive variety (and massive amount) of input sources. The more variety the better. The more volume the better. A traditional, CPU-heavy DBMS architecture can’t give you that. Here’s why:
- A traditional DBMS does not have nearly enough scale-out ingest ports for incoming data. As a result, the analytic models have access to less data, and the data is often “older”.
- There is too much data to be dragged out of traditional storage systems and brought up to the mathematical algorithms at the CPU level. Getting the correct answer takes longer.
- The variety of incoming data is structured and unstructured; these incoming streams get partitioned and add additional burden on the CPU to sort them out during modeling.
So what kind of innovation is needed to satisfy this new form of correctness?
In a nutshell, we need scale-out, shared-nothing storage systems that can ingest massive amounts of data and run the customer analytics in a highly-parallel fashion, right alongside the data!
I’ll be spending some time in 2013 diving down a bit further into the specifics of this approach (e.g. how to get the right answer).
From an innovation standpoint, EMC and VMware plan on capitalizing on this industry shift via the recently-announced spin-off: the Pivotal Initiative. As more and more customers look to leverage analytic capabilities in a cloud environment (and as cloud providers begin to provide those services), the Pivotal Initiative plans on building the enabling framework to make it happen.
Don’t get the wrong answer. Run the analytics closer to the storage and leverage both variety and volume.
image credit: www.abc.net.au
Wait! Before you go…
Choose how you want the latest innovation content delivered to you:
- Daily — RSS Feed — Email — Twitter — Facebook — Linkedin Today
- Weekly — Email Newsletter — Free Magazine — Linkedin Group
Steve Todd is an EMC Fellow, the Director of EMC’s Innovation Network, and a high-tech inventor and book author Innovate With Global Influence. An EMC Intrapreneur with over 200 patent applications and billions in product revenue, he writes about innovation on his personal blog, the Information Playground. Twitter: @SteveTodd