The Next Era of Data Usage: Infrastructure Implications of AI

The Next Era of Data Usage: Infrastructure Implications of AI

Solutions Review’s Premium Content Series is a collection of contributed articles written by industry experts in enterprise software categories. In this feature, Quantum Senior Director of Product Eric Bassier offers a commentary on the next era of data usage and the infrastructure implications of AI.

SR PremiumArtificial Intelligence (AI) is the most discussed hot topic of the moment, and it seems as if the possibilities may be endless – but the long-term potential impacts to enterprises is still unknown. When machine learning and deep learning are properly implemented, they have the potential to revolutionize entire industries. For instance, AI can use data to learn how to successfully diagnose diseases, be the first line of defense in ever-important fraud detection, or to customize a customer’s journey based on past history. AI models begin by being trained on existing data, and then once trained, they can also then go through future data on their own to continue that training. This use of AI may make finding future value in the rapidly growing amount of data being generated each day a little less overwhelming.

However, this “next era” of data usage has raised some questions and challenges for organizations everywhere. The data being generated and stored by organizations worldwide is scaling in volumes that are exponentially larger than anything before. In addition, 80 percent of all of that exponential data growth being collected is unstructured, which is far more complex to store and manage.

Simply put, legacy storage systems were not made for this level of scale. This is not a fault of legacy storage systems- the amount of data being generated is immense and not something we could’ve predicted even a decade ago. Because of the challenges this presents, it’s important to understand the need for this data, what the data does, and what it could do in the future – and therefore why legacy storage systems just aren’t cutting it.

The Process of Processing

To develop AI applications, we generally follow a three-step process for the data involved in that training process. First, we have data preparation, where the huge amounts of “raw materials” are translated into useable data. Next, software programs are then trained to learn a new capability from all of that processed data in what is called “model training.” Finally, there’s the inference stage, where the program applies this training to new data. This cycle occurs 24/7, which contributes to massive data growth. In fact, industry analysts project that the amount of unstructured data will double or even triple in capacity over the next few years thanks in large part to AI/ML applications and initiatives.

In addition, as AI continues to rapidly evolve, we’re now facing a data storage crisis. Applications are either suddenly requiring and relying on data to function because of machine learning, or they’re simply outputting data at a massive rate. Because we are still in the midst of the AI evolution, organizations still aren’t sure what data will be valuable and when, so most organizations are taking the route of storing everything for reuse, repurposing, and for extracting value from the data. That data includes the large datasets used for data preparation, and the datasets that AI, machine learning, and deep learning rely on to function. This data requires a storage solution that delivers high-performance, makes cataloging, tagging, and indexing that data for easy retrieval and reuse, and offers long-term, low-cost archiving capabilities.

The Numbers Game

AI/ML applications also put huge demands on storage system performance. Processing these massive unstructured datasets requires extremely low latencies and high performance. To this end, legacy disk-based storage systems simply can no longer keep up to meet these new performance requirements. Because of this, there has been unexpected growth in all-flash file and object storage, and this growth will only accelerate in the next five years, particularly as the price of flash decreases and as new architectures use modern memory technologies like NVMe and RDMA to enable ultra-low-latency distributed storage architectures.

Remember: all of this data doesn’t live in just one place. Data is usually generated outside the data center, whether by applications or physical objects, and then is moved elsewhere to be processed. Processing can happen in a public cloud, a private data center, or anywhere in between. This poses the additional issue of the management of this data across its lifecycle as it moves from one place to the next. And, as such, storage solutions must be flexible and have the ability to operate wherever the data resides.

Building for a Better Future

When you really think about it, most of the world’s unstructured data is stored on systems that were designed over 20 years ago. When these systems were designed, the idea of trillions of files and objects and exabytes of data that may need to be stored for decades wasn’t even a thought, so legacy systems simply weren’t built to scale to this degree.

As AI/ML adoption increases in popularity and ubiquity, it has the potential to improve our lives in every way, and it’s just getting started. This means that having the right storage solution in place in anticipation will benefit not only the organizations managing this data, but, ultimately, the customers they serve.

Eric Bassier
Follow Eric

Leave a Reply