
SNIA Experts on Data
Listen to interviews with SNIA experts on data who cover a wide range of topics on both established and emerging technologies. SNIA is an industry organization that develops global standards and delivers vendor-neutral education on technologies related to data.
SNIA Experts on Data
Why Data Storage Matters More Than Ever
Data storage technologies continue to evolve to keep pace with massive data creation. In this episode, SNIA experts discuss the innovations, challenges, and importance of data storage solutions and standards to handle ever-growing digital demands. Tune in to learn about the fundamental aspects of storage technologies and how SNIA and the industry are addressing data storage—from SSDs and HDDs—to cutting-edge solutions like DNA data storage as our experts discuss:
• Introduction to the SNIA community and its objectives
• Emphasis on the significance of data storage in today's technology landscape
• Overview of key storage technologies: SSDs, HDDs, and archival solutions
• Exploration of the "save-discard dilemma" impacting data management
• Insights into storage needs driven by autonomous vehicles and AI applications
SNIA is an industry organization that develops global standards and delivers vendor-neutral education on technologies related to data. In these interviews, SNIA experts on data cover a wide range of topics on both established and emerging technologies.
About SNIA:
All right, welcome everybody to the SNEA Experts on Data podcast. My name is Eric Wright. I'm the chief podcaster here at the SNEA EOD and also the co-founder of GTM Delta. I'm excited by the group we've got. Today we're talking about the data focus areas. If you've been following the series, it's a great way to break down everything across the whole SNEA tech community ecosystem and content and ideas to the marketplace and how we can really bring people across these different data focus areas and different practice areas to really understand who does what. And we actually very quickly find out that we all do amazing things. And speaking of amazing things, here's my amazing group. So I'll start with a quick round table. Jason, you want to lead us out and tell us? Give a quick intro yourself.
Speaker 2:Yeah, hello everyone. My name is Jason Molgaard. I'm a principal storage solutions architect at Soladyne. I work in a pathfinding and advanced development group and do a lot of work with SNEA, and I'm very excited to be here today.
Speaker 1:All right.
Speaker 3:Dave, I'm Dave Lansman. I'm a distinguished engineer at Western Digital. I'm also on the SNEA board and I do a lot of work in SNEA committees, things like that and I'm a background as a system architect and a standards guy.
Speaker 1:Fantastic. And last but very certainly not least, Craig, let's bring you to the group here.
Speaker 4:So, excuse me, my name is Craig Carlson. I work for AMD as a storage and networking architect in the data center strategy group. My relationship to CINEA I've been on the technical council for a number of years. Relationship to SNEA I've been on the technical council for a number of years. I've been in the storage industry for even longer, probably 25 years doing different types of storage fiber channel, ethernet, networking based storage, things like that and I'm happy to be here Fantastic.
Speaker 1:I often wonder every time I talk to I'm an old, I've been around this thing a long time Whenever we have a storage team, I've never met a. I've been in storage for two years Like we need to get a fresh batch. I'm sure there's a lot of them out there, but maybe I'm just talking to the wrong means.
Speaker 4:It's a problem.
Speaker 1:It is a problem. Well, that's actually. I tell you, it's not, it's just an doing with SNEA is really important, because a lot of people don't even know the opportunities there. So, you know, based on that, Craig, you've obviously seen really cool changes in your time with SNEA. And let's talk about what store is, because that's really the center of this one today is the store data focus area, and I think the best you can tell us what it is and what it covers.
Speaker 4:Well it comes down to, if you look at the SNEA areas of focus, the primary job of storage and everything that we're doing is to store the data, of course, and there is a lot of work that goes into making sure that data is stored correctly and stored in a way that you can actually read it back and it's not just write-only.
Speaker 4:That's probably one of the biggest concerns that any storage architect has is that the data gets to the storage device and returns in the same form with all the bits intact. Any of us who have been doing this for a long time have spent many, many hours working to make sure that's the case, and we have kind of a victim of our own success, to the point where people don't think about storage or the store portion because it works so well these days. Your data goes on the drive. You know that drive has who knows how many billions of bits on it these days, or trillions of bits on it these days, and when you ask for it it comes back off, and so you know we're kind of a victim of our own success and the fact that this stuff works so well. It wasn't always like that. I've, you know, going back in my early years, you weren't always guaranteed to get your data back, which you know would upset people, but we don't have that situation today.
Speaker 1:Yeah, people get a little funny about that. It's like backup. You always say like I've got a great backup product. I'm like, yeah, is it a good restore product? That's the I need. The data has to come back.
Speaker 4:Yeah, and backups are one of the ones. That's the biggest area where you know, no matter who you think, it is always check your backup.
Speaker 4:Amen to that, and you know as far as, like the, where SNEA comes in in this, craig, what are some of the sort of milestones that SNEA aims for? When we think of, you know, a store as a data focus area, well, you know, it's really the core aspect of SNEA and so really, if you look at every other, pretty much all the other components go to that store component in some form or another. You know, and SNEA has a wide range of different areas we operate in that make that work, all the way from, you know, the actual protocols that go into the devices to the. If you look at the SSF committee which is working with the form factors and moving the bits, so SNIA has this very wide range of technologies which go into making the store component function.
Speaker 1:And I think that's probably a good chance where we can dive into some of the interest areas and specific technologies that we have. So I think, dave, do you want to walk us through Because I know you did a great write up that helped us to understand. I was like, wow, there's a lot in here. So if you want to tell us what are the different technologies that we're touching and also, you know, a real sort of snapshot of why these are critical pieces of what we have going forward, of why these are critical pieces of what we have going forward.
Speaker 3:Sure, so yeah, in the store area we have a number of interest areas like non-volatile memory zone storage, dna data storage. There's others, and you know SNIA looks across all of these interest areas because we need to look across the whole range where data is stored, both from the very hot, rapidly or frequently accessed layer, non-volatile memory typically SSDs to cooler tier of storage with HDD and finally to slower and larger tiers of storage archival, where media such as tape have been used historically and now some new media types are being looked at. So the reason these are all critical is due to different cost performance tradeoffs related to how the data is being stored and used and critically, especially in the colder archival tiers, how much data is being stored, and we'll all come back to that. So in the SSD layer, data is written and read very frequently and it's read at random addresses across the storage address space, and performance and latency are really the critical characteristics in the storage tier, whereas cost and capacity relative to the other tiers is a bit less important. So in the SSD layer we're very performance sensitive and we make performance sensitive optimizations such as flexible data placement.
Speaker 3:It's a standard that's evolved in NVMe and we're always trying to optimize the performance of SSDs and lower their wear. So we touch the NBM layer, the SSD layer, there. If we go to the HDD layer, we write data at a much more leisurely pace, but we still need access to that data relatively frequently. So we want the access latency enabled by HDs. But we also want maximum capacity. So one of the means of gaining higher capacity in HDDs is something called shingled magnetic recording, or SMR Now in SMR drives, to maximize capacity. This is how we maximize capacity. We write data in slightly overlapping physical tracks on the media.
Speaker 3:It's like shingles on a roof, which is why it's called shingle magnetic recording. So the tradeoff for SMR density, however, is that the data must be written to these drives sequentially. So, unlike SSDs, where we're doing a lot of random writes and reads, no-transcript, and the host applications and file systems that talk to SMR drives have to effectively manage these zones and, whereas on SSDs or classic random access devices, they manage small blocks or sectors. So zone storage to come back to it, which is one of our interest areas, is a set of standards for how drives, how storage, the character, the way in which zone storage is used. Now for archival tiers. Getting to another interest area, we talked about DNA data storage or tape or other things. So in archival tiers, the data is written and read much less frequently than the HDD layer, and it may even be stored off-site, right. So the key attribute of storage in this tier is not real-time access, but it's the overall cost of the storage, power, density and endurance over time. So tape has occupied the archival layer in storage for decades.
Speaker 3:However, in the realm of archival storage, we're facing today what I call the save-discard dilemma, and that dilemma is that humanity is digitizing all forms of information at ever greater rates and we wish to save all of that data because of the value that may be extracted from it, either now or in the future. And the ability to extract data is especially true with the advent of AI and ML techniques, so we can search patterns and data. So we have this, you know, mountains of data. We want to be able to search it and we have better tools to do so. The problem is the amount of data is becoming so large that with today's storage technologies from SSDs to HDDs, even to tapes it's getting too costly, so we're forced to discard some of the data due to cost.
Speaker 3:So hence the save-discard dilemma. So it's because of the save-discard dilemma that storage architects are looking for new media types for archival storage, and this is why SNIA got into the DNA data storage arena. Dna data storage is one of the potential new media types to be able to play at this archival media layer, and there are other media types, whether they're ceramic or glass, that people are looking at, I guess. In summary, I'd say that all of the areas of interest that are shown in SNEA's store area are about optimizing the layers of the storage hierarchy, both within each layer and between the layers, so we have an optimal solution up and down the stack.
Speaker 1:You mentioned AI and new use cases that come up, because what's happening is what we would just say we're going to put this stuff away. It's there for the supposed seven-year cycle. While we keep data, we send it off to an Iron Mountain or an equivalent, and then we bring it back when we need it which is hopefully never and then it gets destroyed and it just seems to live in this indeterminate space like Schrodinger's data. And then now we're saying, well, wait a minute, we need to bring that data back because we need to train LLMs, we need to digitize it and marry it to OCR data that we're pulling with new client information. So we're finding new ways to use old data which is sitting on old media or or just, you know, diverse media. So I think that's probably why these standards being developed as a cohesive industry group so important, because everybody writing software now or writing building hardware now to consolidate the stuff or bring it back.
Speaker 1:If you don't know what it came from and we don't standardize on that, then it's the Wild West. And you know we saw what happened even to VHS and beta. It was 50-50. Never seen a beta machine after that. Dvd, hd DVD you know there were two standards and one wins. That's right.
Speaker 3:Yeah, and you know it's not just the standards are critically important. It's not just the standards, though, like when you talk about long-term archival, which is only part of the picture. We have cold archives, warm archives, et cetera. Right, cold archives, warm archives, et cetera right. But when you're talking about very cold archives, you have the problem of if I pull this data back after 100 years, do I still have the reader? How do I do that? Or do I have to do fixity checks, or how do I ensure that the data stays reliable in the media over 100 years? So there's many aspects to the long-term archival storage problem that need to be looked at when we think about things, new media types that are not magnetic tape or not HDDs, et cetera, so DNA glass, some of these other medias are really interesting in respect to with respect to some of these long-term archival problems.
Speaker 1:Yeah, and otherwise we end up I saw silo, I know what happens. You end up having to go way down to find somebody who has an old computer to plug that thing into, right, but that's it. You know this, the fact that media is changing, and even the reuse or the extension of the durability. So there's original durability and then we're making innovations in power consumption, in the way we write data, the way we organize it before it gets there, so that even the original estimated durability is actually getting lengthened by a lot of the work that's being done by folks in this community where, you know, I've been told over and over again. You know, get get your data off that hard drive, it's just going to eventually die.
Speaker 1:Well, I've got a lot of old hard drives and a lot of old usb sticks and they seem to all be there as long as I keep, you know, keep the weather right. But as far as people go, who are the people that can really benefit from being involved when looking at store as a data focus area? I know all of you have probably got different ideas on this, jason, I'll start with you, just because I've left you out so far. But who are the personas and titles and kinds of folks that you find that are getting benefit from being a part of this.
Speaker 2:Sure, I'll start with one.
Speaker 2:I think storage architects, for sure, and IT architects those people who are building up the systems in data centers to store all this data at the various levels that Dave described centers to store all this data at the various levels that Dave described they need to be aware of all of the new technologies and not just what is the spec state, but actually how do I use it.
Speaker 2:And that's where a lot of the work comes in. Snea is how to take, for example, dave mentioned flexible data placement. Great, you can go read the spec, but how do you actually deploy that? How do you make it work in your environment or the environment you're trying to create, or you know thinking of archival. So you know, maybe DNA is not the direction you're going today, but it's good to be aware of that because it's probably a direction you're going to be pursuing later in, you know, as the technology continues to advance and mature, and so having that knowledge of not only how to do things today and where are things going tomorrow is going to make your implementation of a storage system, you know, not only functioning today but future-proof for the next generation.
Speaker 1:Craig, what have you seen, especially again as you've been participating in a lot of these twigs and you've gone between many and seen many different companies and folks joining the group to drive this stuff?
Speaker 4:joining the group to drive this stuff. Well, you see, a lot of you know a lot of these companies are coming in from a competitive background and they still get together because it's an important thing to have interoperability for the industry and for the customers. And so, you see, a lot of you know it's. It's kind of, in some ways, making standards is kind of like making sausage you, you, you, you, you, you, you like the results, but but sometimes it can be kind of messy and and that's a lot of what you see when you, when you go to some of these standards bodies, um, you know the, uh, the politics and and and and the things that go into it, and hopefully at the end of the day you come out with a good standard.
Speaker 4:And that's really the goal that everybody has in these groups is to make sure that you know not only are their own interests covered, but the interests of the industry as the whole have to be also covered as well, and to make sure that the customer is put first. You know, I've seen some groups that that doesn't happen and then you don't get a good result. So really, you know, you need to have that mindset of. You know we're here for the industry, you're here for the customer. Most groups do work that way. So for the most part, most of these groups do work pretty well.
Speaker 1:Yeah, good old OSI Layer 8. It always stinks in. This is one of the communities that I find in talking with folks from IEEE and IETF. These are tricky groups to get together and come to agreement on, but, as we know, we move faster together. For sure we're we're like a Peloton of innovation. So, uh, dave, what's your experience been as far as folks that you've maybe even been surprised by how much they can benefit by understanding the?
Speaker 3:the, the store data focus. Well, I I mean, I think anybody who's involved in computing tasks of any kind in computer systems Storage is central to that. So, whether it's, you know, university librarians, national archivists, scientific computing centers, you know, any user that has to manage a lot of data has to deal with store. So it's just a universal currency of our industry. It's a little boring in some ways, but it's really quite interesting from a technology perspective, even if it's boring at a high level. And one thing I wanted to talk about it's not also Craig was talking about emphasizing standards again I wanted to talk about it's not also Craig was talking about emphasizing standards again.
Speaker 3:But I wanted to pick up on something Jason said about what SNIAA does in particular in terms of using storage. Like, if you take something like flexible data placement, which was a standard developed in NVMe, but the data placement there's a lot of ways it's in the name of flexible data placement that it's flexible and so what kind of granularity do you use when you set up your storage to play? You know the bins that you put your storage in, things like that, and how do you optimize as a system? And there's different tradeoffs for different use cases. So what SNIA does a lot of is diving into the use cases and how people use storage and apply these standards and come up with ways, kind of meta-standards, of how the ecosystem can apply all this stuff in ways that will be, you know, meaningful and interoperable. So that's another big thing that Cia does.
Speaker 1:And, I think, the collaboration opportunity too, the fact that this also ties into secure right. So data that will be at rest in an archival format. How do we effectively encrypt? Decrypt like safe storage? There's many, many things and those same people are in the same community. So big difference.
Speaker 1:When I was a practitioner coming up even just in sysadmin and systems architecture the fact that I was looking around and all I saw was people on my team I was like, why aren't we sitting beside the network people or the storage team? And when we did that, the collaboration was so much better. And then we architected with that in mind, and I think that's the other thing too. Is that the fact that as we think of each new development, it's like who else can I tap here in this incredible group of folks and group of companies that can probably save me some time, because they've likely gone down the road, or at least, and we find out that other people could be collaborating and moving faster, but moving. Let me tell you one of the things I saw automotive storage. Now, this is cool. Automotive data storage. I, uh, we forget that. All these cars with this LIDAR and all this good stuff. There's a. There's some non-persistent and probably persistent stories that's rolling around. So what's happening there?
Speaker 2:Well, I think that's a great question and actually a lot is the answer and it's going to grow. There's obviously all this effort in the industry, in the auto industry, to have autonomous vehicles, and the amount of data required for that type of autonomous vehicle is phenomenal. You know it isn't just do I have access to the GPS it's. You would need very precise mapping data on where roads are on the vehicle's drive, right in the vehicle. It's, in fact, called a data center on wheels, in terms of not only the compute performance and also the amount of storage capacity needed to store this to the tune of, you know, terabytes of storage in the car is kind of the direction things are going, you know, today. Maybe it's not quite you know the number of terabytes yet, but it's increasing right.
Speaker 2:There's a lot of goals in terms of, you know, storing the you know black box data you know what happened right up to an accident or the telemetry data of the car what's is it performing correctly? You know what are the diagnostics and that's only going to grow. And especially, as you you know, move more into autonomous vehicles. You, or even with AI, as it relates to autonomous vehicles, you have to have documentation stored on the car. How did the car make to reach a decision that it was OK to go through this, this traffic intersection, for example?
Speaker 2:Was something wrong with the AI or did it do the right thing? For example, was something wrong with the AI or did it do the right thing? You know, because you know, some days there's going to be an incident, I'm sure, and there's going to be legal discussions and court cases, and if you don't have the data, then you're certainly going to have a problem, but those who have the data are going to be able to support. Hey, my autonomous vehicle did the right thing or didn't, whatever the case may be, but a lot of growth in that area, which side of the airplane?
Speaker 1:Right, right. Exactly Half of the transaction is unfortunately going the wrong way Right.
Speaker 2:Very good point.
Speaker 1:But just think of that. Like, how many times have any of us gone on a like even just as silly as going on a trip? And you, like I go I was in the shenandoah, you know mountains, going hiking and they warn you they're like I'll download your maps because otherwise you're gonna head into a spot and there's no access, the cell phone is useless, so it's wild, and that's a small thing because I'm in control of everything else. But what if it was?
Speaker 1:a system controlling. That required that data to be real time. And how do then we reconnect and resync so it even goes beyond not just store, but transport, right? So now it's like how do we move data between these systems again? Well, look at that another data focus area, like there's a lot of this stuff that comes together.
Speaker 2:So sorry, Jason, keep going. No, please do. I mean you're. What you're saying is absolutely correct, and it's because of that latency of the cell phone that you have to have the data stored locally, and that's going to be. Those are very large files that are going to be stored on your car. And just, for example and it isn't just cars, right, we think of autonomous vehicles because that's in the news. But what about autonomous tractors or trucks or those kinds of things? It's a similar kind of thing. It's some kind of an autonomous vehicle that's going to need a lot of storage illustration of the intersectionality of all these layers and interest areas is that when Jason was talking about the.
Speaker 3:AI part of this. With autonomous vehicles, the training data for the models is notoriously large and getting larger. And not only that, but I've been talking to people recently that you have to if you want to do that forensic analysis after the crash or you went, you know it went through the internet, ran the red light, or whatever. You have to keep the model data around for as long as there's any car of that model on the road, which means you, you have to, you know you have to save this stuff a long time, which kind of which segues into the archival storage problem. It's like not only are we creating a bunch of data or massive amounts of data, we have to save more and more of it. So it's anyway, just a point.
Speaker 3:I thought, of as I was listening to Jason.
Speaker 1:We will be talking. You know, I've been talking since the 80s about a paperless society, and now we're going to be talking about, no, no more spinning plates, it's all going to be NVRAM and SSD. And it was like but all of these generations have to continue to live on. And it's also a bit of like Jevons paradox or Jevons paradox, where once we make something so economical to use, it will rise in popularity and it's like the storage that we thought was going to go away. All of a sudden it's like oh, it's incredibly valuable to have this.
Speaker 1:You know, I remember even seeing hsm systems like the fact that you could automatically defer some data down to a lower tier, to cheaper storage, and this was like an incredible, magical thing in the 90s and we just take for granted, like that's, you just expect it, you know.
Speaker 1:But now it's coming to s3 and to other technologies. They're like they're rediscovering the problems we had with previous generations of technology and media. And then, as that idea comes forward again, it's like well, let's not reinvent the wheel, let's look to the standard of the wheel and the people that built the bloody wheel Cause they're all in this room with me and and we can do what's next, by knowing how we got to here. So that's my people go to school and learn what a learn, what f-sock is. The moment you see that, that that thing come up and you just the fear enters your soul of like I hope this hard drive comes back. Let me tell you, uh, f-sock. I replaced one letter in that word a lot of times and it was yeah you never think that it's going to come out positive.
Speaker 1:So there's a lot of crossing of fingers, and we can't cross our fingers when it's ai powered learning ai powered active.
Speaker 1:You know movement in in technology and buses and trucks and everything and planes. So yeah, I'm it's wild, so sorry I took far too much just enjoying the room of intelligence that I'm surrounded by here. So for folks that did want to connect, obviously I would encourage anybody as a practitioner. Take a look at SNEA. Go to SNEAorg. Lots of great information there. Websites continue to get updated. We've got events coming up. We've got SDC, we've got regional events.
Speaker 1:There's a lot of ways you can interact with the community. These podcasts are all over the place. So we're on YouTube. We're on every podcast app you would like, and then there's also all the previous SNEA events. A lot of the videos are on YouTube so that you can get to them through the site and then as a company, it just makes sense. I've seen brand new startups that came up with an idea that is no relationship to physical storage, but I bring them to the room and they realize they're talking to software engineers who built these HPC platforms and they're like oh wow, I just realized I see this use case and they learn something like, ah, we aren't really just a cloud company, we're a cloud company that has to understand storage Right and it all comes together. So, for folks that did want to reach you, what's the best way to do that? We'll start from the top, jason.
Speaker 2:Probably you know I do a lot of things in SNEA. In fact I'm probably on several SNEA webpages. You can certainly look me up there. I'm going to be at the regional SDC, Denver that you mentioned, and also SDC in September, so happy to connect with anybody at either of those events.
Speaker 1:Fantastic and Dave.
Speaker 3:You can find me at SNEA. Probably the most active thing I'm involved in at snea right now is the dna data storage alliance, so I'm the co-chair of that and you can find me there. You can, of course, uh find me at uh davelandsman at wccom, my corporate, uh corporate address.
Speaker 1:So fantastic, yeah, and we've got. We met with uh, we, we had a podcast with the DNA some DNA storage folks. It was really, really cool to dive into that. We've talked about SaaS and space. I'm the luckiest podcaster alive. This is the greatest job ever. Craig, give me a quick wrap. Why should somebody care about store and and why is this data focus area? You know an important piece of the puzzle, and then, of course, give yourself a shout out on the way out.
Speaker 4:Sure. So, obviously, as we touched on the store area, or storage, is the key component with Forcenia, and we're not anything if we're not actually storing the data. So, you know, if you look at all the different ranges of types of storage archival, short-term, long-term, persistent you know SNIA does work with all of those and you know there's very interesting things going on. We we you mentioned the, uh, the graybeards. We do would love to have some new people come in, uh, the people who are coming out of college. Don't forget about storage. It's a very interesting, um, uh, field of discipline. Um, you know it may not always get the get, the um, the uh, the attention, but it's always there. That's the thing about storage it's always there and every system requires it, and so, if you're looking for an interesting area to explore, storage is a good one. And, having said that, I think all of us would say you know the careers that we've had have been very interesting it's. You know technology is always interesting and when you, when you add in something that's something that's impacts people and impacts systems as much as storage does, that you know you have a higher bar that you're looking at as well.
Speaker 4:As far as how to reach me. I think you wanted me to get a little bit of information on that. As Jason mentioned, I also am part of the TNIA TC. I'll be at the STC events coming up. I'm also on the NVME board of directors so you can reach me through NVME as well, and you know I tend to be at a lot of the conferences so you can always find me there. I'll send out, I'll tell you my email, even though probably nobody will probably remember it. It's craigcarlson at amdcom and you're always willing to, or welcome to, reach out to me at that as well.
Speaker 1:You're always welcome to reach out to me at that as well. Fantastic, and yeah, the events are great. It's a huge opportunity. And while storage may sound not as exciting as some of the other things, let me tell you, as the saying in marketing goes, just rub some AI on it and you're going to find out real fast. What you are about to hit is the real problem.
Speaker 3:Ai is not a come on, you don't want to scare the engineers away with too much of that marketing, but definitely that the the impact of storage on the future of ai is not insignificant.
Speaker 1:It's we're, we're, all you know. We love watching them pull things out of you know fancy kitchens and put them on, and the fine folks of the GPU makers are bringing new things, but in the end, where does that data live? Right here, so get on into SNEA, or else you're going to be left out in the cold, all right.
Speaker 1:Well, thank you all for joining today. And again for folks, do head over to the SNEA Experts on Data podcast Hit subscribe in your favorite app and I'll even say smash that like button or do whatever you're supposed to do, whatever the Zoomers tell us to do on YouTube. But get to an event and looking forward to hopefully catching up IRL with all you fine folks and thanks very much for your time today.
Speaker 4:Thank you, thank you.