
SNIA Experts on Data
Listen to interviews with SNIA experts on data who cover a wide range of topics on both established and emerging technologies. SNIA is an industry organization that develops global standards and delivers vendor-neutral education on technologies related to data.
SNIA Experts on Data
Serial Attached SCSI (SAS) Paving the Future for Modern Data Centers
This episode highlights ongoing SCSI and SAS technology innovations, demonstrating their essential roles in modern data management and infrastructure. As hyperscalers push for performance and capacity, legacy systems are evolving to meet needs while addressing sustainability, efficiency, and security challenges. Join Rick Kutcipal, Board Member of the SNIA STA (SCSI Trade Association) Community, as he discusses:
• The roles of SNIA STA Community and the INCITS/SCSI standards organization
• The latest SAS specifications
• Innovations in storage technology tailored for hyperscalers
• Sustainable practices emerging from technology improvements
• RAID technology advancements
• Prospects for SCSI in terms of security and innovation
The SAS Roadmap is a living document that can be viewed here: https://www.snia.org/groups/scsi-trade-association-sta-forum/sas-roadmaps to see progression in the standard and what’s to come.
SNIA is an industry organization that develops global standards and delivers vendor-neutral education on technologies related to data. In these interviews, SNIA experts on data cover a wide range of topics on both established and emerging technologies.
About SNIA:
All right, welcome everybody to the SNEA Experts on Data podcast. My name is Eric Wright. I'm the Chief Content Officer at GTM Delta and the host of the SNEA Experts on Data podcast. Super excited because today I'm welcoming Rick Kutzepel, who is somebody who we've been talking about. A lot of the work that has been happening in the industry around innovation and we've got work that's happening with the stay community. We've got stuff that there's collaboration around the work that's happening within SNEA as well. Great opportunity to have a great discussion. But before I get too far for folks that are brand new to you, rick, if you want to do a quick bio and introduction and then we're going to jump in.
Speaker 2:Sure. Thanks, Eric. So my name is Rick Kutzpel. I'm a product planner within Broadcom's data center solutions group and I'm also a member of the board of directors of the SCSI Trade Association.
Speaker 1:Now this is. It's always fun because I'm seeing more and more where. When we started, you know, while this is the SNEA, you know podcast, we talk with people from you know, ieee, with the Luskussi Trade Association and with many of the other standards bodies and communities, but folks are probably may not know the difference or what the origins are. If you want to talk about STAY and we mentioned T as well in in some of the chat we had prior to this let's, uh, let's unpack the, the acronyms for folks uh, rick and right, we'll talk about what's going on right.
Speaker 2:so I guess let me let me start with stay. Um, we're the scuzzy trade association, um, so we're part of snea um and we're consider us the marketing marketing arm for T10. T10 is an insights organization and that's where the real technical work gets done. They develop the specs and then and then STAY helps guide some of that and then and then promotes it within the industry. And it's not just SaaS really, it's, it's really all things SCSI, but we are branded as the SCSI Trade Association.
Speaker 1:It's interesting because everybody nowadays, when we talk about what's coming and what's new and the exciting industry trends, the first thing everybody does, of course, their AI starts dripping off their tongues. And it's all about all these new, fantastic, amazing things that we think we're doing but we're actually not necessarily doing at the at the level we think. You know, I think there's an over rotation to these really far-reaching, you know stuff that's happening today and it's interesting use cases, but we forget that there's still a ton of innovation that's going on in tried and true technologies that have been around for a while. So how does this come in? When we say SAS has got something new, people may be going really.
Speaker 1:So what is what is going on in the state community?
Speaker 2:Yeah, so. So that's. That's a good question. You know, the the SAS spec has, you know, has gotten to 20, 24 gig. Everybody's aware of the 24 gig revision of the spec. But what people don't understand is that the SCSI stack itself is is very layered and the very bottom of that would be in, in this case would be the SAS for the physical layer, and on top of that there are many different areas and that's where the innovation is coming to support features focused on the needs of the hyperscalers with capacity and rotational media, as well as the performance needs of Flash and it's funny because we we think of the scale at which we're we're working with, and quite often that's where people think is everything is going to be sort of bigger, better, faster, more.
Speaker 1:But it's also about, you know, consistent programmability. It's about consistent apis. It's about new ways to do management. You know a lot of the work that happens in the SNEA community around stuff with Swordfish, relating to what's going on with Redfish. So it's not just day-to-day operations of the metal, but how do we manage, optimize, protect, so, when we have work that's going on in the technical working groups and within Stay, what are some of those sort of features and factors that are being managed and innovated on?
Speaker 2:Well, you know. So first let me talk about, you know, the legacy of SaaS and of SCSI, right, SCSI has been around for a very long time. The legacy of SAS and of SCSI, right, SCSI has been around for a very long time. Um, SAS, since, you know, the beginning of the of the decade, uh, or the of the century, actually, Um, and and it's a very tried and true um platform. Um, you know very, you know very reliable, Um, it's been enterprise, tested and true. So, when it comes to reliability, manageability, serviceability, a lot of that is actually built into the specification.
Speaker 2:Moving forward, the real innovation that's occurring now comes in a couple different areas. Like I mentioned, one is focusing on the needs of the CSPs, or the hyperscalers, with their insatiable requirements for capacity and performance, and we'll talk about it in a minute. And it's, you know, performance isn't just, you know, fast data, you know fast reads and writes. It can get into latency and other metrics, Right, and then, and then on the you know, with regard to flash, right, the, you know the incredible read capabilities of flash and how. You know how, how SAS is dealing with that, and and there have been some innovations around that area as well.
Speaker 1:And even just you know there's so many things that we take for granted that are. You know, I'm an older fella, so when I came into, you know, computing in the enterprise, there were very low capacity drives. You know even just the idea of having different RAID patterns. It was sort of new, Like what was the reason why we'd use raid zero versus raid one versus raid five, and then raid six was just coming and it was like we were sort of wrapped around the axle on on how and when to use this stuff, because the use cases weren't really varied. But what we're seeing now is like the pattern of workloads is so fundamentally different than it was when I was coming through the industry in the enterprise days. So what does this mean now, you know, for SAS as a, as an architecture, with these really diverse and like high scale workloads?
Speaker 2:So good question. One example is the service level agreements that the hyperscalers have. So these big data centers will have very specific service agreements that they have with their customers, and so certain metrics need to be met, um, and in order, you know, in one, one thing that um T10 has done is they've implemented what they call command duration limits, um, and this is to help with the tail latencies of the of the drives, of the HDDs, of these very, very large HDDs, right? So for the most part, the, the average latencies of of an HDD are are very predictable, very, very, you know, very on average, very predictable. But in some cases the drive could be doing, you know, some garbage collection, doing another task, or have to do a couple seeks, and then what they call the tail latency becomes very large, latency becomes very large, and so T10 has implemented CDLs, or command duration limits, in order to handle this. And what it does is it gracefully fails a particular command if that latency gets too long, and there are configurable policies that are in place and so the user or the CSP can put those in place to be able to, you know, throttle or control that.
Speaker 2:And this originated from an OCP effort called OCP Fast Fail and there's actually a good publication. It's called Cloud HDD Fast Fail Read. It was published in 2018. And that was really the genesis of then. T10 took it and it was a genesis of CDLs, which was published in SPC 6. And starting to be deployed in today's data centers specifically to control the tail latencies. And there's quite a bit of information online if you go and search.
Speaker 1:Yeah, and it was interestingly, this morning I was literally listening to somebody talking about SQLite and sort of one of the challenges around rewriting it in Rust and one of the things that they were struggling with was this idea of like partial writes and handling partial writes because of tail latency in those io queues and it was. It's so apropos that here we are, you know, realizing this is being solved at a few different layers, but at the very bottom layer, like the safest, down to the metal layer that people need to worry about or be confident in, I guess, is what's been going on and seeing those innovations in SaaS.
Speaker 2:Yeah, and another one is about just servicing the capacity needs of the hyperscalers, right, and so everybody's probably familiar with, or at least heard of, smr or shingled magnetic recording. In fact, over 50% of the enterprise rotational bits shipped in 2024 have been SMR enabled Right. So that's pretty big. So SMR has become very large and just in general, in order to improve the aerial densities of the drives right, to get more capacity in the same footprint and T10, you know we. So we have this all the way back and it's in ZBC1. And it's been published for now I don't know, maybe eight or so years, so it's well established.
Speaker 2:The follow-on to that is what T10 calls format with presets. It's a very obscure name. A lot of times it will be referred to as hybrid SMR or next generation SMR, and what it does is it gives the user or the CSP the ability to format the drive either as a CMR drive or an SMR drive or even dynamically do it. So part of the drive is CMR and part is SMR Wow, and it's mainly being used today as a SKU reduction. So the hyperscaler, the CSP, can buy a specific drive and then, depending on the actual need, do they need CMR, do they need SMR? Do they need more capacity but then be limited to sequential operations, or do they need CMR drives? And so it's a very interesting one, and that's being rolled out today as we speak as well.
Speaker 1:What you think of it like. One of the things I'm talking to the hyperscalers myself like the biggest challenge is capacity planning and management because of really they don't know what workloads are coming, so there's a lot of like thumb in the air, guessing the wind direction, stuff. But giving this flexibility now means that diverse workload patterns can apply to the same gear and then now we can do get the advantages at the lowest layer. That ultimately ends up as being better SLAs, better performance profiles and, most likely as well, durability, because that means the longevity of the hardware is going to be ultimately stretching out the cost. There's wins all around, I would say.
Speaker 2:Right, no, agreed, agreed out the cost and there's wins all around. I would say right, no, agreed, agreed now, one thing I was curious.
Speaker 1:you know, sustainability is coming up more and more and I've been lucky, I've chatted with, uh, john michael hands has been on a a couple times in the podcast, and we've talked a lot about the sort of secondary factor. We don't necessarily say, hey, you know, the stuff you just just mentioned is like it actually does have a real strong sustainability angle, but not as a primary purpose, it's, it's a, it's sort of a bonus fries to it. But this is the advantage that, like we used to, you get a piece of hardware and it could do one thing, you know, and that was it. But now this is adding more flexibility and I would say then the environmental impact is probably much better because we're likely driving better power utilization, the use of those large cap drives, so that we can again, you know, stretch the need of the workload as far as possible without having to put more impact on the power, air conditioning and just the actual gear itself.
Speaker 2:Yeah, and that's interesting. I haven't really thought of it in that context, but that is a you know. I would call it a secondary value proposition of you know this type of technology.
Speaker 1:And then, as the classic goes, everything sounds great when it's going great until it's not. So stuff like RAID, rebuild assist and other things around the recovery and the recouping that let's take us through. You know what does this mean, for you know developments happening in that area.
Speaker 2:Yeah, you know.
Speaker 2:So, like I said in the beginning, you know there's been innovation on numerous fronts, and we just touched on some of the, you know, more HDD hyperscale-centric ones, but also with regard to the innovations just in performance with Flash, especially the read capabilities of Flash.
Speaker 2:One problem that keeps getting discussed is rebuild times. The drives are getting so big and rebuilds are pretty compute intensive process, and not only does it take a long time and have the vulnerability of having one drive out of your array, but also then the performance of the system, you know, is typically impacted. And so what T10 has done is they've implemented what they call rebuild assist, and this gives the drive the ability to communicate via SAS, you know, to the rate engine. You know about the health of the LBAs, specifically marking the failed LBAs, so that when it comes to a rebuild right, the rate engine can use that information and only then recreate, because recreating the data is one of the big ticket items, and if they only have to rebuild the specific things that are bad instead of everything, the rebuild process can be much more efficient. So that's one that's rolling out today as well, and you know.
Speaker 1:On another, oh, go ahead. So I was going to say and you know it's funny talking about sort of percentage of utilization, not that we're like sort of looking at market specs and market you know implementations, but again we kind of over-rotate to the new exciting, shiny stuff. I would say but like, what is the percentage of SaaS? You know, storage that's sitting out in these hyperscalers today is probably the significant portion of of a lot of their, their data layer.
Speaker 2:So you know and I haven't seen the most recent data but at the beginning of this year a report that I saw was estimating, you know, about 10% of of the of the capacity, the total capacities in these data centers is not SAS relay, is not on a SAS infrastructure. So 90, 90% is behind the SAS infrastructure. So remember that that's not only SAS. You know SAS drives because you know the SAS drives may not make up, you know that surely doesn't make up that much. But then all the all the ATA drives right sit behind a SAS infrastructure and you know that's part of the SAS spec is. You know the SAT layer and they translate between SCSI and ATA. And so all those ATA drives, all those near line high cap drives, are all part of the SCSI infrastructure.
Speaker 1:Yes, they say my favorite lines always. You call it legacy, I call it production. All this time we talk about the past, but the past is actually pretty happily working in the present and it has a long future ahead.
Speaker 2:You know, and the AI is the buzzword right, Everybody's eyes are spinning with AI and it is a very important uh technology and a very important transition within the ecosystem. Um, but remember, it takes a lot of data to drive these models and, while sas may not be like at the heart of the you know the AI machine, right, it's very important in containing all this data that then go to create these models.
Speaker 1:Well, and I would say this is where we will begin to see.
Speaker 1:In fact we're already starting to see this, you know and I always mispronounce it whether it's Jevons or Jevons paradox the idea that we make something so efficient that it becomes popular, you know, and that it rises in utilization, but the idea that we think it's just going to be like stable and age out, but in fact it becomes this new spot and we're seeing a lot of, you know, the hope, at least at the hyperscale level, that they can use things like S3, you know, for storing training data.
Speaker 1:And then we begin to see the work of stuff like cxl and sdxi. But at the underlay, what? What is the thing that we're still doing? The artifacts are still there at these hardware layers and we're getting optimizations and how we manage the way that you know, like I, just the fact that RAID, rebuild, assist, knowing that you can be aware of what the performance impact is of stuff and potentially, you know, run production workloads. Alongside these things where we're very laser, specific on what the rebuilds are versus before, like at these scales, it would be impossible to manage data centers if we didn't have the innovation that's gone on in that SaaS layer.
Speaker 2:Agreed, agreed, and it's been around for a very long time and it will be around for a very long time.
Speaker 1:So, looking ahead, rick, what's the kind of next things that we're going to see coming through on the STAY?
Speaker 2:community and, in fact, you know what do you see as some of the other collaborations, even at the human physical layer, and there's a lot of work being done in T10. T10 is working vigorously on a number of these things and there are a bunch, and so a lot of these specs take time to come out and to get done and done correctly and to get done and done correctly. So some of the things we've been talking about are complete, some are still being worked on, but some of the things we're looking at in the future. Security is a big topic, right, and so is there anything SAS can do in terms of security? Some ideas around key per IO, right. So having a key for every, every piece of data going across, um, that that's one that you know that's. That's something that's being tossed around.
Speaker 2:Um, um attestation, supportive attestation, um is another one um you know to be able to go be. You know SAS being the infrastructure that then all the media is sitting. You know the media that we've been talking about, um, is sitting behind. You know SAS being the infrastructure that then all the media is sitting. You know the media that we've been talking about is sitting behind. You know it's a good place for that type of thing. So you know, there are a number of things that you know that are being looked at.
Speaker 1:Yeah, and you know, and I just I picked on security because security, security is a hot topic these days and so as it should be yeah, well, we always, I always, you know sort of laugh, because we we had yeah, at one point somebody talked about there's devops was again, it was the hot new thing. Now that was really a new thing had been around since rad, you know development. We just renamed it, we called it devops, uh, and then all of a sudden you have this idea of devops where we called it devsecops and people got a little earth because they're like, why do you have to say devsecops, like it's implied? I said whoa, no, no, it was hoped, it was not implied, it was like no one did actually invite the security team, and that's another thing. With what's going on with the scuzzy trade association and inside sia. It was not implied, it was like no one did actually invite the security team. And that's another thing.
Speaker 1:With what's going on with the SCSI Trade Association and Inside SIA, you're surrounded by people that are cross group. So you have security pros, you've got the CSPs, msps and the hyperscalers that are participating. You've got individual hardware vendors, you've got memory vendors, gpu vendors, everybody's in this room and whereas we used to kind of go away and develop our own individual pieces and our pillars and then they would come together at the end with maybe APIs or SDKs, if you're lucky. Well, now you're literally having conversations with these security people so that, as you say, the idea of like doing key generation and dynamic you know, key per IO that's very much a thing that requires both sides of that conversation to be present out with our you know, our integration into SNEA.
Speaker 2:Something that I, you know I guess I didn't appreciate is, you know, that community around us to give us, you know, access to the experts in all these different fields. Right, that's something that I, you know, that I think a lot of us didn't really foresee, but it's very beneficial to us. And you know we are using SNEA, you know the infrastructure of SNEA.
Speaker 1:In that case, Well, there's definitely a lot of excitement ahead and, like I said, for anybody that thinks you know SAS has settled science, this is not the case and you know there's still innovations with LTO. There's still innovations in all these other different storage. You know layers. As much as I'd like to say that tape is dead, you know SAS isn't dead, it just smells funny. You know that was my old, it's a Frank Zappa line of jazz. But there is so much cool stuff that's ahead and so I'm excited. And first of all, rick, thank you very much for taking the time to share today and for folks that do want to get connected with you, what's the best way they can? They can get together with you and and talk with the folks on the on the stay community and and get connected with t10 as well.
Speaker 2:Yeah, it's, it's via, so I think the best way is via the Scuzzy Trade Association, via SNIA and and I believe we'll put that that, the, the link to our website at, which contains our roadmap and you can see the. You know, some of those innovations. We'll we'll include that link in in the in the proceedings of the podcast.
Speaker 1:Yeah, that's fantastic and actually that was a really good point. We want to make sure that folks do check it out, because being able to see those roadmaps gives you a sense of what's coming and also where they see the opportunity to collaborate, because quite often you know people. They think that they're heading towards some uncharted territory and you find out that there's a bunch of people already waiting for you at the bottom of the hill.
Speaker 1:You're like okay maybe we can save a lot of time, and I think that's. The innovation pace that happens in these communities is so much greater, in a quiet way, because of the work that's happening across companies who are technically competitors in the field. And yet this is a safe innovation collaboration space where we can meet amazing humans and learn about amazing technology capabilities. So I'm a fan. I'm a fan. Well, rick, thank you so much and, yeah, happy new year. I guess is about the time. This is rolling out.
Speaker 1:People may be hopefully spending New Year's with a glass of bubbly in their hand and a SNEA podcast on their iPod while they're listening. No better way to start your year than with great conversations like this, your year than with great conversations like this. So, rick, thank you very much. And for folks, of course, do check out this and other great conversations on the Sneha Experts on Data podcast. We've got the audio version and we've got this is also going up on YouTube so we can see these beautiful smiles happening. And thank you very much for taking the time with us today, rick.
Speaker 2:Thank you, eric, all right.