SNIA Experts on Data

How Standards Make DNA Storage Data Center Ready

SNIA Episode 28

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 47:56

DNA data storage is moving from a fascinating research topic toward something that can eventually live alongside enterprise storage in real data centers, but the leap is not only about chemistry or sequencing speed. It is about manageability, interoperability, and standardized APIs that let operators monitor, provision, and troubleshoot systems at scale. 

In this conversation, Vincent Franceschini, co-chair of the SNIA DNA Data Storage Alliance and Richelle Ahlvers, long-time chair of the SNIA Storage Management Community and primary author of the SNIA Swordfish® specification, connect the dots between a new storage medium and the standards-based management that makes it operable at scale.

Archives are turning “active” because of AI, and long-term preservation is back in the spotlight. DNA data can become a durable data center technology if standards are built early, models are shared openly, and the community aligns on terminology, lifecycle operations, and monitoring so tomorrow’s archives remain readable and manageable for decades.  

SNIA is an industry organization that develops global standards and delivers vendor-neutral education on technologies related to data.  In these interviews, SNIA experts on data cover a wide range of topics on both established and emerging technologies.

About SNIA:

Why DNA And Swordfish Belong Together

SPEAKER_01

All right, folks, it's my favorite time of the day because it's experts on data time. Very excited because today we're going to be talking about DNA and swordfish. And you're probably saying to yourself, wait a second, those sound like entirely unrelated to storage topics, but in fact, both are very, very related to not only current storage, but the future of storage. So I'm super excited. We've talked a little bit about this subject in the past. So this is a great chance to kind of refresh, see where we're at. And I've got two amazing folks. So before we get started, I'll ask uh, oh, of course, my name is Eric Wright. I'm the host of the SNEA Experts on Data Podcast, also the co-founder of GTM Delta and a very proud SNEA community member. And uh I've also got with me today, I've got Rochelle. So, Rochelle, you want to give yourself a quick introduction?

SPEAKER_02

Sure. Thanks, Eric. Uh Rochelle Alvers, I uh have been the chair and primary uh author for the Swordfish specification for almost 10 years now. And um, so we're really excited as we move the spec forward. Um, how do we bring in new features and functionality? And that's really what we'll be talking about here today.

SPEAKER_01

Excellent. And uh Vincent's joining us as well. So, folks that are brand new to you, Vincent, uh let's uh give an introduction.

SPEAKER_00

Sure, thank you, Eric. So I'm Vincent Franceschini. I'm the uh co-chair of the uh uh DNA data storage alliance, you know, community of the uh of the of NIA. And uh definitely uh uh happy and and very uh thrilled to be connected with this discussion because for us, Saltfish is a is uh is a very good example of the kind of things that the DNA data storage uh um community need to embrace, you know, uh in order to prepare for the future use of our technology in data centers.

SPEAKER_01

And there's certainly been an the pace of innovation that we're seeing in the great wide world that's going on right now is almost tough to watch, as we say, the the folks that have got red eyes from keeping up with the the situation. It's there's a lot going on. But what I always love to tell people is the reminder that what we're doing today is made possible because of very solid foundations that we know we can depend on, which is why being in the SNEA community has been such an incredible part of my own career, and just leaning on the folks that are telling these stories and they're building and innovating together from a standards, you know, first approach, as well as a collaborative approach. This is one of the rare spots where we see potentially competitive, you know, company brands that all come together because we all see the opportunity to move the world forward in a positive way that we can, you know, measurably add value to what enterprise and and even consumer technology is able to do. So, Rochelle, given that this is your area coverage and Swordfish

Swordfish On Redfish Foundations

SPEAKER_01

has has been uh in your backyard for quite a while, let's talk about your view on why this is an important both topic and time.

SPEAKER_02

Okay. Uh great. So I want to follow up on one thing you just said, which is you know, building on solid foundations, but also being able to move quickly. So this is something that folks don't necessarily think of from a standards organization, but Swordfish, um, we build on top of the MTF's Redfish, so we focus exclusively on the storage applications. Both Redfish and Swordfish release updates up to four times a year. And so we we react we can react very quickly to new features and functionality. Um, so that's kind of one of the, you know, how are how is this relevant? Um, the other piece is we're focusing on manageability here. And um, one of the things that is really key about standards-based management is it allows the companies working in these spaces to just, you know, to focus on where their actual value add is instead of treating their entire management stack as value add. If you can, you know, just level out the parts that everybody's doing, and then it gives you more time and more ability to focus and deliver value add functionality. So that's really what we're trying to do here with as we expand in the DNA, right? So the DNA Data Storage Alliance has been a part of SMEA for a couple, three years now. We know it's taking some time. This is a this is really cool technology, which we're not gonna get into that much today, other than to talk about the manageability aspects of it. But if you do have a chance, go watch some of the materials on it. It's fascinating. Um, but because this is going to take a while, we can increment our way there as well with you know what what was a basic model look like? Um, what are this, what are the standard components that the DNA um data storage alliance is driving in all of these different kinds of uh uh devices and across different across multiple companies. And then I'm coming in with this viewpoint of how can I make that match the models we already have and just do it as incremental uh as possible. And so that's kind of where we are. Um I'll let Vincent in a minute talk about what the components are we're talking about here, but what what we've done so far. And we have out on uh the SMEA website currently for public review a draft that highlights um you know what these objects would look like and how we would model a device. Um, what we're really looking for now, though, is a lot of feedback from um other folks either you know working on something else or working in the DNA communities as well, saying, hey, this doesn't quite match what we want to do. How how can we change that? Um we're very early in the cycle right now. We're looking at at you know, uh modeling for various components, getting the naming, naming to me of components is is trivial. I want to make sure we've got the right components in there. And then um, you know, saying, do we have enough for people to get started? And then we can add more as we go. Um, so you know, again, we can turn these specs out up to four times a year. You know, we can we can have stuff something ready to go um before SDC this year, uh, depending on how much input we get back. If we're getting input more slowly, it'll take a little bit longer. But we we do expect to to try and roll this out um and roll out the models here uh as quickly as we can. So with that, I'll I'll let Vincent, I'll ask Vincent to talk about what's going on in the DNA um data storage alliance side

Manageability Challenges For DNA Storage

SPEAKER_02

of this work.

SPEAKER_00

Uh thank you, Richel. So yeah, uh uh let me just say that um the the work we we've we started to do with the Solfitch uh you know uh technical work group is really a way to inspire the DNA um data storage community to embrace all the topics related to manageability. Why is that? Um so with DNA data storage, you know, beside being a very exciting topic, as Richel just mentioned, you know, there's so much stuff going on in that community, you know, on the scientific front and technology front. So I invite uh everyone interested to take a look at our uh pages on the on the on the SDI website. But besides that, um there is um uh uh a tremendous uh new ecosystem of technology and equipment that is about to be combined together to enter in the data center, you know. Um unlike, for example, just to give you some some different perspective, unlike other storage technology, um where you basically provide a recipient and you pull data into it, you know, with DNA, you build the recipient and the data at the same time. You know, completely different perspective as to uh what you need to to consider moving forward. And obviously, everything, all these resources need to be harnessed in a in a defined, in a standardized environment. And that's where all the discussions around manageability makes a lot of sense. But another key point, um we're about it's about data, it's about writing data, it's about reading data, you know, from this new medium. So with DNA data storage technologies, the write path and the read path, you know, include a completely different set of technologies. On one hand, you synthesize DNA, on the other hand, you sequence DNA, you know, different types of equipment, you know, that would need to be combined in order to provide uh um an automated way to access data like with any other uh uh storage media. So all these equipment and all the stages that it takes to go from the original uh digital file, you know, in order to make it uh on the DNA, you know, uh medium and then back into uh uh a digital format, all those steps uh need to be well understood, um well mapped into a process that can then be monitored in a standardized fashion, you know. Um the same way the resources that will be involved need to be identified and monitored in a in a standardized fashion. So we all need kind of common model to do all this, you know, whatever way you implement it at the end, you know, as individual uh uh manufacturers of equipment and providers of solutions uh based on DNA data storage. So the work with uh the SaltFitch Twig is an excellent exercise and segue into considering everything the DNA data storage community has to take into account in order to provide a very interoperable environment and set of product solutions, you know, that can walk into the data center and be integrated in the greater picture of data storage, you know, for data centers. You know, so um I'm I'm I'm really excited that you know we we have this opportunity. Um it's uh I would say it's it's it's kind of heavy lifting work in a certain way because um we are discovering um a lot of new topics, like I like I mentioned about the new types of equipment that needs to be included in the management equation here. Um uh and uh the more we discover what we need to do, the more we realize that uh there are some um elements of of the management environment that have not yet been taken into account. And some some equipment might require some uh additional chilling equipment, others, you know, some some uh uh environmental uh focus, you know, to be uh you know included in all the parameters that need to be tracked with a with a standard like solfish. So again, um we we've made a a first attempt, first step at you know at creating uh what's required. The the DNA data storage community needs to embrace those concepts, potentially work on them, evolve them, make them bigger, you know, more detailed. You know, this is what's gonna happen in the collaboration work we hope to have, you know, with Social Fitch tweak, based on the feedback we're gonna get. And if it takes uh as as Richard just said, the better the more feedback we get, the quicker we get it, the more, the faster we can move. If it if we don't have that feedback, then we'll take our time um because this is a new topic. Um keeping keep in mind that not everyone in the data DNA data storage community has uh a data center background, you know, so we also need to take our time to bridge, you know, um those those uh different background uh uh uh participants, you know, in order to make them working on on the real issues, on the real challenges associated with the management of uh DA data storage uh resources. Uh but we are we are pretty confident that this is uh uh a great topic of interest for our community, and therefore there should be a great motivation uh to proceed with the uh with the work that we have ahead of us uh with the with SaltFish to make to make it a strong standard and to match, as Richel rightly said, to match the other efforts that have been made for other technologies

Environmental Needs And Data Center Fit

SPEAKER_00

as part of the uh saltfish uh framework.

SPEAKER_01

Wait, and in looking at the you know, just sort of the the raw primitives of of operating as far as like writing and reading data, that's that in itself, this is a fantastic use case for making sure that we have a standardized approach. We just simply there they're just different artifacts. Of course, they operate differently, which is probably what I want to jump into a bit, uh, Vincent, in what are different in the artifacts. Everything up to this point has been a variation on a theme, but generally with the same sort of fixed management patterns. It's you know, block file objects. We we mostly around capabilities. There's additional things around heat and placement, and there's a lot of lot of work around optimizing operational with existing traditional enterprise storage. So with this, you're not only mapping a new set of primitives and a new set of operational primitives, but now you're also the environmental operations are probably much more important in this than they would be in a sort of traditional magnetic or or or NVMe type of storage environment.

SPEAKER_00

So um yes and no. You know, um, if you uh if you look at what has been done uh in the DNA data storage community at large, you will find um uh a very diversified um technology background and scientific background, by the way, because there are very different ways of taking digital input uh and and and put it into DNA. So um technically there are so many different ways of synthesizing um uh DNA uh that can be applied to uh to DNA data storage, that in itself, you know, it's a bit like if we were having very, very different ways of doing SSDs or HTDs uh of the past. So uh and and and I'm sure the manufacturers are saying, but there are many different ways of doing this. But um, but the reality is that you know we are in the process of uh reviewing what all the different ways of doing uh of doing DNA data storage, you know, and writing, you know, synthesizing uh DNA, for example, and sequencing as well, uh, you know, uh DNA for the reading. And uh the idea is that there will be a common understanding of what's feasible, what's not feasible. And and by the way, that's one of the goals of the DNA data storage alliance is to publish on a regular basis, you know, all the progress of our community so that the rest of the uh the ecosystem that we want to be part of inside of the data center understand our constraints. Now, going back to your question about environmentals, you know, it really uh depends on how you do this. You know, I would think that um the majority of the community would aim at having uh products, technologies that can exist in the traditional context of a data center without requiring very special set of equipment, you know, and I think it's it's it's the same for um a lot of a lot of uh you know developers of new technologies that you want to you want to bring obviously um new new value add, you want to bring differentiation, you want to be, you know, because of what you build, but at the same time, you need to be integrated in an environment and you don't want to disturb that environment too much, otherwise, you know, your adoption is gonna be in question, you know. Um, so trying to exist in a traditional data center environment is something that we aim at you know delivering, you know, at the end of the day as a community and obviously as individual vendors as part of this community. So that you know, um trying to be different, yes, trying to not be too different is is also yes, you know. It's uh it's it's it's it it's important. Otherwise, you know, in terms of adoption, it it creates a kind of dilemma for the future adopters, you know. Is it worth it? I mean not taking too risk. What's gonna be the the challenges if I modify too much my environment just to make some room for those new guys with this new fantastic technology, but hey, this is so different that I have to do things so many so differently that you know uh the benefit is not there. So we gotta be careful about that overall. Uh we want to be part of the data center and not just be part of a data center extension which requires special humidity, special temperatures, special cooling, special this, special that. You know, at the end of the day, you create so so many uh you know requirements that the benefits are are gonna be in question. So we want to avoid that if possible. Um, and uh the discussion uh around um manageability and environment, you know, is is really also uh about uh having the possibility to um be seamlessly integrated. And that you know is is what we uh we want to do uh overall.

SPEAKER_02

So one of the things that we've been doing in these conversations is really taking,

Naming Containers And Modeling Resources

SPEAKER_02

here's all of this baseline of existing models. And as as the DNA team comes and says, here's this thing, I say, is it like one of these? Is it like, you know, is it like one of these? Is it like one of these? And so we can actually go in and take all of these components that Vince is talking about and match them up as much as possible, adding extensions where needed or changes where needed, but basically saying, you know, this kind of one of these is going to look a lot like a tape library. So, you know, um, and uh somebody else's maybe, you know, look look different, but we can separate out the logical versus physical modeling. And then like the cooling and and uh power infrastructures, all of those. Um, the redfish DMTF Redfish is working very extensively on enhancing all of those just in general. So most of those we just get a it's a drag and drop. It's a oh, you want these kinds of metrics over here? You need a cooling thing here. Whoop, we just drop it, rake it right in. And so we have a lot of building blocks to work from here. We just need to make sure we're getting to critical mass of under of folks understanding, you know, do we have this, you know, the these couple of these mod the underlying models fundamentally correct, and then that we can build on because we don't we don't need the whole thing now. We need kind of the fur, you know, the reporting and and um initial instrumentations and event notifications and the things that are critical when you first move into a data center, and then we can build off of those as we move forward.

SPEAKER_00

So it's uh what Richard just mentioned is absolutely right. We're not trying to reinvent the wheel when we don't need to, you know. So if we can be model-like uh another um uh type of resource in a data center, then great. So be in tape library or something else. Um, but when we need to um specify something that is very much about DNA data storage, like for example, uh the the resource um where you're gonna pour the DNA into, you know, when it's gonna end, you know. Um we you know as as the individual uh uh uh product uh providers, and you know, we we all have different jargon and we're trying to align that jargon. Something like, for example, are we talking about a DNA container, a series of containers, you know, all this is is is in the in discussion at the moment because we all come as the individual parties, you know, with with with with different perspectives on on how it should be shaped, named, and so on. So we need to align on that. That's so that's the work that is ongoing. And and uh the exercise with Solfish um you know pushes us as a community to to come to a conclusion and to a consensus, you know. So that's the good thing about this, because otherwise we could delay those discussions forever. But you know, in in a way, it it gives us the opportunity to talk about those very important things, you know, uh moving forward. And and and yes, um in the DNA uh data storage space, we we tend to talk a lot about the physical resources, but not so much about the logical resources. Um and and and and that's also um uh a great exercise uh for for our community to uh to get into because you know manageability is also about the logical path of things, you know, and and and that we need to. And the other thing is that unlike other media, um the DNA um uh uh data storage resources are not directly addressable as a storage entity, you know. Um so it's not like you can you have your disk drive and you can you cannot mount your disk drive, you know, made on DNA.

SPEAKER_01

You know, there's no fat table for uh DNA.

SPEAKER_00

Fat table, uh fat table maybe. Um but there isn't there isn't uh the equivalent of an LTFS, for example. Right, you know, uh in order to mount your your tape cartridge and see online, but you know, you you don't do those things yet with with DNA data storage. Maybe one day we'll get there, but you know that you know this is not the case. So we have some constraints as to how the resources in you know in play um can be addressed, managed, mounted, utilized, you know, decommissioned, um recycled, uh, etc. etc. So um we have to map the overall life cycle and and associated processes for merging those resources in a way that is one, compatible with the way we do DNA data storage, and two, in a way that can be understood by the upper. Software layers that are going to be tapping into our resource management systems, you know, based on uh the API that uh a Source Street standard can provide. So, you know, this is really uh a very interesting exercise, uh, as far as I'm concerned for for our community, because I I I can see it opening so many doors with so many questions that we need to answer in order to make progress, you know, in order to have the opportunity to become properly integrated in the in the data centers and all the various layers of of management that are required to run a data center and all um you know specific jobs to manage data, to manage storage resources and so on. So that's very important that we get there.

SPEAKER_01

Luckily, I mean it's a very it's a it's a niche uh style of of implementation, of course, given that this is not a uh a typical mid-market uh type of storage play. This is going to be used in very specific use cases. And I'll say there's a lot of enterprise storage admins that are glad that they don't have to learn what an illegal nucleotide pool is just yet. Uh, but how do we now sort of map the that those sort of storage artifacts as we know them today, you know, into how we manage and as we learn as we build out this new standard of storage deployment?

Reusing Storage Pools And Capacity Sources

SPEAKER_02

Well, I think that's where the logical representation really comes to play because we're using the exact same high-level constructs on and saying, okay, how does the DNA DNA has these things, let's plug these in. So you have um capacity sources. One of the things we have in in Swordfish as a kind of a richer model than base redfish has is we have capacity sources. You can create uh volumes, file systems, object stores on top of any form of capacity source. It can be memory, it can be hard drive, it can be tape, it can be DNA. And so um we're basically taking those exact same models and saying, okay, I have a storage pool, and my pool is made up of these kinds of things. And from that, I've got the attributes here that tell me what I can create. On the same in the same environment, you could have a giant pool of NVMe devices, so you can go be creating namespaces and and you know, exporting block uh uh uh files and even file systems on top of that, and and in the same place, have your DNA. You're using the same types of objects, you're using the same patterns for a completely different technology. Like I have a pool of these and I want to create an up a set of objects and I want to make them accessible over here is exactly the same series of operations at a high level that you're doing on traditional storage.

SPEAKER_01

So in that one, for people that are developing, you know, how much of the the core is there today, you know, for being able to do, you know, understanding what a simple post operation is. Today we know the again that that element has been defined, they have a certain set of attributes. What what work has already been done and and how far along are we in mapping those things to be able to be in product and and seeing it work live?

SPEAKER_02

So we basically took um uh uh one example. Um, and so both the common terminology that the DNA Data Storage Alliance is working on, and then we took one example and we'd love to see more so that we can see how well we've extrapolated this, and said, okay, um, here's this is gonna be an object store using storage pools, using um using capacity sources of type DNA, containers or subcontainers, um, or both. And so that was kind of our starting point. So we said, what is what would the the physical piece is gonna be relatively unique per vendor, but it's all common components. Um, so we needed to add uh a couple of things in there, a couple of new kinds of physical entities. Think of anything that's a fruit needed to be added. Um, but they're just added into our standard stack, the same way you have a chassis with an SSD inside it. An SSD can be modeled as a chassis if it if you wanted to, um, or it's just uh it's just a fruit so we've done the same thing and proposed some new models for that. Um, what we've done really is uh in the uh working draft that's on the website, there's a giant diagram that kind of shows all the different components and how they would cross-connect. Um and we've and we've also published mockups on swordfish mockups.com so you can kind of go in and see you know what this relatively simple um configuration would look like and how everything's cross-linked between logical and physical. So the physical ones you automatically get, you know, anything that was already there, you automatically get all of the event reporting and notification capabilities that are there. And as we're adding some of these new components in, we'll have to say what you know, what new specific events do we need to have for these? Um, but there's not really a lot of the rest to do, right? The the metrics that are there, the performance, like I've already said, the cool power and cooling, all of that stuff's already in place. So it's really just, you know, what are the absolute unique attributes?

SPEAKER_01

And with that, you know, Vincent, what do we have to prepare folks for as

Wet Lab Reality And Deployment Timelines

SPEAKER_01

far as introducing, you know, wet lab into traditional rack environments? What are the what are the footprints? What are how much of it's gonna be, you know, rack scale versus you know, like what are the the current deployments that you're seeing and where do you see this heading towards?

SPEAKER_00

So um let me start by saying that at the moment, um no vendor has uh announced a product that is capable to be deployed uh as a rack map unit, for example, or things like that. So there's a lot at play here. Um the technology we're talking about uh comes from a biotech background. Um so there are new elements uh that need to be taken into account, like um the biology that comes with the technology, um the uh the chemicals that potentially need to be uh leveraged and used in order to produce um some of the DNA. Um you know, it might be leveraged for as consumables, for example. So this this very different um uh way of thinking how to uh make uh make it possible. So I think it's gonna take quite a few years up until we see um DNA data storage in the in the format that we used to in a data center, you know, and we shouldn't be surprised about that, you know, because again, um uh remember that in the past we used to have heavy iron in the data center. They used to take you know uh a lot of uh uh uh square footage in uh of the data center. And uh and we were happy about it because they were delivering something uh very much of value that nobody else could do. You know, same thing is gonna happen with with DNA. We're gonna start potentially with um bizarre shape equipment, you know, thing uh when I say bizarre, not seen before in the data center, you know, that's the way you have to look to look at this. Um, and as we move along, we're gonna see the industry moving forward and working on what makes sense to gradually uh uh integrate as as has been part of a more traditional approach for deploying you know uh uh uh data storage uh equipment in a data center. So it's gonna take some time to get there. I'm confident, I'm confident that it's gonna happen. Trying to give you a safe timeline when it's gonna happen is is is uh is a bit of a crystal ball uh reading um at this stage. But let me let me start by saying that everyone in this industry is motivated to make it happen uh one way or another, you know, uh, and we see the benefit because at the end of the day, you were talking early on about use cases. Why are we doing this? It's because we see the need to better preserve data longer term, you know, that's one of the main motivations, and we know that there are challenges with existing media, you know, either economically or physically, you know, some of the technology today is not meant to last a hundred years, 200 years, you know, DNA can, you know. Um, so and we see already some segmentation in the like the data archiving um data preservation space, you know. Um, it's been segmented also because there are new ways of leveraging those data pools, especially now that we have AI, you know, taking data from all sorts of different locations uh to make it work and to produce new results that we couldn't do before without AI. So suddenly archives are becoming more active and the traditional archive media are becoming more active as well. So we see a lot of, for example, of archives based on magnetic tape, you know, being sucked in all sorts of um AI uh related uh, you know, uh data traffic, you know, um where they were not considered as uh as a potential repository for such kind of data pools before. Um so um, but there's still a need for long-term preservation, for making sure data is safe for very long term, you know, um and therefore uh there will be

Long-Term Preservation And AI-Active Archives

SPEAKER_00

room for other media, and and and DNA is certainly one of them. Um and and and and one of the reasons the alliance um also came um to SNEA is also to uh be connected with other discussions related to long-term data preservation. What does it take to provide the framework uh you know uh in order to manage all the media related to long-term data preservation in a way that it can be also standardized? So everything we talk about, all the all the focus and dedication on standards for the manageability with saltfish is is a good example as to what will need to be done as well um related to data preservation, because we would also need to leverage other standards um in order to provide that that continuity in the management and make sure that in a hundred years, you know, if you take um uh a DNA data storage device, you will be able to read it and you will be able to access the data on it. You know, this is not a simple task, you know. So beyond manageability, you know, there are also facets of our technology that also need to be standardized. And the alliance has already uh released a few papers about this. This is a starting point. There will be more to come. Um but but but it's but this is definitely uh uh uh for us a way of embracing um a world of standards that need to be applied to our technology world and uh in order to make progress. So um uh you know there is quite a long uh quite a lot of milestones ahead of us, you know, um in order to make it happen. Um but this is why this is so exciting. There's a there's there's a lot to do in order to get there, but uh I'm I'm confident that with this with our community and with the help of the the this near ecosystem, we have great resources at hand to uh to to make progress um and and deliver something of value that can help us to become a more standardized resource in a data center environment.

SPEAKER_01

When and I think that brings one of the most important things I find about this this community is the fact that any single thing that we're tackling, you know, whether it's manageability, whether it's you know, building the right APIs, whether it's building you know, underlying artifacts, physical environments, security, there's many, many, many working groups that ultimately can all participate in this shared outcome. And especially given that it's so fresh and new, there's literally is this is untouched powder, you know, for new areas of exploration. So I it feels to me like this is exactly the kind of community we need in order to drive this forward. Because if we believe it's just a performance and an optimization problem, and you don't bring in the security teams or you don't bring in the power and and you know operational management and environment teams, this is where we have risk that when people go along, it may feel like they're going faster, but they're going to run into a wall faster. And this shared delivery and the shared outcomes and the shared goals, it feels like this is the only, or say the most effective way to bring people together that have this broad view into all elements of what it really means to eventually see this as live on the ground wet lab sitting in storage racks in in real environments. So, Rochelle, this must be a beautiful thing for you, having you know built this community. And of course, Vincent, you've been a longtime participant and supporter of SNEA. So thank you both for what you've always done to drive this. Rochelle, how is this cross-TWG and cross-pollination really playing out well because of the way that SNEA has been organized?

SPEAKER_02

Um, so you know, I think it's

Sustainability Security And Partner Standards

SPEAKER_02

more than just how SNEA is organized, it's also how well we work with our alliance partners. So a big chunk of this is radically simplified by the fact that SNEA, with the storage experts, works on uh the storage-specific models, and Redfish can just kind of provide us a ton of base instrumentation. We don't have to focus on that. And it also gives you seamless management with everything. The other piece we haven't really talked about, but Vincent hinted at a little bit with the AI, is how do we bring all this together with sustainability? So we're looking at at what instrumentation needs to be there for, you know, Energy Star, what need what instrumentation needs to be there for sustainability across all the different medium. And all of that is SwordFish will be working on as well. Um, we we we do expect to see some new sustainability um uh efforts ramping up here in the next year or two. And it's not gonna just be I used my SSDs here, now I want to move them over here. It's gonna have to be a lot broader, and maybe that can also take us into a cycle of how do we abstract away more from you know, we have an archive here and an archive here and an archive here. Do we want to try and consolidate those down? If they're really being actively used, what's the best place for those to actually live? And are we gonna try and do any consolidation of the underlying data uh data lake as we're moving forward? Um, so what we've done is, you know, we have uh bi-weekly meetings with Redfish. Uh, we've brought in requirements from the green storage twig. We have uh incorporated requirements we've sent the requirements to and incorporated uh specs back from the security twig. Um so we have a pretty good history inside SNEA as well as outside SNEA of how we basically you know take other people's requirements and synthesize them back down into our specifications. We've worked with the Open Fabrics Alliance, we've worked with NBME. Um, and so you know, both inside and outside SNEA, we or uh most of the of the functionality we develop is in partnership with one of those groups.

SPEAKER_01

And uh this is community done right, and it especially at the pace we're moving. I'm excited by what we can do now. Like we literally in the last three months, much of our industry on the software side of the world has been uh it feels like it's been up-ended. Like in reality, the broader software space is still exactly where it was three months ago. But if you watch the news, it would feel like it'll all be gone replaced by a lobster-based software tool uh at some point. But it's all possible because of what's been done in these in these development of standards and and the the hard yards, you know, we just think right-click deploy volume, you know, is it'll be right-click deploy olige old nucleotide, you know, whatever the the the artifact is gonna be, that we can explore this so much easier now, and it's so exciting. So, Vincent, given that you've made the move from the storage side of the world to now living in the bio world and that merger between, what does it feel like to you to sort of change camps a bit, but also to bring your history and understanding of the standards focus to what can be done in that world?

SPEAKER_00

You're right. I I I started to work with uh storage, uh traditional storage matters, you know, uh back in the back in the mid-90s. Um, so um definitely um you know uh switch switching to the to the biotech side of things. Um I see certainly um uh some some fresh air coming my way because um in a way um the the the biotech guys involved in DNA data storage, they're asking questions that we should keep asking ourselves, thinking about storage at large, you know, you know, um and and and manageability is part of this, you know, and and how fast can I go? How big can I go? How secure I am, you know. In the DNA uh data storage community, we have guys focusing on data retention, biosecurity, random access. One thing I didn't mention, and and that's part of the excitement. So this is why I'm talking about a bit of fresh air. When we talk DNA data storage, we talk about storage,

Biotech Meets Storage Standards Mindset

SPEAKER_00

but there is also a compute chapter to what we do, you know. Um, because DNA uh focused you know technology can also be leveraged to handle certain um uh computational work, like for example, to accelerate uh data search. You know, um it's it's not ready for uh you know prime time consumption or this kind of technology, but it's it's it's in the it's it's there, it's part of the community, and there are discussions about it. So um it's it's it's you know in a way you you revisit your classics, you know, whether from a completely different angle, from a completely different through a different prism, and at the same time, you bring tons of innovation, a great scientific community backing up all these developments because you have to remember DNA is also leveraged outside storage, you know, applications. So there's tons of ideas coming out uh outside the data storage community, um, you know, related to DNA, and it's a great source of inspiration as well. You know, um, you have to remember that one of the reasons we're thinking about DNA as a as a as a medium for data storage because we still can read very, very old DNA uh that we can find you know underground, you know, when you start digging out, you know, prehistoric you know uh artifacts and so on. So it's it's it's it's it is something uh that it's very exciting. As I say, it's a it's a way of revisiting your your classics about about data storage, you know, and and apply them to a brand new environment. And at the same time, there's so much um technology innovation coming out of this that um I think it's it's a great opportunity to transform the way IT has been conceived so far. Um uh behind the data DNA data storage community, there's also what you could call a uh a molecular IT community that is working to see how the molecular world you know can also help transforming IT at large. DNA data storage is a first step towards this, and obviously we have to be successful at this first. Um, but but but there is there is hope that it could be a lot more uh than just storage, you know, behind uh the use of DNA.

SPEAKER_01

That's it. It's a it's a beautiful bi-directional sharing because we will find that we likely learn, we think that we're teaching those those crazy biotech kids something. It's like, no, no, no, we're gonna learn a lot about like, oh, these actually may map over to things that we can do. So yeah, it's an exciting time as we're currently sending a few folks for them, one being Canadian, proud to proud Canadian here, around uh a little trip around the moon for the first time in a in a few years. Uh, allegedly, that's for my my my friends who are still conspiracy theorists that don't think we got there the first time, but we're definitely there. And this is a time of excitement. Like this is a time where we can say our ideas of the past are becoming products of the very, very near future. And we can be a part of what's being built. And the reason why it's so important to do it in this way through a community that has this cross pollination and and shared outcomes and shared goals is that we do the right things about clean deprecation and making sure that standards aren't just for the first launch, but continuous operations. And so So it's it's exciting to watch these worlds come together. And in a world where unfortunately people are mostly celebrated by GitHub stars, you find folks don't always get the star that you get to put out in the world. But you're you are my two favorite GitHub stars today. I can tell you, thank you for spending time and discussing this. And so, Rochelle, for folks that want to find out more, what's the right place to point them towards in the SNEA community site? And and we, of course, will have some links in the podcast as well that we'll add so it makes it easy for people to click around.

SPEAKER_02

Yes, if you go to the SNEA.org website and there's at the top um the about data focus areas, the dropdowns there, go to the technical work and look for the open for public review page that will have the uh modeling DNA data storage through sortfish, uh, that draft model that I talked about. You can always find links on SNEA.org slash sortfish as well.

SPEAKER_01

Fantastic. And Vincent, for folks that uh want to connect with you and and learn more or participate in in the effort,

Public Review Links And Upcoming Events

SPEAKER_01

where's the best place to find you uh going forward?

SPEAKER_00

So um same same starting point. Go to the um SNEA uh top page, uh the SNEA website top page, look at the uh group menu and uh um go down to um DNA data storage. That's our our landing page, you know, the DNA data storage alliance. You will find uh uh a series of uh of papers that we've published already, you know, and um uh further description of some of our activities, you know, um uh the various work groups that are available, uh, and there is a contact uh link as well if you want to get in touch with us, you know, if you're not sure uh you know what what matters to you uh in that uh new uh uh technology web. So um don't hesitate, uh get in touch with us. Uh and if you're already a senior member, you could also see on the on the members page all the DNA-related groups available. So don't hesitate, contact us through the uh internal system uh and and uh we'll be happy to answer your questions.

SPEAKER_01

All right. Well, you gotta be in it to win it. And when it comes to the zettabyte scale of stuff that we're gonna be doing in our world, there is no doubt that this is the this is the path to get there. And thank you both. Uh, and of course, for folks, do check out more of these amazing experts on data podcasts, not just because I'm here, but because I'm I'm surrounded by amazing people. I I love this chance to to share stories. And if you got ideas, of course, drop a comment in any of the places where you find us. Uh, you can find us on your your podcast delivery of choice on iTunes and Spotify, as well as, of course, on YouTube, so you can see these beautiful smiling faces every every time. So uh thank you both, uh Rochelle and Vincent, and uh you all have a good time. And we'll see you. Oh, quick event notification if you're watching this now. We've got storage AI is happening very soon. That's gonna be happening in Colorado. Make sure you go check out the events page and the big one, of course, Rochelle SDC is uh tail end of September. What's the quick preview on why they should be already getting their travel schedule organized for that one?

SPEAKER_02

Oh, well, that's simple. It's the Swordfish 10th 10-year anniversary celebration.

SPEAKER_01

Yeah, this is gonna be a really, really cool uh event. And I have a feeling that this is gonna be the fastest growing new community. Uh, the oldest and newest community, because we're gonna see people really seeing you know the work that's being done and the amazing folks that they can collaborate with. So there you go. So check out Stia.org and we'll see you all on the next show.