Runtime Arguments

5: Filesystems - So many choices

Jim McQuillan & Wolf Episode 5

When setting up a new system there's many things to think about including choosing the filesystem. In this episode, Jim dives into all of the choices and describes the benefits of each. Wolf is here to ask the questions, and to fail at keeping Jim entirely on track.

Show notes

Things to think about:

  1. For Linux, ext4 is the default for many distros and it's pretty good but btrfs has some really great benefits and you should consider using it.
  2. For MacOS, the default is case-insensitive for filenames.  This can cause some problems when cloning some git repos because within the repo there may be two files with the same name, only different by case. In a case-insensitive filesystem, you'll have a collision. BUT, turning off case-insensitive can cause other problems for some applications. Maybe the best thing to do is create a separate case-sensitive filesystem on a volume and use that for your git trees.
  3. Beware of the 'Year 2038' problem on linux filesystems before ext4.


Links:


Hosts:
Jim McQuillan can be reached at jam@RuntimeArguments.fm
Wolf can be reached at wolf@RuntimeArguments.fm

Follow us on Mastodon: @RuntimeArguments@hachyderm.io

If you have feedback for us, please send it to feedback@RuntimeArguments.fm

Checkout our webpage at http://RuntimeArguments.fm


Theme music:

Dawn by nuer self, from the album Digital Sky

Wolf:

Howdy everybody and welcome to another episode of Runtime Arguments, a podcast r about tech and tech related stuff. I'm Wolf. And I'm Jim. Today, uh Jim is going to be uh doing the talking. He did the research. Um I'm gonna be asking the questions. Uh it's a topic um I am not completely ignorant about, so some of the questions I ask are gonna be things I'm asking on behalf of the listener. Um we did get some feedback um to our previous podcasts, and the feedback was this um our podcast has uh what the uh listener described as a hole, and that hole is um we tell you about things, but we don't dive down into the how to do it. Um we don't show you actual source code, we don't um do anything visual. I mean it's it's just a podcast. Um it's not a YouTube video or anything, at least yet. Um I'm not exactly sure how we address this whole. If you have feelings that we need to dive deeper, let us know. Give us feedback. Um I uh I will tell you at the end of the show how to give us feedback. And um there are show notes. The show notes uh are along with the show where you got it in the RSS feed, um, so you'll be able to read those. Uh so these uh podcasts came out of, always come out of, mostly come out of, uh, just discussions Jim and I have over lunch. Um and a lot of these podcasts are gonna be broad and interesting to more people. And some of them are gonna be more narrowly focused and maybe not interesting to as many people. Um today it happens uh that we have a pretty narrow focus. Um if this is for you, I think you're really gonna like it. If it's not for you, don't judge our podcast based on a topic you didn't care for. Um I promise we will talk about things you want to hear. Um anyway, with that, let me give it over to Jim.

Jim:

Yeah, thanks, Wolf. Um file systems, how you store your data on disk. Now, I I I started in the Unix world back in '85. And at that time, somehow I was just really, really interested in how file systems works. Uh we were uh uh using SEO uh Xenix at the time. Uh we had a uh I worked for a small company, we did software for doctors, and we were getting into Xenix, and I got really, really interested in how my data is stored on the disk. I studied the documentation, and you know, the thing about Unix is there's there's a lot of source code out there. It wasn't an open source operating system, but we did have header files, we did have documentation about how the files were stored on disk, how the file systems worked. So I I did some digging. I wrote little C programs to dump out the the block, the the headers at the beginning of the file system, what's called the super block. I could dump inodes, I could follow the block, uh the chain of blocks of disk, and it was all really interesting. And then uh I attended a user group meeting. This was the Michigan Unix Users Group mug. Uh, we've talked about them before. Um, the very first uh meeting I went to, there was a presentation uh by someone who became a friend of mine, Sharon Kalani. He was talking about file systems and repairing file systems using the FSCK. I watched that presentation in complete amazement, and I realized these are my people. They're talking about the things I like to talk about or I like to learn about. Um, so as time went on, I didn't follow file systems quite as much. Uh I got more into databases and higher level things. But uh underneath it all, file systems were still there. Some of you, I'm sure, know exactly what a file system is, and some people maybe they're gonna learn something. I hope they learned something today. Um anytime you have a disk drive, whether it's a floppy, a hard disk, uh an SSD, uh, a CD, a thumb drive, you can store data on it, but that data needs to be organized and stored in a way that you can retrieve it. And that's what a file system does. It uh it basically is an index to the data, it organizes uh all the blocks for uh of your data. Um most of what I'm gonna talk about, I'm gonna be referring to uh spinning hard drives, but really whether it's an SSD or a thumb drive or even a floppy, it doesn't much matter. Uh file systems are block devices. Uh in Unix, there's there's basically two kinds of devices. There's block devices and character devices. The the difference is block devices, you you write blocks of data at a time. It might be 512 bytes, it might be 4,096 bytes. Uh dependent on the file system, it could be 64,000 bytes. Um but you're writing in blocks. A character device, on the other hand, is more like a terminal or a printer or a serial device where you're writing characters at a time. And if you're interested in which of your devices are block and which are character, you can just do an ls on the slash dev directory. And the first character in the far left uh uh column of the output is either a B or a C. Uh B, of course, is for block, uh, C is for character, and you might see other things in there. You might see a D that stands for a directory, an L stands for a link. Uh there'd be a symbolic link. Um anyway, hard disks are block devices. Um those block devices, uh, you take a hard disk, it might be, I don't know, 200 gigabytes, it might be eight terabytes or whatever. Um you can take that block device and put a partition table on it. Um and that's a way to break the device into multiple slices. Uh a typical file, uh a typical partition table would have four partitions. One of those partitions might be an extended partition, which lets you break that extended partition up into even more. Um but you you start by creating a uh uh a partition using Fdisk or G parted, or there's there's other utilities.

Wolf:

Um just to clarify, um right now the things you're saying are specific to Linux and or Unix, is that right?

Jim:

Yeah, yeah. Let me make that clear. I I'm kind of diving deep into the Linux file systems, but we will talk a little bit about the Windows and the Mac OS file systems uh towards the end. Uh but uh on those operating systems, you don't get a lot of choice about your file system. And on Linux, there's lots of choices, and we're gonna we're gonna cover those. Uh we were just talking about uh partition tables and partitions. So you you you uh partition your disk up into block in into partitions, and you can lay a file system on one of those, or really on all of those if you want. Um the neat thing about block devices and Linux and file systems is they're layers. You can take your block, you can take your hard disk, uh partition it, then on one of those partitions, you could just put a file system right there. Or you can put another layer before. Like one of the layers might be encryption. You might uh uh use uh Lux, it's L U K S. Uh that's the uh Linux unified key setup, or you might do DMcrypt, uh, and that's that's how you can put encryption on the hard drive. Then when you do that, that creates basically another block device. Within that block device, now you can lay a file system down, or you can do something like uh volume management using LVM, that's logical volume manager. You can uh break that uh encrypted uh block device up into multiple uh smaller block devices, and you can lay a file system on that. Um it can get pretty deep. Um in addition, I didn't even talk about RAID yet. Before you even get into partition tables, you can have RAID uh and RAID, a redundant array of inexpensive disks. That's multiple disks uh gathered together to look like a single disk or a single block device. Uh and you can do that at the hardware level if you have a RAID controller in your machine, or you can do it at the software level using uh Linux's MD, that's the multiple device support. Um so you might be looking at a block device that's made of many hard disks. Um maybe it's just a single disk. Um I like to use uh LVM. That's the default when you're setting up uh like an Ubuntu system, it's gonna lay down LVM and then your file system will be on top of that. Um what do you know what LVM stands for? Yeah, Linux volume manager. Or I'm sorry, logical volume manager. Um it's just a way to manage the block devices on your hard drive. Um one of the neat features of LVM is they have what's called uh thin provisioning. By default, it's thick provisioning, and that is if you create a partition or a or a volume that's say 200 gigabytes, it's gonna allocate that entire 200 gigs for your use as a block device. If you go with thin provisioning, it doesn't actually allocate that space yet. And I'll I'll tell you where that's really handy. Let's say you're a uh I know we don't have ISPs so much anymore, but let's let's take uh virtual machines. You're gonna you're gonna build a uh a server and you're gonna run um um XCPNG on it or some other virtualization software. And you know, when you build a VM, you have to say how big the disk is on it, and you might say it's 32 gigs. You could overcommit that space or over-subscribe that disk space. You could create hundreds of virtual machines, all of them with like 32 gigs of space, even though you don't have uh 32 gigs times 100 uh uh in your disk space. Thin provisioning lets you overcommit your space. And it only as disk as you need.

Wolf:

This sounds exactly like um a thing on the Macintosh, where a lot of times you make disk images, uh and then when you open a disk image, that becomes a volume that you can treat just like any other volume, uh an attached hard disk. But they come in several different um formats, and uh one of them is sparse disk images. And you say up front, my sparse disk image is starting at a you know 50 gigs or whatever. Um but it only takes up as much space as what you've actually put on it. So it sounds a lot like what you're saying.

Jim:

Right. So if if uh if you had a bunch of virtual machines all allocated 32 gigs of space, but they're not all really using that much space, you're you're you're not gonna run out to disk uh doing this. Uh the virtual machines that need the space will will grab it. You you can ultimately run out if all of the machines need it. Um but you don't have to worry about uh giving your virtual machines a lot of space and not using it. So you're not wasting the space, uh, if that makes any sense. So let's let's get into file systems themselves. Um with Linux, there's lots of choices, and I'll discuss uh uh a bunch of them here today. But a file system, what it does for you, I mentioned it it organizes your data for you. You have data you want to store, it's a file. Uh the file system gives you the ability to give that file a name, uh, to store that file in a directory or folder, if you want to call it that. Uh there's some metadata about the file, uh, things like uh the timestamp that the file was created, uh, the time the file was last accessed, um, modes, uh, and that would be uh the file permissions, whether it's readable, writable, or executable uh by owners, uh uh groups and others. Uh it also supports uh uh something called hard links and soft links. Uh the difference between those is uh a hard link is two file names accessing the exact same file. It's the same blocks on disk, it's the same inode. Um let me ask you this.

Wolf:

Yeah. Um it sounds like there's a lot of information on the disk that isn't the file. Um I mean, users are thinking about their disks, and all they think about is the file or the files, whatever they are. Um but I guess they don't really get the whole disk.

Jim:

Well, there's there's certainly overhead, right? There's the uh the the like I said, the metadata. Uh the the inode space is not great, but it is it is it does require some space. Um the the directory entries that takes space. Um and we'll get into more of this in a little bit. The uh checksums, uh redundancy, things like that, that all takes space. So you don't get all the space for your file. Um, but I I was just mentioning hard links and symbolic links, and hard links is two files, uh two path names pointing to the same file. Symbolic links is you have a file, and your symbolic link uh points to the original file name. So it's like an indirect thing. Um you can have as many simlinks as you want, uh, but the uh important thing is they have to be on the same file system. You can't have simlinks across uh I'm sorry, you can't have hard links across file systems. You can have simlinks across file systems, if that makes sense.

Wolf:

Um why would I ever use a hard link? Simlinks seem great.

Jim:

You know, uh yeah, simlinks, you know, like I said, I started playing with this stuff back in the mid-80s. Uh and at that time, SimLinks didn't exist. All we had was hard links. Umlinks, um, I I I I can't really tell you why you would pick one over the other. Uh the thing about uh hardlinks though is you can uh create a file and then you can link it using the ln command, and now you've got two names for the file. You can delete the original name, and the the new name is still there. And you can have as many links as you want. Um there's nothing special at any of them.

Wolf:

They're they're all let me ask you this about pointers to the same inode. Yeah. Um I I've followed when you have a simlink. I know I can do this thing. I can, if if I'm in a program working with that name, that path, I can resolve it and turn it into the real path, the path that actually leads to the file. Um but from what you're describing, it sounds like you might have five hard links all pointing to the same contents, and you know about one of them, you're working with one of these hard links. From there, can you get to the others? Can you find all the hard links? Is there a way to do that?

Jim:

I I I don't know if there's a tool that will go out and find all the all the directory entries that point to the same inode? Because that's really what we're talking about is multiple directory entries pointing to the same uh inode. Uh if you do a list of the directory or a list of the file, uh one of the things that appears in the directory listing is a count, and that count is the number of links to that file. There's nothing special about any one of those links. They're they're just directory entries that point to an inode that point to your data on disk.

Wolf:

So exactly like ref counted pointers in a program. You can't find the others, but you know they are there.

Jim:

Right. Now there might be some disk utilities that will help you find them. Um I think you can look for path names that reference a certain inode, if you know the inode number. Um but uh there's nothing special about any of those hard links. They're all just as equal as the other. Whereas SimLink- I didn't mean to derail you. Yeah, Simlinks, uh uh there's a special one, the first one. Everything else points to that. It's it's just a uh the the the directory entry is nothing more than a path name and then uh the name of the file that originally was called. That's all. So I we probably beat uh Simlinks to death here. Let's get into some of the file systems. Um on Linux, um the original file system was ext, the ext file system. That lasted and thinking you know, from like 91 till about 93 when ext2 came out. That was an improvement. Um it it served us well. Uh I never when I when I was in Linux, I never uh maybe I did use ext2. Yeah, I I I guess I did. But um ext2 has a limitation. EXT1 has this limitation as well. A lot of the older file systems have this limitation, and that is part of the metadata that I mentioned was timestamp. When the file was created, when it was modified, when it was accessed, those timestamps are stored in a 32-bit signed field. That leads us to the year 2038 problem. Y'all remember uh Y2K uh when when the clock sticked over from 1999 to 2000? Well, we're gonna hit another one of these in uh 2038, and that's when the 32-bit integer field that stores a timestamp uh hits the end and rolls over and goes negative. Um that's gonna be a lot of fun. Uh uh probably harder to understand than the Y2K problem. Um, because everybody could understand rolling from 1999 to 2000. Um, but anyway, we're gonna run into this problem. And ext2 file systems have this problem. Files that might appear to be uh uh modified on January 18th, 2038. Um the next time you you access it on the next day, it's going to be a whole different number. I think it's gonna go back to uh 1904 or 1901, something like that. It's gonna appear like the file uh was modified in 1901. That's gonna cause problems. Um so we have affix to that. This is really a problem with timestamps in general in in Linux. Uh, and it's been largely overcome by uh 64-bit timestamps, but a lot of applications uh haven't been uh fixed to use the the larger timestamps. EXT3 suffers from the same problem. They added a lot of nice features to it, but it still has that two year 2038 problem. Uh some of the features that they added to ext3 is journaling. Now, Wolf, do you know what journaling is?

Wolf:

It's uh As it happens, journaling is very important to me. Uh journaling for a file system, uh, I happen to know, is almost exactly like journaling in a database. The things that you are going to do and change in the file system, you send those uh to the file system, and not only does it make those changes, um, which may be buffered, but it remembers the list of commands that you are going to apply. And if something goes wrong, it still has that list of commands. It can do them again, redo them, undo them, various things like that. Uh, journaling is a great feature of databases and just as great in file systems.

Jim:

Yeah, thank you. Uh really important, it was added to ext3. Um another thing they added was uh indexing of directories. Normally a directory is on disk, it looks like a file. It's just an inode and a path name. It's an inode number and a path name for the file. If you have a directory with lots and lots of files in it, on older file systems, it's searched linearly. Lots of files in a directory, that can get very, very slow. So what ext3 added was indexing of the entries in the directory. So no longer is it a like a flat file, now it's an indexed thing that uh is much, much faster. So if you have a lot of files in your directory, greatly speeds that up.

Wolf:

So it's like directory, it's like you got a hash table instead of a list.

Jim:

Um yeah. Yeah, it's just uh I I I think they're they're using B trees and it just finds the entry so much faster. If you only have uh a small number of files in the directory, it's not going to make any difference at all. It might even be slower. But if you have a lot of files in that directory, it's substantially improve the performance. So that's uh ext3 uh gave us that. Um ext3, the maximum size of a file is two terabytes. Uh the maximum file system size is 16 terabytes. So that's a lot of space. Um I don't know. Do you have any files, Wolf, that are two terabytes in size?

Wolf:

I don't, but I don't edit videos. I know. I mean, there are big files, and in my work, um we deal with geography and roadmaps, and um we have some pretty big files, but on my local system, no.

Jim:

Yeah, I I I don't need much more than that either, but you're gonna hear uh some of these other file systems have much larger capacities, uh uh capabilities. So let's move on to ext4. Uh that came along in 2008. A lot of improvements over ext3. They fixed the year 2038 problem by expanding the size of the timestamp. Uh they didn't go to 64-bit timestamps, they just added two more bits to the seconds field of the timestamp, uh, taking it out 408 years uh to the year 2446. So that's kicking the can pretty far down the road. Uh I'm not going to worry too much about uh storing my files out in EXT4 file system and hitting that uh time limit. Um they improved on the journaling. Uh in ext3 uh it would only journal the data that was being written. Uh in ext4, they also uh uh journal the metadata, which is important. Uh the metadata being the the inode entry, the the times and stuff. They journal all of that so that if you have a power failure, if your drive goes down, uh when you come back up, you're gonna be in pretty good shape. So that greatly improves the reliability of the file system. Another thing they added was extents. Normally, when you or on older uh file systems, when you when a file grows, it would just go allocate blocks. The next blocks available on the disk, it would just grab those for you. Didn't pay much attention to where those blocks were on the disk. Uh if you had a bunch of, if you created a bunch of files and then you deleted some of the files, and then you created more files, or you uh uh updated more files, those blocks could be all over the place. With extents, it keeps blocks together. So when you allocate a chunk of disk for your file, it might grab a large number of blocks, and so that when it writes the data, they're in contiguous blocks, which is really important. Um we'll talk in a minute about defragmentation, and that's that's kind of the problem I'm discussing here. So ext4 helps minimize the fragmentation of the file system. Uh, they you know, we talked about two two terabyte files, ext4 bumped that up to 16 terabytes. Uh the maximum size of the entire file system, 64 zetabytes. That's one billion terabytes? That's a lot of space. I'm not sure if anybody is storing that kind of uh data, even even the big guys, the Googles and the Facebooks.

Wolf:

Uh is there any place where the actual media is big enough to hit those limits? Well, you'd have to have multiple drives, right?

Jim:

You're not gonna you're not gonna find one hard drive with that kind of storage space. But with RAID, um or or even with distributed uh block devices where it might be spread across many, many servers, uh you could format that as a single file system. So I could see having having many, many terabytes uh of a file system, but zeta bytes?

Wolf:

Uh I'd have to do some back of the napkin calculations figuring out.

Jim:

It's uh well what is a terabyte? Uh 12 zeros and uh and a billion of those would be nine more zeros, so 21 zeros after the one? That's a lot of space. Um nobody's gonna hit that limit, but this sort of points out that the the limit is so big nobody has to worry about it. Right. Uh ext4 is the default file system for Debian and Ubuntu. So if you load up a machine, uh that's likely what you're gonna get unless you ask for something else. Now there's a a newer kit on the block. Uh it's actually been around since 2008, but it hasn't been adopted by distros until more recently. And that's BTRFS. Uh, usually pronounced it butterfs. Some of the benefits this thing has are amazing. Uh defragmentation. We talked about uh when your files grow and shrink and new files get created, that your blocks of your file system end up all over the disk. So if you try to access your file, those uh if if you're thinking about a physical hard drive, a spinning disk, those heads on that uh on that device, they have to seek all over the place to find that data. Um that's called fragmentation. Uh ButterFS has defragmentation built in. And what that'll do is rearrange the blocks of data so that they're in a contiguous location. So you might be writing data all over the place, but over some time uh defragmentation should kick in and fix that. I know like on Windows machines, there was a utility to defrag. Uh have you ever played with those, uh, Wolf, where you get to kind of watch this graphical thing and it shows you all the data.

Wolf:

I think I have, but it's been a long time.

Jim:

Yeah. I I think it's much less trouble now of a problem with SSDs because those are randomly accessed. It kind of doesn't matter so much if the data is contiguous or not. But for spinning disks, fragmentation's a real problem. Uh something else that ButterFS added was compression. I mean, for a long time we've had various types of compression where you could take a file and you could gazip it or or zip it or, you know, whatever compression tool you want to use. You could compress it before you write it to the disk. And that's fine. But it's nice to have a file system that does that for you. So you're as you're writing the data, it's transparent. You don't know it, but it's using less space on the disk than you think because it's compressing. Of course, it's an optional thing. You can you can turn it on if you want, uh, or leave it off if you uh you'll get better performance uh uh by leaving compression off. Um another feature it's got a snapshots. Again, we're talking about Butterfs here. And a snapshot is um it uses something called copy on write, COW. You take a snapshot of a file system and it copies some of the header stuff, but it doesn't copy the blocks of data. So you've got two file systems, uh the original and the snapshot, both pointing to the same data. And you you might wonder what's the point of that. But the neat thing about snapshots is it covers a point in time. So as you start writing to the original file system, uh the the the snapshot change. Uh the snapshot doesn't change. The original blocks uh get copied, and your updates go to the to the new blocks. Uh so your original file, I'm calling it the original file system. The the the one that's mounted, the one where you're writing your data, certainly has all the updates on it, but your snapshot is still pointing to the original blocks that haven't changed. So it's uh if you're about to do a big update on your system, uh you might take a snapshot. And then you perform your update, and if something goes horribly wrong during that update, you can roll back the snapshot. Really nice. Uh like I I I was using VMware for many, many years, and that had that feature. Um XCPNG has that feature. So if I'm doing something that might potentially break things, I have a safety net. I can roll back easily. Uh so that's snapshots. Uh ButterfS has that. Another feature that ButterFS is deduplication. And that's kind of a neat feature. If you're writing a lot of files that have a lot of the same data, uh, and this is at the block level, um, maybe you have files with a lot of zeros in them, uh nulls, whatever. Uh deduplication will basically write one copy of that data and both files point to that block. Uh, can significantly reduce the amount of space that uh your your file system needs. Um anyway, Butterfs has that. Uh I believe it's offline though. I think uh if you want to deduplicate your drives, you have to unmount the file system, run the deduplication, and then mount it again. Uh but it's a way of cleaning up space without losing any data. Uh, it also offers cloning a files. I I I talked about snapshots. You can also kind of take a snapshot of an individual file, and that is a copy on write snapshot. So you snapshot the file, uh, you can keep on making changes to the to the first file, and your snapshot will stay there. And you can you can roll it back uh uh into the original, and there you go. Uh ButterfS also has the checksums on metadata and and the data. Uh it'll do an in-place conversion from ext3 and four to ButterFS.

Wolf:

You didn't talk yet about what a checksum is. Oh, and what it gives you.

Jim:

Yeah. I I mentioned them earlier, but I didn't get into it. Um a checksum is uh when you write blocks of data to the disk, you also write a checksum. And that's like a hash. Uh it's a fairly small number, you know. You you think of like SHA 256 hashes and stuff. It's nothing like that. It's a much smaller checksum. It might only be like four digits. And you write that to the disk, and then later on, when reads are done, it'll read the data, it'll read the checksum, it'll recalculate the checksum and compare uh with with what was on disk. And if they're different, you've got a corruption problem. Um ext4 has checksums, uh ButterfS has checksums, and the other ones have it as well. The the newer files.

Wolf:

So a checksum is not as good as error correcting memory. It's just error detection. Is that what you're saying?

Jim:

Yeah, it it lets you know if you've got a problem, which is very, very helpful, right? Um uh when we get into one of the other file systems, it'll actually correct itself, uh, which is kind of neat. Um it's all about storing your data in a safe way, right? Um ButterfS, you know, we talked about the the uh well, we talked about large the the the uh logical volume manager for creating volumes. Butterfs has that built in. So ButterfS will allow you to create a block device that spans multiple uh hard drives. Uh it'll do RAID for you. It it kind of does all that stuff without those other layers I talked about with LVM and MD and those things. Butterfs will handle that for you. Um and finally, uh the maximum file size on a uh ButterfS file system, 16 exabytes. I think exa comes after Terra. That's a lot of zeros. Uh anyway, 16 exabytes. That's a big, big file.

Wolf:

Um so I have a question. Yeah, and that is um of these don't seem useful to a normal human, but a couple of them, like defragmentation, compression, snapshots, cloning, those things sound like something I want. They sound like something everybody wants. Why why isn't ButterFS the default? Why why XT why ext4?

Jim:

That's a great question. And uh Ubuntu and Debian, they're they're still sticking with ext4. But SUSE has been using ButterFS since 2015, Fedora since 2020, uh, Red Hat Enterprise Linux 6 and 7 had well in Red uh RHEL 6, they actually removed Butter FS. Uh I I I don't know why they removed it. Uh I think they felt maybe for an enterprise level operating system it wasn't quite ready. Um it is included in Ubuntu, but it's just not the default.

Wolf:

Uh so when you're installing Ubuntu, which is a thing a normal ordinary user does who wants to use Linux. Ubuntu is almost the default choice. Here, they can choose this uh file system. They could have the good one.

Jim:

Yes. Yeah, the default is ext4, but as you're installing, uh you go through the Ubuntu installer and you uh say you want to customize your your disk uh parameters uh in there, you have a choice of picking uh ButterfS. And you can do it for both the data uh file system and the boot file system. And that I think is a fairly recent fix because booting from a ButterfS file system required support for Butterfs in the bootloader, uh Grub in this case. Uh so you had to have support so you could actually boot off of it. Um uh anyway, for Debian and Ubuntu, you you can choose that uh uh during your install. Uh overall, I think ButterFS is really looking like a great file system. Uh really interesting.

Wolf:

I I didn't even know uh I I understand things about file systems, but I didn't know the properties of these specific file systems, and I do have Ubuntu installed in places, and I do want ButterfS, and from now on that's what I'm gonna pick.

Jim:

Yeah, yeah. I I installed one uh in my research for this. I built a whole bunch of different VMs and I built an Ubuntu system choosing ButterfS. It just worked as you'd expect. Uh interesting thing between EXT4 and ButterfS. Now, uh Ted Cho is the kernel developer that was in charge of ext4. And even he back in 2000 said uh ext4 is better than ext3, but it's not it's not the greatest. He suggests Butter FS. Uh so the guy that created EXT4 is saying, if you really want a file system, look at ButterfS. And that was a that was back in 2008. I think he was at that point he was talking about the features of ButterFS. Uh but now 2008, what's that, 17 years ago? Those features are there, and it's it's it's a robust file system.

Wolf:

Yeah, it seems like it's uh definitely been around the block by now. I I feel like you can trust it.

Jim:

Yeah, yeah, I think so. Uh I think it's just a matter of time before Debian and Ubuntu adopt ButterFS as the default. Um so let's move on to another file system. This one was created by Silicon Graphics. Remember that company? Boy, they were they were at the top a long time ago, weren't they? I wanted one of those machines so bad. They uh they had an operating system called iRix. It was a Unix operating system, but they called it iRix. Uh they created a file system called XFS. That has all the benefits or many of the benefits of ButterFS. It's got the journaling and the online, uh, this has got online defragmentation. So in the background, there's a process running to keep your disk defragmented. That's a pretty neat thing. Um it allows you to configure block sizes from 512 to 64k. I think Butterfs allows you to configure your block sizes as well. Uh and the reason why the block size is important, if you're writing lots and lots of small files, you want a small block size. Because uh any file you write has to use up multiples of blocks. So if the if the it has to use a whole block at least. So if your file is uh 10 bytes in size, right, and you write it to the disk, it's gonna take the whole 512 bytes. That's just the way it is. If your file is 513 bytes in size, it's gonna take two of those blocks. So it's gonna take a thousand twenty-four bytes. But it's faster that way, right? If if you're writing really, really large files, uh writing a ton of small 512-byte blocks is is a resource problem. You're doing a lot of block allocation, so you might go with a larger block size. Maybe you go 4K or 64k, like XFS supports. So it can be faster for large files, not as good for small files. Uh with XFS like uh uh Butterfs and and EXT4, uh, they allocate in uh extents, not in blocks. So it's gonna allocate a bunch of blocks, and I think it'll reclaim those blocks if you don't use them, but uh it's faster at uh uh uh writing files because it's grabbing whole blocks, uh whole whole uh uh groups of blocks at once. Uh there are no snapshots in XFS. It has been the default file system for Red Hat since uh Red Hat Enterprise Linux 8. That's uh 2019.

Wolf:

Strongly disagree with Red Hat.

Jim:

Yeah. Uh it's also the default on Fedora 42 server. Uh Fedora has a has a desktop version and a server version. The default file system is uh XFS on Fedora uh server, and it's uh ButterFS on uh Fedora desktop. Um it's also the default for uh Alma Linux, which is kind of the uh aftershoot uh uh the offshoot of CentOS. Remember when there was that whole CentOS licensing thing a year or two ago?

Wolf:

We used CentOS on everything. Still? Uh we used to uh when I was uh when I was at LAS. Uh that was that was our thing, all our servers, CentOS.

Jim:

Yeah. Red Hat made some moves that make it less uh desirable. Anyway, uh Alma Linux uh is using uh XFS. And finally, let's move into ZFS. Uh that's from Sun Microsystems. Remember they had an operating system called Solaris? Uh Sun created a file system for it called ZFS. Um got bought out by Oracle. Oracle kind of did some evil things. Uh there was an open Solaris. I think Oracle kind of shut that down. So the status of ZFS was kind of up in the air. There's open ZFS, uh, so that's still out there. Uh but Linux never added it to its arsenal of file systems, at least not built into the kernel. Uh you can install packages, install the kernel module, and run ZFS, but you really got to try to do this. And upgrades become a problem because when you upgrade the kernel, now you've got to make sure you upgrade the module, and things get kind of ugly. And the last thing you want to do is render your system unbootable because you can't access your file system anymore. Uh, there are a lot of benefits to ZFS, though. Uh, it can handle multiple hard drives, it can be a layer over the top of multiple drives like volume management. It'll do snapshots and checksums and compression and deduplication. Uh replication. That's something we hadn't talked about yet. That's something very common in databases, but not so much in file systems. Anyway, you can replicate your file system. Basically, what that is, it's going to write your data to two different systems. Uh so in the event of a failure, you've got a hot spare. It's kind of neat. That's how I do my databases always. Uh something else the ZFS does is it stores multiple copies of the file. Uh I think two or three is the default. And that way, if there's a checksum issue, remember we talked about checksums and files. And if you do a read on a file and you compare the checksum and it doesn't match, the file system knows that that file could be corrupt. So it just reads one of the other copies, checks the file system. Uh, worst case, all of your files have the wrong checksum, and then you're you're stuck. But it's pretty safe that way. I I really kind of like that idea. Of course, you're going to use up more disk space, but what do you want? You know, disk space is cheap. What do you want? Uh safety or or uh size.

Wolf:

Um that sort of gives me a question, which is if you've got three copies of the file, um, and I don't know exactly how the copies were made, if they were all written at once, and you're reading, and when you read you get one checksum, and when you write, you get a different checksum, maybe maybe the thing I'm about to ask doesn't matter. But how do you know which of those checksums is right? Like, maybe when you wrote it, you just wrote the wrong checksum. And when you read it, um you got the right check.

Jim:

Well, let's start with I guess you originally wrote the file and it's gonna calculate a checksum and and store the checksum on disk. It's gonna make three copies of that, so it's gonna copy that checksum three times as well. So at least I think it's gonna make three copies of the checksum. Uh what you're asking is what if the checksum was written wrong? Well, that's a bug in the file system, right?

Wolf:

I see. It's not really so much that the checksum is right or wrong, it's that they disagree. If they disagree, that means you read different bytes from what you wrote or meant to write.

Jim:

Right. And that could be a uh a flaw in the in the magnetic material on the hard drive, right? Um it it's not a good idea.

Wolf:

Which is why three copies is good.

Jim:

Yeah. Yeah. So you you read uh you compare the checksums, if they match, you've got a good file that isn't corrupt. If you have to read the second copy or the third copy to get it, then you do that. It's it's really pretty nice.

Wolf:

So ZFS sounds good, but you still haven't talked me out of ButterfS. I feel like ButterfS is what I want.

Jim:

It does sound good, but you really got to jump through hoops to get at it. And I think uh ButterfS has hit the point where it's just better. Right? I think maybe back in 2005 when Solaris was popular, maybe ZFS was great. But I think ButterfS has surpassed it. So at this point, why? Right? Um interesting note though, we talked about uh uh how large your file could be. A ZFS file system can store 256 quadrillion zetabytes. What did we say the other file system was that would do uh uh a zeta 16 zetabytes? It was uh a zettabyte is like a uh uh a billion terabytes. Uh these numbers are ridiculous. 256 quadrillion zetabytes.

Wolf:

I want to know how that compares to the number of stars in the universe or the number of grains of sand on all the beaches in the world.

Jim:

Yeah, I don't know. It's nuts. Uh that's a lot of that's a lot of zy space.

Wolf:

I don't even know how to think about those numbers.

Jim:

No. I uh but I think you're right. Butterfs is kind of the way to go. And I don't need 256 quadrillion zettabytes. Uh and it's not fully supported. So interesting uh academically, but I'm not gonna try that. Uh so that's kind of rounds it up for the Linux file systems. And and I said at the beginning we'd talk about the Windows file systems, and really, you know, Windows, if you go all the way all the way back to MS DOS, you've got your FAT, FAT12, FAT16, FAT32, and EXFA. Those are great file systems for floppy disks. Uh they're kind of the standard, right? Uh FAT32 has sort of become the standard for for cameras and and devices that have storage. Um floppy drives, uh any operating system will read a FAT32 floppy drive, right? Um uh so that's kind of like the old standard. Uh with Windows NT, remember that? Like around 1993 or 4 or so? Uh NTFS. Uh that was the file system they created for that. And that was closer to a Unix type file system. Um and it's it's the default now on all the Windows operating systems. Windows Server, Windows Desktop, they're all using NTFS. Uh Windows did try to introduce a new file system uh back in 2012 with Windows Server called REFS, the resilient file system. Somehow it never really caught on. I did a lot of digging, trying to find information on it, and there really isn't much out there. Um, but they did sort of resurrect it recently with DevDrive. DevDrive is using REFS. Uh and Wolf, you you're using DevDrive. What can you tell us about it?

Wolf:

I am using DevDrive, and uh the reason I'm using it is because the Microsoft documentation says DevDrive is specifically tuned to the kinds of tasks a developer does. So like a Git repo with lots of little um uh blocks in it where each block is a uh you know an object in the object database. There's way more um of these objects in the object database, and they're just files, uh, than there are files in your uh working directory, your working copy. Um but you have to have all of those things. You have all the objects, and you have all the files in your working copy. So you've got tons more stuff than you think you have. So my understanding is it's good, it's fast, it's it specializes in handling these normal sized files. Uh by by which I mean, you know, it's it's not a giant video file or or something like that. It's a textbook source.

Jim:

Maybe a few kids. Exactly.

Wolf:

Uh so it's good at those jobs. I think it knows something about Windows built-in virus scanners and things like that. Um I think it knows something about trial. I don't I don't know enough about DevDrive to be an expert in it or to tell you why it's better. I can only tell you what they told me enough to make me decide it was the right thing for me. Um so I partitioned um my my big uh C drive um and uh that partition that I made uh I formatted uh as a dev drive, and that's what I've been using since then. And no problems, it seems to work great. Um I'll bet it does more than I know about. Um definitely worth looking into if you're a Windows person.

Jim:

Yeah, I know Microsoft was claiming it was th up to 30% faster for uh builds doing your um you know your software builds. 30% is a lot. Yeah, I don't know if it actually lives up to that, but it sounds interesting. So uh let's move on to Mac. There's not a lot of choices with Mac file systems. There are some uh settings you can make. Uh Mac for the longest time was using the HFS Plus file system. Um that was from way back in the early days of Mac, I think. Um but then they moved uh in 2017, they moved to the APFS. That's the Apple file system. Came out with a high Sierra release of Mac OS. Um encryption is built in. You can turn that on if you want. Uh one of the interesting features about it though is case insensitive, and that's by default. And I Wolf, I know you have thoughts on file systems that are case insensitive.

Wolf:

I do. Um so primarily I use Mac OS. That's that's where I do all my personal work. I like it because in a lot of ways it's very Unix-like, but in this one way, it is absolutely wrong. Um the right file system is case sensitive. Um and a lot of Unix um projects that were built on some Unix or Linux and are then stored on GitHub, they might actually exploit the it's it's bad to do this, but they might actually exploit the fact that um it's case sensitive and have two different files with the same name if you ignore case, but they're actually different cases, like one's all caps and one's all lowercase. You can't clone that project onto a normal Mac uh volume because it's case insensitive. Those two files, whichever one you read first, would be overwritten by the second. Now, uh, and in fact, uh Lindis Torvalds has opinions about this. His opinions have a lot more dirty words in it than mine. Um but essentially we both say the same thing, which is case insensitive file systems are bad, um, however, they're really good for human beings. Like when a a person is trying to talk about a file, um, to a normal person who's not a programmer, case insensitive, they don't even know the difference between capitalized letters and lowercase letters. All they know is they spelled JPEG. That's what they know. Um so maybe, maybe macOS made the right choice for the user. Um, but I have tried to run with a case-sensitive file system and games that I was trying to play. I think uh the game I was trying to play was World of Warcraft in this case. Um it had some file written in its source code that it wanted to find, and that file had a different case name where it was actually stored on the file system. So it couldn't find that file, and it was a really important file, and so it couldn't run at all. Um so I couldn't play World of Warcraft on a case-sensitive file system. Um so uh the upshot of this is if in general you want your real volumes to be case insensitive, it's the default, just take it. And if you need a case-sensitive file system, because for instance, you need to work on one of these repos we just talked about, um, make a disk image. On the disk image, you get to pick exactly what kind of file system you get, whether it's encrypted, whether it's case-sensitive, do your work on that volume, the volume that comes from that disk image. Um there's my opinion, contradictory as it is. Good advice there.

Jim:

Uh so let's move on to network file systems. Um, these aren't really file systems, they're more of an abstraction that allows you to access a file system across a network. Uh, the old granddaddy of that is uh NFS, uh network file system. It lets you over the network uh mount a file system um and access it like it were local. Um kind of neat. I think we're up to NFS version four now. Is that right? Um it works really well. I I I like it when you when you need to have it. You know, back in the days when I was doing LTSP, we relied on NFS for mounting the root file system. That's how the thin clients worked. That's where they got their disk. It was across the network. Um for those in the Windows world, uh SIFs, CIFS. That's their Windows file system across the network. Uh if you're trying to access one of those from a Unix box, you have Samba uh to access the the Windows file system uh over the network. Uh and then there's SSHFS. That's kind of a user land feature. Uh it runs in user space, uh, but it lets you mount a file system across the network using SSH as the transport. Nice and secure.

Wolf:

Uh I'm gonna ask you a I'm gonna ask you a question that you might not know the answer to, and that is this. Uh I hear about device I don't have one. I hear about devices called NAS. Devices, network attached to the data. Storage? Yeah. So that seems like a file system to me, at least on the device, that's what it's going to be all about. There is. Do you treat it like that? Do you talk to it like it's a file system? Or is there some protocol?

Jim:

You use NFS or CFS or CIFs. NFS or CIFs to get at it. That's what you do. The device itself, I think there's FreeNAS, I think that's one of the BSDs. It's using one of their file systems to store the data. But you're going to talk to it over the network. It's going to be a network attached file system. NFS or SIFs, most likely. So they're not really file systems, they're just an abstraction for the file system. So you can get at it across the network. And finally, let's move on to undelete. I know when Linux started becoming popular in the 90s, people moving from Windows couldn't believe that they didn't have an undelete. I've been in Unix world for so long. If I delete a file, it's gone unless I have it on backup. And I I certainly should have it on backup. But people couldn't understand that you can't uh undelete a file. And as these file systems get more and more complex, there's it's not surprising that you can't do an undelete. Uh when when you write files and you update files and you delete files and you add new files, the blocks are all over the disk. When you if if you delete a file, it's going to free up those blocks for use by another file uh by the next time you you write. So it's not surprising that the file is not recoverable once it's been deleted. Really, all deleting does is removes the inode entry and the directory entry uh pointing to that inode. The data might still be there on disk somewhere, but you lose the pointers to the data. Uh so it's gone. So there really isn't an undelete. But the way operating systems have handled that now is they have this whole metaphor of a of a trash can or a or a recycle bin. I know Mac has the trash can icon on the bottom of the des uh the task bar. Um Windows has that. You drag a file and you drop it in the trash can. It's not really deleting the file. It's just removing it from where it was in your in your directory. It's still saving the file. And most operating systems will hang on to that whatever's in the trash uh the the the recycle bin for 30 days. I think it's tunable, but that file will sit there for 30 days unless you empty the trash can, uh, which really removes it. So that's like the safety nut for people. You know, you drive your into there.

Wolf:

I rarely put it in the trash can. I I RM it.

Jim:

I I work on the command line all the time. And if you if you're at the command line and you type in RM space file name, you're deleting the file. You're not moving it to the trash can. Uh in fact, I wonder that on Mac, there's probably a command to move something to the trash can. I bet there's a command line utility for doing that. Um, but that's not how I work. I just remove it. And once in a great while, I'll I wish I hadn't done that. But I understand that uh there's no undelete.

Wolf:

Um I think we've talked about backups before. Um I know you have some, and I know I have too many. Um so I don't I don't really worry about deleting files.

Jim:

Yeah. Yeah. I I I do backups regularly, and I've had to pull files off a backup. Uh but undelete is just a feature I I don't need. Some people do, but use the trash can then, right? So I don't have I covered enough about file systems? We've been talking forever about file systems. I think I think people have had enough.

SPEAKER_02:

What do you think?

Wolf:

I I I think that's probably true. I think for me, I'm absolutely gonna take away Butter FS. That's that's what's in my mind.

Jim:

Yeah, I think that's the that's the big takeaway. Uh I I didn't know much about it before I did the research for this episode. Uh and and Wolf even asked me like two weeks ago after after looking into this stuff, what would I use? And at that time I answered him, I just stick with the default ext4. But man, Butterfs has some nice features. And I'm I'm thinking I'm gonna choose that from now on. So that's uh that's that's one thing to take away from this. Uh and the other thing to take away is if you're on a Mac, uh you probably want the default case insensitive set. Uh if you switch it to anything else, you you're likely to have trouble.

Wolf:

So um, I do know, I think, that these days, um on the Macintosh, the file system is encrypted by default. I don't think I know this for a hundred percent, um, but that is also a choice you can make. Uh and my personal opinion, as I've mentioned in previous episodes, I'm paranoid my file system is encrypted.

Jim:

If you can if if you can encrypt it, encrypt it. Uh it's a choice when you're building an Ubuntu system, whether you want your uh whether you want your disk encrypted or if you just want your home directory encrypted, you can choose those. Uh I think it's a good idea.

Wolf:

Anyway.

unknown:

I think.

Wolf:

So you feel like that's the takeaway?

Jim:

I think so.

Wolf:

Uh in that case, it's probably time for me to say to all of our listeners, thank you so much for sticking around to hear the end of this. Um this was much more interesting than I thought it was going to be, and I learned stuff I didn't realize I was going to learn. Uh, I hope that uh the experience uh was as good for you as it was for me. Um you can reach out to us with feedback. We love feedback because we want to get better. Um we want to be right if we made mistakes. We want you to correct us. Um you can reach us um by email feedback at runtimearguments.fm, or you can uh talk to us on Mastodon. Um uh we are um jam at actually what what is your uh mine's yes just okay, forget it. Look at the show notes. Maybe we'll edit this out. It's I don't know. But absolutely. Uh check the show notes. Um we look forward to talking to you guys next time. And thanks, thanks so much, Jim?

Jim:

Yeah, thank you very much for listening. Appreciate it, and uh looking forward to the next episode. Bye, everybody.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

CoRecursive: Coding Stories Artwork

CoRecursive: Coding Stories

Adam Gordon Bell - Software Developer
Two's Complement Artwork

Two's Complement

Ben Rady and Matt Godbolt
Accidental Tech Podcast Artwork

Accidental Tech Podcast

Marco Arment, Casey Liss, John Siracusa
Python Bytes Artwork

Python Bytes

Michael Kennedy and Brian Okken
Talk Python To Me Artwork

Talk Python To Me

Michael Kennedy