Runtime Arguments
Conversations about technology between two friends who disagree on plenty, and agree on plenty more.
Runtime Arguments
15: POSIXLY_CORRECT - What it means to be POSIX Compliant
Listen in as Jim describes what POSIX is, what it means to be "POSIX Compliant" and why you should (or shouldn't) care.
Hosts:
Jim McQuillan can be reached at jam@RuntimeArguments.fm
Wolf can be reached at wolf@RuntimeArguments.fm
Follow us on Mastodon: @RuntimeArguments@hachyderm.io
If you have feedback for us, please send it to feedback@RuntimeArguments.fm
Checkout our webpage at http://RuntimeArguments.fm
Theme music:
Dawn by nuer self, from the album Digital Sky
Hey everybody, welcome to another episode of Runtime Arguments. I'm Wolf, and as always, I am joined by my my very best friend Jim. Jim, say hi. Hey Wolf. Today uh we're gonna talk about POSIX. Jim's gonna talk about POSIC, as you guys have probably noticed. Uh we alternate. Uh one of us is the expert uh each episode, and it goes back and forth. Uh this time it's Jim, and it's on a topic he has knowledge about and also did research on. Uh, but let's start the real way. Jim, how's your week?
Jim:Uh, you know, uh every week, uh every time we do this, I say uh it's a busy week. Um and this is no exception, only this time it's different busy. This week I'm traveling. Uh we do a little geek retreat to uh Southwest Harbor, Maine, every November. And uh so that's this week. Uh and unfortunately, Wolf didn't get to go this time. He's been with us a few times in the past, but he was not able to go this time. Uh but it's a bunch of geeky guys uh like ourselves. Uh we get together. Anyway, that's this week.
Wolf:I'm super excited that you get to go. I'm super sad I don't get to go. Um, there's uh good things and bad things about the trip, almost entirely good. I don't like spending the money, of course. That's that's a bad thing, but a very small one. Um I hope you guys have a great time. I hope is Marlon coming this year?
Jim:Uh no, Marlon's not here, unfortunately. He's he usually comes, but uh he's he's busy. You know, everybody gets busy, right? And uh unfortunately uh we usually get a lot of guys from Europe and Canada, and most of them aren't coming either. Um they they just don't want to get bogged down in uh the customs stuff. I don't I don't want to get into that too much, but uh it's a very small group this time.
Wolf:Uh that is unfortunate, but uh just the same. I hope you have the very best time possible. And one of the things uh from the times I've been on the trip is that all of the people who work there, who run the hotel, who run the restaurants, who run everything, we know them all now. And uh they like us and we interact, and it's great to see them. And i it's a really it's a really great uh visit.
Jim:They're old friends now, so it's it's it's real nice.
Wolf:Yeah. Um let me talk a little bit about uh some of the reaction to uh our last episode, which was uh me talking about knowledge and process essential for programmers to know whatever level they might be at. Um Jim and I had a fundamental disagreement that we we glossed over pretty well in the episode about what we think is important. Um and there there was a bit of feedback. Um you know, Jim talked about how C is an important fundamental thing you ought to know because everything builds on that. It's it's it was there early and it teaches you things, and no matter what you're using, C speaks to it. Uh and I brought up that you know there's lots of important things in C, for instance, pointers, um, which is a concept that a lot of people uh have trouble with, but how they're a specific example of indirection, and I tied that back into uh pass by name and pass by reference and understanding uh things of that nature. And uh that made everybody happy. The person who gave the feedback, uh, Jim understood that I did really see value in the same things he saw value in. And um oh, and by the way, you mentioned uh circular relationships and how annoying those are when you have prints as pointers. That's a problem in in Python because you can write stuff if you're not paying attention or you didn't realize this was a problem that imports things in a way where A imports B and B imports C and C imports A, and guess what? Your program doesn't run because it's a circular import. So that's a problem. So there was feedback. Um as always, we want your feedback. I know that you guys are listening and you think, boy, these guys probably get way too much feedback and would never have time to respond to me. Actually, we don't get nearly enough feedback. We we want feedback, we want to hear from you. If you have things to tell us, tell us. And you're gonna hear back from us. That's how much feedback we get right now. Um so that's where we are. And uh today it's Jim's topic. He's gonna be talking about POSIX. Without further delay, Jim, tell us about POSIX. What is it?
Jim:Yeah, well, first of all, you you said uh uh every week we go back and forth, or every two weeks we go back and forth, and uh some weeks I'm the expert, and some weeks you're the expert. I don't want people to be fooled into thinking I'm an expert at POSIX. I'm interested. Okay, I'm interested in POSIX. Um, just because I run into the term all the time, and and that's all. I I did some research, we're gonna talk about it, and and you'll realize at the end I'm not really an expert in it, but that's okay. We're talking, right? Um, you know, and it you know, it surprises me. We do these things every two weeks, and I I can't believe two weeks has gone by already since the last episode. Time has just flown. I know, and it seems like only minutes. It it does, it does. Uh because these things are so much fun, you know, and we're so busy in between that uh the time just flies. So uh, you know, Wolf and I, this whole podcast started because Wolf and I we have lunch every Saturday at a sushi restaurant in uh just south of Ann Arbor called Biwaco. Um and we have a great time. Uh anybody is welcome to join us there. It's 11 o'clock on Saturday mornings, most Saturdays. Um this week, however, I'm in Maine, so we're not meeting. But anyway, uh we were uh recently having lunch. Yeah, you'll be there. Recently we were having lunch, and uh I don't even remember how it came up, but we were talking about how uh Apple in Mac OS stopped using bash as the default shell, and they switched to the Z shell ZSH. And I was kind of wondering why, and we talked about it, and Wolf said, well, it's probably because of licensing, because bash is GPL, and Apple doesn't want GPL because I I guess they think it infects their their IP or something. So they switched to uh ZSH. And also uh uh in the same breath practically, he mentioned it was kind of weird because uh the Z shell is not POSIX compliant and bash is, and and that sort of led to the discussion. Well, what does that mean? What does POSIX compliance mean? So that's uh that's kind of how we got here. So here we are. We're gonna talk about uh what it means to be POSIX compliant. Um to get into it, I'm gonna spend just a just a minute or so talking about Unix history. Um Unix was created in 1969 by uh Ken Thompson and Dennis Ritchie. We've talked about it before. You you all probably know that already. Uh but for a while, that was it. That was the only kind of Unix that was out there, right? Uh so there was no problem with compatibility. Uh they you know they ran it on the PDP, what did they do? PDP 7, PDP 8, something like that. And they didn't really have a problem with compatibility, um, especially at the source level, because they controlled everything. But in um 1978, um the computer systems research group out of uh the University of California, Berkeley, they released a version of Unix called the Berk Berkeley Software Distribution BSD. Um so now we had two Unixes, and at that point things started changing, right? They kind of forked and things became different. Um and and then uh to make matters worse, uh, following the divestiture of Bell Labs in 1982, uh Unix uh ATT began licensing Unix source code. Uh they used to uh give it to universities for a very, very small fee. Uh I think it was like the the cost to put it on tape or something like that. They'd give it to universities. But uh businesses or uh companies started getting the source code and they started creating Unixes. Uh you know, we saw uh Sun, uh they had Sun OS, that was based on BSD. Uh DEC had Altrix, HPUX came out of uh Hewlett-Backard, uh Silicon Graphics, they had iRix. Uh Microsoft, uh, they worked with the Santa Cruz operation, they created Xenix, uh, and then eventually it became uh SEO Unix. Uh so there's all these different flavors of Unix out there, and people started worrying about you know, if I write my program for uh HPUX, is it gonna run on Sun OS? Uh ATT saw this as a problem, uh, and it was smart thinking, I I think. So they created something called the System 5 interface definition. SFID. You ever heard of that?
Wolf:Well um SBI. No, when you said it, it sounded like some kind of uh communicable disease.
Jim:Yeah, yeah. SPID and then SFID2. Um, it was a um uh uh a set of standards uh that uh Unixes had to follow in order to be called uh basically SPID compliant, which meant you could write a program on one system and you could uh compile it on another system, and the odds of it working were pretty high. Uh that would be a portable program, um, a portable uh application. Um, they had SFID and then SPID2, and I think they had SPID three and four. Uh anyway, in 1988, I was at a conference in Toronto uh and it you know, they had an expo hall, and man, all the vendors were talking about, yeah, but we're SFID compliant. Uh and that sounded like it meant something. Uh, and it was kind of neat. Um but that uh that led way to uh the IEEE. Um they created something called the Portable Operating System Interface, uh known as POSIX. Um and that was to define the application programming interfaces, the command line shells, uh utilities, uh, and other things for software compatibility. It was basically SFID on steroids. It was a much bigger, broader uh uh set of standards. Uh and Wolf, do you have any idea where the name POSIX came from? Who coined that term?
Wolf:Uh I never even thought about it. No, who did coin it?
Jim:I never did either until I started researching. Well, it was Richard Stallman. Oh, I mean, uh oh. Oh Richard Stallman from the GNU uh world. Um uh, you know, and Richard, he I I've met him several times, and he's a he's a character. Uh uh, I'll I'll say that much about him. Uh he rubs a lot of people the wrong way, but you know, uh in my opinion, we owe Richard a lot of uh respect for what he's done. Uh you know, he created uh the GCC compiler, uh Emacs, which neither of us use, uh, and several other things. And and uh, you know, I I can't say that there wouldn't be open source without him. Um, but anyway, he came up with the name POSIX, told uh IEEE, and they agreed, and here we have it. Uh uh POSIX is what we're talking about.
Wolf:I will say that if Richard Snowman had his way, um there wouldn't be open source.
Jim:Uh no, no. And and uh and and possibly he would like to rename it GNU POSIX, but uh that that that I never heard him actually say that. Um anyway, there's lots of other standards uh for the Unix and Unix-like systems. Um I'll just briefly mention a couple of them. There's the IBCS, uh, and that is the Intel binary compatibility standard, and that lets you run a binary, compile a binary on one Intel machine and run it on another. Whether you compiled it on uh uh Xenix and you want to run it on Unix, or you compiled it on uh I mean uh assuming the CPU architecture is the same, uh and in this case it's Intel. So if you have a Unix that runs on Intel, uh you using IBCS, you should have been able to compile your program on one system and run it on another uh from another vendor. And you know that that really did work because we used to uh get some binaries that were made for SEO Xenix, and we were able to run them on SEO Unix. Really a completely different animal, Xenix and Unix, but uh we were able to run the the Xenix binaries on Unix because there's that binary compatibility layer. So that was kind of neat. Um there's also the single Unix specification. Um a lot of the Unix vendors uh came up with that. Uh there's the LSB, the Linux standards base.
Wolf:I actually know something about that because uh I spent uh maybe a year, maybe two years, I can't remember, on the LSB committee uh when I worked for the company that was then called Troll Tech, uh, and it's now called the Cute Company. Um that was uh that was an introduction to uh trying to decide things in a big body made of very, very um uh partisan players. You know, it wasn't like here's some guy from the community. It was like here's the Intel guy, and here's the Troll Tech guy, and here's the IBM guy. And I bet everybody had something to sell. So and I bet none of them had an opinion on how things should be, right?
Jim:Uh another one uh uh uh is the FHS, the file system hierarchy standard. And I remember uh years ago when I was doing LTSP, and uh you know, we had this whole image that had to sit on the file system that uh would be served up through NFS, and we approached uh some guys at Debian about where in the file system we should put this tree of information that was going to be served by NFS. And man, we just had people arguing and arguing and arguing. We ended up putting it in/opt because we just we had to have a place. We never really got a clear, clear idea of where it should have been, but that's where we put it in slash opt, and that worked.
Wolf:Um by the way, yeah, in uh in our last episode, I talked about no surprises and naming things in a way where, you know, you can see the name and then you'd know what it is. Right. Uh slash opt is a horrible name. What does slash opt even mean? Optional.
Jim:I don't know. I think I think now now if I were to do it again, I'd probably put it in slash serve, slash srv. Um I I I'm not even sure if slash serve existed back then in uh like 1999-2000, but uh that's probably where I'd put it now if I were to do it again. Um and then one more uh standard uh thing is the XDG, uh the X Desktop group. Uh it's it's become free desktop.org now. Uh they've got some standards. Um and that's the one that actually matters to me because I care about dot config and yeah, they define the dot config and and they give you tools like to uh inspect your system to know what version, what vendor OS you're on, you know, if you're on Red Hat or SUSE or or whatever. Um there's tools in the XDG uh kit that that will help you with that. Um anyway, those are some standards. We're not talking about those today. Uh we're here to talk about what does it mean to be POSIX compliant? Um and and it's really a lot of things. Um let's start off, uh, system utilities. Um the um the set of utilities that ship with your system. If you want to be POSIX compliant, you need a minimum set of utilities. Um and it's uh it's a fairly large set of things. And uh uh I I'm kind of ahead of myself a little bit here. Um I'm gonna talk about the mandatory utilities uh that you need to have. Um and there's a it's a weird list of things, and there's like 142 of them in this list, and there's the usual ones you would you would expect. Uh cat, uh ls, cp for copy, diff, you know, those are you'd expect those. But I I think the list might be a little outdated. Um, and this was updated in 2024. But they to be POSIX compliant, you need UUCP. Yeah, you need um uh some queuing stuff that nobody uses. Um there's there's some things. Uh I mean UUCP. When's the last time you used UUCP?
Wolf:I don't even think I know what UUCP stands for. I don't think I've ever typed those four letters in a row.
Jim:It's a way of uh moving files from one Unix system to another using modems. Um I suppose you didn't have to use modems, but it was this uh whole thing.
Wolf:If I want to get files from one machine to another, I use SCP. Well, sure you do now.
Jim:Now you have a network, now you have the internet. But back in the in the 80s, we didn't have the internet, right? I mean the internet was uh that was a research research thing by DARPA. Uh but Unix to Unix copy. That's how net news moved around. Uh you you you probably used a newsreader, didn't you, back in the day?
Wolf:I uh I did something. I absolutely used Unix news groups, no question.
Jim:So uh so anyway, there's 142 commands on that list. Kind of surprised me there's not uh the tar command is not on that list. Um man, I use that command all the time. And it seems like boy, if you're moving between machines, you're gonna want tar or CPIO. Neither of those are there. Um but there's other things on there, you know, sort, T, kill, all that kind of stuff.
Wolf:Um and when you're moving things between machines and using tar, um you don't tar the file and then send the tarred file.
Jim:You build a pipeline of commands like we talked about in a previous episode on the you know, you you do your list uh with find or or something, um and you might you might uh you might pipe it through sort to uh sort the list that we when you feed it to tar, um you get a nice listing in alphabetical order. Um what else would you do? Uh the output of tar you might run through compress or gazip or something like that. Or Bzip2. A nice, nice little pipeline. So you end up with like a TGZ file or something. But that's not that's not part of the POSIX standard. So you can do it, but if you want to say your system is compliant, uh POSIX compliant, um, you need that 142 utilities. That's just one of the things you need. Uh the other thing is the C language is is POSIX compliant. Um Well, it's not really POSIX compliant, it's part of being POSIX compliant. Your system uh, I think has to support a C compiler, and there's a whole bunch of header files you need. You know, if you write C, I I don't know why I'm hung up on C. I talked about it last episode too. But if you do write C, you're gonna use things like uh stdlib.h or stdio.h, uh or a whole bunch of different header files. Those are standard, those are part of the POSIX standard. You gotta have those. Uh, and there's a bunch of them. Um error codes. Uh, if you if you're if your program returns an error, well, first of all, you know, if you're if you write a program uh and use it in a shell script, you want that program to return zero if the program was successful. And it seems a little funny because zero is generally considered false and non-zero is true, but in the case of an exit uh uh value from a program, zero means it ran successfully, something else means it didn't. Uh, and then whatever that something else is, um, you know, if uh if you return a uh I think a two is file not found. Um 13 is segfault, I think.
Wolf:Um and then negative values are all user errors. They're the errors that are specific to your program.
Jim:Yeah, yeah. And and it's weird because in in your program you have to shift uh well whatever whatever whatever value you get out, you gotta shift it by some number of bits to to actually see what the value is. Anyway, the point the important thing is you want your program to return zero if it's successful. Uh that's a POSIX thing. Um if you want to be portable, you're gonna write it that way. Um another thing in in POSIX is things like processes, uh, how you deal with processes. Uh I mentioned the exit status, the environment. Um, you know, we uh first of all POSIX says you have to have an environment, and there's a bunch of variables you should have in that environment. Uh things that we see all the time, you know, the the prompt strings for for the shell, PS1, PS2, 3, and 4, lang. Uh these are all uppercase, by the way. L-A-N-G, it describes the language that you're in. If you're English, it's it's gonna be EN or EN.us or something like that. Um T Z for the time zone, so you can describe what time zone you're in. Uh-huh. Uh and different users can have different time zones, and the system keeps track of that for you. Uh user, who are you? So from a shell script or something, you can check dollar user and and uh uh it'll tell you who who you logged in as. Um the shell you're gonna use, uh print uh PWD, PWD, that's your current working directory, uh right? Isn't it PWD?
Wolf:It is. Yeah, that was one of the ones I was gonna mention. I think that one's poorly named. Why is it PWD and not CWD?
Jim:Yeah, I I I don't know. Probably because the command is PWD to print the working directory, so maybe it's tied to that. I don't know. It's weird. Anyway, there's a whole handful of uh environment variables you gotta have. Uh POSIX defines limits, uh, all kinds of limits. For instance, did you know the maximum length of a file name in Unix is 255 characters? Uh that's just the basics. That's not enough. That's well, it's it's enough for the file name itself within the directory. That's not the limit uh of the whole path. That's just the file name. So you can have it nested really deep in a direct in a whole directory structure, and each point, each node in that structure is up to 255 characters long. Um so yeah, limits like that. Um POSIX does uh uh describe uh what a shell needs to have. Uh and we do we talk a lot about shells. You and I and some of our friends, we're it seems like we're talking about shells all the time. And there's some things you gotta have in a shell, uh, and it has to behave in a certain way. Things like quoting. You know, in in uh in the in the born shell, you got quoting, and you know, double quotes variables will be uh expanded, single quotes they won't, um escape sequences, you know, backslash n for a new line, uh and many others. In double quotes they get expanded, single quotes they don't. Um the whole way we do special parameters, um you know, you run a command, uh let's say you're writing a shell script, and uh you pass arguments to the shell script, well, those arguments come in as$1,$2,$3. Those are the positional parameters.$0 is the name of the process itself. Um that's a that's a POSIX thing. Um if you want to see the exit status of the last program you ran, like I you know, how I mentioned you gotta return zero if it's successful. At any time you want, you can type uh echo dollar question, and it'll tell you what the uh exit status was of the last command you ran. Um$ will tell you the process ID of your current shell. Um all those things. That's POSIX uh uh uh part of the POSIX standard. Um uh expansion of variables. Uh you know, uh variables start with a dollar sign when you want to reference them. Uh they don't start with a dollar sign when you want to set them. Um command substitution. Uh by that I mean like doing the back ticks. You know, if you want to run a command and send the output. Okay, I gotta stop you right now.
Wolf:Yeah, if I see somebody use back ticks, I stop them and I tell them, here's what you use today. Dollar open paren, then the the the command, and then close paren. Because those nest and do quotes and do everything that you want, and back ticks don't back ticks are old, don't use them. So sorry, didn't mean to interrupt. Go ahead.
Jim:I I was a late convert to using the dollar open parentheses uh and close parentheses, uh, and it's clearly better. You know, sometimes my fingers just remember the old way and I'm typing, but yeah, don't use the back ticks because they're hard to see. You know, somebody looks at them and and and they just uh sometimes they skip right past you. You don't realize those are back ticks and not not uh forward ticks. Uh so yeah, dollar uh left parentheses, command, dollar right parentheses, um arithmetic expressions, that kind of stuff. That's all that's all shell stuff. Um redirection. We had that whole episode on pipelines and and uh what what do we call that episode? Um Wolf, that was one of your superpowers in the name, yeah. Super unlocking the superpowers, uh uh basically of the command line? Yeah, of the command line. Uh it's all through um uh redirection. Sending the output of a command to a file using the greater than uh appending the output of a file uh using the double greater.
Wolf:Um this actually gives me a question. Yeah um there is a command. Uh that command is the word test, and it has a synonym. The synonym is a command that runs, it actually runs an executable, and the spelling of that command is a single character. It's the left square bracket. Yeah. Um is that part of the POSIX standard?
Jim:Um, you know what? I I don't know. I I think it might be. Um test is. Uh and and uh the left square bracket is really kind of like syntactic sugar. Uh it's uh I don't know if it still is, but on older systems it was a sim link to uh to to the um um to test. To test, yeah.
Wolf:Um that's different from double left square brackets because double left square brackets are something that's built into bash.
Jim:Yeah, and I think that's not POSIX there. I think that's something different. Uh we're gonna talk a little bit about the the uh about uh bash um uh and POSIX um compliance. Um you know uh some of the control structures you have case statements, uh for loops, while loops, uh until loops, all that stuff, all the conditionals, that's all defined by POSIX, what you have to have in your shell. And if you're programming, uh if you're writing a script and you want to ensure that you can run it on another system, regardless of what shell they have, try to stick to the POSIX compliant things. Right? All these shells have more than that. Uh but if if if your goal is to write something that's portable, stick with the uh the minimum uh POSIX specs, uh and you'll have an easier time porting your your things or just running your scripts on another machine. Um file globing, you know what that is. The uh I do, I use it all the time. Yeah, it's uh you know, it it's a cousin, it's the simple cousin to uh regular expressions. Um I mean it's even like MS DOS had it way back in the early 80s. Uh and that's you know uh uh star to represent any character, uh question mark to represent one character. Um what what else? Uh globbing uh square brackets, I think, uh so you can do a range of characters. It's that is correct.
Wolf:I use a thing um when I'm globing path names, that's where I normally in encounter globing. Um so like in LS or whatever, um, I'll use curly braces, and inside the curly braces a comma separated list. So if I have uh two files and one of them is named A underscore file and the other is named B underscore file, and there there happens to be a C underscore file and a D underscore file, but I only want A and B. I can say open curly brace, A, comma, B, close curly brace underscore file, um, and I'll get just those two. And I won't get the others. Uh so I use curly braces all the time. I don't know if that's POSIX or if I don't think it is.
Jim:I think that's actually uh Bash is doing that for you. And uh you're gonna find out in a little bit that that's not POSIX compliant, but that's okay. Um uh job control. Uh uh, you know, we uh uh Wolf and I are part of the Michigan Unix users group, and uh uh a couple of weeks ago or a month ago, we had a uh really good descruption uh uh discussion on um on job control, basically the FG and the BG command for putting things in the foreground, putting them in background, uh hitting control Z, uh using the ampersand at the end of command line to throw it in the background. Uh that's all part of the POSIC spec. Were you gonna say something, Wolf?
Wolf:Uh I was. Uh we the key to that discussion was uh a command you didn't mention, which was the word jobs. Jobs.
Jim:Well, yeah, jobs. Jobs will list the the jobs that you have, uh whether they're running or in a in a you know if they're in the background or if they're in a suspended state. Uh anyway, control Z is the key to that. Uh and that's all part of the POSIX spec. Um I I I let's see, I talked about signals and uh utilities. The file system. We did mention one thing about the file system, and that is uh file names are limited to 255 characters uh for each segment of the file name within a path. Um POSIX says that the file system has to be hierarchical, uh, that it has to have a root directory with subdirectories and a tree-like structure. Uh files and directories can exist at any point in the tree. Um what I was surprised to learn was file access permissions are POSIX. Uh, you know, the read-write X, read write X, read write X thing that you see when you do a directory listing uh that says that the owner, the the group, and others have various permissions. That's all POSIX spelled out there.
Wolf:It's very interesting because the idea of a hierarchical file system, which came from here, uh so dominates the thinking that if you ever see something else, you're like, what the hell is that? For instance, uh inside an iPhone, it's this. It's a hierarchical file system, just like any other, it's essentially a Unix underneath. But I worked on the Newton a long time ago. The Newton didn't have a file system. The Newton had um uh what we today would call a database, but what they called uh the soup. And uh it wasn't hierarchical. It was pages that you didn't know about. You didn't notice the pages, but you fetched things uh in much the same way that you would in a database. Uh you j y but everything is hierarchical today because of this.
Jim:Cool. Uh I uh you know I'm so used to the the the file systems of Unix. I I I just don't think of anything else, right? That's just second nature for me. Uh use it all the time. Um POSIX specifies the characters that you can use in a file name. Um and it's it's a small set of characters, and it might surprise you. Uh it's A through Z, upper and lower case, um uh the underscore, the dot, and a hyphen. And that's it. Uh and we see files all the time that, you know, because users, you know, they're gonna name the file whatever they want to. Uh, and it's gonna have special characters. Particularly, I see things like parentheses in the file names. Uh that's not POSIX compliant when they do that. And the space. The space is like the number one thing people stick in there. Space is not one of the valid characters for a file name according to POSIX. Uh, but we see it all the time. So not everything we do is POSIX compliant, right? Um another important rule is uppercase and lowercase letters shall retain their unique identities, um, which is unlike uh mac OS's default. Uh we've talked about that before. Files uh in a Mac OS file system, uh, if they are uh uppercase or lowercase, it doesn't matter. Uh you can access it either way. And some may say, well, that's nice and easy, but it's not always uh uh good. Yeah.
Wolf:Uh great for humans, bad for programmers.
Jim:Yeah. Yeah. Especially if you try to uh uh clone a uh Git repo uh that was created in another system where uh case mattered. Uh you can get yourself.
Wolf:And they leveraged that capability and had two files whose name only differed by case.
Jim:Right, right. Right. Um, you know, the file times. Uh POSIX specifies that you gotta have a time of last access, a time of last data modification, and a time of file status last changed. I think that's primarily used for like if the file's been backed up with one of the old backup utilities or something. Um but the uh those timestamps have to be there in the file system. Uh and the the resolution of the timestamps, it's implement implementation uh dependent. It's POSIX doesn't say what the uh what what the resolution has to be other than it can't be any coarser than one second intervals. Most file systems now or most systems we use now have timestamps that go to the the the six decimal places, right? Um anyway. POSIX says it it has to at least be to the second. Uh which is I think kind of weird. Well uh Microsoft is is n is not uh uh POSIX compliant, but I think at least MS DOS and Windows, I think they were to the two second uh timestamp on on files, um which was close enough for them, right? Um I think they went two seconds because they could store uh uh the the range was twice as large. Uh if they only had to and it I anyway, uh I I'm getting into the weeds now. Um POSIX specifies some networking stuff, some of the system calls you gotta have, or library calls you have to have. Um and network byte order. That's uh that's important, right? Um uh and you know what if you don't know what network byte order is, um it's the order that the bytes go across the wire, right? Uh and and uh network byte order is the standard thing. Uh lowest addressed octet holding the most significant bits. So the numbers are kind of in the order humans I think would expect. Uh a lot of CPUs don't expect that. They gotta flip the bits around the bytes around in order for them to work on the CPU. Um anyway, that's important. Network byte order is important for things like port numbers and IP addresses. If you're communicating with another machine that has a different machine uh byte order, uh network byte order will save you. Uh you have to do it, right? It's not just a good idea, it's the law. Um regular expressions are a POSIX thing. We talked about those in the last episode. Um and here's something that I learned. I've been using regular expressions for a long, long time. Um, but there's different levels of expressions. Uh you know, the the shell, the early on version was the BRE, the basic regular expressions. Um and that requires escaping the uh parentheses and the and the uh uh ellipsis, the uh uh squiggle brace, whatever whatever you want to call it, uh you have to escape those to use them. Uh the ERE, the extended regular expressions, um those uh added things like the question mark and the plus sign. So if you want to say um uh you know what would it be? Dot uh uh dot plus or or just like a plus would be uh one or more of the letter A. That's part of the extended regular expressions, didn't exist in the BRE. Uh and then we have the Perl compatible regular expressions, the PCREs, there's a library for that that so many things use it now. And I gotta say, you know, I'm I'm a Perl programmer, I've used Pearl for years and years and years. But I think the best thing out of Pearl, and I think you'll agree with me, is the regular expressions. Pearl has awesome regular expression support.
Wolf:Um absolutely, I agree with you. Regular expressions are um I don't know uh what they thought regular expressions were gonna be when they were writing when Larry Wall was writing Pearl. I don't know how much he thought regular expressions were gonna be part of the answer, but regular expressions are the answer to so many problems. If you don't know what a regular Expression is how to use it or what tools you can use to take your regular expression and apply it to your problem, you have missed an entire uh dimension of problem solving. Go go learn those things right now.
Jim:Yeah, and the fact that uh that Perl really sort of took them way out to the edge. And and like Python uses uh Perl regular regular expressions, JavaScript can use Perl regular expressions. Uh it's it it's uh some good stuff. Anyway, if you're in the shell, you're likely going to use regular expressions when you're using grep. Uh um and here's something here's the thing I learned, okay? If you just type uh grep and you search for something, you're using the basic regular expressions. You're not that's the lowest of regular expressions.
Wolf:That's why grep should always alias to grep minus e.
Jim:Uh yeah, then grep minus e is for uh the extended regular expressions, and grep minus p is for the PCRE, the Perl uh compatible regular expressions. So if you're using grep and uh uh you want to do regular expressions, you you're probably gonna want dash E or dash P if you're doing anything fancy. Uh most of the time I don't specify anything and it just uses basic and it and it works for me. Um I want to interject.
Wolf:Yeah. Um so there it's Perl regular expressions are they're in the language. You don't need a special library. They're there. You're using them all the time. But in Python, uh you have to include the RE library, you have to import it. Um and you will you'll define, you'll use a string to define a regular expression. And uh normally uh you'll use that in a place where there is a pattern argument, or you'll compile it into a pattern um so that you can apply it many, many, many, many, many times. The important thing to know is in Python there's lots of different little tiny prefixes you can put on your string. Like, for instance, you can say uh F some string, and that means, oh, this is a string where it'll evaluate expressions inside of it. Every time you define a regular expression in Python, that should be an R string. R for raw. Because when you define a raw string, every character inside the string is what it is. It doesn't get evaluated or turned into anything. If you have to put a slash in front of something in a regular string, like slash n in a r in a string without an R means the new line. But slash N in a raw string means the slash and then an N. Uh and then if you hand that to a regular expression compiler or whatever, it says, oh slash n, that means something to me. When it never even would have seen the slash n if you hadn't made that be a raw string. So the whole reason to talk about this is because you don't have raw strings in the shell. So you gotta watch out.
Jim:Right, right. And I know this is not an episode about Python, but I'm curious when you say you start your string with a special character, like an R or an F, is that like F double quote, and then your string and then double quote? Yes. Okay. Yeah.
Wolf:There's actually a whole bunch of different kinds of quotes, but what you said is exactly right.
Jim:Yeah, okay. Uh uh F quote something end quote or R quote. I'm not a Python programmer. I've dabbled with it, uh, and I have seen that, and I was thinking that's that's what you were talking about. Um anyway, so yeah, regular expressions are an important part of POSIX. Uh you have to have some support, and I think you even have to have Perl compatible regular expression uh support. Um processes and process groups. Uh, you know, every time you run a program in in the shell or or on the system, it gets a process ID. That's a POSIX uh requirement. Uh if your program spawns other programs, other executables, um those also get a process ID. And all of those together are part of a process group. And that gives you some interesting things like you can kill the parent and the child child should die, as long as the child isn't uh uh um hasn't detached itself or isn't ignoring signals of various types. But uh process groups is that. And if you're like logged into a shell and you're doing things and you get disconnected for whatever reason, uh your shell will die and all the processes it was running should die. That's because of process groups, and there's a whole bunch of functions that are POSIX compliant that let you play with those process things. Like you can you can uh create a process, you can create a new process group and uh separate it from the existing group so that you can start something running in the background and leave it in the background while you uh kill the foreground. Um that's that's processes and process groups. Uh threading, you know, p-threads. I've not done a lot of I haven't done any threading threaded coding. Um some of the tools I use do threading for me, but I haven't sat down and written a C program that actually does threading. Um but that's all POSIX compliant. And there's various libraries for doing threadings. P-threads is one of them. That's the POSIX one. Uh Wolf, have you done much threaded programming?
Wolf:Uh so there's this thing about running two pieces of code um that don't interfere with each other. Maybe that means at the same time, maybe it means switching back and forth between them. And there's a bunch of different words we use processes, threading, multitasking, async and await. Um those all lead to the same kind of thing, but with different specifics. Um and Python, the versions that I'm allowed to use, uh, don't get to use threads. Yeah. Because not really. I mean, there is threading, but the problem is there is a thing called the global interpreter lock, the gill, the G I L. And um usually uh you need the gill to make new objects. There's a lot of reference counting, and you have to have the gill to be able, it's like holding the speaker's conch, the shell, so you you're the one who gets to talk. The gill is a mutex. Uh only one person gets to actually execute code at a time. So threading doesn't help you, but multi-processing does, and that turns out to be much more expensive than threading. Very annoying. P threads are great. I have used p threads in uh C programs, um, but not in Python. But now uh there's free threaded Python. I'm super looking forward to getting to use that.
Jim:Oh, that sounds neat. Um uh I think in an upcoming episode, maybe several weeks out, we're gonna be talking about threading and multiprocessing um in a little more depth. So that should be interesting. Anyway, yeah, P threads, POSIX threads. Um one more thing about uh uh POSIX that it specifies. There's there's others, but uh the the last one I'll really talk about is time, Unix time, right? Uh POSIX says uh that time has to be seconds since the epoch. They don't say when the epoch is, they just say it has to be seconds since the epoch. And for Unix, um that's uh January 1st, 1970. So uh we are, I don't know, three billion seconds since uh um January 1st, uh 1970. Uh I don't know where we're at in that. Uh I know that it's going to become a problem in 2038 when we get just over 4 billion seconds, because a lot of places where time is stored is a 32-bit um uh integer, and that maxes out at just over uh 4 billion. Uh so things are if you thought y2k was crazy, this is crazier and probably harder for a lot of people to understand. In 2038, we're gonna run into a problem. Anyway uh uh POSIX says time uh has to be in number of seconds since an epoch. Um that's that's kind of it for what's required in POSIX. Um, I I want to do point out again, I mentioned it earlier that POSIX compliance does not mean binary compatibility. It's really only at the source level and at the like the shell script level, the system level. It's not uh binary compatibility. Um so let's talk a little bit about uh POSIX compliant operating systems. Um I I I mentioned uh Mac OS. Mac OS is certified POSIX compliant. Um but it's kind of funny because there's some things in Mac OS that clearly are not POSIX compliant. Um embrace and extend. Yeah, embrace it extend, particularly um the file system. The default file system that you use in in Mac is is not uh POSIX compliant because uh file names are case insensitive. That's a problem. Um they uh Apple did include an option to disable that. Um so you can make your system POSIX compliant. Uh the shell that they use is not POSIX compliant. Um anyway, they are formally certified as POSIX compliant. Uh Linux is largely POSIX compliant, but it's not formally certified. There is a certified version of Linux called Euler OS. It's E-U-L-E-R-O-S. It's from the Huawei company Euler? Uh Euler? Okay. Yeah, Euler OS. Which I actually never heard of. Um I haven't heard of this either. Uh I am familiar with Euler though, uh, and I don't know why I called it Euler. Anyway, uh thanks for straightening that out. Uh that's from uh Huawei, uh the Chinese company that makes routers and various things that I think. Absolutely trustworthy. No question. 100%. I think those routers are banned in the U.S., or at least uh for for U.S. government jobs. Uh anyway, they have a POSIX compliant uh implementation of Linux that has been certified. None of the other big names have gone down the route of certifying. It's a lot of work and it costs money. And you know, if they could just say, yeah, we're POSIX compatible, that's probably good enough. They're just not compliant. Um, Windows is not POSIX compliant. Uh back in Windows uh NT, uh they made that POSIX compliant. I remember reading, uh, I think his name was Dave Cutler, the engineer that created uh Windows NT, the kernel and stuff. He kind of modeled it after some stuff they did at DEC. Uh he uh he wrote a book on it, and I read that book, and he his goal was to have it POSIX compliant. Uh and I think they ended up with a POSIX compliant subsystem that you could use to do POSIX things. Um, but it it I think NT 4.0 is the end of uh of that subsystem. Now, you know, if you want a POSIX uh thing in Windows, you run um uh Windows subsystem for Linux, uh, which is basically Linux. And even that, like I said, it's not it's not certified POSIX compliant. It's it's compatible though. It's good enough. If you want to run your POSIX things, if you want to run a shell script that was written to be POSIX compliant, it'll almost certainly run on Windows subsystem for Linux.
Wolf:If you install the right Linux on top of WSL.
Jim:Yeah, yeah. Although is there a wrong Linux that uh that wouldn't be compliant, uh can't compatible? I don't know. Anyway, um so yeah, Windows is not compliant. Um yeah. There's not that many systems out there that are POSIX compliant. It used to be a bigger thing than it is now. Uh but so let's talk about languages.
Wolf:Okay, now earlier you said you started with C is part of the POSIX standards, but then you corrected yourself and you said it's not uh that C is itself POSIX compliant. It's that C uses the right headers and libraries and error codes and stuff like that. Uh so the question I immediately wanted to ask when I was looking through the outline is uh is Python POSIX compliant? But maybe that's the wrong question. Is it? What's the what's the story?
Jim:No, it's not, and it shouldn't be. It's out of the scope of POSIX compliance. Uh you can use Python in a POSIX system. Um POSIX is not required. I'm sorry, Python is not required in a POSIX system. So if you run your if you write your program in Python, you're not guaranteed to be able to run it on another system. But if you install Python on that other system, you're and you write it in a way that's POSIX compatible. Uh you're you're likely going to be just fine. The language itself is not POSIX compliant. Um you use Python, I use Perl. Uh Perl's not POSIX compliant either, but it does have a POSIC module. Uh if you say use POSIX, uh it gives you access to a whole bunch of library calls that are real POSIX uh compliant library calls. It's basically the the same ones that you would use when you're in C. You know, things like open and close and create and and um did you say create?
Wolf:I bet you actually meant create.
Jim:Create. Yes, create, yeah. I I don't know. Those old Unix programmers they wanted to save every character they could, so they they called the create uh uh library call C-R-E-A-T. They chopped up that button and just made it so much better. Yeah, 20% better. Yeah, yeah. Creot. Um uh anyway, uh by importing the POSIX module, you get access to all those things. And things like uh process IDs, process you know, you can create a process process group. I've done all that stuff in Perl, and it works when you load the POSIX module. And I imagine Python has a similar uh module you can import.
Wolf:Um it it might. I actually have never thought about it or investigated because I have I'm gonna say this in a mean way. I I have never cared.
Jim:I think when we get down to the uh the the takeaways, uh you're gonna find that I agree with you. All right, so let's let's uh let's wrap this thing up. Uh we're we're already longer than I expected this one to be. Uh, but let's talk about uh what we originally set out to talk about, and that was um um why did yeah, why did Apple drop bash when it was POSIX compliant and they adopted Z Shell when it's not? Uh and it turns out bash is not POSIX compliant. Bash has a POSIX compliant mode. You need to enable it if you want your POSIX, if you want your bash to be POSIX compliant. Um if you run the shell with the name SH, if you invoke bash with the name of SH. So if you create a sim link uh to it uh and you give the name of that simlink as SH uh and you run the shell, it's gonna be in POSIX compliant mode. Uh if you have an environment variable POSIXLY, uh this is hard to say, POSIXLY underscore correct. If that is set to any value, then when your shell is running, it's gonna be in POSIX compliant mode. Uh if you run bash with a dash dash POSIX option, uh you're you're you're in POSIX mode. Uh if you're in your bash script and you type uh set uh you use set space minus oh space POSIX, you'll be in POSIX compliant mode. And what that really means is it takes away all the things you love about bash. It basically gives you a good old sh uh shell without all the things that make bash great. Like the compound commands. You mentioned a double bracket, uh starting starting something with a double bracket, you lose that. Um the arithmetic for loop, uh process substitution. I I've got a list of like 12 things that you miss out on when you turn off uh POSIX compliant, or when you turn on POSIX compliant in bash. Um so our original argument about uh Apple chose it um uh uh and and why did they choose it? Because Z Shell is not POSIX compliant. Well, neither's Bash.
Wolf:So and annoyingly, the version of Bash, because it still comes on the system, it's just not the shell that you will get by default. The version they ship is in the threes somewhere. Uh I do run both bash and Z shell uh on my Mac, and the bash I run I got from Homebrew. So that's five.
Jim:So you run five something, yeah. I I'm gonna speculate that Apple keeps Bash around uh simply because it can give them POSIX compliance when you want it, um and they have no interest in moving up to version five. It doesn't give them anything. They want you to use Z Shell. That's all. That's my guess. I don't know. So so that's kind of POSIX in a nutshell. We kind of moved fast, talked about a lot of different things. Uh I do have some takeaways. Um if you want to write a portable application, yeah. If you want to write a portable application that will work across all POSIX compliant systems, you got to be careful to only use POSIX things. Um options that you pass to our to utilities, um the shell that you use, uh, all those things matter if you want to write a portable thing that'll run on all uh POSIX operating systems. Um POSIX does not mean binary compatibility. I mentioned that a couple of times. And finally, POSIX compliance is probably less important now than it used to be. I don't think people really care anymore. Um I don't know. Maybe it's sad, maybe it's not. We we all get our work done and it's working pretty well. Uh so does it matter? Uh if you think it matters, send us some feedback. We'd love to hear from you. So, Wolf, I I I I I think I've covered all I wanted to cover.
Wolf:Uh and I think you did a damn fine job. Well, I I definitely learned stuff about POSIX. Um, and and then uh with all that new knowledge, I got to the end and I thought, holy crap, that's some stuff I maybe didn't need to know as much as I thought I did.
Jim:Well, it's a deep dive in a in a lunchtime conversation. How's that?
Wolf:Uh I absolutely think that's a good description of it. Uh I I feel educated. Some of that stuff is super useful. Um if I had to, as an outsider, look at this, I would say one of the most important things you talked about was regular expressions. Um, I think that's absolutely important. Um and I will say that if you are in the situation where you want to write something that's gonna run in a whole bunch of different places, and you've decided POSIX is the characteristic that describes that portability, then run it under SH. You can test locally. You'll know when you've used the wrong thing. Yeah, you're gonna have to look up some reference pages, but uh I think this was a great episode. Um I absolutely uh felt like you gave a good description. Uh, I learned a lot. Uh I want to thank everybody who listened, um, especially if you got all the way down to where we are here. Um, absolutely send us feedback. We need more feedback. We want to be saying the right things, and we want to have our ladder leaned up against the right wall so that not only are we saying right things, but they're the things that are right for you. So tell us. Uh, send that feedback to feedback at runtimearguments.fm. And uh if you uh go to the website http colon slash slash runtimearguments.fm, you'll see the show, you'll see the episodes, you'll see how to contact Jim or me individually. Um Thank you so much for listening. Jim, final thoughts? Um no.
Jim:I think uh uh I think I'm kind of worn out at this point. Uh this was kind of fun to talk about. I think you're probably gonna give me a hard time because I pushed you really hard on the last episode to keep it short, and it looked like it was gonna be too long, and this episode is longer. So I expect uh hard time uh to come my way.
Wolf:Uh but you'll be gone, so it'll be a hard time only by messages.
Jim:Yeah.
Wolf:All right. All right. Thanks, everybody. Bye bye. Thank you. Bye bye.
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.
CoRecursive: Coding Stories
Adam Gordon Bell - Software Developer
Two's Complement
Ben Rady and Matt GodboltAccidental Tech Podcast
Marco Arment, Casey Liss, John Siracusa
Python Bytes
Michael Kennedy and Brian Okken