Runtime Arguments

21: Everything Is A Database Problem

Jim McQuillan & Wolf

Jim brings his vast knowledge of Databases and dives into the history, some theory, some best practices and some choices you can make.

Links:

Claude best practices guide - https://code.claude.com/docs/en/best-practices

Passkeys Server and Client - https://github.com/Runtime-Arguments/passkeys-demo

Webassembly History - https://bytecodealliance.org/articles/ten-years-of-webassembly-a-retrospective

E.F.Codd - Relational Model of Data for Large Data Banks- https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf - E.F.Codd

Lessons about rewriting: https://en.wikipedia.org/wiki/Ship_of_Theseus

Hosts:
Jim McQuillan can be reached at jam@RuntimeArguments.fm
Wolf can be reached at wolf@RuntimeArguments.fm

Follow us on Mastodon: @RuntimeArguments@hachyderm.io

If you have feedback for us, please send it to feedback@RuntimeArguments.fm

Checkout our webpage at http://RuntimeArguments.fm

Theme music:
Dawn by nuer self, from the album Digital Sky

Wolf:

Hello, everybody. Um, it is time for another episode of Runtime Arguments. I am Wolf, and as always, I'm joined by my best friend Jim. Jim, say hi. Hey Wolf, how you doing? I'm doing good. Um This is gonna be an episode where Jim happens to be an expert, not just research, but he really is an expert. Um it's gonna be about databases. This is our episode number 21, uh, which means it's our 22nd episode, because we started with zero. Um like you're supposed to. Like you are supposed to. Thank you very much, Lua. I don't know what's going on there. Um and MATLAB, by the way, starts at one. What the hell is wrong with MATLAB? What's up with that? Anyway. Uh well, Cobalt.

SPEAKER_01:

There is stuff as well.

Wolf:

Oh, I didn't remember that. There is stuff we normally say at the beginning, like uh that we have a website to go to, runtimearguments.fm, that lists the episodes, and you can send us feedback. There's gonna be show notes and transcriptions. Um we'll talk about that at the end. Uh in the meantime, uh let's start off with uh how's your week? How was your week, Jim?

Jim:

Oh gosh, I've had a pretty good week. Um I I I may have mentioned it before. I do all of my development remotely. I I've got this big fancy Mac studio with lots of RAM and CPU, and I use it as an SSH uh terminal to log into machines located elsewhere. Uh I'm finally uh gonna move uh as much development locally as I can. Uh our our in my day job, we have this giant uh web-based application. It's a lot of uh Perl and a lot of JavaScript and and uh various other things. And um uh it relies on pat Apache and HAProxy and many, many other technologies. And I've always been hesitant to bring it home because I'm on a Mac, it's running a uh M1 chip. Yes, I've got one of the older ones, uh, but it's it's an ARM-based chip, not an Intel. So I've been hesitant to try to do this. Plus, I didn't want to pollute my machine with all the stuff that I need for development. Uh, and then I did some serious thinking. We're using Docker to deploy for our customers. Um, so I thought, why not use Docker on my desktop and and do it that way so I don't have to load up my machine with all kinds of Perl libraries and and Postgres stuff and everything. Why don't I just do it all in a in a Docker container? So I started setting that up and I stumbled upon something that I am just so amazed at. Uh, and that is Docker lets you create an image that's multi-architecture. So you you create a Docker file, that's how you you build the image, uh, and in that Docker file you list all the things that you need. Uh, and then when you run it, you I use Docker build X with a bunch of arguments. There's an argument you can pass on that dash dash platforms, and you list the platforms you want. And in my case, I want x86 and ARM. So I list that, and it's amazing because it builds the image for both. In fact, it does it in parallel, so it's kind of neat. Now I'm on I'm on the ARM chip, so it builds the ARM stuff really, really quickly, but it it builds the x86 stuff fairly slowly because it's doing some emulation. I don't know if it's using X uh QEMU or or what, but that's kind of slow, but that's all right. Uh it it builds these images, and and when I push those up to a registry like uh GitHub, I use GitHub as a registry, and I also run my own local registry. Um when I push it to my registry, it pushes the whole image up. But when I run it, it only pulls down the the layers that it needs for that architecture. So the the the pull is fast and it's really nice. And this thing is working perfectly. I just I just love it. So I'm starting to do development locally. Now, uh uh Wolf, uh you and I had the conversation this morning about the issue I ran into and and the way I develop with Docker. We're we're gonna do an episode on Docker at some point, but um you know Docker can the Docker container is all of the tools and stuff that I need, but I don't have the source code in there. I mount the source code in a directory, it's in a directory, I mount that as a volume to Docker. So it sits outside of the Docker container. Uh, but the Docker container can see it because I mounted it. I ran into a problem. I I know this is getting kind of long, right? Uh I ran into a problem in that. It was I was making changes to my source code. Um, but my my application wasn't seeing those changes. Uh you know, I I would hit a CGI script and it wasn't getting the right script. And sometimes it looked like a corrupt file, even though I look uh you know locally and the files are just fine. Well, it turns out uh bind mounting doesn't actually exist on Mac. Uh it's a great thing for Linux. You bind mount a volume to another location, and it's the same data, just with two different references to it. Um inside the Docker container, you see you see the data, outside the container, you see the data. It's all the same data. Well, you can't do that on Mac because Mac doesn't have bind mounts in the kernel. Uh, what they have to do then is synchronize the data. Uh, it actually copies the data for use by Docker. Um, and and you get out of sync to the point where I would have to uh restart the Docker engine in order to pick up my changes. That became unworkable. Um, but I did find a trick, it cost money. I actually had to upgrade and start using the Docker Pro um uh subscription, which gives me some kind of fancy schmancy synchronization, and that's working so far. I've only been doing it for eight hours, but it's working, and I'm happy, so I'm developing locally. This is going to be uh a whole different way for me to work. Uh and and the nice thing is I won't be relying on the internet anymore uh or my customer sites or or uh anything. Uh so that's kind of neat. So yeah, in a long-winded way, that was like five minutes of me talking about Docker.

Wolf:

To me, the amazing part about that is not you moving from remote development to local. It is stuff you didn't say. It is a Vim user who uses VS Code additionally. It's not like you switched to VS Code, but Pro code. Now you're using it.

Jim:

Yeah. Yeah, I am.

Wolf:

I am.

Jim:

That's a whole world of difference for me.

Wolf:

You using um a bunch of AI stuff. So lots of changes for you.

Jim:

Uh it's it's overwhelming and fun and and uh it's kind of neat. And uh you you didn't think I could do this, did you?

Wolf:

I was based on the All I can say is watching you grow in this way absolutely warms my heart. And I'm gonna say the words that when I say them to my wife, uh, because she names something uh that is like what what it's already like for me, and I say these words, she becomes furious. Uh I'm gonna say them to you with the hope they don't make you furious.

Jim:

Okay.

Wolf:

Welcome to my world.

Jim:

Thank you.

Wolf:

You're you're you're you're welcoming me with open arms. I am. So how was your week? Let me tell you, let me tell you about my week. Um lots and lots of stuff happened this week. Uh one of the things is um Jim pointed me to some uh Claude Best Practices documents on Anthropics site. And a consequence of that, first of all, I try to get better every day.

Jim:

Whatever things that you do, you are so good at that. I tend to stick with what works and don't venture out much.

Wolf:

So um I'm always trying to reduce friction. I'm always trying to figure out what's the smarter way to do it than what I'm doing today. Can I do it that way tomorrow? And uh I got pointed to this best practices for for a Claude Code document. Jim, you pointed me. And I have slimmed down my Claude Code files. I have taken things that I don't need Claude to keep in mind because that's the that's the thing that's a limited resource for Claude. When you're working with Claude, there's a thing called the context. And the context is everything it's got in mind solving your problem right now. Um it's it's all in one session, and when it gets to as you develop more and more, Claude slows down. Eventually you have to clear it. Claude sort of forgets things unless you tell it to remember. Things get pushed off the end. So your Claude Your Claude MD files, um, they're they're in memory, they're in your context the whole time. So they need to be super valuable and super slim. And if there's stuff you only need Claude to know about and understand for just a moment, like for instance, when you start up a new task or close down an old task, uh, for instance, uh, that it needs to update your work log. That's a thing I do, that you want it to um uh uh commit and push or uh figure out what things might be. Any anything that you don't need Claude to keep in mind the whole time, instead of making that be part of your Claude MD, you make it into a skill. And skills are things that you invoke by hand with a name. Uh there's built-in skills, or I don't even know if they're still called skills when they're built in, but for instance, in the latest Claude, you can say slash insights, and it will tell you about all the things you've ever done with Claude Code and whether you succeeded or failed and how you can be better. It does that with a web page, it's really great. But I have um new skills for moving between tasks and remembering what I've done so far, things like that. Um, and I've pulled out the specific languages that I work with. I work a lot with Python, I work a little bit with Bash, and I'm learning Rust and not giving it nearly enough time. It doesn't need to remember those things all the time. For instance, at work, I'm only using Python. It doesn't need the other two at all. So why put these in my Claude.md file? Don't. Um, but also it doesn't need to know the things I care about with Python, like for instance to always use type annotations and stuff like that, until I actually want it to maybe write some Python. So in my ClaudeMD file, I say, and if you're going to write some Python, go look at this language file.

SPEAKER_01:

Oh neat.

Wolf:

That way, not in my context, giving me more and more room. So I'm getting better and better and better at um reducing friction in the places where Claude can help me. Um, and mostly that is about um context switching, my contexts, not Claude's context. Moving from one task to another. Not nearly as much about, hey, Claude, write some code for me as everybody wants. Everybody's all about, oh, Claude can write code for me. Claude, at least right now, Claude does not write code that is good enough for me. It's good enough for tests, right? But it's not good enough for you know the principal code. Um always read. Always read what Claude is doing, always know what it what it means. Make sure you understand. Um but that's what I did during my my uh cool.

Jim:

I I I've got something I want to say. We uh we've mentioned this many, many times. This whole podcast came out of Wolf and I meeting for lunch every Saturday at a sushi place uh near Ann Arbor. Uh so we're still doing that. And we've uh started inviting more friends. It used to be just me and Wolf for years, it was just the two of us. Uh, but now it's it's become a group. There's like at least four of us there every week. And uh if you're in the area, please come come to Biwaco uh uh in Saline, uh, 11 o'clock on Saturday mornings. Um when we were there last week, uh uh one of the things that Wolf says a lot, and it's kind of funny, is he says he talks too much, right? We we would you just say that's accurate, Wolf. You say you talk too much. And I'm gonna go on record here and say, yeah, Wolf talks a lot, but you know, when he does talk, it's always good information. I I've been around a lot of people, or at least a handful of people, who talk and talk and talk. They talk too much, and I walk away from the conversation without any additional knowledge that I had going in. With Wolf, he he talks a lot. I'm not gonna say it's too much, because when I walk away from that conversation, I've learned something. Uh, and if you join us for lunch, you can learn too. And we'll learn, we'll learn from you, whether you want to or not. But it's always uh uh enjoyable uh to to especially when Wolf uh talks about something he's passionate about. Uh it's a lot of fun. And I do have a lot of passions, yeah. So he does not talk too much, he just talks a lot, but it's good talk. So yeah. It is too loud though. It can get loud. Uh the more passionate you are about, the louder it gets. But again, did you ever see me complain about it? No. No. It's it's fun. It's fun to watch him get wound up.

Wolf:

Um, you know what it's time for, though? Uh feedback? It is time for feedback, and uh, I know you've got a couple things, but let me start. Um last episode, we were talking about Git. Yes, and I I misspoke. I said something, and I immediately knew it was wrong, and I beat myself up about it afterwards when I re-listened to the episode, and that was this. Um before the days of Git and Mercurial, um, it was the case that uh in Git and Mercurial, the central repo is exactly what you have on your local machine, um, except maybe on your local machine you also have source files, um, not just a database of objects. Um in the old days, the thing that you had on your machine was source files, but it couldn't be used as a database. You couldn't go back in time. Anytime you wanted to find out about histories or previous versions or whatever, you needed a network connection and you had to get to the central repo uh repo. Now I talked about what was actually at that central repo, and I said something very close to these words. I said CVS stores um the original source file in a way where it is the text plus um hunks of changes that will get you back and forth between the different versions, but all in one file. And the thing that was uh a mistake about the words that I used there are that I said CVS when what I should have said was RCS. RCS uh, of course, was one of the earlier the revision control system. Um CVS uses uh uh hidden history directories for each uh directory that you're in uh that gets you the changes, and the source file that you that is actually there is the latest version. So it's easy to get the current thing, and you can go back in time if you want. Um so that was my mistake. I said a thing with CVS, but it was really RCS. That's that was feedback from me.

Jim:

From the last episode, episode 20, uh on uh Git. What do you got, Jim? So I've got a couple of things. Um episode three, way back in May, we talked about WebAssembly, uh WASM. Um I just came across a really good article just last week about the history of it. Um and I I I thought it was really interesting. The the the way the browser developers all got together uh and pushed for it, um, without management really knowing what was going on. Uh uh the the Microsoft people uh working on uh their browser and trying to include WASM and the Mozilla people and the Chrome people, um, they all sort of colluded together, and Apple also with Safari, they all colluded together and they all told their management that the other guys are doing it. And I thought it was pretty funny. That's how they got it by management because you know, try to tell management you want to do something that that that it's gonna take a while and um you're not really sure about the benefit yet. Uh you might get pushback from management, but they just told the management, yeah, the other guys are doing it. We have to do it. Uh anyway, there's a really good article. I'm including the link to that article in the show notes so that uh you can go read the article as well. I thought it was kind of neat. Uh, the second piece of feedback I have is um uh it's sort of related to the our very first episode, episode zero, pass keys. Um, uh, you know, I'm I'm venturing into the world of uh of AI and I'm using Claude Code. And uh our good friend Marlon suggested that I ask Claude Code to write me a pass key server and client. Uh and of course, for me the server is in Perl. For Wolf, it would have been in Python, I'm sure. Uh created the server in Perl and the client in JavaScript. Obviously, JavaScript for the client makes sense, right? So I did that, and 10 minutes later, I had a working implementation of a pass key setup uh where I can take that server code and incorporate it into my web app and the client code. Of course, I gotta incl incorporate that into the into the um uh uh the the client side of my app. But had I had to go searching for that and and piece together all the little pieces to make that work, I I would still be doing it two weeks later. Uh, I was just really impressed, and thanks to Marlon for suggesting that it was a really nice exercise. Uh and to top it off, I took that code that I generated uh and I put it into a repo, and you can find it on our GitHub repository, the runtime arguments GitHub repository. And I've included a link for that in the show notes as well.

Wolf:

I just want to make sure, Jim, when Claude wrote Perl for you. Yes. You did read that Perl, right?

Jim:

Am I supposed to read it? I took a look at it. I'm not incorporating this into my application until I really read it. But just the fact that it worked was pretty amazing. That's as far as I got with it. Uh if I do take that code and use it, I will read it.

Wolf:

A long, long, long time ago, when I was in my early 20s and I was working on my first real application, which was a word processor for the Macintosh, the when the Macintosh looked like a toaster or whatever it was. Yeah, the Fat Mac. That was the one we were writing for. And it was called Full Wright Professional. Yeah. Um, Full Wright Professional, uh, the code name for Full Wright Professional was Danc Bear. And the reason it was called Dancing Bear uh is very strongly related to one of the sentences you just said, and that is this the amazing thing about a dancing bear is not how well it dances, it's that it dances at all. Yeah.

Jim:

That's that's good, and that's kind of where I'm at with that. Before I incorporate any of that code into my system, I'm gonna make sure I understand it. And I quite likely won't integrate it directly as it was written. Uh, this is more of an exercise. In fact, I know that they're using some of the tools that I don't use in my Perl code, but here's a working example that I can steal from now.

Wolf:

All right. With that, uh, let's get to the meat. Um Jim? Yeah.

Jim:

Yeah. Today we're talking about databases. As Wolf uh mentioned, this is something that he calls me an expert. I don't know about that. I do like databases. Um, I I enjoy working with them. Uh, and uh I'm gonna start right off by saying the database that I use professionally for several years now, gosh, for 25 years now is Postgres. Uh so a lot of what I say about databases is gonna be slanted towards Postgres because I don't have the experience with others. I I've played with SQL Server and MySQL, but it's all Postgres for me. But that's not really what we're talking about. We're talking about databases in general. Um a lot of times when I do an episode like this, I go deep into the history and I spend 20 minutes talking about the history of databases. I'm not gonna spend that much time on it. I I will mention that the uh the term database goes back to 1962. Uh uh that's according to the Oxford Dictionary. I would have thought it went back much further than that. Um but uh it it it goes back to 1962, and that happens to coincide with the available of uh hard disks, uh rotating media, um, which again seems like that would go back further. But I guess you know, back in those days they weren't storing on hard disks, they were storing it on on tape and punch cards and and those kinds of things.

Wolf:

But the name database um just reminds me that uh the the older I get, the more time I spend programming, every problem you face is at some level or another a database problem. Maybe it doesn't take Postgres to solve the problem. Maybe it's SQL, uh SQ Lite, maybe it's not even um uh an official database, it's just a special data structure, but it's it's a data problem, it's a database problem. It is anyway. I I didn't mean to interrupt, keep going. A data problem, yes.

Jim:

Uh maybe that's why I I gravitate towards databases because I sort of like how they work and and they they they've proven very, very useful for me. I I I'm fascinated with data in general, and a database is just a huge collection of data. Um gosh. Uh interesting, one of the things I found in the research was uh one of the first databases actually created was the IBM IMS Information Management System. And guess what? They created that for the Apollo program. Remember we talked about uh the key fob episode about is your key fob more powerful than the than the rock than the computers that took a man to the moon. Um uh yeah, that's where that's where IMS came from, the Apollo program, uh running on uh on a system 360. So that was kind of neat. Uh it wasn't until 1970 that a very, very important figure in the world of databases, uh Edgar Cod, uh EF Codd, he wrote a paper uh titled Relational Model for of Data for Large Data Banks. Here he's talking about data banks, we you know, uh databases. Uh he lays out the whole new way of organizing and accessing uh data uh using tables and rows and columns. That sounds familiar, doesn't it? Um queries would join the tables based on relationships between the tables using a set of operations based on the mathematical system of relational calculus. I I don't think of relational calculus calculus when I'm working on a database, but that's underneath the hood, uh the the methods that they used. Um he powered the the the the idea of normalized tables. Uh and if you're in database development, you know about normalization, and that is where you don't repeat data. Uh you you have uh like if you're a classic example, uh a sales order, right? You got the header, which contains who the client is and many other things, and then you got the detail table that contains all the details for the order, the line items for the order. Um that comes out of uh EF COD's uh uh normalization of tables. Get the data in the right place so you don't have to repeat it.

Wolf:

I would love, I would love to be somewhere where I where the database actually was normalized. I don't yeah, I don't know that I've ever seen it.

Jim:

It's uh to me, it's trouble if you try to go for a hundred percent normalization, right? Um, but you should strive for what you can. Um again, you know, think of the the header uh uh detail tables, uh that kind of a thing. Put the data in the right place, uh, and it and it all makes sense. Um by basing this thing on mathematics, um it's all in mathematical terms. So queries can be rewritten and to it to optimize them uh and be proven correct through mathematics. Uh you know, I was a math major in college, and this is still way beyond what I understand. How they prove that a query, a rewritten query is correct. I I don't know, but it it's it's the basis of query optimization, which is huge these days. I mean, the database I work with, Postgres does an awesome job at optimizing the query when I tell it what I want. It figures out the better way to get it. Um, and that's again going back to back to mathematics. Uh, and all of this work was done before SQL was invented. Uh the document, by the way, I do have uh our show notes are going to be very, very busy this time because I have a link to that document from EFC in the show notes. So definitely check that out if you get a chance to take a look at it. Um his paper, by the way, inspired uh a lot of work at a lot of teams at various universities, including uh here's another name that's important in databases, and that's Michael Stonebreaker. Uh he created Ingress, which was a uh a relational database. Um he was at the uh University of uh of Berkeley in California. He created that. Uh he left the university and uh took his Ingress code and created a new project um called Progress. I'm sorry, not Progress, Postgres. Progress is another database. He created a database called Postgres. Um that's the beginnings of the database that we're using today. Um I I thought that was kind of neat. Uh it didn't include SQL right out of the box. Uh that was something that got added a few years later, and that's when they changed the project named up Postgres QL. That's the official name of what I use uh these days. Um in the mid-70s, though, before Michael Stonebreaker did his thing, uh IBM created a database called System R. That was the first implementation of SQL, the structured query language. Um so that's that's important. And I know a lot of people are scared of SQL because I I believe because they don't understand it. Um people shy away from it. Um I I I sort of embrace it. I love the way SQL works. It I can I can write some crazy queries and get that kind of data out that I want. Uh, before SQL though, a lot of people were writing code and they were relying on ISAM and VSAM. You remember those terms? Wolf, you've run across that before, haven't you?

Wolf:

Uh-huh.

Jim:

ISAM uh index sequential access method. Uh there was files with indexes, uh, they were all separate files with different data in different files, and your program had to, for instance, read the sales header record out of one table. Uh, once it had the sales uh header record, then it could go read the details out of another table. And then programmatically, you would piece that data together and produce a work order or a sales order or an invoice or whatever you had. Um, it was a really common way to work uh if you did COBOL. And and I did a lot of COBOL uh back in the day, and it was all using ISAM. Uh VSAM is kind of similar as a virtual storage access method. They created some more access methods, so you could uh read um uh sequentially, uh, you could read indexed, you could read relative. So if you're on one record, you could get to the next record, uh, that kind of a thing. Um and that was kind of neat. Um, but but then you know let's talk about real databases, the kinds of databases that we use today. Um maybe a little hit uh uh a list of some of the old databases first, is the old databases first, and then we'll talk about the the the current ones. Uh but way back when there was IBM DB2. It was a big, big database that a lot of people used. Uh Informix uh database. Do you remember? Have you have you ever played with that? That was the first database.

Wolf:

Not Informex, but the very next one you're gonna do. Okay. If you know, yeah, okay. Well, I'll tell you when you get there.

Jim:

Yeah, okay. Uh Informix, that was the first real production level database that I used back in the late 80s. I wrote a uh uh uh ticket tracking system uh for the company I worked for and for another company. We we both shared it. Um and and that was for you know taking customer uh uh tickets and storing them in the in the database. Um and it used a language called uh Informex 4GL, um which we don't see much of these things anymore. Um the the uh 4GL is fourth generation language. Uh the whole idea was a whole language and database built together so that you had forms and reports and queries and data all in one package, um, so you could write your programs and it would access the data and show it on the screen or print it on paper or whatever. Uh informics was one of the big players in that world. Um uh for uh corporations anyway. Uh let's talk a little bit about the some of the smaller players though. Like like Wolf, you you said you used this D Base. Remember that in the uh I do remember it.

Wolf:

And the reason I remember it is because that word processor I talked about. Yeah. Uh my little company and Arpersoftworks, we got bought by Ashton Tate and moved out to California, and we were in the building with the D base guys, and we um we used D Base, um, learned all about it. It was interesting and new and um Wow. So I had close-up knowledge. That's pretty neat.

Jim:

Um that was uh like a 4GL, right? It had a programming language, it had a forms generation thing, it had a report generator, uh, of course, it had the database. Uh all that stuff built into one thing.

Wolf:

We actually um kind of have something today that is not unlike that. It isn't really one piece, but to a user, to a to the to the engineer, to the programmer, it acts like one piece. I'm dying to know what's Ruby on Rails.

Jim:

Yeah, okay.

SPEAKER_01:

Yeah.

Jim:

But Ruby on Rails doesn't really include the database, does it? Um you had to set up a database.

Wolf:

The whole point of Ruby on Rails is uh that it is the database and the ORM and all the access and the views and like that's the point of Ruby on Rails. That's why people love it. Because it solves that problem that everybody has.

Jim:

Well, that's neat. Um there were some variations to D Base, uh, and they were all really built on uh uh they were like clones of D Base. I think they even worked with D Base files, and that was uh Clipper. Remember that one? Uh Foxbase, Fox Pro. These are some tools I I've never used, but the next tool I used quite a bit, and that was uh Microsoft Access. That was um that was a whole environment. You had a database uh using the Jet database engine, you had the query language, the forms, um everything. Um, and you could write programs that interacted with the screen and saved the data in the database and allowed you to print reports. Um, I did quite a bit of work in that. Um, it was it was kind of neat. In fact, I was playing around with that and I wanted to access the data outside of Access. Uh wanted to get access to it right in a C program, uh, the data in the Access database. And I saw the announcement that Microsoft made about uh ODBC. Remember ODBC? It's still out there. Um, it was very, very early on, and I I I think this might have been predated um uh email even. I think I had to call a phone number because I wanted to find out what it was. I called the phone number, it was Microsoft in Seattle, and the guy was really interested to talk to me, and he didn't know how little I really knew. Uh he he he was all interested, and several days later a package showed up uh on my door, um, which was this thick printed manual, and it was it was not a nicely bound manual, it was like somebody printed it out and put a cover on it and mailed it to me, and uh uh uh some floppy disks with the code on it, and basically anything I needed to get started uh writing C programs to access a JET database or access database. And I was so somebody sent me this.

Wolf:

It's amazing you what you can get when you just ask.

Jim:

Yeah. I just asked, and I was floored that Microsoft was sending me this, didn't charge me anything for it. I think they were looking for people that had interest in this kind of thing and wanted to get it out there. And I I I'm still impressed to this day that that that happened. I thought it was pretty neat. Um anyway, another database that I think it's got its roots in the in the 80s, maybe, and I mentioned it earlier. Uh I misspoke. Progress, the progress database. That's another 4GL. Uh, and I mentioned that because um I have some friends that have done a lot of work in progress. In fact, uh big companies like Quicken Loans, which what are they now? Rocket Mortgage, they're a they're a progress shop, they're using progress uh in their back end, and I find that really, really interesting because it I know my friends like it. I I was never all that impressed with it. I did some work on it and and I I never really cared much for it, but it did have a full language with forms and reports and and all that kind of stuff, and it was pretty neat. Anyway, let's get into some modern database stuff. Um, there's a few terms that I that I'm I'm gonna talk about ahead of time before we talk about the databases themselves. And one of the one of the big ones is acid compliance. Have you seen this? A C I D.

Wolf:

Um I've seen it mentioned. I don't actually know what it means.

Jim:

What are the tasks? Um the letters A C I D. A is for atomicity. Uh transactions are atomic. You you do a transaction and it all works or it doesn't. And if it doesn't work, it's it's never partially committed. Uh when you commit a transaction, the whole transaction, uh, that data gets written to disk. It's part of your database. If anything fails along the way, uh you get a duplicate key violation or some other uh constraint violation, or there's a hardware failure, it won't leave your database in a in a half-written state. Uh that's atomicity. Um the C is for consistency. The data must be valid. You can you can set up constraints on your data. You can say that this field is a date field, and it better contain a valid date, or it's it's not gonna update, right? Um you can have triggers that fire when the when the when when the update happens. Uh a really big one for consistency is referential integrity. Remember, I talked about the example of the sales order where you've got the sales order header and then the detail lines. Uh well, with referential integrity, you would never be able to write a detail line if the header didn't exist. That's if you've set up your foreign keys properly, like you should.

Wolf:

Okay, so now you just said a new word. Yeah, a word that happens to be very important to me, a phrase, and that is foreign keys.

Jim:

Yeah. Well, okay, a foreign key is that the detail table in this case has a field in it as one of its columns uh that is a pointer to the master or the the uh the header table, uh sales order ID. So every detail record has uh as a foreign key setup uh that points to the uh uh header record. So if you're working on order number one, two, three, four, all your detail lines contain that field, uh contain a column with that value in it, one, two, three, four. And then you've add a header record with the ID of one, two, three, four. Uh you can't insert a row that uh points to a uh uh a header that doesn't exist, and you can't delete the header record if you have detail records pointing to it. That's reference referential integrity. It's hard for me to say, but I I I use it all the time. Um that's what what helps keep your data consistent. You won't have uh dangling uh uh detail lines out there with no header. Um it's really quite important. Um the I in ACID is isolation. Uh that is uh really it involves uh locking and concurrency. And here's a big one for you that's part of that MVCC, multiversion currency control. That's a whole uh we could spend hours talking about that. Um, that's this whole idea that when I start a transaction and I select some data for it and I start updating the data, I have a picture of what that data looks like throughout the entire uh processing of that transaction until I finally get to the point where I do my update and I commit my transaction. Uh that data looks the same to me. Other people outside of that transaction, they might query the database, they're not gonna see uh the partially uh uh complete um uh updates that I've done. Like I might in my transaction, I might do inserts and updates and deletes and stuff. But people outside of that transaction, other processes, they don't have any idea that that's going on. They they won't see it at all. They're isolated from me and I'm isolated from them. Until I commit that database, uh that that uh transaction, uh they won't see any of that. Now, once I commit it, if they do a select, they're gonna get the updated database, uh the updated data. Um so that's that's your isolation. And then finally, the D in asset is durability. Uh, once a transaction has been committed, it'll remain committed even in the case of system failure. Um that's really important. When you write data to the database, you want to make sure it's there. Um, if a system uh were to fail, uh you want to make sure that the database management system takes care of you. And uh while the data might not be in the database tables, what the what they do is they write it to a a wall file, a write-ahead log. So this this file is a list of all the transactions that happened on the system. Every single uh insert, update, delete, uh, even the DDL stuff, the the database definition language stuff, like create table.

Wolf:

And it has log in its name, but it's not the kind of log you're thinking of.

Jim:

It's not a log you can go look at. Um it it gets written to the log first before it it does the actual updating of the tables. Um and when it does that write, it does a synchronous write or a uh uh an F sync uh on the on the uh table uh on the file on disk to make sure it's written so that you you don't lose data. If you uh die in a in a crash, when you bring the database back up, it will look at the data and it'll look at the wall file and it'll make sure that all of the things in the wall file have been applied to the database. And if they haven't, it will apply those for you. And it may take a few seconds, it it may take minutes, um, depending on your system and how large your database is. But your data uh has durability, it will be there. And and you know, in my world, I'm I'm in the healthcare industry. Uh when we see a patient and we update their their medical records, we want to make sure that data is there. Uh we need we need confidence that it's there. So that's the durability part. So that's acid. Um so now, Wolf, you know what acid is, right?

Wolf:

We and with a lot of the things you just said um sound like the things that Git brought to the table over CVS. Like, for instance, isolation uh and atomicity. Uh a commit in Git might change 50 files, but when you're looking at the server, somebody else who's not you, even if you're in the middle of a commit and that person isn't, they're either gonna see exactly what it was before your commit, or they're gonna see Exactly what it was after your commit. Right. They're not going to get halfway three of the 50 files, which with CVS, they could have gotten just three of the 50 files.

Jim:

And notice notice how git uses the word commit. Just like Databas. Right? Yep. You do git add, and that's uh like that's like an insert or an update in SQL or a delete, and you do commit and it it commits that data to the to the uh to the you know to the git uh file store, whatever that is. Um another term uh I'm gonna talk very briefly about, and that is uh you've probably seen OLTP and OLAP, right? Most of the databases, in fact, all the databases that I ever work with are OLTP, online transaction processing. Uh keyword there is transaction, uh, and that's that's what makes them uh uh ACID compliant. You you write data in an OLTP database and it's gonna be uh uh uh atomic and consistent and isolated and durable. Um the other one though is OLAP, O L A P. That's online analytical processing. That's what you do uh for um uh if you want to do uh data analysis. You're not doing much in the way of updating that database. You just have a pile of data and you want to analyze it using um whatever tools you have. Um uh that would be an OLAP database. Those are typically very, very, very fast at reading and probably quite slow at writing. Uh, but what you're interested in is the reading. You load up this database, now you just start analyzing the data. That's what an OLAP database is for. Um, and here's maybe the biggest uh uh term we're gonna talk about, and that is SQL, structured query language. That's the whole language, uh including select and insert and update and delete and uh begin transaction and commit and or begin work rather that starts the transaction and commit and and all those rollback. If you decide halfway through updating uh you you don't want to continue, uh you issue a rollback statement and it'll undo everything you've done and nobody will be the wiser. It's just it's like it never happened. Um that's SQL. Um stored procedures. Um that's an interesting one, and that's that's really kind of why I picked Postgres over some of the others uh back when I was choosing it back in the early 2000s. Uh stored procedures are the ability to write little snippets of code and sometimes big snippets of code and store those in the database and call them as functions on your data. Like I've got stored procedures to do some fairly simple things like uh format a phone number. Uh you know, there's an international standard for phone numbers. Uh there's a number that goes along with it, I forget what it is. But a phone number would be something like plus the plus character, a country code in the US it's one, and then an area code and a uh uh uh the seven-digit phone number. Um so I I've got a function that will take a phone number and and print it that way for me. Uh I've got other things for formatting dates and formatting names. I've got a lot of stored procedures that I use. And I just call those in my select list. Like I'll say select um uh format phone number, and then in parentheses, the the phone number column. And then the output of that is going to be a nicely formatted phone number. So let's store.

Wolf:

This immediately gives me a question. Yep. Um and and that is um I love the fact that uh at least with respect to databases that understand SQL, SQL is roughly the same for all of those. Um pretty close.

Jim:

There's little standard body that they all try to comply with, but they all have their own extensions to the language as well, including Postgres. They they've got their own Postgres isms. But stored procedures Those are all over the place. Right?

Wolf:

Yeah, what the what the hell's going on there?

Jim:

Well, there's no real standard to that, I don't think. In fact, there might be a standard, but Postgres has their language, uh PLPG SQL. Uh SQL Server's got uh transact SQL. MySQL has stored procedure language, they don't give it a name. Uh the thing that I find interesting though, at least Postgres and MySQL, um, I think MySQL does this. They allow you to uh use other languages for your stored procedures. Postgres lets you write a stored procedure in Python if you enable that, if you load that language extension.

Wolf:

I like it. That's yeah, you like it already, don't you?

Jim:

Uh I I can write stored procedures procedures in Perl if I want. Um, I I've I've opted never to do that. I just use the tool that they give me. I don't want to go embedding Perl into my database. Um it doesn't seem right to me. Uh and there's a lot of people that say you should never include business logic in your stored procedures. I I'm not sure I agree with that. I think the one thing about stored procedures is it lets you um um set a constraint where the data has to uh uh fit a certain thing. Uh and the stored procedure is that test of of uh uh of the fitness uh which sounds like business logic. It does. It does. Now some people would would implement entire businesses in stored procedures. I I've not gone that far, but I do implement some constraints uh it in there, and I do I do uh stored procedures that like well, let me give you an ex a simple example. Uh every one of our c of our tables there has a created and an updated timestamp in it. So I've got a really simple stored procedure. It's a trigger. I'll talk about that in a sec. But every time I do an insert or an update on a table, it updates those columns for me. An insert's gonna uh gonna fill in the created and the updated column. An update isn't gonna touch the created column, but it's gonna fill in the uh the updated column so that anytime anybody updates data in the database, we know when that row is created and we know uh when it was last updated. That's a stored procedure. Uh I've got others that you you you insert into a table and it might take that data and insert it into other tables, like a log or something. That's closer to business logic.

Wolf:

Um The thing that uh I feel is great about stored procedures is they're sort of three kingdoms. There's the database machine, there's the client machine, and there's the pipe in between the two. And uh the database machine is probably big and fast. And the client machine probably isn't big and fast. And the pipe is the skinniest part of the whole system. Yeah. So anytime you can not send data across the pipe and you can attack that data with the biggest, fastest machine that's available, which by the way, did I say that was the database machine? Because it is. Yeah. Um, that seems better to me. So if there's a place where you can do work and not send the data to the client to do the work, that can be a win.

Jim:

It it could be argued both ways, I think, because if you have a thousand users connecting to the database and they're all pushing data to the database, and then it's causing lots of processing to happen on the database, besides the normal write the data to a file on disk. Um, that might not be a great thing. Yeah, sure, it's a big powerful machine, but a thousand users pushing their data at it and expecting that machine to do all kinds of complex logic, maybe not the best idea when you could just let the clients do that, right?

Wolf:

Yeah, I do hear you. I think the reason my perspective is slightly different from yours is in my world, uh in my database, there's not nearly as much writing.

Jim:

Um, doing a lot of reading and writing, and and and I have to be careful of the the load that I put on the database server. We don't have a problem with it. We do have a big, really fast machine uh for the database, and it performs really, really well. But I I'm I'm uh I recognize the fact that I don't want I don't want to put too much non-data uh non non-file writing stuff in in there. Um but here's another really good reason.

Wolf:

We're both thinking about the same thing. Yeah. I mean, even though we have different constraints, the fact that we are thinking about this problem, that's important.

Jim:

Yeah, and I think I I'm fine with putting some some business logic in there because think about it, you've got a database server and it's it's got the data. It's the one thing in common. You you might have lots of programs out there, lots of clients out there that could be written in all kinds of different languages. Um and the database is where you get to impose the strict checking of the data. Um you know, somebody can write a Perl program to access the database, somebody could write a Python program. Somebody could sit there at PSQL, the command line tool for interacting with the database, and they could type in the SQL to insert a row into this table. All of those things, uh, the one thing they have in common is the database. And if the database is in charge of making sure what data is correct and what data is not, uh I like that idea.

Wolf:

Yeah, in a video game, um there's almost always a server, in a multiplayer one, there's almost always a server. The server has to decide, oh yeah, you shot the guy, he's dead. If the client decides that it's just an invitation to cheating.

SPEAKER_01:

Right.

Jim:

Good point. Uh so I said I'd mention triggers. Triggers are a way to cause code to run when something happens. Think of it as an event. Like an event happens when you insert a row into a table, or when you update a row or you delete a row. Those are all things that can cause a trigger to fire. And all that means is it runs that attached code when that happens. Uh, and you can have triggers that are either row level or statement level. So uh for instance, if I if I uh update a thousand rows in my table, a row level trigger will fire once for each row that gets updated. If I have a statement level trigger, that trigger will run once, no matter how many rows I update. One row, a million rows, it's only gonna fire that thing once. So, you know, think about what you're trying to do in your stored procedure, whether you want row level or statement level. Um next thing that I think is really important. I I hold this one near and dear to my heart, and that is replication. I I love replication. I've talked lots about replicate replication. I've given talks at our user group on it, uh, I've talked to people about it. I can't I can't stress this enough about how important replication is. And that is the ability to have your main database uh with all your data in it, and then have a replica someplace else. Preferably, uh well, I I like having one on site and I like having one remotely. So this is your disaster recovery plan right here. If something were to happen to the main office, uh that alternate site, the remote site, has a replica of the database, and you're good to go. You haven't lost any data. Um, this is not in place of a backup. Backups are still really, really important, and there's always mechanisms to back up a database. And I highly suggest you do that. In fact, I I would really, really encourage that. Uh, but replication I just love because I'm not gonna get a call uh that uh the data, the the the server crashed and the data's gone. You know, what are we gonna do? And a backup is only as good as how often you do it and and how good your backup is. Uh with a replica, though, it's it's at the transaction level. Every transaction that gets committed gets pushed out to the replica. And and that's just that gives me the ability to sleep at night, knowing that I've got a I've got multiple copies of that data out there. Um it's really useful. And there's different kinds of uh replication. Uh the one that I use is Master Slave. I've got one writer that's the main node, um, the main database server. All the data gets written there. And then I've got slave uh replicas that are uh I've got uh in fact, I've got my big customer, I've got two local replicas, and I've got one remote replica. Um that's a master slave. Uh all I can do on those remote replicas is read the database. I can't write to it because there's no mechanism to push the data back. The other type of replication is multi-master. That's where you have two sites or more. You might have 50 sites, and they're all able to you're able to write data to any one of them, and that data gets pushed to all the others.

Wolf:

Um that's which is exactly like our episode about how the blockchain works.

Jim:

Yeah. Yeah, the the replication or the they didn't use the term replication for that.

Wolf:

Uh it was just they didn't, but the idea is that um each blockchain is a thing, and uh when you m make a write to a blockchain that turns out to be the winner, it goes everywhere. Every one of them is just as good as any other.

Jim:

Yeah, yeah. Um and that's neat. I I don't do it that way. I haven't really needed to. I I I'm not I don't have thousands or tens of thousands of writers. I've got a fairly small, constrained set of uh users that can actually write to the database. So write into the master database, that's that's what I need. And the replicas are purely for for uh safety for the durability part of what I'm doing. Um but if we're talking about replicas, we got to talk about the CAP theorem. That's CAP. And of course, it's another acronym that it means something. Um, if you have a distributed data store store, like a replicated database with multi-master capability where anybody can write to any database or uh and the data eventually gets pushed to all the others, uh caps is for three things. The C is for consistency, every read receives the most recent write or an error, no matter which node they connect to. Uh all the data is consistent everywhere. Uh availability. Every request received must result in a response. Um, you know, if you want the the the five nines uh uh reliability mark, you want to make sure your data is always available. Um that's the availability part of it. And then finally, you got partition tolerance. What do you do if the network goes down between the nodes? Right? Uh what when a network failure happens, uh updates to one date one node aren't going to get propagated to the other node, and and vice versa. So now you're no longer consistent.

Wolf:

This immediately makes me ask about a phrase that I have heard many times, um, which seems to relate uh uh to the very first thing you said, consistency, and to this partition tolerance thing, which I think is a horrible name, but like a very important concept. They wanted to and they wanted to create an acronym. I I don't like that. Um But the phrase that I want you to tell me about is eventual consistency. What what is that? Explain.

Jim:

Yeah, that sounds kind of bad, doesn't it? But that's if you have two nodes and and either it's a slow link or or the link goes away for some reason and network failure or or something, uh people are updating both and the data's not consistent, but the promise is it'll eventually be consistent, right? If you're if you're a company like Facebook and you're you're updating somebody's timeline, somebody's typing posts and stuff, if that if that post doesn't arrive at all the other nodes, and believe me, Facebook has a lot of nodes, uh, if that if that doesn't arrive at all the other nodes instantly, it's not that big a deal. Eventual consistency is good enough. Um there's other cases where it's it's not good enough. So the thing about the CAP theorem, uh one of the one of the parts about that I find interesting is um you can't guarantee all three all the time. All three of those things. The consistency, availability, and partition tolerance. You can have any two, but you can't guarantee all three. Uh because if there's a network failure, you're gonna lose out on the partition tolerance. Uh if if the uh a node goes down, uh you're not gonna have availability for that node. Um if you know consistency. Um if somebody's updating one database in one place and it doesn't get to the other place, you're not consistent. So you can't have all three at the same time. You can have any two. You can have consistency and availability, you can have availability and fault tolerance or partition tolerance. You just can't have all three at the same time. You can't guarantee it. In normal operation, you're gonna get all three, right? Uh if there's no errors, you you're gonna be just fine. Um, so let's move on. This is uh, of course, taking longer than I expected. So sorry for that. There's still a few things I want to cover. Uh ORMs. I know, Wolf, you've got some thoughts about object relational mappers. Oh my god. Tell us what that is, first of all.

Wolf:

An object relational mapper is um this idea that SQL is too hard, and that probably however you deal with whatever the native thing is inside your language, so in Python objects, regular old objects, or I guess it's called objects in a lot of different languages, um, that when you connect to a database, instead of you executing SQL and getting back rows and looking into the columns of those rows to see what things you have, what you ought to get back, says the ORM, is things that look just like objects. Um and so I have a sales order and I can ask for the products that are in it. And and those would be, you know, child, whatever, it doesn't matter. The point is you don't know that it's SQL underneath, but it is, and that means you're not writing the queries, some piece of code is, and it also means that probably when you ask for certain things, like for instance, let's say you're using in Python a thing called SQL Alchemy. Um SQL Alchemy defines a thing called a relationship. A relationship looks exactly like a property. Like um maybe you have children, you're a parent and you've got children. Well, children isn't an actual field in you, it's not a column. Um, it didn't get read. What happens is when you ask for children, SQL Alchemy does another query right then, behind your back, totally opaque. You didn't even know it. Please, Jim, tell us about the N plus one problem.

Jim:

We covered that in another episode. Uh it it gets out of control. And the fact that uh uh the ORM is is behaving that way um doesn't give you a chance to optimize the queries the way you might like to. I don't use an RM an ORM. I I've always just written my own SQL and embedded it right in my programs. And you know, some people don't like that, but I have the opportunity now to write queries that are fast. I can do joins in my queries and get back all the data that I know I'm gonna need for whatever it is I'm doing. An ORM sort of sidesteps that and it doesn't give you the chance to tune it.

Wolf:

Um worse, um there's a zillion different ORMs, uh, and not just one per language, not one for Perl and one for Python. There's you know, 10 for Python and 20 for who knows? Yeah, there's only one SQL. If you write it in SQL, you're gonna see the right thing everywhere you look. So I have an opinion, and now you've heard it.

Jim:

Well, I hope I'm not talking down to people to say uh I think ORMs are good for people who don't want to learn or don't know SQL. Um maybe, maybe it makes sense in a lot of cases. Uh uh, it's just not the way I'm going. Uh if if if if you the listener has any feedback on that, please let us know. You might have good reasons why you'd want to use an ORM. Uh, it's just not the direction that I've gone. Uh so let's move on. Um, let's talk a little bit. I haven't even mentioned this thing yet, but we talked about SQL. Let's talk a little bit about uh No SQL. Um No SQL seems to be a movement, right? Um at least it was, I don't know, ten years ago. I don't hear s that much about it anymore. I think it's sort sort of there. And one of the things I was surprised to learn was No SQL doesn't mean no SQL. It means not only SQL. Most of these NoSQL databases provide an SQL interface for you to query the data. So let's talk a little bit about what No SQL is to begin with. You know, we talk about relational databases where you got rows and columns and tables and all that kind of stuff, and there's this relationship between the different tables. NoSQL is really more about storing things that maybe don't have that kind of structure. Think of it as the ones I see all the time store a JSON object. That's what you give it. And you might give it a key or something so that it can find it quickly. But it's just like a JSON document. And you know, if you're writing a uh a social media site where you just have um a profile for a user that has a whole bunch of fields in it. You don't want to lay out uh a whole uh relational database schema to handle that. You just store it.

Wolf:

Especially if they're variable, like if they might have more or less information.

Jim:

Yeah, variable. Uh and you want to add information later, you you know, you come up with new things you want to track. Uh you don't have to do the old database uh schema change to handle that. It's just more fields in the in the uh the JSON document.

Wolf:

Um Yeah, at um at SourceForge, when we replaced the old Forge with the new Forge, I was working on the new Forge. Um we used Mongo for that, and uh that's that's a No SQL database. And and it's JSON, just like you just said.

Jim:

Yeah, there was there were two of them uh kind of coming up in the in the mid-2000s, uh MongoDB and CouchDB. And CouchDB is still out there, but it seems like everybody's using MongoDB. I've not used either one. Um I I just choose to use Postgres. Uh, but those are two very, very popular ones. Now, some of the uh in fact all of the the web hosting sites, the the the cloud service providers, uh Amazon and Azure and those, um they have their own uh No SQL databases too. And I think in many cases they're built on top of Mongo or or CouchDB. Um but it's a it's a really interesting way to store data. Uh and they've got all kinds of really neat indexing strategies and stuff, and you can use SQL to query it. So you know it might look like SQL when you're all done, so it's kind of neat. I I'm just I I I just don't don't use that. Um but before we uh before we end this little section, I didn't really mention some of the SQL database. Certainly I talked about Postgres. I I could talk about that all day long, and there's others, MySQL. In fact, way back in the early 2000s when I chose Postgres, I looked at MySQL and I looked at the list of features that Postgres had, and it supported uh transactions and triggers and um you know the things that I knew I needed. And MySQL supported those too. But when I looked deeper, they did this weird thing where MySQL had a couple of different engines for the back end, and um some of the uh back ends, I I don't even remember the names of them, but there was one that supported transactions, but it didn't support triggers, and the other one supported triggers, but didn't support transactions. So you if if you weren't careful, you'd get pretty deep into MySQL uh into the world and then realize well you can't do the the the transactions like you wanted to do.

Wolf:

When I looked at the and there were no stored procedures and there was no replication.

Jim:

Yeah, yeah, and and uh but uh uh I think there were stored procedures if you use the right back end. And and I thought that was kind of nonsense. What they had, I think they had a checklist of does MySQL support this feature? Check it does, check it does. But the checklist wasn't clear about what you had to do and the fact that some of them were were not compatible with others. They were you you couldn't have all of them at the same time. In Postgres, you could, and that seemed like the the clear winner to me. And I stuck with it, and I'm still very, very happy with that with that choice. Um, there are other databases, certainly. Um you mentioned SQL Lite. Um, I'll mention it again. That's a that's a great, great little database. Um and it's in everything.

Wolf:

It's in it might except for Excel, it might be the most popular database in the world.

Jim:

Yeah, yeah. It's it's everywhere. It's in your phone. Whether it's an iOS phone or an Android phone, you've got uh SQL Lite in there. Um there's something like I think the number was something like six billion installations of SQL Lite in the world. That's a lot, you know. You might have many of them already, you know. Your watch probably has it for all I've done.

Wolf:

Everything has it.

Jim:

Yeah. Um that's kind of neat. And uh Wolf turned me onto this thing, uh Terso. It's a Rust implementation of SQL Lite.

Wolf:

Uh which I tried to use uh when I was I added a database of the things that you've installed. I have a dot file manager named.NET.

SPEAKER_01:

Yeah, yeah.

Wolf:

And I have been growing it, and I added to it a database of what has been installed so you can more easily uninstall and check health and things like that. And I thought, well, I'll I'll start with Terso. Um and I ran into instant problems with that, uh, just installing the package, uh, something having to do with whether it was synchronous or asynchronous. It it uses an SQLite database underneath a file. The date the it's compatible file-wise. Yeah. So I just moved to SQL Lite, but what what does that even mean? Is this an appropriate time to ask you what asynchronous versus synchronous means? Or are we too short on time? You tell me.

Jim:

Let's let's keep going, right? Uh asynchronous versus synchronous. It has there's three meanings that I know of. Uh uh one of them is uh whether the database writes data to the disk uh asynchronously, so it can sort of dispatch the write and come back and continue on with whatever you were doing, assuming that the database is gonna that that the server is going to update or the operating system is gonna actually write that data to disk. It's not my problem. It says it'll do it, so it does it. That's asynchronous. Synchronous would be where you do a write and you wait for the response. You wait for the write to return. Um, and there's a setting in Postgres to enforce that uh Fsync writes or something like that, uh, so that every write to the to the disk is confirmed, written, before you can move on, before that that query returns, um, the update or the insert query. Um that's that's one way to talk about synchronous versus asynchronous. Uh another way is if you're talking about replication, uh, you can set up synchronous replicas where if I if I write my data, it's it's one thing to have confirmation that it's been written to the disk, but now you can also get confirmation that the uh remote replica has written it to the disk. And you you can't continue on until some number of your replicas have confirmed that they have written the data to the disk. Uh, or or you can uh I I don't set my replicas up that way. I certainly turn on uh synchronous rights for the local store, but for my replicas, I don't because the network latency uh is kind of bad. And and I know it's kind of like this eventually my replicas will be uh uh consistent. Um uh but then finally the third way to talk about uh synchronous versus asynchronous is uh the async await paradigm. Um you know, the thing you use in your language. Uh uh progress uh uh uh Python supports that, right? And um uh Python, JavaScript? Yeah. Of course, Perl doesn't, but that's all right. JavaScript certainly does, and that is you can call a function and either just let it go do its thing while you move on, or you can do an await on it and wait till that function returns. Uh the the functions have to be written that way. The library has to be written to allow that. And it wasn't that one of the reasons why you couldn't use Terso because it was doing that and it told you it needed it or something.

Wolf:

Um I I didn't wait around to diagnose it. It gave me trouble. It's and there and I instantly moved on to SQL Lite to investigate this problem later.

Jim:

Okay, yeah. One thing I noticed when I was examining the the features of Tirso was um there's a lot of things that aren't compatible. You know, there's a long list of features that SQL Lite has, and Terso does not implement everything. It it implemented it missed seemed to me like half of the things. And maybe they're not important to you, uh, but you know, to say it's compatible, you you gotta you gotta be careful with that. It it may or may not be. Right. One more thing, one more thing about SQL Lite. Um, I've listened to a podcast episode from the Co-Recursive Podcast, which is one of the best out there. If you get a chance to listen to that, it comes out monthly, I think. And uh Adam Gordon Bell just does a fabulous job with his interviews. They're like hour-long interviews, and he interviewed the guy that wrote uh SQL Lite, and it was it was a fascinating conversation about how he did it. He was he was doing a contracting job for the U.S. Navy at the time, and they needed something that was going to work, and and he wrote that, and it's pretty neat. I I've of course include a link to it in the show notes so you can go listen to that episode because it was really, really interesting. I highly advise listening to that.

Wolf:

I I guess this does give me one more question about SQL Lite. I I think I know the answer, but I'm I'm asking this one on behalf of the listener, and that is um SQL Lite's uh different than Postgres in this way. Postgres is running somewhere and you ask it questions. The question is in your code, but the action of answering that question is happening in some other process. SQL Lite's a library that's in you and it all happens inside you.

Jim:

Right. Well, inside your program. It's called in process, it's running as part of your your application. Uh, it doesn't send out a network request or through a Unix socket or anything like that to go ask the database manager to do something for you. It is your application is the database manager using that SQL Lite library. That's how it works. And also, uh, it's a single file database. Your entire database is stored in a single file in SQL Lite. Uh as compared to Postgres, every table is one or more files on disk. Uh the files I think are limited to a gigabyte in size. So if your database takes more space than that, it just creates another file and it just keeps on going. Um uh so if you look at a uh at the the database uh at the file layout for Postgres, there can be tens of thousands of files uh because of that. Every index is a file. I think every view, uh I'm not sure if views are a file or not, but the indexes are a file. Um anything you do is pretty much gonna create a file. Uh and the wall, the right-ahead log file, those are files. Um whereas SQLite it's it's a single file. Um so yeah, you know, there's a lot of other databases that are out there. Certainly you've all heard of these Oracle, um, uh Sybase, SQL Server, uh, DuckDB. We can talk about that for a second, but let's go back to Oracle for a second. I've never used Oracle, but in my research, I came across two pretty funny things. Uh their first customer was a CIA in 1977. And uh they they had a choice uh to develop for the CIA. They had two choices. They could either develop a database or a compiler for the PDP4. They flipped a coin and it came up database. Uh an actual coin? An actual coin. This is this is it's on the internet, so it must be true, right? They they flipped a coin and the coin came up. Uh, we're gonna do a database for the CIA. And that's what they did. And now look, uh, Larry Ellison is one of the richest men in the world, um, because Oracle charges so much for their database. Um uh and it was uh the company was founded by three people, uh Larry Ellison, um uh Bob Minor, and Ed Oates. Ed Oates came from IBM from back in the old IBM database days. Anyway, the three of them uh decided that Ellison was the worst programmer among the bunch, so he became the salesman. And look, you hear about Ellison all the time. You'd never hear about the other guys. So I guess the real money's in sales, right?

Wolf:

Mysteriously killed.

Jim:

No, probably not true. That's made up.

Wolf:

I I those are my that's a pretend thing.

Jim:

Right. All right, so uh SQL Server, that's from Microsoft. I've dabbled a little bit with that. I had to do some for a for a client project. I I didn't like it much, probably because I'm just not that familiar with it. But you know, it it's SQL. I could read the tables and do the stuff that I needed to do. Um DuckdB, there's a database. Wolf mentioned this one to me. That's an OLAP database. That's for analyzing data. Uh, it's a single file, there's no transactions, uh, it's not ACID compliant. You know, there's no guarantee that the data is going to be there when you write. In fact, you do very few writes. It's really all about reading, and it's incredibly fast at reading. So you use it for analyzing data.

Wolf:

And one great thing about DuckDB is it's not just about its own database files. You can point DuckDB at all kinds of different data structures and then just use SQL on those things. For instance, DuckDB can give you SQL access to your file system, to a pendant data frame, uh to CSV files, whatever you want. Um so it's kind of neat.

Jim:

Yeah, I know it's used by Facebook, Google, and Airbnb. Are you guys using it at work? Is that where you come across?

Wolf:

I'm not, but I am advocating that it's a thing we could use and might speed us up in places and or give us uh more consistency. Uh for instance, why should we use the uh special API for pandas data frames or polar's data frames when instead we could use SQL, which everybody knows.

Jim:

Right.

Wolf:

Except the people using ORMs.

Jim:

Right. So uh that's that's you know, that's kind of a roundup of of those databases. There's a lot of cloud services that offer databases. Certainly, Amazon offers um uh some No SQL databases, uh Simple DB, and I think Dynamo DB. Uh they've got their S3 scalable secure storage. That's that's a database. It's a key value store, but you can still think of it as a database. There's no SQL access to it, though. Uh Azure's got uh a couple of different choices for NoSQL, the Cosmos DB, MongoDB, Cassandra. Uh they've certainly you know Azure supports uh SQL Server, MySQL, Postgres, MariaDB. Um so those are all there. And I I've talked in the past about how I'm on Azure, and we were having Azure host the database for us. Uh Azure managed database. And I talked about the problems with that. I couldn't really get control of the data very well, so I moved off to my own self-hosted database. Uh, but Azure, you can do it either way. Uh Google, they've got their things, the Big Table and Firestore and Firebase and um a few others. Uh and of course they support MySQL and Postgres. One of the things you have to be careful of, I think all these database services, uh, they have kind of their own database that claims to be uh Postgres compatible or MySQL compatible. Those bother me because you know that whole compatible thing is, yeah, they're compatible until they're not. You know, they're compatible for some features, but probably the features I want, they're gonna fall down on. So I've stayed away from those. Um, there are some things that you wouldn't think of as a database. Wolf already talked about Git. Git's a database, right? You're you're storing the database. The data in this case is your source code.

Wolf:

It's a key value store, and the values are hunks of content, um, not named files. Like the name is in a different object, a tree object, which is a directory. Um, it's just a hunk of content that is inside some file, maybe multiple files, um, maybe files that used to exist and no longer exist in the current version. And the key is um the SHA-1 hash of uh that underlying content object. Um so a simple database, yes, but that's what it is.

Jim:

Sure. A CSV file. You could think of that as a database. It contains data, right? Uh probably the second most popular database in the world behind uh SQLite would be a spreadsheet, uh Microsoft Excel spreadsheet. Think of all the corporate data locked away in spreadsheets out there on people's desks uh that it may or may not get backed up. Umly the person actually using that computer really knows what's in it. And it might be important corporate data. Uh, I'm not saying spreadsheets are a great place to store your data, but it does happen.

Wolf:

Um I so I I don't want to uh accidentally waste time, but there is a database you didn't mention. Oh, it's open source. Yeah, it's called Firebird. Um, and a long, long, long time ago, uh and Firebird still exists today, but it it was it was around back then too, when we first decided to make a thing from Mozilla that was more user-focused, the first name that we thought of to call it was Firebird. Yeah, and that drew instant, not nice, instant aggression from the Firebird people. They were not nice at all. Really? Like, okay, this is a mistake. A problem happened, we need to resolve the problem. Um, but they weren't like, hey, we already had this name, let's figure out a way where we can they weren't like that at all. They were, you guys are jerks, and you're trying to steal from us, and you're bad people, and you're gonna go to a bad place, and you need to fix yourself right now, you arrogant asshole. Yeah, and this is I guess we're gonna have to beat that.

Jim:

Yeah, you were you were working at uh at um uh Mozilla. Mozilla. Yeah, that's right.

Wolf:

And it was a big deal.

Jim:

Yeah, yeah. All right, so uh let me let me tell you just uh I I've got to keep this brief, but I want to tell you a story about something I did with the database. You know, uh I I've been computing uh professionally since the mid-80s, and uh we had this system, it was all written in COBOL, and uh eventually it was Accu COBOL and the the database there, it's not a database, it was ISAM files. We did reads and writes and and updates, and every file, you know, the header file was a was a file, the detail file was a file, and we had to handle all this in program code of how to uh you know access a whole sales order, that kind of a thing. Um, and we started to migrate to the web. And of course, you're not gonna use COBOL when you do web development, uh, you're gonna use a uh some other language. In our case, we chose Perl for various reasons. I know Wolf wishes we had chosen Python, but this is before I knew by I knew Wolf. Had I known him, I definitely would have gone Python, I guarantee it. Anyway, we use Perl. And um uh so we had lots and lots of data tied up in these Accu COBOL ISAB files, and we we couldn't access them from outside COBOL. Uh we couldn't write a Perl program that could read and write that data and interact with it in any way. Uh, so the neat thing about Accu COBOL was it had a runtime, it was it was kind of like a uh a Java uh a Java type program where you compile the code down to bytecode and then you run it with a runtime. And the runtime we had uh the ability to link C uh uh objects into uh the runtime. And Perl is great at embedding in another system, especially if it's C. So we actually embedded the Perl interpreter in the COBOL runtime. So from COBOL we could call Perl, and that was really useful, and especially because in Perl I could interact with with Postgres, so that meant from COBOL I could read. Read and write Postgres uh database tables. Really, really useful. Um, and and uh this is back when when our friend Dave was working for me, and we wrote uh a conversion, we had over a million lines of code, and all those all those programs they were doing um uh uh you know, read from this table, write to a table. Uh you didn't have select and and that kind of stuff. You had cobalt verbs to interact with the data. We wrote some programs uh to sweep through that million lines of code and uh convert all of those uh COBOL read write update verbs into calls to subroutines that would behind the scene uh use a Postgres, it use an SQL command. And um it worked amazingly well. This allowed us to move all of our data out of those COBOL ISAM files into a Postgres database and still use all of our COBOL. And Wolf, I know you've said this many times. Why is it a bad idea to do a complete rewrite of your system all at once?

Wolf:

Uh it would be a giant surprise if you did a complete rewrite and finished. Yeah. But does that ever actually happen?

Jim:

So a little bit at a time. By doing this, uh, you know, the big important step was moving the data into a database, into a proper database. And we could have used any other database too, because Perl has has uh uh uh bindings to all the databases. So what we chose chose Postgres by moving all the data into Postgres, like we did it over a weekend for the client, and and that you know, the Monday they come in and now they're interacting with our system, but all the data is no longer in files on disk the way it always had been, it's now in a database. And much to our surprise, it ran faster than you know the old way. And it was it was fantastic. And what this meant now is we could start replacing pieces of functionality in our system with new programs, new web-based programs. And it you know, it accessed the same database. So now, you know, whether we're coding in COBOL or Pearl, we could write and read the same same data. It's fantastic. Uh, took about two years uh to complete the project where we finally threw away the last piece of COBOL. But let me tell you, it was absolutely worth it. And and the customer, you know, we kept delivering new functionality over time. We didn't have to say, we're not gonna be able to do anything for you at all for two years while we write this whole new system uh that may or may not work. So it really was great.

Wolf:

Uh so I know I know you are you are proud of lots of things that you've done in your life, um, but I have to say, of the different problems that you have solved where I know how you solved them, this is one of the most admirable uh solutions. Thank you. It was ever.

Jim:

I I I love doing it. And and and uh it worked out really well. And it I felt a little bad having to throw all that old COBOL code away because we didn't need it anymore. It meant now we're strictly in the database, but that was the end goal anyway, right? Uh so that all that uh linking of the Pearl runtime into the COBOL, we threw that all away. We didn't need it anymore. But it was really, really cool, and I love doing it. But look, I've been talking for way too long now. You know, we always estimate how long these episodes are gonna be, and I said I'd be happy if it was 50 minutes, and here we are at just over 90. So I think we've got to wrap this thing up. Uh so I I do have a couple of lines uh uh of summary that I want to talk about. First off, pick the right database for the job you're trying to accomplish, right? If it's an SQL database you need, definitely pick one. Pick the right one. If you if you go towards Postgres, great. If SQL Server is your answer because you're in a Windows world, great. Pick the right one and then understand it. Make sure you understand all the options that are available in that database. Like I mentioned in an option option in Postgres where we turned on uh synchronous writes. That's really important in a production environment. It may impact the performance a little bit, but it's worth it because your data is gonna be secure. Um uh something I didn't mention at all that is uh uh tuning your queries. Uh, there's a there's a uh feature in SQL, it's a standard part of SQL, it's called explain. You can do that in front of your select statement or in front of your query, and it'll tell you what the query planner figured out was gonna use, and it'll tell you if you use explain analyze, it'll tell you how long it's gonna take to run. And believe me, I've taken queries that take minutes and I've cut them down to milliseconds by looking at the output of explain analyze and realizing, oh yeah, it's doing a sequential scan of the table there. I better create an index on that column so that it doesn't have to do this sequential scan. Like I said, milliseconds versus minutes, huge, huge win. Um, and if you can set up replication, do it, it'll help you sleep at night. Uh and finally, like Wolf mentioned at the very beginning, when you look at it, all problems are database problems. So, yeah. I talked a lot. I hope you got something out of this. I could talk a lot longer about this stuff. There's a lot of things I didn't even cover, but uh, thanks for listening.

Wolf:

Wolf? Yeah. Uh first of all, uh, I want to say to everybody, thank you so much for listening. Uh, we love talking about this stuff, and we love talking to people who care what we're saying. Uh, and if you cared, and if you wanted us to say something different, or you saw something that or heard something that was wrong and you want to fix us, or if there was something that we said that uh struck a chord with you and you want to say, hey, that was right, send us feedback. Um so feedback at runtimearguments.fm um in the show notes. Uh you'll find uh individual email addresses for us and the website uh which links everything. Uh we're gonna have notes on all of the things that we talked about. I think we should add, just for Jim's story about migration, a link to the Ship of Theseus uh uh story. Um and if we include that link, you'll understand why Jim's story relates to that. Um I think that's basically I think that's basically all you need to know. Lots and lots of notes on this one. There's gonna be a transcript. Um and uh I hope you got something out of it. Um Jim is absolutely an expert in this. I do want to say one thing. Um I'm always trying to get better, and one thing I learned in the last uh 12 calendar months was the difference between like I learned about CTEs and using them in a query and why that is really good and easy to understand.

Jim:

Common label expressions.

Wolf:

Uh and then within the last maybe month or so, someone showed me that it can be that subqueries are faster than CTEs in in some cases. Yeah. Um so boy, I sure like CTEs, but I also like speed. Anyway, um thanks for listening, everybody, and uh send us send us feedback and ideas and everything you want. We'll talk to you soon. Uh bye from me, Jim. Yeah, thank you. Bye, everybody.

Jim:

And uh hey, join us for lunch sometime. I'm gonna include information about that in the show notes if you want to join us on a Saturday in the Ann Arbor area. So, yeah, thanks a lot, everybody. Bye bye.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

CoRecursive: Coding Stories Artwork

CoRecursive: Coding Stories

Adam Gordon Bell - Software Developer
Two's Complement Artwork

Two's Complement

Ben Rady and Matt Godbolt
Accidental Tech Podcast Artwork

Accidental Tech Podcast

Marco Arment, Casey Liss, John Siracusa
Python Bytes Artwork

Python Bytes

Michael Kennedy and Brian Okken
Talk Python To Me Artwork

Talk Python To Me

Michael Kennedy