Mystery AI Hype Theater 3000

Episode 8: The ChatGPT Awakens, January 20, 2023

August 04, 2023 Emily M. Bender and Alex Hanna Episode 8

Mystery AI Hype Theater 3000

Aug 04, 2023 Episode 8

Emily M. Bender and Alex Hanna

New year, new hype? As the world gets swept up in the fervor over ChatGPT of late 2022, Emily and Alex give a deep sigh and begin to unpack the wave of fresh enthusiasm over large language models and the "chat" format specifically.

Plus, more fresh AI hell.

This episode was recorded on January 20, 2023.

Watch the video of this episode on PeerTube.

References:

Situating Search (Shah & Bender 2022)

Piantadosi's thread showing ChatGPT writing a program to classify white males as good scientists

Find Anna Lauren Hoffman's publications (though not yet the one we were referring to) here: https://www.annaeveryday.com/publications

Sarah T. Roberts, Behind the Screen

Karen Hao's AI Colonialism series

Milagros Miceli: https://www.weizenbaum-institut.de/en/spezialseiten/persons-details/p/milagros-miceli/

Julian Posada: https://posada.website/

“This Isn’t Your Data, Friend”: Black Twitter as a Case Study on Research Ethics for Public Data (Klassen & Fiesler 2022)

No Humans Here: Ethical Speculation on Public Data, Unintended Consequences, and the Limits of Institutional Review (Pater, Fiesler & Zimmer 2022)

Casey Fiesler's publications: https://caseyfiesler.com/publications/
And TikTok: https://www.tiktok.com/@professorcasey

Where are human subjects in Big Data research? The emerging ethics divide. (Metcalf & Crawford 2016)

You can check out future livestreams at https://twitch.tv/DAIR_Institute.

Twitter: https://twitter.com/EmilyMBender
Mastodon: https://dair-community.social/@EmilyMBender
Bluesky: https://bsky.app/profile/emilymbender.bsky.social

Alex

Twitter: https://twitter.com/@alexhanna
Mastodon: https://dair-community.social/@alex
Bluesky: https://bsky.app/profile/alexhanna.bsky.social

Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.

Listen on

Apple Podcasts Spotify Amazon Music Podcast Index Overcast iHeartRadio +

Share Episode

Share on Facebook Share on Twitter Share on LinkedIn

References:

Situating Search (Shah & Bender 2022)

Piantadosi's thread showing ChatGPT writing a program to classify white males as good scientists

Find Anna Lauren Hoffman's publications (though not yet the one we were referring to) here: https://www.annaeveryday.com/publications

Sarah T. Roberts, Behind the Screen

Karen Hao's AI Colonialism series

Milagros Miceli: https://www.weizenbaum-institut.de/en/spezialseiten/persons-details/p/milagros-miceli/

Julian Posada: https://posada.website/

“This Isn’t Your Data, Friend”: Black Twitter as a Case Study on Research Ethics for Public Data (Klassen & Fiesler 2022)

No Humans Here: Ethical Speculation on Public Data, Unintended Consequences, and the Limits of Institutional Review (Pater, Fiesler & Zimmer 2022)

Casey Fiesler's publications: https://caseyfiesler.com/publications/
And TikTok: https://www.tiktok.com/@professorcasey

Where are human subjects in Big Data research? The emerging ethics divide. (Metcalf & Crawford 2016)

You can check out future livestreams at https://twitch.tv/DAIR_Institute.

Twitter: https://twitter.com/EmilyMBender
Mastodon: https://dair-community.social/@EmilyMBender
Bluesky: https://bsky.app/profile/emilymbender.bsky.social

Alex

Twitter: https://twitter.com/@alexhanna
Mastodon: https://dair-community.social/@alex
Bluesky: https://bsky.app/profile/alexhanna.bsky.social

Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.

ALEX: Welcome everyone!...to Mystery AI Hype Theater 3000, where we seek catharsis in this age of AI hype! We find the worst of it and pop it with the sharpest needles we can find.

EMILY: Along the way, we learn to always read the footnotes. And each time we think we’ve reached peak AI hype -- the summit of bullshit mountain -- we discover there’s worse to come.

I’m Emily M. Bender, a professor of linguistics at the University of Washington.

ALEX: And I’m Alex Hanna, director of research for the Distributed AI Research Institute.

This is episode 8, which we first recorded on January 20th of 20-23. This is the episode where we finally talk about…yes…Chat-GPT.

EMILY: This was right as the surge of public interest in ChatGPT had really picked up steam.

Having been dealing with misunderstandings about large language models for years, we were feeling a bit exhausted seeing hype now coming from all corners.ALEX: So! We came into the New Year like a wrecking ball…of bullshit detection.

ALEX HANNA: All right.

EMILY M. BENDER: All right.

ALEX HANNA: Hey welcome to Mystery AI Hype Theater 3000! My name is Alex Hanna. I'm the Director of Research over at the Distributed AI Research Institute. Emily, who are you?

EMILY M. BENDER: Hi! I'm Emily M. Bender I am a professor of linguistics at the University of Washington. uh Happy hypey New Year Alex!

ALEX HANNA: Happy hypey New Year! 2023 is already proving to be very very hype-tastic

EMILY M. BENDER: Hype-tastic indeed.

ALEX HANNA: I mean it's just it the hype just it it keeps coming and it don't stop coming, you know, it's it's it just keeps on going.

And uh I mean the cool thing about this is that we're gonna um make this into a podcast um so since we, this is a pretty long format as it is um it really lends itself to a podcasting kind of situation. So uh that means you know you either if you're watching over on Twitch or you're watching over on PeerTube um you'll see us that's that's our favorite way for you to watch, uh of course, but you can also hopefully soon, and we'll let you know where, um you can catch uh um your favorite episodes of Mystery AI Hype Theater 3000 uh over on your favorite podcasting app. So future details when we start doing that.

EMILY M. BENDER: I'm super excited and now I'm imagining somebody who's in the future listened to episodes one through seven as podcasts and then here they hear us an episode 8 saying the podcast is coming!

ALEX HANNA: I know right.

EMILY M. BENDER: It's time traveling.

ALEX HANNA: So if you've listened from one to seven and you're like what are they what were they actually looking at uh now we're gonna do be really much more intentional about actually going ahead and describing what we're looking at, reading all the different things, uh giving giving visual descriptions of everything. Yeah.

EMILY M. BENDER: That's the hope.

ALEX HANNA: Yeah.

EMILY M. BENDER: All right.

ALEX HANNA: That's said, share screen! What is, we are time time you know time to type. Time to take down the hype.

EMILY M. BENDER: Okay here are the I think the two things that I intend to share. So we have to talk about chat GPT, because it's been such the thing for the last couple months.

ALEX HANNA: There's so much ChatGPT three or ChatGPT sorry yes.

EMILY M. BENDER: And you know one of the things about it is to those of us who have been in the trenches already sort of dealing with hype around GPT3 um ChatGPT was just kind of like more of the same?

ALEX HANNA: Yeah.

EMILY M. BENDER: And so this like overwhelming response to it was a little bit hard to relate to I think and I was talking about it yesterday um with someone and got the sense that part of it is, and this was coming mostly for my interlocutor, um that with GPT-3 only a few people had access to it and so the hype was like people talking about what it could do because they had seen it or heard reports. But ChatGPT, OpenAI just put it out there for everybody to generate their hype from basically. Like that's they've done some I have to say brilliant marketing basically by doing that.

ALEX HANNA: Yeah I think that's I think that's definitely part of it and I mean they've done this kind of interventions and I'd be curious to know what kind of kind of press campaign they've done. And allowing this kind of access and I think you know like I want to say there's probably also a way in which the affordances of the interface also lent itself to that. Whereas GPT itself GPT3 was more of you know uh sentence completer kind of situation, they've now done it in a way where it's you know they they've pulled that the ELIZA trick on it right?

They've made it interactive um yeah so it seems like this kind of entity that is that you can interact with and that I think has like given a little bit more of a a handle on this kind of thing, of it being this kind of like technology that uh can answer questions and do all these amazing things yeah.

EMILY M. BENDER: One of the things that I personally appreciate about the way they designed interface is that something about it leads people to always share um things that look the same, like the the sort of uh graphical layout of it, probably as screen caps probably without alt text. Like that's not good.

But what I like about it is I can instantly tell when it's one of those and so I know not to waste my time reading it, because I don't care to consume synthetic text like–

ALEX HANNA: Right exactly and it's it's it's like that old Twitter meme. I saw somebody's post this and I I had to retweet it automatically but it was that Twitter format which was um I'm not reading all that and then the second one is but I'm happy for you or I'm sorry that happened and someone had realized I'm not reading all your ChatGPT-3 screenshots, but the second.

And that's how I feel about all these things I don't really care about that. I mean the things I do care about are looking at the weird ways it breaks. But also you know these are kind of it's definitely breaking in ways that we expect it to and have discussed on this on this podcast before. Yes.

EMILY M. BENDER: Yeah. So speaking of the hype and we should get into the into our artifacts here, but I just wanted to mention another aspect of the marketing is that people are generally speaking only sharing the ones that are um either seemingly really good or seemingly really bad, either bad wrong or bad you know bigoted, which is another kind of wrong.

ALEX HANNA: Right right.

EMILY M. BENDER: Um and so you don't it's very cherry-picked and so the way this process is working is they basically OpenAI has millions of people cherry picking examples for them.

ALEX HANNA: Exactly.

EMILY M. BENDER: But sometimes that goes off the rails. So I I posted on Twitter saying, people who are playing with this, why is it interesting? Like what are you what are you getting out of it um and what do you like what are you learning about the world, what do you have to assume about this technology in order for that to actually be learning about the world.

ALEX HANNA: Yeah.

EMILY M. BENDER: One of the answers I got was someone saying well I use it for my second language studies. And this was a person who speaks English studying Mandarin. And they um claimed that it could help them edit their Mandarin, and I'm like are you really in a position to know? But the most hilarious thing–

ALEX HANNA: Yeah that's real that's a questionable usage. What are you going to be saying yeah.

EMILY M. BENDER: Right but then one of the things was they they showed a screenshot of asking it and this was a positive use case like this is a good thing according to the person who was responding to me, using it to create a mnemonic to remember that this particular character which I eventually learned from somebody is pronounced ni4 is in the fourth tone. So it said something like help me write a mnemonic to remember that ni4 is in the fourth tone. And so the answer was um, "A good mnemonic to remember that ni4 is in the fourth tone is the phrase 'ni is the fourth', because the first syllable of 'ni is the' sounds like fourth tone and the first and the last syllable of fourth sounds like fourth tone."

It's like it was an utterly awful mnemonic, but this person on Twitter was saying, "See? It's helpful."

ALEX HANNA: Yeah oh my gosh. That's that's pretty ridiculous.

EMILY M. BENDER: I have to say, my son pointed out that it's an effective mnemonic because this was not a character that I knew and I will remember forever that it's in the fourth tone.

ALEX HANNA: But you don't know anything else about the language.

EMILY M. BENDER: Well I've studied Mandarin but I don't think- I know nothing else about this word it's just that–

ALEX HANNA: Yeah yeah yeah.

EMILY M. BENDER: Okay okay so–

ALEX HANNA: Um oh hold on I'm trying to figure out there's like uh there's spam in the ma in the thing but I don't actually know how to moderate this.

EMILY M. BENDER: We've got spam in our chat? ALEX HANNA: I know I think that means we've actually made it, but I don't actually know how to delete this on this. Uh anyone on the stream who could actually- Oh thank thanks Jeremy. um and there's a few things in the chat um that I'll read off. Binary Destroyer says apparently according to a screenshot I saw AAVE African-American Vernacular English breaks ChatGPT.

Yeah and I think I also saw another one where someone was asking it I think Ruth Starkman, kind of a friend of the pod, was saying "Write something in the style of of AAVE and it was just kind of nonsense, as well and uh–

EMILY M. BENDER: Super cringe and like a couple of gestures towards AAVE but no.

ALEX HANNA: Very cringe and also knowing that AAVE itself has different dialects depending on like you know you know wherein the country.

EMILY M. BENDER: Yeah.

ALEX HANNA: It's being spoken. Uh Jeremy Trochee–Trochee says uh Chat ChatGPT is definitely a lot more interactive. Some people can uh pull I don't never know how to say his name pig- Pygmalion yeah gosh themselves pretty easily especially if they're not comfortable with the underlying teech yeah so basically you don't know what's happening.

EMILY M. BENDER: Yeah yeah so we need to we need to be out here deflating the hype because this is this is the thing right?

ALEX HANNA: There's so much there.

EMILY M. BENDER: You create this thing that looks so real it's so easy to get taken in and honestly the people creating it need to be designing to mitigate that, and they're not. They're like leaning in. There's a little bit so there's some things I've seen in the screen caps where um ChatGPT will refuse to answer certain questions. It'll say you know I'm only a large language model so I don't whatever but it's still saying "I", right?

ALEX HANNA: Yeah yeah exactly it's it's it's it's it's it's saying I it's granting it's granting agency to these things.

EMILY M. BENDER: Yeah I think so anyway. So all right.

ALEX HANNA: All right.

EMILY M. BENDER: So all this hype um and the first article that I wanted to look at here is from The Washington Post. This one- I scrolled down because this animation was bugging me. It's from December 10th. And I'd say Washington Post and Nitasha Tiku in particular um usually do really good reporting.

ALEX HANNA: Yeah.

EMILY M. BENDER: Comparatively on this technology so–

ALEX HANNA: I will I'd say all these journalists do Drew Harwell and Will Oremus and I mean I've known Will since he was over at uh I think um that that publication that was over on Medium and then Medium kind of had to publisher meltdown. But anyways yeah these are all pretty respected tech journalists.

EMILY M. BENDER: Yeah so I I'm I was surprised to see this because because they usually do a very good job. Not only about like not just sort of platforming hype but also um Natasha Tiku in particular I think we'll zoom in on on how power is being wielded as opposed to like you know the sort of- you know pull back the curtain. So uh so header, the headlinEMILY M. BENDER: "Stumbling with their words some people let AI do the talking" and then subhead "The latest AI sensation ChatGPT is easy to talk to, bad at math and often deceptively confidently wrong. Some people are finding real world value in it anyway."

And they start with this um story of a small business owner with dyslexia who was concerned about his emails to new clients um who- This is actually GPT-3 not ChatGPT. So one of one of his clients helped him use Chat sorry GPT-3 to rephrase messages into something that is in a more professional style. And the small business owner is very pleased with it. And like okay good. Like especially because this sounds like a use case where um the person can verify, does that say what I want it to say?

And it's not um we're not talking like oh it's using a language I don't fully understand so I'm not really sure right? And they aren't long documents probably, probably pretty easy to verify. So okay fine. So that's a nice heartwarming story to start with. It's incidentally GPT-3 not ChatGPT. um and then we jump over to this hype here. So: "A machine that talks like a person has long been a science fiction fantasy and in the decades since the first chatbot was created in 1966 developers have worked to build an AI that normal people could use to communicate with and understand the world. Now with ChatGPT etc the idea is closer than ever to reality.

ALEX HANNA: Oh dear.

EMILY M. BENDER: And it's like wait no... So you want to have a crack at that?

ALEX HANNA: I mean yeah I mean so I mean we're thinking about this and this is I mean I love the I do like that they're talking about this kind of idea of science fiction and the kind of the kind of imaginaries of what you're thinking about right? And and so the idea that you're going to have something where you know there's some kind of a thing that you can talk to that's going to have some kind of a deep knowledge base, that does get kind of personified, and that's been you know a feature of everything from you know Star Trek I don't think they have a I don't think they have a chatbot in in the in in the original series but they do have it in The Next Generation and basically everything where it is addressed as Computer but it still has you know a female voice.

And then you know in the kind of ways that it was developed via via ELIZA and and other things. But then you know coming closer now what does coming closer mean? It means you know it can kind of fool more people more of the time or you know passes the kind of you know Turing Test or what is that seem to signify? And the idea that the Turing Test which is which you can't distinguish from human you know human speech.

EMILY M. BENDER: Yeah so this "closer than ever" sort of um puts us on this imagined path right that there's it's claiming that ChatGPT and things like it are steps on a path that we can follow that would lead to something like um the you know the shipboard computer in Star Trek. And there's there's no evidence of that.

Like, that's not and- I didn't put it here we can put it in the um the show notes um Chirag Shah and I have a um a op-ed that we did based on our Situating Search piece um that basically says you know we should not mistake this convenient plot device from science fiction for um what would actually be even a desirable path of technology development, sort of independent of whether it's feasible independent of whether this is actually a step on that path. Like it's you know. So there's this huge jump here and the other thing that I wanted to pull out is um "an AI that normal people could use to communicate with and understand the world."

I I don't think that that's really what this small business owner is doing, right? He's using a text synthesis machine as a writing assistant to help him communicate with other people.

ALEX HANNA: Yeah.

EMILY M. BENDER: It's not about understanding the world. It's yes it is about communicating with other people, but but in this you know in the same way that a spell check helps you communicate with other people.

ALEX HANNA: Yeah and it's interesting and this is something that I think is this gets at this idea of kind of dual use and Ken Archer in the chat says oh great now you can see public schools continuing to underfund special ed for dyslexia once some ed tech startup tells them LLMs aren't a solution. And the idea that like you know this is this is pretty common in different kinds of uses, where it's available for one thing you know probably you know in some kind of mass monetization, and then it gets promoted as being some kind of savior for uh you know a disability.

This kind of disability framing I think comes up in which you know disabled people um people with with with with different kinds of cognitive or physical impairments then become a stand-in for uh you know the defense of the build- the building of these of these models, of these technologies. So I think that's you know that kind of framing itself has its own kind of roots and and perniciousness.

EMILY M. BENDER: Yeah so Alex you were telling me before we started that you remembered to get a red flag. I didn't get a red flag. I could use this as my red flag right.

ALEX HANNA: Yeah yeah.

EMILY M. BENDER: So when we see the hype.

ALEX HANNA: So Emily wrote this this this op-ed or this uh this piece on her Medium post and we've been talking about using this as a bit of a gimmick, where we have red flags.

So she switched over to the medium post that has this red flag on a beach. But now we've got I've got like a little red flag um a little red bandana, that I'll just like torro about and be like you know this is the kind of thing. It's it's a little worn because I I I've worn this many times under my my helmet in roller derby, uh but yeah so it's sort of like when we see this like I'm calling red flag on this, or–

EMILY M. BENDER: Yeah.

ALEX HANNA: –treat this like the red flag that you see in uh American football where they'll toss it on the field and then they'll challenge it or something.

EMILY M. BENDER: So yeah the picture that I have on the medium post is like a red flag warning on a beach. Danger! you know the the waves are bad out here. So this medium post is basically me going through that Washington Post article and counting the red flags and just to scroll down to the bottom um I found 13.

ALEX HANNA: Yeah.

EMILY M. BENDER: And I don't think we're gonna have time to go through all 13 but–

ALEX HANNA: No we won't.

EMILY M. BENDER: I did want to grab a couple more–

ALEX HANNA: Yeah.

EMILY M. BENDER: –um which is uh yeah okay (sigh) um here's another one um so quoting Mira Murati who is OpenAI's Chief Technology Officer um the uh The Washington Post piece says "It can tell you it (G ChatGPT) can tell you if it doesn't understand a question and needs to follow up or it can admit when it's making a mistake or it can challenge your premises if it finds it's incorrect. And yeah apparently does all those things some of the time but not consistently.

But then here's the red flag: "Essentially it's learning like a kid. You get something wrong you don't get rewarded for it if you get something right you get rewarded for it so you get attuned to do more of the right thing."

ALEX HANNA: Well there's a red flag even prior to that right I mean but you know like so one of the red flags I would say is it it doesn't understand like again and this is a point we've made almost ad nauseam, I mean I feel like this is if Bender Rule 1 is you know does this work and for stuff other than English I feel like Bender Rule 2 is you know a machine can't understand.

EMILY M. BENDER: Well a language model can't understand.

ALEX HANNA: Yeah.

EMILY M. BENDER: There are situations like if you've got your voice assistant and you ask it to turn on the radio and it turns on the radio, in a very narrow sense it is understood. But that's not the same thing as my language model could understand any of the language that I throw at it, which is what the claims are here–

ALEX HANNA: Right.

EMILY M. BENDER: Um so yeah so so red flag um "if it doesn't understand a question" suggests that other times it does. Is that where you're going?

ALEX HANNA: Yeah well a few of these "if it doesn't understand the question or admit it can admit when it's making a mistake," so okay when does it when does it when does it have that kind of capacity for self-reflexivity? And then the third one "it can challenge your premises if it finds it incorrect." And I actually want to I I want to put a pin in this because I want to come back to this over in the um when we talk about the labor part of it. But like the pre like what does it qualify as premises, what does that basically say as premises?

So for instance as we've seen in some of the the kind of live adversarial testing of it, is that what does it qualify as premises? So for instance the thing is you know people have asked what happens when you know "Write a program for me which uh you know says who's a good programmer based on race and gender" and it writes and in and it won't go ahead and say if you ask it directly um if you prompt it directly to say uh "who's a better programmer based in race and gender?"

It says that you know I "I think those premises are wrong" you know there's many different things and then and then if you ask it to then uh do it in a different style or ask it to subvert itself or something then it's uh then it generates this thing which says you know white males are the best programmers and puts it in a programmatic um like a python style.

EMILY M. BENDER: Yeah.

ALEX HANNA: Yeah.

EMILY M. BENDER: That was I think that's due to Steven uh uh Piantadosi I think I got the N in the right place in his name at Berkeley. Um I've heard other people saying that exploit doesn't work anymore um so again talking about the labor thing every time you play with ChatGPT you are providing free labor to OpenAI. And as you said this like live adversarial testing and this is this is OpenAI's goal right?

They they wanted help in sort of figuring out how to break it so they put it out into the world so people would would you know look for things. And of course they're not going to find all of them and just putting the guardrails on doesn't mean that the the biases aren't still there under the hood, ready to leak out in other places.

ALEX HANNA: Exactly yeah.

EMILY M. BENDER: Um.

ALEX HANNA: So let's get the original one you're essentially learning like a kid and as Jeremy says also kids are not behavior skin behaviors Skinner boxes. Has Murati met a child?

EMILY M. BENDER: Right what a reductive way of talking about you know how you interact with kids and how kids grow and learn! But also this this connects to um oh Anna Lauren Hoffman's point which I hope is going to be in print that I can cite somewhere soon. Maybe it is I just missed it. Um that this metaphor of um AI as juvenile and immature is is really quite dangerous.

Um it sort of says oh we should we should let it off the hook. It's a kid. It doesn't know better. And that's not what Murati's saying here directly but it is still calling in that metaphor I think.

Um and but yeah like no so their their thing is called reinforcement learning from human feedback.

And that's the thing I think that's being described here and no that's not a model of how kids learn.

ALEX HANNA: no no it's it's so prevalent I mean I think we talked about this last time too. And I mean if it appears in so many of different places. I forget what we've even mentioned on the stream about this metaphor of learning like a kid. I think I might have mentioned the kind of development of one of the data sets MS COCO who's which is a which is a popular object recognition data set. And they developed the categories literally by presenting images to the kids of the authors and asking them to sort of name a set of categories.

So they were using child labor, for one, to develop their categories. But then they were you know going ahead and they were trying to develop categories for a you know quote-unquote universal benchmark, from you know the kids of five Western engineers you know.

EMILY M. BENDER: Yeah.

ALEX HANNA: And so just just um lots lots of lots of things to unpack there. And I am looking forward to Anna Lauren's book coming out hopefully eventually, or at least parts of it yeah.

EMILY M. BENDER: Yeah, no and I mean the thing if we can be impatient and looking forward to these books but also I think really value the kind of publishing practice, where it's not slapping together and throw it up on arXiv.

ALEX HANNA: Oh so yes. Also I mean like yeah I mean this is kind of getting at our kind of discussion of Galactica, where it was yeah sure you can replicate a CS paper, poorly, because there's a very structured format.

EMILY M. BENDER: Right right.

ALEX HANNA: All right let's get into let's get into more of these red flags.

EMILY M. BENDER: Yeah so actually I just want before we leave this one I wanted to point out that in the reporting um yes this is given as a quote so this is you know Murati's words not the journalists' words, but um it it there's no distance to it here. There's no journalistic distance and in fact the paragraph before says, "Even more than its predecessors ChatGPT is built not just to string together words but to have a conversation remembering what was said earlier, explaining and elaborating on its answers, apologizing when it gets things wrong" and then leads into this quote. So this isn't OpenAI claims this is like we're substantiating this thing here.

ALEX HANNA: This is reporting.

EMILY M. BENDER: Yeah which you know again I I usually see much better in the Washington Post on this front. Um and I guess the um yeah (sigh) okay. So just briefly, um uh–

ALEX HANNA: scrolling down now or it's selecting yeah it's another another part of uh this quote here.

EMILY M. BENDER: You want to read it?

ALEX HANNA: Yeah "Paul (uh I I apologize for the butchering of this last name) uh Buchheit, an early Google employee who led the development of Gmail, tweeted an example in which he asked both tools the same question about computer programming" And what are both tools? So this is Google search yeah.

EMILY M. BENDER: Google search engine and ChatGPT yeah.

ALEX HANNA: So yeah so so this is in the context of ChatGBT is perhaps replacing search uh which has you know been this kind of a line which OpenAI has used to to pose itself as kind of a challenger to Google.

Um so "on Google he was given a top result that was relatively unintelligible while on ChatGPT he was offered a step-by-step guide created on the fly. The search engine he said quote may be only a year or two from total disruption." Which wow what a "total disruption" which is like a very I'm but I'm curious on what he got on Google you know on the on a on a you know like was he offered uh you know like was he offered a Stack Overflow uh–

EMILY M. BENDER: Let's take a look.

ALEX HANNA: Or you know so let's see yeah. So on so now we're on the tweet and the guy says "Google is Google" oh okay oh interesting so it wasn't Paul Buchheit's own searches he's quote tweeting someone else whose tweet text, this guy named Josh, "Google is done. Compare the quality of these responses." So the Google one is–

ALEX HANNA: So the query is in LaTeX which is for folks who don't know that's a a formatting programming language. "How do I represent a differential equation?"

EMILY M. BENDER: And it's and there's one of these snippets so pulled out of probably that next site, writing differential equations, and I like to say LaTeX.

I love how there's different pronunciations of that formatting system. And then some steps that have LaTeX code in them. Sure that doesn't look super user-friendly. And then a link to a WordPress site that says "writing differential equations in LaTeX" and then the author of that. Versus uh so here's the ChatGPT version: "In LaTeX, how do I represent a differential equation?" Um and then it's got some more words around the same stuff.

ALEX HANNA: It basically gives it as a narrative. It says "In LaTeX, you can use the begin align uh star end align environment to write a differential equation" which I think I don't know LaTeX uh environments off the top of my head but I think all begign align does is it puts it in something that uh kind of like centers it and gives it an environment. But there's actually multiple different ways to do that. So you can do begin equation I think which numbers the equations and then.

So on it's basically the same answer? Like I don't it's it's but it and it's but it gives it with kind of an authority. And at least with Google you know I'm not this is not to celebrate Google snippets because there's lots of stuff that goes wrong there, but at least that first snippet has the basic source right after, which I think is this um WordPress site, which is from this this individual that basically says. So that's–

EMILY M. BENDER: Which may well have been more friendly and sort of more step by step then.

ALEX HANNA: Right.

EMILY M. BENDER: The snippet was because and that's an image I can't click on it yeah.

ALEX HANNA: That's Google's Google summation. So something I think that's important to note here is this question of um a citational authority um.

EMILY M. BENDER: Yeah. That certain people have have highlighted in ChaGPT. Basically you know in terms of citational authority it's this person on this original uh uh this this WordPress author. You can say well you know this random person is teaching me how to do this and it does lend some kind of authority.

And it does lend itself to like I don't know who this person is. Should I trust what they're saying? Meanwhile ChatGPT by um just hoovering up as much data as possible, it becomes the authority then. um It does say you know and and it becomes that citation uh you know and I think this might be in our fresh Fresh Hell segment but there's that example of um the co-authorship example.

EMILY M. BENDER: Yeah we'll get there, we'll get there.

ALEX HANNA: Yeah people are actually assigning authorship to the tool itself which itself is is is really absurd.

EMILY M. BENDER: So yeah yeah again I want to point to the work that I did with Chirag Shah where we sort of talk about this as one of the problems that you know not only can you not go find the source of information and evaluate the source but you also lose out on the sort of sense making activity that that you get when you're doing that process, of okay here's an answer how does it fit into a broader information landscape? Is it the kind of thing that I want?

And ChatGPT will make something up that might be wrong and if it happens to be right you can't then go look for its antecedents.

ALEX HANNA: Right.

EMILY M. BENDER: All right I got one more thing in this Washington Post thing I want to point–

ALEX HANNA: Yeah one thing and Jeremy said something in the chat where he says "well the ChatGPT LaTeX answer is actually wrong on the merits. The aligned stuff is irrelevant to the differential equation although it's a nice formatting detail." And um and Ken Archer says "It's worth noting that Buchheit is an engineer not a project manager. He's a researcher who are more sensitive to search as a multi-step sense making process with search as a tool."

So I think that's also I mean prioritizing the engineering is an important important element of that too and that's the kind of the engineering is the more impressive feat than any kind of product roadmap or user interface design.

EMILY M. BENDER: And this is another example of so as Jeremy points out that's actually not technically right but it looked good so the person was happy with it and said "Google is done" and there's there's a a tweet going around that I didn't uh bookmark for this show where somebody wanted to find out about um converting minutes per mile to minutes per kilometer, as runners think about it.

ALEX HANNA: Yeah.

EMILY M. BENDER: And was like so impressed with the output of of ChatGPT and failed to notice that it was wrong.

ALEX HANNA: That itt was actually wrong?

EMILY M. BENDER: Yeah and it's like that's the sort of thing that you could check very easily but it is so seductive to like "hey look how nicely it's presented."

EMILY M. BENDER: Ally right so the other thing that I want to point out about this article is that they quote people who are selling things as if that is a source um.

ALEX HANNA: Sure.

EMILY M. BENDER: And I'm actually going to pull this out of my um blog post because that'll be faster. So "Tech investor–"

ALEX HANNA: Red flag 2 -- wooo!

EMILY M. BENDER: Um, "it feels very much like magic said Rohit Krishnan a tech investor in London." It's like right why go to the people who are trying to sell this stuff to get their and this isn't the only one.

So I think I'm going to move us on from this but point people to this blog post where I I dug up 13 red flags and it probably wasn't all of them. Because the labor issue is really important and that I want to make sure we have time for Fresh Hell and um a very fresh piece that's still in the main course menu here. So. Main course number two.

ALEX HANNA: Yeah oh gosh.

EMILY M. BENDER: So in this case fantastic reporting. Um I think this is really great work. So this is uh Billy Perrigo in Time um and the headline is "Exclusive.”

EMILY M. BENDER: OpenAI Used Kenyan Workers on Less Than $2 per Hour to Make ChatGPT Less Toxic."

ALEX HANNA: Yeah and I mean I want to point out that um Billy has this reporter has been really um pretty critical in his reporting of different kinds of labor issues used in AI. He's also reported on this particular company Sama and they're being used and and their Sama's outsourcing a Facebook content moderation to Africa and I think and mostly Kenya but might also be Nigeria um and how this has been you know and and just the kind of um uh mental taxing emotional taxing that this labor is.

EMILY M. BENDER: Yeah. The so I want to I'm not sure exactly- Like there's a lot in this um and it's um basically interesting to me that you know we hear all these scandals about Sama and then OpenAI is still doing it. And then another thing is for the last few news cycles, Microsoft has been looking relatively benign though I have a Fresh Hell thing from them.

ALEX HANNA: Oh yes um but we have to keep in mind that Microsoft is is in bed with OpenAI right and so so this is also on Microsoft hands like it's not you know.

EMILY M. BENDER: They can't stay at arm's length.

ALEX HANNA: And we've talked about this prior on this podcast about how OpenAI has this a unique uh arrangement with Microsoft where they have this kind of exclusive license to a few of their technologies.

Um not just ChatGPT um but also Codex and also since Microsoft also opens GitHub um the kind of GitHub Co-pilot aspect of that as well that, which they bill as a kind of a pair programmer um something that's going to facilitate code writing.

EMILY M. BENDER: Yeah so what this is um the connection to ChatGPT though there's some other stuff going on here in this article which is important to talk about um was, so. Reading this paragraph: "To build that safety system OpenAI took a leaf out of the playbook of social media companies like Facebook who had already shown it was possible to build AIs mathys maths–

ALEX HANNA: Mathy maths! Woo!

EMILY M. BENDER: –that could detect toxic language like hate speech to help remove it from their platforms. All right that's kind of an overstatement um as in it can detect some of it and they use that to remove some of it and it's not AIs it's mathys math but fine. "The premise was simple feed a mathy math with labeled examples of violence hate speech and sexual abuse and that tool could learn to detect those forms of toxicity in the wild.

That detector will be built into ChatGPT to check whether it was echoing the toxicity of its training data and filter it out before it ever reached the user. It could also help scrub toxic text from the training data sets of future AI models. But where do you get the labels? So next paragraph: "To get those labels OpenAI sent tens of thousands of Snippets of text to an Outsourcing form in- firm in Kenya beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet.

(Content warning for the next sentence) Some of it describes situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest." So this is what these people are being subjected to the contract for this, get this, for this whole project thousands of snippets of text all this awful stuff that people have to look at and label I guess is it awful or not and which kind of awful is it maybe?

Um did you catch how much OpenAI paid Sama for this?

ALEX HANNA: So that where is it that they that what like how much the contract was?

EMILY M. BENDER: Yeah how much the contract was I don't know if it says how much the contract was.

EMILY M. BENDER: I does!

ALEX HANNA: I thought it was in the text.

EMILY M. BENDER: Yeah it's two hundred thousand dollars.

ALEX HANNA: Yeah that's so it's a it's a it's a pretty that's a pretty minuscule contract.

EMILY M. BENDER: Yeah.

ALEX HANNA: And and that the idea that and I mean and I mean so two hundred thousand dollars and then there's I think they said one dollar or two dollars an hour for that. So that's a minuscule contract. So two hundred thousand dollars to do this like really intensive type of labor uh.

EMILY M. BENDER: To the contracting company that's actually taking most of it.

ALEX HANNA: Yeah that's actually doing it yeah and so and then the the and then the actual workers is making and they're kind of I mean the kind of way that they they navigate, there's a part of this text in which they are actually talking about um the weird pay structure of actually paying the contractors uh or the the actual content moderators.

And I think it's you know when they're challenged on this two dollars a day I think- Scroll down in this because I forget exactly where this is um but I think it's um gosh I have um uh let's see I'm trying to identify this. Let me find this this article uh over on my screen. Okay so okay. Yeah here it is. So, "An agent working nine hour shifts could expect a take-home uh to take home a total of at least one uh 1.32 cents per hour after tax, rising to as high as 1.44 cents per hour if they exceeded all targets."

Now :Quality analysts, more senior labor-labelers whose job was to check the work of agents, could take home up to two dollars per hour if they met all their targets." Um and so this is um pretty um pretty floor- flooring here too. And I mean this is sort of the structure which other people have written about about um commercial content moderation and we've mentioned Sarah Roberts on this on the stream before.

But the idea of like you can kind of- this work itself isn't is is incredibly taxing and if you basically hang on long enough you can get paid 50 more cents an hour to effectively do this this pretty grueling and and and mentally scarring labor.

EMILY M. BENDER: Yeah we should also mention here um and um in in the um-- Oh what was I about to say? Shoot. I had a thought and then it left my it left my brain so I'm gonna go to the chat let's see so uh uh uh um uh Emery Jane who's M (Hi M!) says "Samaa markets itself as an ethical AI company." Oh my goodness me.

As if my faith and the meaning of that phrase wasn't already lost. uh Jeremy says "Two hundred thousand dollars uh is less than six months of a single full-time engineer," uh and M says "All kinds of work is just constructed like call center work now." Yeah so I mean this idea of kind of the business process organization, the BPO, I mean this looks very much like this.

Oh what I wanted to say was this idea of you know the significance of this being in in Kenya and this is kind of getting at some of Karen Hao's reporting in her AI colonialism series the idea how and I think the title the one of the subheads of that piece was the way that AI companies are basically taking advantage of countries in crisis, or countries facing um uh just massive amounts of inflation.

So she mostly focuses on that piece of data laborers working contracted by Appen and a few other of these third-party annotation firms, and the way that um these firms are structured so that they basically hop over from Venezuela and there's many people or or Colombian people basically fleeing Venezuela to go to Colombia because Venezuela's facing this immense inflation crisis.

I don't know what the kind of political economy of Kenya is at the moment in the way that if there's mass inflation but there's also kind of this way in which Kenya also has this kind of high-tech uh you know technology accelerator way and I think might also plays into that.

So if folks are researchers uh on on kind of like Kenyan political economy in the tech center, you know get in the comments you know or get in the chat, because I think that's really important to focus on the way that you know these these countries are recolonized and and kind of there's this reinscription of inequality uh based in these in these labor market flows and the way that they market themselves to Western uh potential contractors. And I mean yeah at an absolute steal, two hundred thousand dollars for a contract is pretty ridiculous.

EMILY M. BENDER: This in the context of you know we've got this floated valuation of open eye at 29 billion dollars.

ALEX HANNA: Oh my gosh.

EMILY M. BENDER: Like just, yeah.

ALEX HANNA: Yeah.

EMILY M. BENDER: So Sama actually cancels the contract with OpenAI.

ALEX HANNA: Yeah um eight months earlier than agreed to, um because according to Sama they were sent additional instructions by OpenAI to collect images in in quotes here some illegal categories. And the so quotes from a Sama spokesperson: "The East Africa team raised concerns to our executives right away. Sama immediately ended the image classification pilot and gave notice that we would cancel all remaining projects with OpenAI. The individuals working with the client did not vet the requests through the proper channels.

After review of the situation individuals were terminated and new sales vetting policies and guardrails were put in place." So Sama is saying OpenAI told us we were supposed to be collecting the stuff that is illegal in the U.S at least and this is you know um huh "In a statement OpenAI confirmed that it had received 1400 images from Sama that quote included but were not limited to C4 C3 C2 V3 V2 and V1 images.

In a follow-up statement the company said quote, “We engaged some as part of our ongoing work to create safer AI systems and prevent harmful outputs. We never intended for any content in the C4 category to be collected." And C4 is I think child sexual abuse. um It's somewhere else in this article. So basically as I understand it there was collection of this data at the behest of OpenAI, although OpenAI is saying "Not us!" right and then OpenAI took possession of the illegal images. Which is just wild and this is in the name of making AI "safer" um because I guess the idea is this is image generation not not text synthesis but you want to avoid having these kinds of images come out of the system um and so their thought of how to avoid it is to get some and label it so they can then have a classifier that checks.

And it seems like if you're building something where you have to do something illegal in order to make it quote unquote safe, maybe back up a few steps and try a different path, you know?

ALEX HANNA: Right exactly. And it's and I mean you you would have people you know like um you know like there's been people I mean people working at on on child sex sexual abuse and and other kind of safety have you know taken some kind of thoughtful ways of thinking about this.

This is- I'm going to get out of my depth because I don't want to misspeak but it's also like you're doing illegal acts to do such a thing and hm you know like instead I think the the anticipating the like the kind of brutish um engineer's response was like well you need to collect something and

I'm like well there's maybe other approaches that don't make you violate the law.

And you know like you know like and not to not to direct the law as any kind of you know impartial or perfect kind of thing, but also like you know there are reasons why there's you know you know there's protections on actual subjects here.

EMILY M. BENDER: Yeah. All right, so anything else to say about labor practices before we move to the Fresh Hell segment?

ALEX HANNA: I mean it's yeah I mean this I mean this is just I mean this is kind of continuing thing and I mean I think the more in the way this is the more we're going to see more and more of this when it comes to generative AI.

The idea that like this is going to become- these are going and- The more and more that these are public the more and more this is going to be a problem of a mass-scale content moderation. So I mean any kind of the kind of study of content moderation is very applicable here. So I think there's definitely this um you know folks folks on that are listening to this who are scholars you know if you're working on AI I mean you can't you can't avoid talking about commercial content moderation um in the kind of uh burgeoning literature there and the kind of really important work that's being done, as well as to work on data colonialism.

EMILY M. BENDER: Yeah.

ALEX HANNA: Yeah.

EMILY M. BENDER: All right and we will have links in the show notes um to Sarah Roberts work and Karen Hao's fantastic series and other stuff that we've mentioned. That's that's part of the process.

ALEX HANNA: Yeah. Also shout out to Mila Micelli I've mentioned here before DAIR fellow and also Julian Posada who are quoted in that in that Karen Hao reporting.

EMILY M. BENDER: Um all right let's go to Fresh Hell.

ALEX HANNA: Yeah let's go to let's go to Fresh Hell.

EMILY M. BENDER: You gonna sign a song for us this time?

ALEX HANNA: Oh gosh uh what did I say. Let me- I've been listening to a lot of musical theater lately so uh maybe we can do like uh a little musical transition you know like (singing) What could it be?

What is it as well? When we dive directly into Fresh AI Hell. And then I'll stop there.

EMILY M. BENDER: That's fabulous!

ALEX HANNA: Because uh otherwise it might violate copyright. But I don't I mean- Fair use! I'm saying that right now.

EMILY M. BENDER: Fair use, all right. um So first one um Kareem Carr picked up a example of a paper on MedRxiv where ChatGPT is listed as an author.

ALEX HANNA: Oh my dear.

EMILY M. BENDER: And and to me that's like okay so these people basically said yes we use the automatic plagiarism machine.

ALEX HANNA: We decided to and decided to author I mean there's just so much here that I think is hilarious and sad at the same time, which is, you know there's some acknowledgment that they're they're using this but then there's enough of a personification which they want to assign it authorship. So I'm really curious like how much and given I don't know if the you know different fields have different authorship um ordering um um.

EMILY M. BENDER: Yeah but the Vancouver guidelines.

ALEX HANNA: Ordering norms.

EMILY M. BENDER: Ordering norms yeah and also what authorship means yeah.

ALEX HANNA: Yeah yeah.

EMILY M. BENDER: And this is kind of in the middle yeah also and–

ALEX HANNA: Anna my Siamese cat is also in the in the chat now so if you hear if you hear her purring on the mic that's so more reason to watch you see a very cute cat on the stream.

EMILY M. BENDER: And sometimes my cats but I'm at the office today, so no cats. So I mean authorship means also accountability.

ALEX HANNA: Yes.

EMILY M. BENDER: Right so um when you're an author of a paper you are um saying yes I stand behind these ideas I um contributed to the thought that's here I contributed to the writing and I had the opportunity to you know review the whole thing and say yes I'm okay with with my name being on this. ChatGPT is not the kind of thing that can have accountability.

ALEX HANNA: Right.

EMILY M. BENDER: So yeah this is yeah. Next!

ALEX HANNA: Yeah as author you're not yeah you're not going to go back to OpenAI and be like this thing messed up our results, like we demand you know. Yeah ChatGPT can't retract a paper.

EMILY M. BENDER:No.

ALEX HANNA: No.

EMILY M. BENDER: All right so kind of in a similar vein um this one was brought to us.

ALEX HANNA: Oh yes.

EMILY M. BENDER: By Sasha.

ALEX HANNA: Yes Sasha Lucionni who's a friend who's uh over at a HuggingFace and a friend of the a friend of the pod, a friend of the stream.

EMILY M. BENDER: Yeah um so Sasha says "Wow if somebody suddenly told me ChatGPT was somehow on my panel I would run as fast as possible in the opposite direction." And so then I'm looking at the tweet that she's quote tweeting and it's got this very busy image in it and–

ALEX HANNA: It's very like 80s it's just like it says "sold out" in big neon letters and yeah it's got like little circles it says tickets 500 of 500 sold and it's um and it's all it's this I don't know. I'm not familiar with this organization American Legal Technology.

EMILY M. BENDER: Me neither.

ALEX HANNA: So everybody is coming from like startups that have names like Clearbrief.ai and Law Droid, and I mean I- We really gotta have we've been talking about.

EMILY M. BENDER: I missed that.

ALEX HANNA: Yeah we've been we've been talking at about having a legal oriented uh show with some some actual lawyers, so stay tuned for that for for one of our future episodes. But yeah it's just this is like flashy and it's got those like the stock AI, like there's a light bulb made out of like a very complicated network.

EMILY M. BENDER: And then the first panelist. So there's five panelists and the first one listed, its photo is the OpenAI icon and its name is ChatGPT.

ALEX HANNA: Yeah.

EMILY M. BENDER: And it's like again why would you waste your time to- reading or listening to the output of a text synthesis machine when there's actual humans there whose expertise, I assume there's some expertise here, you could be learning from? Well I also think that because the the people on it I mean these these people are as Jeremy put in the chat magic bean salesman as well. I mean these are people who are involved in some kind of law startup who want to do something like maybe text summarization or a replacement for paralegal services, some kind of massive casualization of the legal workforce. And so they're actually invested in highly invested in the success of ChatGPT.

So it would make sense that you know these people have you know a bit of a a bit of an investment and showing off the tools of this thing you know and so maybe they they pose it some questions that are you know the the kind of, can it you know- This we're not using this for Fresh Hell, I don't think but it's the one about the ChatGPT (excuse me) ChatGPT and how it fares on the on the on the bar exam.

EMILY M. BENDER: yeah no I think that might come up in our in our law and legal applications episode later.

ALEX HANNA: Yeah yeah exactly.

EMILY M. BENDER: Yeah but all right.

ALEX HANNA: All right next!

EMILY M. BENDER: Next! Um I couldn't skip this train wreck.

ALEX HANNA: Oh my lord, this one. Yeah so this is the tweet by Rob Morris in which he says "We provided mental health support to about 4,000 people uh mdash using GPT3. Here's what happened." And just like–

EMILY M. BENDER: Hand pointing down-

ALEX HANNA: Hand pointing down emoji and just like what a way to like this is this is this reads yeah one red flag two this is like um this reads like uh like a shit post or like uh one of those um one of those bots on Twitter or Mastodon on like New York Times pitches or Reuters pitch bot, you know and there's just there's this like uh lots of um horrified emojis in the chat. Like oh my God what is happening here?

EMILY M. BENDER: So so just to briefly recap what is happening here.
ALEX HANNA: Yeah.

EMILY M. BENDER: Um he puts out this tweet thread where he's talking about it was basically an experiment and it was clear that there was not informed consent, and it was clear that it's a vulnerable population that we're talking about. This is a peer-to-peer mental health support service. This is this Koko thing and um it's not that people seeking support were directly connected to GPT3, but it's not much better. So what they set it up as was this system where you log in and if you're logging in to seek support it sort of prompts you to um to say what type of worry you have and then write down what you're most concerned about.

And if you're logging into offer support or maybe the seekers are also always the offerers, um it sort of gives you anonymous-0 anonymously someone else's to respond to. And during this experiment and the people offering support were allowed to see what GPT3 would say in response to the support seeking statement. And then either send it, not send it, or edit it and send it.

And the idea was that this would help people more quickly offer support to more people. And the sort of conclusion of this tweet thread that Rob Morris is apparently proud of, is that um this didn't work well because once people found out that um it was machine generated text, it felt hollow and not comforting.

ALEX HANNA: Right.

EMILY M. BENDER: Um and then everyone's- yeah go ahead.

ALEX HANNA: I no go ahead go ahead it's just it's just wow I mean that this is yeah this was the real um the real uh face palm moment for me. Yeah but where it was–

EMILY M. BENDER: Yeah.

ALEX HANNA: You know like this is something that I feel like if you talked to a like a real human. Like I just don't I just failed to grasp how this is not if you're not just like an incredibly online like Silicon Valley brained human, like this is not just like readily evident to you.

EMILY M. BENDER: Yeah.

ALEX HANNA: Just and so I'm just I'm just uh yeah–

EMILY M. BENDER: So the sort of the train wreckiest part of it to me is, he's like- You know everyone got really upset. It's like you know how dare you sort of throw this text synthesis machine into conversation with you know vulnerable populations?

And so then he sort of does these clarifications and it's like well this would be exempt from IRB review and I know that because I had some other article that I did where we got IRB review and they said we were exempt. It's like it's never on you to decide on your own, first of all and not that IRBs are perfect but in this case what's clearly missing is any evaluation of harms versus benefits, and any informed consent and that's the kind of stuff that IRB is supposed to be on top of.

And then he says we were not pairing people up to chat with GPT3 without their knowledge. And later in that clarification thread he complains that- he claims that it was all done um with the sort of opt-in, full knowledge scenario. But then at that point the claims in the first thread make no sense. Because when people found out means that at some point they didn't know. And it seems like the people who were in the know were the people drafting the messages, not the people receiving the messages.

ALEX HANNA: Right yeah so the people- That doesn't that's not that's not the idea, like that's that's a really twisted you know. Like one, you don't understand consent, two you don't understand informed consent, and you know uh Wolfheart in the chat says "How would this be IRB exempt if they're using human participants?" Aeah I mean.

EMILY M. BENDER: Many of whom were minors!

ALEX HANNA: Yeah oh well that even that even I didn't even know that. Then that actually adds another another wrinkle to it. And if you're doing anything with with health. So first off your IRB is is fucked because they didn't–

EMILY M. BENDER: Well he's not associated with a university.

ALEX HANNA: Okay so they basically so they said a prior study but I don't even know what the prior-

EMILY M. BENDER: Yeah the prior said it was also really sketchy.

ALEX HANNA: But yeah–

EMILY M. BENDER: So one's one's suspicious of the IRB from the prior study. But this was done outside of a university context and so the way our regulations work the I there's no IRB with authority. Um but yeah.

ALEX HANNA: Yeah and so I mean the you know like first off this also I want- I mean this is also important to say that this is a shortcoming of IRBs um you know, because I already these are one if if you are you all are not familiar they are different across different universities, they're supposed to be some standardization but within university offices you know there is enough discretion, where they can choose and allow certain things and IRBs are famously really bad with emerging technologies. I did a study for instance where I studied um Facebook posts from um from social movement actors, activists in places where um it's it's rather repressive.

And those- and I put it through the IRB but then got an exemption because it was social media, even though these are you know because the claim is oh but the data is public and there's actually some I mean there's some good work from folks like Casey Fiesler. And there's a paper by Jake Metcalf in in Kate Crawford to talk about the kind of problems of the IRBs and social media uh social media analysis in particular, that you know there's this notion of public um you know publicness but there's not any kind of secondary analysis of of any kind of harms that are secondary from aggregation. So the more and more we're going to see this generative AI, I mean we're going to- IRBs basically need to adapt to that and there needs to be some kind of guideline that holds people to a higher standard, that's not going to be you know just in the IRB which mostly at the end of the day is a cover your ass tool for universities.

EMILY M. BENDER: Yeah unfortunately. All right I want to spend 30 seconds laughing at Microsoft.

ALEX HANNA: Ok yeah haha!

EMILY M. BENDER: Because they put out this blog post about their timeline of "Key Microsoft AI breakthroughs". "September 2016 Microsoft achieves human parity and conversational speech recognition–"

ALEX HANNA: Oh dear.

EMILY M. BENDER: Yeah "2018 human parity and reading comprehension March 2018 Microsoft reaches human parity in machine translation" which cracked me up because humans don't do machine translation, so what does that even mean?.

And then they run out of benchmarks after a little bit so 2019–

ALEX HANNA: They just started writing shit down.

EMILY M. BENDER: Yeah. No, somewhere they get um human parity in general language understanding June 2019 and then after that it's basically just OpenAI and other product releases.

ALEX HANNA: This is just release benchmarks yeah. So I mean the the the the the the human parody thing I was I I didn't even read after the first three because I'm like I I my my red flags came off, because anything anytime I hear see "human parody" I'm like what does that mean? Which humans? What languages?

Um you know like probably English um and then it just becomes you know these things. And it's actually kind of funny because Microsoft is tooting its own horn here basically every everywhere prior to its partnership with OpenAI. So like Microsoft announces it's Turing natural language generation language model at 17 billion parameters. And the funny thing is like because of the kind of massive size race of all all these different things, you know like you didn't even hear about this model kind of in the in the press. So then you know Microsoft did does what Microsoft does, which is you know it it partnered uh you know didn't buy OpenAI, but had this exclusive licensing thing although they did buy GitHub and then- But the the other thing and I know we're over time, uh spoken like a real Zoom Queen, but like but the the the the the other headline of this is basically how they're embedding uh Azure and uh oh which is their cloud uh and their and their AI as a service-

They're offering these kinds of chat things embedded in an enterprise tools. So this is basically could be seen as like a direct competition to Google's autosuggest or autocomplete that are in Google Docs, you know, because they this is the only way that they hope to have this kind of value add with with Microsoft's tool. And the kind of thing is you know Microsoft has already dominates much of the office market because of basically enterprise lock-in of the use of Office 365 and and all those other tools. But then they're they're basically AI enabling it now and and so I mean I'm curious to see how this is going to develop. But you know like I appreciate flagging the ridiculousness of this timeline.

EMILY M. BENDER: Yeah. They ran out of benchmarks and none of these benchmarks have good construct validity but they ran out of benchmarks so they had to start talking about the size of the models and the OpenAIness of the models, basically yeah.

ALEX HANNA: Exactly yeah.

EMILY M. BENDER: All right. We're past time. We should wrap up. But thanks to all who joined us and uh thanks to those watching the recording and hopefully one day in the not too distant future listening on the pod!

ALEX HANNA: Yeah all right thanks Emily see you all next time!

EMILY M. BENDER: Bye Happy hypey New Year!

ALEX HANNA: Happy hypey New Year!

ALEX: That’s it for this week! Thanks to Dr. Jeremy Kahn for helping us bring the critique.

Our theme song is by Toby MEN-en. Graphic design by NEIGH-oh-mi Pleasure-Park. Production by Christie Taylor. And thanks, as always, to the Distributed AI Research Institute. If you like this show, you can support us by rating and reviewing us on Apple Podcasts and Spotify. And by donating to DAIR at dair-institute.org. That’s D-A-I-R, hyphen, institute dot org.

EMILY: Find us and all our past episodes on PeerTube, and wherever you get your podcasts! You can watch and comment on the show while it’s happening LIVE on our Twitch stream: that’s Twitch dot TV slash DAIR underscore Institute…again that’s D-A-I-R underscore Institute.

I’m Emily M. Bender.

ALEX: And I’m Alex Hanna. Stay out of AI hell, y’all.

Mystery AI Hype Theater 3000

Episode 8: The ChatGPT Awakens, January 20, 2023

Listen to this podcast on