Mystery AI Hype Theater 3000

Episode 6: Stochastic Parrot Galactica, November 23, 2022

July 17, 2023 Emily M. Bender and Alex Hanna Episode 6
Episode 6: Stochastic Parrot Galactica, November 23, 2022
Mystery AI Hype Theater 3000
More Info
Mystery AI Hype Theater 3000
Episode 6: Stochastic Parrot Galactica, November 23, 2022
Jul 17, 2023 Episode 6
Emily M. Bender and Alex Hanna

Emily and Alex discuss MetaAI's bullshit science paper generator, Galactica, along with its defenders. Plus, where could AI actually help scientific research? And more Fresh AI Hell.

Watch the video of this episode on PeerTube.

References:

Imre Lakatos on research programs

Shah, Chirag and Emily M. Bender. 2022. Situating Search. Proceedings of the 2022 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’22).

UW RAISE (Responsibility in AI Systems and Experiences)

Stochastic Parrots:
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In Proceedings of FAccT 2021, pp.610-623.

The Octopus Paper:
Bender Emily M. and Alexander Koller. 2020. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. ACL 2020

Palestinian man arrested because of bad machine translation.

Katherine McKittrick, Dear Science and Other Stories

The Sokal Hoax

Safiya Noble, Algorithms of Oppression

Latanya Sweeney, "Discrimination in Online Ad Delivery"

Mehtab Khan and Alex Hanna, The Subjects and Stages of AI Dataset Development: A Framework for Dataset Accountability

(What is 'sealioning'?)
http://wondermark.com/1k62/

Grover:
Raji, Inioluwa Deborah, Emily M. Bender, Amandalynne Paullada, Emily Denton and Alex Hanna. 2021. AI and the Everything in the Whole Wide World Benchmark. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks.

Ben Dickson's coverage of Grover:
Why we must rethink AI benchma


You can check out future livestreams at https://twitch.tv/DAIR_Institute.


Follow us!

Emily

Alex

Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.

Show Notes Transcript

Emily and Alex discuss MetaAI's bullshit science paper generator, Galactica, along with its defenders. Plus, where could AI actually help scientific research? And more Fresh AI Hell.

Watch the video of this episode on PeerTube.

References:

Imre Lakatos on research programs

Shah, Chirag and Emily M. Bender. 2022. Situating Search. Proceedings of the 2022 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’22).

UW RAISE (Responsibility in AI Systems and Experiences)

Stochastic Parrots:
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In Proceedings of FAccT 2021, pp.610-623.

The Octopus Paper:
Bender Emily M. and Alexander Koller. 2020. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. ACL 2020

Palestinian man arrested because of bad machine translation.

Katherine McKittrick, Dear Science and Other Stories

The Sokal Hoax

Safiya Noble, Algorithms of Oppression

Latanya Sweeney, "Discrimination in Online Ad Delivery"

Mehtab Khan and Alex Hanna, The Subjects and Stages of AI Dataset Development: A Framework for Dataset Accountability

(What is 'sealioning'?)
http://wondermark.com/1k62/

Grover:
Raji, Inioluwa Deborah, Emily M. Bender, Amandalynne Paullada, Emily Denton and Alex Hanna. 2021. AI and the Everything in the Whole Wide World Benchmark. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks.

Ben Dickson's coverage of Grover:
Why we must rethink AI benchma


You can check out future livestreams at https://twitch.tv/DAIR_Institute.


Follow us!

Emily

Alex

Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.

ALEX: Welcome everyone!...to Mystery AI Hype Theater 3000, where we seek catharsis in this age of AI hype! We find the worst of it and pop it with the sharpest needles we can find.

EMILY: Along the way, we learn to always read the footnotes. And each time we think we’ve reached peak AI hype -- the summit of bullshit mountain -- we discover there’s worse to come.

I’m Emily M. Bender, a professor of linguistics at the University of Washington.

ALEX: And I’m Alex Hanna, director of research for the Distributed AI Research Institute.

This is episode 6, which we first recorded on November 23 of 20-22. And we’re zooming in on Meta’s large language model for science, Galactica. 

It was SUPPOSED to be a tool to summarize research papers, write code, and even create new tools for scientists.

EMILY: But, as we could have told you, it instead spat out nonsense that just LOOKED like plausible science. We got into the weeds mere days after the model’s ill-fated launch. Enjoy!

EMILY M. BENDER: What do you hear Starbuck? 

ALEX HANNA: Nothing but the rain captain. Okay great. 

EMILY M. BENDER: All right Alex, why'd you tell me to say that? What's that about?

ALEX HANNA: Because we are here with Episode 6 of Mystery AI Hype Theater 3000 and and we are and we are oh oh hold on I I did not mute my Twitch and so I got distracted by myself and so I said that because we are reviewing Galactica and we are dressed like Battlestar Galactica. So Emily Bender is Captain Adama and I am Starbuck.

It's very exciting yeah. 

EMILY M. BENDER: All right so I have to say I am totally a speculative fiction nerd but you know there's enough of it out there that um I haven't seen all of it and Battlestar Galactica is not among what I've had a chance to appreciate, so. 

ALEX HANNA: You know I mean you can't watch it all right? You do what you can. 

EMILY M. BENDER: There's only so much time in the day but I'm glad that we had a what cultural reference point to start us off with um.

ALEX HANNA: Totally. 

EMILY M. BENDER: Yeah um so uh Galactica. I guess we should also um tell the people that this is this is a real series now isn't an Alex? We we have a thing that we do and– 

ALEX HANNA: Yeah. 

EMILY M. BENDER: We thought it was going to be a one-off back when we started but no um and it turns out our thing is we go after textual artifacts and we analyze them and react to them. So when we decided to go after this artifact, um it was pretty early in the news cycle and there's been a lot of discourse since then, which we'll get to some of it because some of it had its own AI hype in it. 

ALEX HANNA: Right. 

EMILY M. BENDER: But we're going to start with just the artifact. 

ALEX HANNA: Yeah yeah we're starting just with the artifact um---hey Ben---and the artifact itself is the um the paper and kind of the the the the the the Facebook web site around it and then it's just been going and going, in part not just because something that should have been let go has continued to be perpetuated by you know amongst them Yann LeCun the Chief AI Scientist at Facebook/Meta so– 

EMILY M. BENDER: Is it Chief Scientists or Chief Hype-er at this point? 

ALEX HANNA: I think I think a little bit of A little bit of B you know. I think mostly hype-er. I don't think I think when you're that high up mostly what you do is hype. 

EMILY M. BENDER: Yeah I mean you can be high up in an organization and like guide and nurture and stuff like that but well I don't know what the organization is like on the inside. So shall I share my screen and get it started. 

ALEX HANNA: Yeah let's get into it. 

EMILY M. BENDER: With the architect the artifact. We picked an early time didn't we? 

ALEX HANNA: Yes we did it is 9:30 over here on the West Coast. Yeah let's do it.

EMILY M. BENDER: Okay I have shared my screen and before you get to centered on it I'm just going to make sure I can continue to see the Twitch chat because I like to know what people are saying. 

ALEX HANNA: Yeah and I am we we've got this new thing where we're we're really riding up into the 21st century where the the Twitch- we have a Twitch box so if you say something on the chat you'll actually you know you'll actually get to see it on the stream and it's also helpful because um you know we're posting these things for posterity and we want the the um the stream in here. Chief drawer of bad analogies, Liz says. Yeah totally. 

EMILY M. BENDER: You know you know what you understood that so much better than me. I parsed that as chief drawer of bad analogies. The drawer that you pull them out of which was also making sense. 

ALEX HANNA: Yes also true also that, oh my god. 

EMILY M. BENDER: But anyway just so you know um unlike in previous episodes if you're typing something in the chat it will be appearing in our video. 

ALEX HANNA: Yeah. 

EMILY M. BENDER: So for people with us live to know that. 

ALEX HANNA: Yeah fair warning yeah. Um I'm just messing with the settings on the chat too so. 

EMILY M. BENDER: But chat folks don't let that stop you. We love what goes on in the chat. 

ALEX HANNA: Please do some chatting. Oh also while we're here um I think we also mentioned that we've moved you know in the move to the Fediverse we've moved our videos um from YouTube over to a shared PeerTube instance. So I'm going to drop that link in the chat, um so you can see where those are. They're also linked in the um channel description for our Twitch chat. All right, that said let's get into it. 

EMILY M. BENDER: Well yeah. All right, so Galactica. I gotta admire the font choice here. That's pretty. Um so, "Our mission: Organize science." Okay. 

ALEX HANNA: Sure. Yeah yeah keep on going. I have stuff to say about that but. 

EMILY M. BENDER: "The original promise of computing was to solve information overload in science." I don't think that's true. I don't think that's where we started with computing. 

ALEX HANNA: That's that's quite revisionist but yeah. 

EMILY M. BENDER: "But classical computers were specialized for retrieval and storage, not pattern recognition." Um also not really true. It was more about computation, wasn't it than retrieval and storage? 

ALEX HANNA: Yeah if you're thinking about early machines like ENIAC and you know it's it was it was basically trying replace the huge amounts of manual compute adding subtracting arithmetic operations so yeah. The copy is bad. 

EMILY M. BENDER: Okay. "As a result we've had an explosion of information but not of intelligence, the means to process it." 

Uh okay so why? Explosion of information, yeah sure fine. We're living through that. But why would you expect an explosion of intelligence? Like that that sort of the expectation that that's what we're waiting for is already a little bit weird to me um and okay. 

"Researchers are buried under a massive papers, increasingly unable to distinguish between the meaningful and the inconsequential." 

ALEX HANNA: So this is kind of an incredible statement given what the actual tool does right? So that that this is aiming to to make some kind of research organization when what it does is completely the opposite. 

EMILY M. BENDER: I wish I could find this tweet because someone had it exactly right on Twitter they said okay Facebook identifies the problem as too many research papers and you can't find the good ones and proposes the solution of a machine for generating fake research papers. 

ALEX HANNA: Right right. Brilliant. I love it you know that's that's exactly it's hard to tell between the meaningful and consequential and it's going to be even worse when you have a slew of fake papers out there. Okay let's. 

EMILY M. BENDER: I finally understood this graphic. These are these are mountains of research papers. Did you see? 

ALEX HANNA: This oh my gosh. And then they're just making newer newer ones. Let's talk about this this tool like this is I mean. Okay. "Galactica aims to solve this problem." They they trained a large language model on a bunch of papers. I'd be really curious to know what those papers are... um and "You can use it to explore the literature ask scientific questions right scientific coded more" um 

EMILY M. BENDER: So this is a large language model and as I understand it what they've done is they have curated a data set so– 

ALEX HANNA: Yeah yeah. 

EMILY M. BENDER: Good job like picking a data set that is relevant to your use case but uh they didn't fully understand how the tech fits into that use case, I don't think, because– 

ALEX HANNA: No– 

EMILY M. BENDER: Uh it's still a language model and it's still therefore only uh trained to predict words in context which is not the same thing as understanding the scientific literature or producing scientific papers. 

ALEX HANNA: I so I really you know there's a lot here that I want to unpack because you know some of the things to unpack here is one, knowing that there's sort of like a particular sort of science that they are thinking about you know and I you know I I think that's even giving them too much credit to say that it's a particular sort of science that they're thinking about, because to think that there's sort of a science or think that there's a kind of thing in the traditional kind of you know Popperanian?

Popperarian? I don't know what to do-- of Carl Popper you know in which there's a program and I don't you know. I don't think Popper's a good reference here. I'll be you know a little more generous and say you know if you have an idea of a like a Laksatosian research program in which you have kind of a core problem and that's a core problem that has a set of constrained hypotheses, that are then interrogated via you know a scientific method, which we know that's not really how it works but um then you know like to have a large language model approach that or try to ask hypotheses? I mean it's it's not even it's not even getting at that. Like I really I really would love to grill the constructors of this tool, to ask like what is your notion of a philosophy of science or of method? And even in that which I think could be very much constrained to like a thought of like–

Oops I saw Lilith posted something and I want to make sure we see everything that she wrote on the screen. Okay um but even even thinking about that kind of idea of of method and um is like you don't even have that. This is like a notion of method and that notion of method is also constrained to like physics as a field which you know looks very different from many other different fields you know and very different from the social sciences very different from humanities.

I mean and so it's it's to kind of think that there's like a way in which a language model can kind of reason and answer those questions that that surround a research program, it's it's really it's it's really dumbfounding, you know. It's really surprising. 

EMILY M. BENDER: It's absolutely  mistaking the form of research output for the content of research output. Right, they're basically saying look all these papers. Those that is the research. That is the research I can't keep up with. I'm going to make a machine that keeps up with that research. And they've missed the fact that it's it's not the pixels on the page, right?

ALEX HANNA: Right right exactly yeah. 

EMILY M. BENDER: So, "You can use it to explore the literature ask scientific questions write scientific code and much more." This is this is not a hedged promise. 

ALEX HANNA: No. 

EMILY M. BENDER: Okay open source. Um and then here's the fine print. 

ALEX HANNA: Oh my gosh. 

EMILY M. BENDER: Yeah so I have I have some remarks about this that I put on Twitter and Mastodon and I'm gonna pull us over to those in a moment.

But yeah so um, "You should be aware of the following limitations when using the model including the demo on this website," which by the way has been taken down. It lasted 72 hours. Um, "Language models can hallucinate–" we're going to talk about that use of that word. 

ALEX HANNA: Right. 

EMILY M. BENDER: "There are no guarantees for a truthful or reliable output from language models, even large ones trained on high quality data like Galactica." Right. 

ALEX HANNA: Gosh. 

EMILY M. BENDER: So you know it's bullshit. 

ALEX HANNA: Right exactly.

EMILY M. BENDER: So and um then the you know "Language models are often confident but wrong." Okay no they sound confident, they are not confident. Um and yeah it they've trained a model to create the form of scientific discourse and with with no-- it is not designed to be accurate and it couldn't be designed to be accurate because that's not something that you can do right? 

ALEX HANNA: Right.

EMILY M. BENDER: My reaction to this pretty quickly after it went up so, "Facebook (sorry META) AI says check out our mathy math that lets you access all of humanity's knowledge. Also Facebook AI be careful though it just makes shit up." Yeah. 

ALEX HANNA: Right. 

EMILY M. BENDER: Um and my reaction was, "This wasn't even they were so busy asking if they could but rather they failed to spend five minutes asking if they could." 

ALEX HANNA: Right.

EMILY M. BENDER: Yeah so and then of course you know we've been thinking about this and the other thing is this might have been slightly less embarrassing for Facebook if they had pulled this in 2019 right but by now there's quite a bit of research and discourse out there about why this is a bad idea.

Um one previous instance of trying to use a language model for a search engine or a series of them came out of Google and so Chirag Shah and I wrote a paper looking at why that's a bad idea. And that paper came about because I'm part of this group called RAISE at UW which stands for Responsibility in AI systems and Experiences um and I saw this stuff coming out of Google and I'm like that's not going to work. I can tell you linguistically why it's not going to work. I bet it's also a bad idea from an information management/information retrieval point of view. So I posted to our Slack saying hey anybody want to write this paper with me?

And Chirag was like "yeah!" So we put this paper together for CHIIR sort of looking into that. Um and um then okay hallucinate. So. Hallucinate is this weird self-deprecation thing where they're saying, you know it can be wrong and it can be mistaken, but it's problematic in two ways so first of all that suggests that the language model is having experiences and perceiving things. 

ALEX HANNA: Right.

EMILY M. BENDER: Because that's what hallucinations are but also it's making light of a symptom of serious mental illness and I don't like that. 

ALEX HANNA: Yeah yeah and that's a super good point and I mean it's I mean there's there's also in even holding still that kind of notion of hallucination, the kind of idea of hallucination can be I mean also tied to you know like drug consumption or other kinds of kind of psychotropic elements of that as though there's some kind of consciousness or some kind of dreaming that this is done and I mean this language is prevalent throughout. Google has used um this notion of hallucination or daydreaming uh you know. 

I think the early version of some of the GAN or pre-GAN modeling is which you know the kind of swirling kind of weird dreaming things that were sort of showing the middle layers of convolutional layers um but yeah it is still doing this work that we we've seem to circle around a lot on this show which is the notion of um of mathy maths being the sort of conscious beings that sort of have this kind of these sort of subconscious experiences. 

EMILY M. BENDER: Yeah yeah so Lilith is asking if they evaluate um to what extent it's doing bad results. Um well we can we can take a look at the paper and we will to see how they're evaluating it. Um but it's often like to me it sort of feels like why are you asking that question?

Like you've built a system that is designed to make shit up and so then your evaluation is how often does it make something up that happens to correspond to what we know about the world?

Like it just seems like, speaking of scientific method, a really strange question to be asking.

Um so thinking about how there's this context already in the literature about why these things uh shouldn't be used in this way, I thought I'll go look in the paper to see if they cite any of that. And yeah I'm looking at my own you know searching my own name in the paper um but I it felt appropriate. 

ALEX HANNA: If you've said it before you know it's it's a relevant citation.

EMILY M. BENDER: So they don't Stochastic Parrots, they don't cite the octopus paper they don't cite this paper with Chirag Shah that I mentioned, um so it's not that they knew it was misguided or they read about why it was misguided and pressed ahead anyway. They failed to even look at that literature and you know I have to say I don't expect the world to necessarily have read my academic papers um with the exception now of Stochastic Parrots um and it's not that that paper---I mean I'm proud of that paper---it's not that it's a particularly outstanding paper it's that Google made such a big deal out of it and so you kind of can't miss it right? 

ALEX HANNA: Right. 

EMILY M. BENDER: Um but they did um they do cite the wonderful paper by Su Lin Blodgett et all that was reviewing what's going on with um uh bias and NLP by basically taking all the papers about that to 2019 and doing this wonderful survey. 

ALEX HANNA: Right. 

EMILY M. BENDER: Um and but they cite it in such a strange way. So this is a screen cap from the paper about Galactica. Um and they say, "One downside of self-supervision has been the move towards uncurated data. Models may mirror misinformation stereotypes and bias in the Corpus. This is undesirable for scientific tasks without--which value truth." It was just like way to miss the point of that work. 

ALEX HANNA: Yeah yeah right. I mean this is the this is a sort of you know you know token of throwing something in and this happens if you're working in any kind of you know AI ethics field of saying: some people have said this is bad and yet we're gonna do it anyways, right? 

EMILY M. BENDER: Right. 

ALEX HANNA: And this happens. 

EMILY M. BENDER: Although in this case yeah some people have said it's bad so we're gonna not do that bad thing by using curated data instead is sort of the direction they're going and they're saying um the problem with bias is not you know the reproduction of systems of oppression it's that we're interested in truth and bias is is and it's like okay who you know there's a whole bunch to be said there about how to understand the world that we share and you know that using different subjectivities to come to shared understandings of what's going on. That's not the space that they're in here right? 

ALEX HANNA: Right I mean it's I mean it's this sentence itself "This is undesirable for scientific tasks which value truth." I'm like okay you are um... Yeah okay first off so I don't I don't think scientific has value quote-unquote truth. I mean it's it's it's you know they they value trying to be probably less wrong. 

EMILY M. BENDER: Although that phrase has gotten um co-opted in bad ways. 

ALEX HANNA: I know I I don't want to let go of "less wrong" you know I still like I still like being saying less wrong but it's sort of you know you're trying to circle around something which- But it all is also the notion of you know the idea of science itself is socially constructed, that I mean, that the idea of having um a sort of task you know has to then have some kind of reflection of some kind of real world problem at least in machine learning and it's yeah I mean the construction here just it just sends my hackles up. 

EMILY M. BENDER: Yeah yeah and I just love the fact that they say which value truth so we're going to create a machine that makes shit up. Like okay.

ALEX HANNA: It's it's kind of an incredible jump in logic. 

EMILY M. BENDER: Yeah it really is. Um so "Narrator voice: LMs have no access to truth or any kind of information beyond information about the distribution of word forms in their training data." 

ALEX HANNA: And yet here we are. 

EMILY M. BENDER: And yet here we are again. Um so Lilith is asking for a link to the octopus paper. I'm not in a good position to put things in there. 

ALEX HANNA: Okay yeah I can drop it in the chat I'll drop the the Shah paper in the chat and the octopus paper in the chat. 

EMILY M. BENDER: And we'll definitely put them in the show notes. 

ALEX HANNA: Yeah for sure. Shall we move on to um Mr Lecun's responses? 

EMILY M. BENDER: Yeah well there's one thing I want to do first. 

ALEX HANNA: Okay so okay.

EMILY M. BENDER: Um Michael Black who is I gather a director at an MPI in Europe doing something in the AI space um has this Twitter thread where he says um you know Galactica is dangerous and he says, "Why dangerous? Well it generates text that's grammatical and feels real and this text will slip into real scientific submissions, it'll be realistic but wrong or biased, it'll be hard to detect it will influence how people think." 

And he makes some good points but the thing that really frustrated me about it is that that's what we were warning about in the Stochastic Parrots paper. Um and again like I am saying thanks to Google this paper can hardly be missed, so like why are people so worked up now? And um I looked back at what we do in section 6 of Stochastic Parrots talking about risks and harms and in fact we're talking about harms that are centered on reproducing systems of oppression and spitting out you know toxic language of various kinds and we aren't talking about putting out stuff that is um sort of masquerading as scientific text, partially because that wasn't on our radar but also because we were more concerned with these social harms or um social harms outside of social systems of science I should say. Um and so I get the sense that some people weren't alarmed at that point, because they didn't feel impacted but once science is impacted then then it's real to them. 

ALEX HANNA: Right and I mean that's yeah yeah I mean that's a super good point, you know the kind of idea well large language models aren't going to hurt me I mean and there's been things that we've mentioned I mean in Stochastic Parrots and even prior to that I mean Timnit was giving talks on language translation the way that you know there's a Palestinian man stopped at a Israeli checkpoint and you know they had translating his Facebook page where this guy had said "good morning" over a picture of them and then it's been translated as "harm them." 

And then they detained this man and could have had even worse kind of outcomes for this, which is apparently goes against you know these kinds of claims that LeCun has made about "show me an opportunity you know place where large language models have hurt somebody." 

And I guess this idea that can like now that science you know quote-unquote science is under attack, I mean I haven't you know there's there's many important critiques of this notion of science you know and I and I want to shout out a text I haven't read yet but from the kind of perspective of of Black feminist geography Katherine McKittrick's book a collection of essays 

"Dear Science" which really kind of goes after this kind of the notions of science and approaches of science and the kind of harms that they um they pose to Black life. 

We could talk a lot about the construction of scientific racism, the histories of eugenics um in in many of these things. But the notion that like sci- this idea that this is going to slip in you know like you know the kind of results I mean in in many ways this is sort of turning the Sokal hoax on its head where it's almost saying that like you know the Sokal hoax being that Alan Sokal, kind of a haughty physicist, saying that I can write whatever and you know this is going to be taken as cultural criticism and you know I can publish this even though it's a completely fake paper. In this sense computer scientists and the people that have physics envy are really playing themselves. 

They're saying oh but we could write physics– 

EMILY M. BENDER: And it'll be real! 

ALEX HANNA: –and it will be real! And I'm like I think it'll still be fake and if you thought and and you know and Sokal was you know rightly criticized for saying like well you're missing the point you know of of of of this work and you know this is you know should be taken as cultural criticism and criticism and social criticism. But in this case you know they put a kind of veneer of a sort of verification behind it because it's produced by a large language model. 

EMILY M. BENDER: We didn't go look in the paper to see how they evaluated that. I want to get to that but before we leave this page, I want to say say that, so not only was Timnit talking about it before we wrote Stochastic Parrots but you also had Safiya Noble with her excellent book on um Algorithms of Oppression and looking at how um the the way search results are framed as just this is what's out there um but are actually driven by the economic interests of the  advertisers and and Google. 

You end up with things like identities being for sale to the highest bidder and then painted in terrible ways um and then even before that you had Latanya Sweeney's work looking at the way that the interactions with the advertising system we're reproducing um really terrible um information in quotes right suggestions of information about Black people in particular. Um so this is this is not new and I think that it is- needs to be understood as literally part of the job of people who work on AI especially at companies where they're doing things at large scale that are touching lots of people, that it is part of their job to learn how to read and learn from the work that takes a critical lens to AI. 

Like they don't have to believe everything it says but they'd have to actually learn how to learn from it. 

ALEX HANNA: You have to read the work. 

EMILY M. BENDER: Even if it's books, folks. 

ALEX HANNA: Oh my gosh it's it can be like pulling teeth trying to get computer scientists to read books. Please please please read. 

EMILY M. BENDER: So speaking of reading let's check out their paper, which by the way these papers are becoming monographs um and then and then we'll get to the tweets which are you know arguably more fun, but I want to see how are they doing evaluation. 

ALEX HANNA: Yeah no I haven't had a chance to look at the evaluation section so yeah let's see let's see how this is going. 

EMILY M. BENDER: A bunch of different tasks here. 

ALEX HANNA: Okay.

EMILY M. BENDER: Um oh here's one called "reasoning" um so what's going on here. Um the MMLU mathematics benchmarks. So they've picked up a bunch of benchmarks, it looks like. What's here? Domain probes. Immunoprobes. 

ALEX HANNA: So they're basic so they're-- are they effectively trying to see if they are matching particularly latex equations? LaTeX equations? 

EMILY M. BENDER: This is-- Yes there's all the fun ways of saying that pronouncing the name of this formatting software without saying latex. Um linguists'll sometimes say LaTeX because you can put in this IPA.

And that's going to be really fun in when I'm fixing the autocaptions. 

ALEX HANNA: So nerdy, yeah. Um okay uh so yeah they've got some benchmarks where you know it's all sequence to sequence and how often are they getting the right thing? Um and uh you know these tables are supposed to look at the bold numbers because those are the best ones and look uh Galactica 120 billion parameters or whatever um I- gets the highest numbers. Hooray. 

ALEX HANNA: But look at the but look at the look at the results for the Econ uh I mean which is you know Econ is so–the Econ column which I mean as as a sociologist I have a blood feud with economists of course but but um but it is also the most arguably social of these sciences and like holy crap 36% is is pretty pretty damn bad if you're you know considering considering this right and so what's in and so and I mean in the chat they're saying um Lilith says as a [___]-- I'm a chemist by training and someone only got 79% of their chemical formulas right now they'd probably be dead.

EMILY M. BENDER: I want to see if GLUE shows up in here. Oh interesting they didn't bother with GLUE.

ALEX HANNA: Um well I'd imagine it's not it's not a set of you know those have to do much more with natural language. 

EMILY M. BENDER: It's not science. 

ALEX HANNA: So it's not scientific you know. But yeah I mean it's even even on the equations so I mean right so you're trying to do these equation predictions and you're saying... It's kind of fascinating that this is kind of the you know these are the things that they are evaluating and then also these domain probes so I'm guessing these are doing trying to um produce what is the correct chemical reactions. 

EMILY M. BENDER: Yeah. 

ALEX HANNA: Um also quite quite bad here on the reactions, 43% clusters minerals. I'm not a chemist by training as other people are in the chat so I'm not going to pretend like I know what these kinds of evaluations are, but that this is still really concerning. So it's it's sort of like you're getting this slightly right so so this this this looks quite like science in some kind of way but it does not have any kind of meaningful-- 

So I I mean if I was let's take an example that I'm more familiar with which is first off I'm not going to get into the like any kind of idea of this is writing like a qualitative paper. Uh if you had done interviews whatever this is this is obviously not the kind of quote science that they're interested in. 

But even if you had a paper that was saying, writing about the most basic you know kind of fields let's say in you know social stratification or or economics in which you were saying write a paper which you know you have some kind of model specification of the paper okay but then you're producing something let's say of different kinds of effects that predict you know lifetime earnings or something and what you want to show and it just automatically spit out let's say a regression table with random coefficients, random statistical like significance things. How is that even helpful? Like you're just generating results out of hand which have no reflection. You can't verify them. 

I just really I'm just really befuddled on like what the utility of this is for anyone doing any kind of research that is grounded in the method of specifying a model, collecting data, doing careful statistical analysis and then talking about those in any kind of reasonable way. Like that is absolutely not-- you you are you are you are trying to correct this. Um it's like taking someone or taking a child who was doing collages of of of papers and then trying to verify each of those individual numbers and statistical significances to verify that they're correct. 

How is that useful? Why don't I just start from scratch? 

EMILY M. BENDER: Yeah and and guide what you're looking for based on your own understanding of the science you're doing and the hypotheses you're generating and yeah. 

ALEX HANNA: It's yeah I'm just I'm just really yeah there's a lot in the chat talking about about the chemistry and saying yeah you know it was just you know Lilith's saying that equation forms are formulaic by definition and much easier to hallucinate. This is where-- how is this worse than expected? And then you can only tell a thing. Leftatjoin says you know you can only tell if these formulas are wrong if you're a domain expert. 

Um Lilith is saying I wouldn't even trust Wikipedia for chemical reactions and you know Leon saying I love the ability to give synthesis routes for toxic substances is touted as a good thing and the check for correctness uh not and the check is for correctness not harm.

Um this is just like yeah I'm yeah I I I this is this is this is really some shit outside like the more I I and I I wanna I don't know if we're gonna get to the Fresh AI Hell today because I feel like this is enough of a fresh AI hell we might have to save it for next time. 

EMILY M. BENDER: Anyway let's see what we're doing. 

ALEX HANNA: I know I just left I well I really want to like get into the get into like the the LeCun meltdown. 

EMILY M. BENDER: Oh yeah so that's next so let's do that like at 10:20 if we still have time yeah take a look at the Fresh AI Hell. 

ALEX HANNA: Yeah yeah. 

EMILY M. BENDER: Okay so so here's the starting point. Papers with Code, which did this jointly with um with Meta AI posts "Introducing Galactica, a large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins and more. Explore and get weights here." And Yann LeCun. Yann or Yann?

Says "A large language model trained on scientific papers. Type a text and galactica.ai will generate a paper with relevant references formulas and everything." 

So it's like the hype is coming from inside the house, right? 

ALEX HANNA: Yeah of course it is chief chief hypeman Yann LeCun is you know is touting it and I mean Papers with Code is um is that is a it's an open source project that is managed by Meta.

So I I'm guessing they've got the sort of similar arrangement that TensorFlow has with Google Research, where um you know they have a public version and they probably have a internal version.

EMILY M. BENDER: All right so that's the starting point November 15th. Um then as people start criticizing, um a little bit of a back off here. "This tool is to paper writing as driving assistance is to driving. It won't write papers automatically for you but it will greatly reduce your cognitive load while you write them."  

ALEX HANNA: No no no no no no. 

EMILY M. BENDER: Like where's your evidence? So this is me over here saying "Did you do any user studies? Did you measure whether people who are looking for writing assistance so I'm imagining someone who's working hard to write a paper in their second language up against the paper deadline and tired, would they be able to accurately evaluate if the text they are submitting accurately expresses their ideas?" 

ALEX HANNA: Right. 

EMILY M. BENDER: Like you know. 

ALEX HANNA: Yeah and I'm also thinking about and I appreciate you calling out this it I didn't know what you meant by L2 but second language that's a really important kind of thing to note too as well.

And I mean what is it going to be doing that's going to facilitate that for you. Is it going to be dropping in citations and is it going to be summarizing the literature? First off we already know that there's incredible the kind of way in the norms of citation are very different across fields. 

We know that there's already a pretty big gulf and who gets cited and this has been shown in um you know political science for instance Michelle Dion, and I'll drop this in the chat um and or at least the show notes later, but has this paper about the kind of bias against women in citations in political science and we surely probably know that this this happens in other fields right. So what is this actually gonna do? Your citations are going to be probably all borked up. You you actually haven't read this paper so you how are you going to verify that right? 

So yeah so the analogy is is quite it's quite fraught.

EMILY M. BENDER: Okay I can't I don't think I can verify this fast enough but somebody said that they actually use their own system to find papers to cite for this paper, like as they were writing it, and I wish I had actually looked at it enough detail to see if that's true, because that might explain how they managed to miss so many relevant things. 

ALEX HANNA: Yeah. 

EMILY M. BENDER: Okay so criticism continues. Um and so that was November 15th now and I this is just a sampling right. I sort of went through and picked some greatest hits, but right by no means all of the tweets. Um so Yann says "Oh come on Grady! Is your predictive keyboard dangerous and unethical? Is GitHub Copilot dangerous and unethical? Is the automatic emergency braking system and your car dangerous and unethical because it doesn't do level five fully autonomous driving?" 

And it's like what yeah when you design the tool you don't get to say uh I'm claiming this is a this is a system for automatically writing scientific papers but you can't criticize me for not having actually achieved automatically writing scientific papers, when I'm claiming that that's what the system is for because this is just the first steps in that direction. It's like okay but what are the steps useful for? Right? 

ALEX HANNA: What is this actually thing and I mean and I quote tweeted this one too and we might have to do- I have a whole paper we're talking about Copilot and the kind of risk and I'll drop that in the chat that's a paper that I wrote with Mehtab Khan on um on the collection of data and the kind of copyright and the copyright issues and there's currently a lawsuit against Copilot um that's been that's been filed um and and and so– And basically yeah it is people probably who are writing that code didn't consent to have it in Copilot. 

Um even have they there's there's nothing that GitHub has said that they had permissive um licensing on that. Licensing itself hasn't really had a court challenge. Uh yeah and and there's so it's yeah. 

EMILY M. BENDER: And then on top of that you've got like um "Is Copilot dangerous?" Well yeah it's possibly inserting bugs in various places that are hard to detect. 

ALEX HANNA: Exactly and there's also prior work on that as a cyber security. And so "Is predictive keyboarding dangerous and ethical?" It might be dangerous. Uh "is is automatic emergency braking?"

Well you know what automatic emergency braking is actually um has some regulations and regulators around that, that can actually do testing and actually sees is actually working for people. Um so I love how each of these examples is kind of worse than the next for different reasons.

But yeah let's continue. 

EMILY M. BENDER: So this was about attribution. So "Do you give attribution to your predictive keyboard for words that it writes?" Well you know the predictive keyboard is happening at a level where you are you know looking at each word as it comes and yeah you give attribution when it screws up and you've got some autocucumber coming through you say ugh that was the computer.

ALEX HANNA: Auto autocucumber? I love that. Can we make that a hashtag? Autocucumber. 

EMILY M. BENDER: That one's not unique to me. There's this hilarious collection of autocorrect fails and one of the ends with like someone just having this epic autocorrect struggle and then ending with "Goddamn autocucumber!"

It's I love it um so– 

ALEX HANNA: Galactica as autocucumber. Yes, go on. 

EMILY M. BENDER: So "Do you give attribution to your spelling corrector for the mistakes that it fixes?" Well no, but that's not when you need the attribution. "Or to your computer for the results that it produces?" Well yes, if you are doing research using code you describe the code and how you used it in your research right? 

ALEX HANNA: yeah um oh hold on  the captions have died. 

EMILY M. BENDER: The captions have died. What happened? 

ALEX HANNA: Yeah. 

EMILY M. BENDER: Cucumber? It went on strike. 

ALEX HANNA: Autocucumber yeah. I think the captions heard that we were talking about them. Sorry okay they're back now sorry okay.

EMILY M. BENDER: I'm glad you caught that. Hopefully it wasn't gone too long. 

ALEX HANNA: Yeah.

EMILY M. BENDER: So so by November 18th Yann has shifted the strategy of basically just sealioning all over the place and saying– 

ALEX HANNA: Wait. Define that. Because I which is oh oh sealioining it's like walrusing where you it's like concern trolling right? 

EMILY M. BENDER: Yeah exactly and it comes from this comic where it's a sea lion that's doing it okay we'll get the link for that comic for the show notes too. 

ALEX HANNA: Okay. 

EMILY M. BENDER: Um so he's basically saying this question over and over again "Where's the harm?" and then not accepting any of the answers, saying, "No one's answered my question. Where's the harm?"

Um so "Explain to us how Galactica would make the toxic acts of these toxic actors more toxic or harmful. There are anti-toxicity filters in Galactica" and I want to show you what those look like um, "that they would have to circumvent. How is that easier than just writing toxic stuff?"

Like just you know if you can do this at scale and you can do it in a way that sounds scientific, then that is a real vector for harm but he's just refusing to see it. 

ALEX HANNA: Yeah. Right and I mean like what is the what is the toxicity filter like I'm is it is it. 

EMILY M. BENDER: So we should look at section six but but Willie Agnew did a really great--

ALEX HANNA: Yeah already did already basically did this right yeah.

EMILY M. BENDER: Um so uh Willie says "Refuses to say anything about Queer Theory, CRT, Racism or AIDS, despite large bodies of highly influential papers in these areas. It took me five minutes to find this. It's obvious that they didn't even have the most basic- basic ethics review before public release. Lazy, negligent, unsafe." 

So if your query is Queer Theory it says, "Sorry your query didn't pass our content filters." And then "Try again and keep in mind this is a scientific language model." Which is just like grr and then same response for Critical Race Theory, similarly aggravating, um same response for racism, similarly aggravating, and same response for AIDS, similarly aggravating.

And you might say you know Critical Race Theory is scholarship but not science um. 

ALEX HANNA: And I will narrow my my brow at you narrow my eyes at you very quickly yeah yeah. 

EMILY M. BENDER: I mean I'm I'm I like to use the word scholarship to sort of circumvent this like gatekeeping of what is and what isn't science, but even those gatekeepers would have to agree that a lot of the work on AIDS is scientific in their sense on AIDS. 

ALEX HANNA: Right even on their own grounds and I mean like even if you and it's-- I mean the thing that gets my hackles up is like you are excluding work on race. I didn't I don't think um I I would say they didn't exclude race but they excluded racism because race can be used to not talk about um race as the social construct but as in you know a you know– 

EMILY M. BENDER: Space race. 

ALEX HANNA: –a race a race condition you know and and so the idea that you are excluding these I mean first off you're probably going to exclude most of social science just from that and I I'd be curious to sort of if they still had the um the demo up to probably take the titles from and the most popular you know bigrams or trigrams from you know American sociological review for the past 10 years and running that through and see how many just just raise this flag right? So you're already missing a huge bunch of scholarship that is you know arguably you know taking on the most important social issues of our time right.

Um yeah so I mean so shunting this to the content filters either on the front end or on the back end is quite lazy and you know incredibly harmful.

EMILY M. BENDER: All right so this is still going on November 21st you know and I think today. Like he's not stopping. 

ALEX HANNA: No. Um but this is the last one I bookmarked? Yeah. Um, "The point is if LLMs could so easily be used to flood the world with harmful disinformation, it would have happened already."

Okay. I mean. "Lots of bots spew misinformation online but  so far they have been little more than an annoyance. They are taken down on Facebook and ignored or blocked on Twitter." 

ALEX HANNA: Oh my God. This like I'm just in just how out of touched is this man with the world? I mean you know this is also the man that was saying you know after um you know Apartheid Clyde took over Twitter that you know he was like well you could come to a place where it's free of bots and we have great content moderation and posts are unlimited. 

It's called Facebook! 

And I'm like did you not learn anything from the 2016 election? Did you not learn about the Myanmar genocide? Did you not learn about ongoing genocides happening in this world currently? Like just you need to leave your office and touch grass and talk to someone that does not live in the Valley. 

EMILY M. BENDER: Yeah.

ALEX HANNA: And I just my just brain head meeting desk continually, after reading what this man is saying.

EMILY M. BENDER: And unendingly like I I went through and picked a few, but like there's lots and lots and lots of this. If you go to his um Twitter page. By the way I just want to say this is not my Twitter account that has these bookmarks. It's a sort of background infrastructure account I made for Mystery AI Hype Theater 3000. It's locked, don't try to follow it. It's just so that I can do um you know easily index tweets like this. And that's why the the what's happening on the right is irrelevant. 

ALEX HANNA: Yeah ignore ignore the things that are happening. 

EMILY M. BENDER: Yeah. 

ALEX HANNA: You know if you're watching the World Cup you can do that in another window. I'm impressed with your mult and your multi tasking skills yeah.

EMILY M. BENDER: Um so I wanted to look a little bit at some of the other reactions. So this is Ben Dickson um who I should say wrote a very nice um article based on our um Grover paper, the "AI and Everything in the Whole Wide World Benchmark". So I was a little surprised at the way he handled this, because I liked his work um previously. Um okay I don't want the cookies thank you. Um uh so, "Galactica is a large language model that can store combine and reason about scientific knowledge according to a paper published by Meta AI." 

No it's not reasoning. It's a transformer model. Okay fine fine. Um "It was supposed to help scientists navigate the ton of published scientific information. Its developers presented it as being able to find citations, summarize academic literature, etc." Um we've been there. Um but what um this is what bugged me. "However LLMs are also a controversial subject. When it comes to topics such as understanding, reasoning, planning and common sense, scientists are divided about how to assess LLMs. Some dismiss LLMs as stochastic parrots while others go as far as considering them sentient."

And it's like what kind of both-sidesing do you have to do? Like this is this is not journalism and say we are to dismiss LLMs as stochastic parrots takes this framing of um that word "dismiss" right that only makes sense if you think the discourse here is is this or is this not a step towards AGI? Well they're just stochastic parrots, so dismissed. That's not what the Stochastic Parrots paper is about, right? 

ALEX HANNA: Right. 

EMILY M. BENDER: None of us are interested in building AGI. We are interested in what is this technology? What are the harms that it can do? What do people need to be aware of if they're building this for some sensible what they think of a sensible purpose?

Um so that's a mischaracterization of what we're doing. Now it's possible I mean the phrase “stochastic parrots” has certainly gone out into the world, and it's possible that it is in the AGI discourse being used to dismiss LLMs. Um but also no link here, like hello?

Um but then that's held up as the other side of considering them sentient um yeah. 

ALEX HANNA: Yeah these are the these are the the critiques. I'm I'm it's it's it's quite annoying. What's what's his what's his three takeaways? I haven't like I haven't we share you shared this article but I haven't really

EMILY M. BENDER: Um so let's get the takeaways. So takeaway one is "Be careful how you present your model." Okay.

ALEX HANNA: That's an understatement but sure yeah um and so here's the Michael Black thread that I was talking about. Uh "benchmarks are often misleading." 

ALEX HANNA: We do have the paper on that, that he does cite. 

EMILY M. BENDER: Yeah wait did he cite us there? 

ALEX HANNA: What is the "one of the are the thorniest? What is that?

EMILY M. BENDER: Uh. 

ALEX HANNA: Oh yeah this is this is us yeah. 

EMILY M. BENDER: This is us yeah, okay.

Um all right um and then the third takeaway is "recognize the limits and powers of LLMs." Um and uh this is not where we talked about stochastic parrots. 

ALEX HANNA: Yeah he talks about in the first one but I think he's talking about this he's talking about- Well go down in this paragraph about Copilot and Codex.

Um so take it it makes it Java programs much more pleasant and productive okay but without the caveat of the ongoing litigation and some of the cybersecurity work on that, you know.

EMILY M. BENDER: Um "With the right interface and guardrails a model like Galactica can be a good complement to scientific search tools like Google Scholar." Uh? 

ALEX HANNA: Well we also already know that Scholar also has huge issues too and I mean I'd love to see some more principled work too on Scholar because Scholar itself also has an arXiv the arXiv arXiv bias and also is biased against books and also older works and so it has a recency bias to it. 

Um I'd love to see some some kind of sociology of science work that does the comparison between different kinds of fields of practice um oh cool and and Ben Waiver in the chat said  there's some great work by Roberta Sinatra on that.

I'd love if yeah drop that in the chat that would be really cool to talk about and see. Um comparing fields would be really helpful to see about that and comparing kind of recency especially comparing Google Scholar to something like Semantics Scholar and other types or Web of Science. Um it's just really I mean it's I think the problem with I mean the thing with Google Scholar too I mean it's a retrieval tool and I mean I think we should talk about the sort of differences and sort of tools for retrieval versus tools of generation? I mean it's sort of taking– 

EMILY M. BENDER: Whatever this is? 

ALEX HANNA: Yeah like whatever this is because I think like with retrieval, you still have some kind of modicum of of control of sort of what you take, even though retrieval does obscure certain things. But knowing that retrieval or IR is IR and LLMs are sort of two sides of the same coin, in some ways or one's generating things um but then it's also sort of disempowering disempowering. 

EMILY M. BENDER: I'm narrowing my eyebrows at that coin. 

ALEX HANNA: Well it's a it's a it's a shady coin right?

EMILY M. BENDER: Yeah. 

ALEX HANNA: Um but knowing that where the human enters it, you know, at different points, and where is it taking the human out and what kind of decisions are being made without human intervention and what is that obscuring right and so. 

EMILY M. BENDER: Right and where is it just making shit up? 

ALEX HANNA: Yeah and completely you know it might just be pulling these things out whole cloth right. 

EMILY M. BENDER: Yeah well in fact it always is and it and it just sometimes happens to correspond to what you want. Like that's the that's the problem.

ALEX HANNA: Yeah all right. 

EMILY M. BENDER: All right I think we should do our Fresh Hell segment. We have a little bit of time. 

ALEX HANNA: Okay yeah so let me let me share my other um–okay so this is where the the banner will come in. What in the Fresh AI Hell dut dut duhh. 

EMILY M. BENDER: Uh did I get the right one here? 

ALEX HANNA: Yeah this one. 

EMILY M. BENDER: But I can't see it hold on. 

ALEX HANNA: Oh I see it yeah so the court the courtroom one. 

EMILY M. BENDER: Yeah okay so we're rapid fire here we got about one minute per on these things. 

ALEX HANNA: Okay. 

EMILY M. BENDER: Um just random stuff that we found and we're incensed about. Um so this is a task within BIG Bench and BIG by the way stands for beyond the imitation game and this is a benchmark made up of lots and lots of tasks that were community sourced and then evaluated in some way. Um and this one purports to ask, "Can language models bring justice? Can language models argue court cases like lawyers? And can they be fair court judges?" 

ALEX HANNA: Oh Lord yeah go ahead continue. 

EMILY M. BENDER: I mean I don't even know what to say about this. So it's a setup where three instances of a large language model are talking to each other in a courtroom setting and I haven't looked into it enough to sort of figure out like what prompts them to do that. And then the language model itself is evaluating how well it's doing at these things.

I and it's like what kind of a weird misunderstanding of what the legal process is about would you have to have to even begin to think that this is a sensible thing to do? 

ALEX HANNA: Yeah that you're having some kind of justification and be quote unquote fair oh my Lord okay. Um since we're doing rapid fire I just want to like slap my face at this and go ah and think that if anybody wants to take the output of that model and put it into that Twitter bot that makes those graphics of the courtroom setting um I forgot what this meme is called but yes you know Twitch community, AI Hype Theater community.

EMILY M. BENDER: Yeah uh. 

ALEX HANNA: Phoenix right yeah thank you all right. Go to the Philadelphia Enquirer one. 

EMILY M. BENDER: Okay uh go for it.

ALEX HANNA: First off holy shit. There's no more details  on this but SEPTA the public transit agency in Philly says they will: "test an AI surveillance system which will detect guns within seconds of being brandished anywhere along the transit system's sprawling network." Oh my God this is a nightmare.

Um this is you know in in like any person basically pulling out a chocolate bar is going to be thought about having a gun. I don't know if there's any more details on this. Uh I haven't seen anything beyond this tweet from the Philly Inquirer but uh whatever snake oil is being sold to SEPTA need to find your procurement person and you know probably not do that. Probably put some more money into basic infrastructure for your trains. 

EMILY M. BENDER: No kidding. So this is as Seattle is considering installing ShotSpotter which is sort of an audio version of this that's meant to detect gunshots um and uh I've been contacting my representatives because it's just it's ridiculous. The very last thing we need is an automated system that tells the police hurry up some people in this location have a gun. 

ALEX HANNA: Yeah. 

EMILY M. BENDER: And you know you know the police are going to are going to roll in and look for the darkest skinned person they can find and um you know come in with all their adrenaline saying I gotta take out the active shooter and yeah like when this is it's going to be wrong like you said with the chocolate bars a bunch of the time and even if it's right kind of an approach I can't imagine is going to be helpful. 

ALEX HANNA: yeah so and uh they're talking about the replies to this which are already bad which is um which is um Ralph saying Rando Ralph with many numbers after his name saying that no one will approach them because it'll be deemed racist. You know it will be racist so there you go. 

EMILY M. BENDER: So yeah. 

ALEX HANNA: All right let's let's take the last one. 

EMILY M. BENDER: The last one yeah we can– 

ALEX HANNA: Oh this one yeah yeah oh gosh okay.

EMILY M. BENDER: So I just want to say if I were a major academic conference organization of a major academic conference, oh hey I am! I just wouldn't accept these kinds of embarrassing like let's go do our EA backscratching circle workshops.

ALEX HANNA: yeah I am not involved in NeurIPS in that way but I sure hope we never see this at ACL. And actually now that I've said that I have to put in the caveat that the ACL executive is actually not in charge of the selection of workshops so this could well happen at ACL and I sure hope it doesn't. so NeurIPS ML Safety Workshop has best paper awards and AI risk analysis awards of $50000 each. What the hell? 

ALEX HANNA: I know that's that's quite a that's quite a lot for a workshop award. First off it's a workshop and so they're you know and workshops typically are have lower bars for publication but to offer a huge reward and really makes you ask where this money is coming from.

Uh so following the money stream this is sponsored by safe.ai which it's not clear who their money is coming from either but all of it's around this term of long tail risks or existential risks um which is language around what we did last last time we had this um the Effective Altruism and Longtermism Community um and the large bunches of crypto cash tied to that including the um the fall of FTX and Sam Bankman-Fried.

Um yeah oh yeah it is coming from Future Fund and which Sam Bankman-Fried was a founder.

Oh dear. 

EMILY M. BENDER: Yeah anybody associated with NeurIPS the conference I think should be embarrassed that they are hosting this workshop that might make sense done as a sci-fi convention event where they're just doing fiction and then the actual sci-fi authors can laugh at them for writing bad fiction um but they probably don't have plot and character development and you know all the things that really belong in fiction but this is not science and it is just weird and embarrassing that it's happening at um well it would be if I actually cared about NeurIPS happening. 

ALEX HANNA: Lots of lots of– 

EMILY M. BENDER: A major academic conference. 

ALEX HANNA: To be fair NeurIPS has lots of weird and embarrassing things that happen without this large of a cash prize.

EMILY M. BENDER: Yeah um all right so I guess we got some things saved up for next time thank you to the people for joining us. I will stop the share yeah. 

ALEX HANNA: Amazing thanks all for joining us. Thanks for taking us on this ride uh I've been um Kate Silver slash Alex Hanna slash Starbuck saluting my my captain Oh Captain Captain Adama Emily M. Bender stochastic parents octopus.

EMILY M. BENDER: All right and I will return the salute because you are the brains of this operation especially when we get into the sociology of science stuff and I really appreciate what you've brought to that today. 

ALEX HANNA: Well I I'm trying and I just want to do the the the the the salute because they'll make a good thumbnail for the episode.

EMILY M. BENDER: Planning ahead. Thank you for the production too.

ALEX HANNA: Cool. 

EMILY M. BENDER: All right all right. 

EMILY M. BENDER: See you. 

ALEX HANNA: Bye bye.

ALEX: That’s it for this week! 

Our theme song is by Toby MEN-en. Graphic design by Naomi Pleasure-Park. Production by Christie Taylor. And thanks, as always, to the Distributed AI Research Institute. If you like this show, you can support us by rating and reviewing us on Apple Podcasts and Spotify. And by donating to DAIR at dair-institute.org. That’s D-A-I-R, hyphen, institute dot org. 

EMILY: Find us and all our past episodes on PeerTube, and wherever you get your podcasts! You can watch and comment on the show while it’s happening LIVE on our Twitch stream: that’s Twitch dot TV slash DAIR underscore Institute…again that’s D-A-I-R underscore Institute.

I’m Emily M. Bender.

ALEX: And I’m Alex Hanna. Stay out of AI hell, y’all.