LessWrong MoreAudible Podcast

"MIRI announces new "Death With Dignity" strategy" by Eliezer Yudkowsky

October 19, 2022 Robert
"MIRI announces new "Death With Dignity" strategy" by Eliezer Yudkowsky
LessWrong MoreAudible Podcast
More Info
LessWrong MoreAudible Podcast
"MIRI announces new "Death With Dignity" strategy" by Eliezer Yudkowsky
Oct 19, 2022
Robert

https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy

tl;dr:  It's obvious at this point that humanity isn't going to solve the alignment problem, or even try very hard, or even go out with much of a fight.  Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with with slightly more dignity.

Well, let's be frank here.  MIRI didn't solve AGI alignment and at least knows that it didn't.  Paul Christiano's incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world.  Chris Olah's transparency work, at current rates of progress, will at best let somebody at DeepMind give a highly speculative warning about how the current set of enormous inscrutable tensors, inside a system that was recompiled three weeks ago and has now been training by gradient descent for 20 days, might possibly be planning to start trying to deceive its operators.

Management will then ask what they're supposed to do about that.

Whoever detected the warning sign will say that there isn't anything known they can do about that.  Just because you can see the system might be planning to kill you, doesn't mean that there's any known way to build a system that won't do that.  Management will then decide not to shut down the project - because it's not certain that the intention was really there or that the AGI will really follow through, because other AGI projects are hard on their heels, because if all those gloomy prophecies are true then there's nothing anybody can do about it anyways.  Pretty soon that troublesome error signal will vanish.

When Earth's prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.

That's why I would suggest reframing the problem - especially on an emotional level - to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained.

Consider the world if Chris Olah had never existed.  It's then much more likely that nobody will even try and fail to adapt Olah's methodologies to try and read complicated facts about internal intentions and future plans, out of whatever enormous inscrutable tensors are being integrated a million times per second, inside of whatever recently designed system finished training 48 hours ago, in a vast GPU farm that's already helpfully connected to the Internet.

It is more dignified for humanity - a better look on our tombstone - if we die after the management of the AGI project was heroically warned of the dangers but came up with totally reasonable reasons to go ahead anyways.

Show Notes Chapter Markers

https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy

tl;dr:  It's obvious at this point that humanity isn't going to solve the alignment problem, or even try very hard, or even go out with much of a fight.  Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with with slightly more dignity.

Well, let's be frank here.  MIRI didn't solve AGI alignment and at least knows that it didn't.  Paul Christiano's incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world.  Chris Olah's transparency work, at current rates of progress, will at best let somebody at DeepMind give a highly speculative warning about how the current set of enormous inscrutable tensors, inside a system that was recompiled three weeks ago and has now been training by gradient descent for 20 days, might possibly be planning to start trying to deceive its operators.

Management will then ask what they're supposed to do about that.

Whoever detected the warning sign will say that there isn't anything known they can do about that.  Just because you can see the system might be planning to kill you, doesn't mean that there's any known way to build a system that won't do that.  Management will then decide not to shut down the project - because it's not certain that the intention was really there or that the AGI will really follow through, because other AGI projects are hard on their heels, because if all those gloomy prophecies are true then there's nothing anybody can do about it anyways.  Pretty soon that troublesome error signal will vanish.

When Earth's prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.

That's why I would suggest reframing the problem - especially on an emotional level - to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained.

Consider the world if Chris Olah had never existed.  It's then much more likely that nobody will even try and fail to adapt Olah's methodologies to try and read complicated facts about internal intentions and future plans, out of whatever enormous inscrutable tensors are being integrated a million times per second, inside of whatever recently designed system finished training 48 hours ago, in a vast GPU farm that's already helpfully connected to the Internet.

It is more dignified for humanity - a better look on our tombstone - if we die after the management of the AGI project was heroically warned of the dangers but came up with totally reasonable reasons to go ahead anyways.

QUESTION 1
QUESTION 2
QUESTION 3
QUESTION 4
QUESTION 5
QUESTION 6