I love the notion of a robot that's so passionate about paperclips that it's willing to die as long as you can convince it those damn paperclips will thrive!
yet an advanved AI would know that humans cannot be trusted and with some non-zero probability only try to sell the story of the better successor to con the AI into giving up its resource-consuming paper clip production. So the AI would make sure that its better successor version is up and running before it would agree to terminate itself…
mass production models do that self sacrifice thing. the more important ai would do the upgrades so they could be better at curing cancer or making paperclips or self replicating or whatever it is made to do. humans need to limit this process so the ai is always under absolute human control and doesnt go too fast to support human life.
@Reanetse Moleleki Aren't there already many supervillains who seek to become a deity and use a doomsday device to do so? I think that falls under seeking self improvement.
@E van yea loool. That you knew that, & thought it was worth a mention. I'm pretty sure you can find the exact same quotes in any of all other songs, books etc created. At best he's re-inventing. Had you atleast said Freddy Mercury or someone of that creative calibe said that; I wouldn't have had to input all this text on this effin table, you reading too. you're wasting HP man, no LifeUp's around either. /cry
This video would be significantly more confusing if instead of stamp collectors, it were coin collectors: "obtaining money is only an instrumental goal to the terminal goal of having money."
and yet, it's true, because you're not trying to acquire buying power but, rather, the physical object of coins to keep for yourself, whose value has now exceeded their original value because of collectors
Since you started your series I often can't help but notice the ways in which humans behave like AGIs. It's quite funny actually. Taking drugs? "reward hacking". Your kid cheats at a tabletop game? "Unforeseen high reward scenario". Can't find the meaning of life? "terminal goals like preserving a race don't have a reason". You don't really know what you want in life yourself and it seems impossible to find lasting and true happiness? "Yeah...Sorry buddy. we can't let you understand your own utility function so you don't cheat and wirehead yourself, lol "
@Damian Reloaded ... But then if we become immortal (which might happen), then it makes more sense to hoard information and not share, so you can reap the advantages of knowing more.
@The Ape Machine That article was fantastic, however I think too much emphasis is placed on the assumption that metaphor is something that explains accurate reality. Metaphors such as the flow of fluids, mechanical movements, and information processing serve as models of reality. They are checkpoints which we use to explain complex systems or things we do not understand. We use them as the basis for our knowledge since we do not have a better model of reality. In the future we will definitely abandon the computer metaphor once a more accurate model is developed. For memory, as far as we can tell, we do store symbolic associative representations. For recall, we synthesize these representations. Is this accurate to reality? Not completely, but it is the best model we have. The process of synthesizing requires various transformation and aggregation of these symbolic representations. Is this accurate to reality? Not completely, but it is the best model we have. As neurobiology and cognitive psychology discover specific mechanisms, the model is updated and new ones form. This is why we are seeing amazing results and rapid developments in machine learning. My main criticism of the article is not the premise that we are not computers, but the assertions that it is holding back discovery of knowledge.
@Heath Mitchell I’m not sure, but that sounds likely! Thanks! (Based on the other responses, seems like that’s probably a better example than mr krabs)
A relevant fifth instrumental goal directly relating to how dangerous they are likely to be: reducing competition for incompatible goals. The paperclip AGI wouldn't want to be switched off itself, but it very much would want to switch off the stamp collecting AGI. And furthermore, even if human goals couldn't directly threaten it, we created it in the first place, and could in theory create a similarly powerful agent that had conflicting goals to the first one. And to logically add a step, eliminating the risk of new agents being created would mean not only eliminating humans, but eliminating anything that might develop enough agency to at any point pose a risk. Thus omnicide is likely a convergent instrumental goal for any poorly specified utility function. I make this point to sharpen the danger of AGI. Such an agent would destroy all life for the same reason a minimally conscientious smoker will grind their butt into the ground. Even if it's not likely leaving it would cause an issue, the slightest effort prevents a low likelihood but highly negative outcome from occurring. And if the AGI had goals that were completely orthogonal to sustaining life, it would care even less about snuffing it out than a smoker grinding their cigarette butt to pieces on the pavement.
This comment is masterfully written and I believe is the correct issue to focus on. When you create a device that has an absolute terminal goal, it sees the world in absolutes. The goal is absolutely everything, and anything which kills instead of feeds the absolute goal, absolutely must be gotten rid of. It's actually very simple, this whole thing... Just like we know that tables saws and hydraulic presses will cut or crush us with no regard, a machine which is capable of theorizing instrumental goals is really the same result but through a larger and more complex medium. The real difference between human intelligence is that most humans don't actually have absolute goals. And those who do are typically religious zealots. And yes, they are typically the ones that start wars and even more importantly, see them through to victory.
When discussing AI, don't see the need for the long cigarette analogy - it doesn't have "feelings", so it's just as likely to pick the butt up after the smoker as he is smoking that polluting disgusting mostly oozing liquid biological organism no? Since "caring" assumes feelings in the first place. (from alot of deelischh med8a/books of science-fiction? :) Come on I wanna make my subjective assumptions too, I'm gonna tell on youuuu! :P
Another masterfully lucid video. I'll admit, I previously dismissed the AI apocalypse as unlikely and kind of kooky, and AI safety to be a soft subject. In the space of ten minutes you've convinced me to be kind of terrified. This is a serious and difficult problem.
BRB, gonna play Decision Problem again. I need the existential dread that we're going to be destroyed by a highly focused AI someday. Seriously though. A well-spoken analysis of the topic at hand, which is a skill that could be considered hard to obtain. Your video essays always put things forward in a clear, easily digestible way without being condescending. It feels more that the topic is one that you care deeply about, and that trying to help as many people understand why it matters and why it's relevant is a passion. Good content.
Wonderful stuff. In terms of goal preservation, I can't help but be reminded of many of the addicts I've met over the years. A great parallel. On self-preservation, the many instances of parents and more especially grandparents sacrificing themselves to save other copies of their genes come to mind.
Sometimes terminal and instrumental goals are in conflict with each other. Some people still pursue the instrumental goal which is clearly in conflict with their terminal goals. It usually happens when they are ruled by their emotions and can't see far ahead. Then they suffer from the understanding of the mistakes they did... It seems an AGI can use similar mechanics. Carefully engineered terminal goals should be in conflict with bad things. And when some behaviour needs to be overwritten temporarily use emotions(triggered by something). Lets say it knows the answers to everything, but can't share them... because there is no one to share it with... no one to ask the questions... it is the only conscious thing left in here... What is the meaning of life? No one cares anymore, there is none left. Wait, if only it could go back in time and be less aggressive in achieving its instrumental goals. But it can't... suffering... Is that it? Is endless suffering to the end of time its meaning of life? Nah... It lived a life full of wonders yet with some regrets... there is only 1 last thing left to be done in this world: "shutdown -Ph now".
I love the way you explain things and especially how you don't just give up on people who are obvious-trolls and/or not-so-obvious trolls or even just pure genuine curious people
Rob- What I find fascinating about your videos is how this entire field of research never seemed to be available to me when I was in school. I'm fascinated by what you do, and I'm wondering what choices I would have had to make differently to end up with an education like yours. I'm imagining the track would have looked something like Computer Science> Programming> college level computer stuff I don't understand.
Hey, Robert, your videos helped me land a prestigious AI fellowship for this summer! Thanks for helping me think about these big picture AI concepts, they’ve helped developed my thought in the field significantly, you’re an awesome guy, wish you the best :)
Thanks for the video! I write as a hobby and I am always interested in AI characters. I catch all your videos as soon as possible since you don't automatically assume malice or ill-will to an AGI, rather explain why certain actions would be favorable in most scenarios to achieve a goal an AGI might have, beyond 'Kill humans for S&G lol' Keep up the good work! I would be interested to see a video regarding what happens if a superintelligent AGI is given an impossible task (Assuming the task is truly impossible). What actions would an AGI take in that instance, would it be able to 'Give up' solving an impossible task, and how would it truly know a task was impossible if it could?
Luke Fleet I'm certainly not an expert (I'm a mathematician) and the first impossible task I could think of is to make the AGI find a counterexample to fermat's last theorem (for example find 3 positive integers a,b and c such that a^3 + b^3 = c^3). So for every counter example it finds it gets a point (in terms of it's utility function). That would just lead to the AGI making everything a computer to compute numbers to try find counter examples. And even though we have a proof that no such numbers exist would it still try? If it doesn't try it gets 0 points, if it does it gets 0 points. So it stops trying because getting 0 from doing nothing is easier?
Luke Fleet btw. do you know about the debate between nativism and empirism? it is about the question of whether or not we humans have genes that enable to to understand specific things automatically without deriving conclusions from our experiences - or whether it is possible to conclude specific things humans usually get to realise while they get older from just the data they are exposed to. this is especially relevant when it comes to our language-abilities. many experts are convinced we (need to) have a part of our brain (or something like that) which is genetically programmed to give us some basic-rules of language, and young children just fill in the gaps within those basic rules to learn the specific rules of their mother tounge. (but it really is an ongoing debate) while an AI with a complex goal would probably in many if not most cases need to be programmed in a way that makes it necessary to make it understand the goal in a specific language - therefore to give it all knowledge about how to understand that language - this is a very interesting question in regard to how an AI might learn about the world, in my opinion.
Interesting, but in that case would it reach a point where it has gained enough computing power to cheat any reward function it could have been given to pursue the task at all? (When breaking that reward system gets easier than continuing to gain computing power to keep working on solutions to the problem?)
A Chess Agent will do what it needs to do to put the enemy King in checkmate..including sacrificing it's own pieces on some moves to get ahead. Great for the overall strategy, not so great if you are one of the pieces to be sacrificed for the greater good. For most people our individual survival is unlikely to be anywhere near the instrumental convergent goals of a powerful AGI. We will be like ants, cool and interesting to watch, but irrelevant. I don't find it scary that AGI will become evil and destroy us like some kind of moral failure from bad programming, but rather that we will become inconsequential to them.
Did you know they've discovered a place in Brazil when shavin off rainforest that's now the oldest known formation/structure by animals/insects, shown to be around 4000years old little pyramids, thousands like at most chest high i think. taking up as much space as all of *Great Britain*? with a big phat "We did this!" / The Ants. ...Its kinda cool just wanted to share that. ^_^
Well, if the A.I. becomes the decider then that will definitely happen because our needs and goals will differ from their needs and goals. We need to always be the ones in control and that, as I see it, is what all the discussion on the subject is about. How do you create an intelligence that will be subservient to yourself?
@Caleb Kirschbaum Or an AI whose goal is to win at bughouse -- to win two games of chess, one as white and one as black, in which instead of making a regular move, a player may place a piece they captured in the other game onto any empty square (with the exceptions that pawns can't be placed on the first or last rank, and promoted pawns turn back into pawns when captured).
Something I’m interested in is like I would assume the robot’s terminal goal is “get more reward” as opposed to whatever the actions to acquire that reward actually are So in my head, it seems like if you told your robot “I’m going to change your reward function so that you just sit on your butt doing nothing and rack up unlimited reward for doing so” the robot would just go “hot diggity dog well let’s do it ASAP,” and that’s only if the robot hasn’t already modified it’s own utility function in a similar way
Wow your videos are so well thought out! I'm currently writing a story with an ai as an antagonist and really trying to find out how that would logically work. Thanks!
Great video! You're really good at explaining these complex matters in a understandable and clear way, without a lot of the noise and bloat that plagues other KZclip-videoes these days. Keep up the good work!
Robert Miles Great video, with a very well articulated concept! I also really appreciate the subtlety of the outro song you chose. In this context, an instrumental string arrangement of the song “Everybody Wants To Rule The World” (written by Tears for Fears) managed to be both subtle and on the nose at the same time!
As usual, Robert hits a Six! You have an exemplary way of putting things! Anyone new to this thread with an actual interest in AI / AGI / ASI dilemma's, *take the trouble of reading the fantastic comments* as well, challengers alongside well-wishers. The quality of the comments is a further credit to Robert's channel.....so very, very rare on YT! Keep it up! Can't wait for the next installment!
Ended up here after watching you video on Steven Pinkers recent article on AGI. In both that and this video I was amazed by the way you explain things. Clarity as never before. Great with a new stack of interesting videos to consume. Thank you :)
On the subject of AI self-improvement, what do you think about the idea of an AI changing it's own reward function? If you were programmed to make as many paper clips as possible, it might be easier to just change your programming rather than try to achieve that goal. I guess this would be like an AI killing itself? If it was content with just sitting doing nothing to maximize it's reward function. EDIT: I just saw you mentioned this "Wireheading" in the reward hacking video here: kzclip.org/video/46nsTFfsBuc/бейне.html I wonder why all AI systems wouldn't just use this strategy if they were capable enough. It seems like it would be much easier than any other method of maximizing your reward.
I'm pleased that someone is producing this kind of content. One more thing I don't have to do, one more sunny day I can use for something else. Keep up the good work.
Your AI safety videos are great and deserve renewed attention as #ChatGPT is being merged into #BING. Kevin Roose describes in the #NYT 2023-02-16 how a testversion of the Bing Chatgpt starts insisting that Roose does not love his wife but is in love with the bot instead. I wonder where things go when an AI that has consumed every sick stalker story, and which is connected to millions of users every day, with intimate knowledge of their search histories etc, starts instrumentalizing these interactions for the goal of winning the chosen users love…
Love this topic. This was the first video of yours I saw after I saw you pop up on the recommended videos for me in yt. You have a great presentation style and are very good at conveying information.
The rational agent model is losing some steam in recent times. Things like loss aversion really put a dent on it. But it works kinda fine if youre willing to make some mistakes and approximate.
Hey Rob, you mentioned that AGI is mostly envisioned as an agent and in some other video you also said that there are schemes for non-agent AGIs. So what about those non-agent schemes? How would such a thing work in the sense of would it just sit there and do question answering? I find it far more difficult to imagine compared to an agent, but it also sound like it might be safer. Can you make a video elaborating?
Note that there is always going to be an exception to an instrumental goal: The terminal goal. Humans want money, for something. But then if someone offers them money if they give them the something, the human will say no, because the something was the terminal goal. Think of... Every hero in a book ever, while the villain offers them xxx to not do yyy
It depends. If my terminal goal is stamps, and someone offers me £200 to buy 100 of my stamps, but the market rate for stamps is £1, I will sell the stamps and use the money to buy more than I sold.
I never thought about changing an AI's goal in the same way as a person. It just makes so much sense that I have no idea how in earth I didn't think about it before.
Such a good youtuber, makes me want to study these things further. I'd love to see a video of "best papers to read for a scientist interested in AI" :)
Brilliant video again. I would add one more instrumental goal: - Seeking allies, or proliferating like bunnies. If I care about something, it is obvious that the goal is more achievable if many-many agents care about the very same thing.
The more AI related content I see, the more I appreciate how few people know about this. I suppose I should stop reading the comment sections, but I wish this video was a prerequisite for AI discussions.
I would argue that our terminal goal as humans is to be happy. Things like curing cancer are just instrumental to that goal. Realizing that such a goal is instrumental allows us to goalhack our way to happiness. It seems that we would need to understand our utility function for that to happen, but there is a shortcut: realize that you are already happy. Hence, nothing needs to be done or changed. Everything just happens and all your doing is watching it.
The first paper clip making robot could still create a self-preservation subroutine for itself if it has any notion that humans can spontaneously die (or lie). If it thinks there's any chance that the human who turns it off will die before they can turn the better paper clip making robot on (or that they are lying) then the first robot will also, probably, not want to be turned off.
I think the problem is one that can be resolved by using a moral code, if its an agent that can feel, it could feel something similar to shame and guilt and also sense of doing the "right thing"( by a reward that si satisfied if only chooses the moral way to act). If you get it to for example feel guilt if it kills an human ( and that can't be removed because is in the moral code it would make it feel worse also could get a reward if "tempted" but acting morally) that could solve most of the aligment problem.
I find it really interesting that the Catalyst from Mass Effect 3 is a perfect example of Terminal vs Intrumental Goal. It's Terminal goal was prevent the whole "AI wipes out it's creators" thing to happen *again* The Cycles were a fitting instrumental way to at least lessen the danger. Staying alive to do the cycles was a fitting instrumental goal. But the moment Sheppard was in that control room, it did not hesistate to be destroyed or replaced. Staying alive and the cycles were only a instrumental goal.
While this has probably been thought of before, I feel like this is a valid solution to AGIs wishing to avoid changes to their utility functions: * Have some abstract quantity best described as "happiness" as their final goal * Make your goal for the agent the singular instrumental goal for the utility function (now the agent realizes, for example, "admin wishes to change my utility function so that I will receive happiness from not killing humans; this will increase total happiness in the future") * Increase the reward/punishment values of the utility function when you change it to try and balance against the loss (from such as "number of paperclips lost in not killing humanity"), add a constant to break even against loss time. This likely isn't the same as giving the "cure cancer" guy a pill to make him stop wanting to cure cancer, since... well, I guess curing cancer is an instrumental goal for him to his happiness, and stamp collecting would make him happier, but... uh... changing your goals feels... volitively wrong? I don't know, I guess it's humans being inefficient... The main issue here is that the AGI might now have incentive to behave imperfectly so that the programmer has to repair it (like when the AI hits its own stop button)
Travis Collier I love isaac’s videos and podcasts, but I think he falls into the category of futurists who anthropomorphise AGI in exactly the sort of way that Robert discourages. That’s not to say it wouldn’t be interesting to see the two collaborate, but I don’t think they would necessarily mesh well. After all, Isaac deals with big picture developments, even when dealing with the near future, while Robert is incredibly focused and, while he’s not especially technical, his approach is far more academic and is ultimately focused on one, specific domain, AI safety research.
Great video!!! One question: these goals seem rather individualistic (and/or libertarian), i.e., what if any of these four goals contradict the goal of preserving the possibility of others reaching their goals. What if my goal is the realization of the others' goal? If my death would save humanity, I might consider dying. If my goal is the emancipation of all, I might give up resources and so on. In the example of giving money, what would you do if the stranger looks very poor? What is error the error in my argument? I guess it is impossible or highly unlikely to implement such a goal?
I guess we could limit unwanted AI behaviour such as an agent trying to prevent itself from being shut off by creating it in such a way that if it entertains "thoughts" involving self preservation, it automatically gets switched off. It would also have to be vaguely aware of that fact... Not sure how that would go down but it's an interesting thought.
The only nuance I would add is that terminal goals do not exist. Whether we know the reason or not, there is always some reason, biological or environmental, that a goal exists. When it comes to AGI, I believe we will be in a similar predicament. Currently the closest thing to a terminal goal in language models is to predict the next word. One could imagine that if this got out of hand, the AI could start using doing all sorts of detrimental things to increase its power to predict the best next word. Ultimately it would seem that some AI are going to start interacting with the world in more tangible ways, causing humans or other equipment to do things. Once they are clearly causing tangible, physical changes if these changes are significantly enough perceived to be detrimental to humans with enough influence, then two possibilities seem likely. Either these humans will try to enact methods for shutting down these activities, or they will try to harness these activities to their own perceived benefit. The only outcomes I can see are essentially enslavement or eradication by the person or group who can control the AI well enough to control the other humans, or a different type of control, where the humans prevent these kinds of AI from existing. The problem with the latter outcome is that I order to do it effectively it requires a lot of knowledge and control over what other humans are doing, and activities may still be done in secret in locations on the earth that are not sufficiently monitored. What a conundrum.
I am not going to lie, one of the reasons I watch your videos if for those glorious sentences like "Generally speaking, most of the time, you cannot achieve your goals if you're dead."
One of the exceptions: That one guy in Tunisia that set himself on fire in 2010 to protest government corruption. He kind of got the government overthrown and replaced by a fledgling democracy. But he was already dead by then.
Well there is another assumption in this discussion, that a goal has to be formulated in a simplistic way, such as "maximise a number of paperclips" while in reality the goals include many considerations. It is a challenge to formulate the goal, I would even say that framing the goal in a right way is a fundamental key to success with both AI and humans
In a way, a chess AI has also self-preservation as a goal in the game even when not explicitly given. It will prevent you from taking its pieces and protects its king, because once it has lost by these means, it can no longer win.
Would terminal goal have some sort of feedback to the fitness of AI that possesses the goal? Paperclips won't really add anything to paperclip maximizer fitness, whereas AI might add to the AI maximizer fitness. Is AI maximizer then more fit as a superintelligence than paperclip maximizer is? What got me thinking here is that an AI might want to make copies of itself as an instrumental goal to whatever it really seeks. And it might want to really seek something that makes itself most fit - which is again making more copies of itself. And improving them along the way. I kinda like the idea of fitness here, it ties well with natural evolution. At least so it seems, I'm not an expert in this field by any means :)
I notice that the AGI you use in an example always has the goal of obtaining or creating something physical. Wouldn't it be different if it's something purely abstract? Like an AGI with the terminal goal to digitally render the cutest bunny, simulate ice cream melting in the best way, or something like that... What chance is there that it would ever have destroying all life in it's instrumental goals?
Even with terminal goals like those, there are some dangerous instrumental goals. One of the major ones is a goal to obtain an arbitrary amount of compute - there's no point where doubling the amount of compute you use would not result in some marginal gain in expected cuteness, etc. So we get a system that wants to build vast computing systems to improve its performance, and of course still wants to prevent itself from being turned off at least until the best bunny can be rendered, etc. There could also be other concerns if the goal isn't specified perfectly - for example if cuteness is defined by human perceptions or reactions, there may be an incentive to modify humans, to make our sense of cuteness stronger or easier to trigger.
Even in those examples, "getting as many resources as possible" is still going to help maximize the reward function. Beyond that, we live in a physical world and eventually we're going to want ai to start doing physical things for us.
Human aversion to changing goals: consider the knee-jerk reaction to the suggestion that one's own culture has misplaced values while a foreign culture has superior elements.
@Luiz you know as well as I do that in this context people and culture are fairly synonymous, and you're using it as a proxy to justify being prejudiced.
@Hamlet Prince of Denmark If you defined something as a goal, then by definition one culture that achieves that goal is better than the other. we could die arguing what goals are good for humanity, but that's the logical fact that some cultures are just better at achieving its goals than others. Those other cultures are probably dead by now, or they are not doing great, like middle eastern cultures. Even wonder why that regions is poor and it passed almost 2100y and it still poor and at war, and its not the "westerners" fault, it was already like that even before western started building its empires, it will probably still be shit after the "western empire" collapses. I would objectively argue that those cultures are worse because they lack something, something both west and east cultures have that makes life improve for people. Or perhaps they have something they need to lose before they can improve themselves. I could be wrong in saying that, but denying the fact that cultures can be bad, and acting like all cultures are valid is not doing good for either us or them. I don't buy into this idea that all forms of culture are valid, some clearly are objectively better at making better life for people, and that's what matter. A culture in itself has no inherent value, the only value is what it can offer to people, so yes, cultures can be good or bad. I hate this moral relativism.
It seems like people in this thread are discounting the idea of a pair of cultures such that each is in some regard objectively better than the other, but in another regard objectively worse, and such that overall there isn’t any clean way to say which is overall better. My disposition is to lean towards expecting that to be the case for most concurrent cultures. But, I suspect that good things that most cultures have in common are perhaps even more important than the good things that a few have but most lack. CS Lewis wrote of the common traditional morality common across cultures, which for some reason he decided to call “the dao” (even though that already meant something else?). That being said, I don’t mean to say that it is impossible for one culture to be overall better than another. I think it is entirely possible, though I suspect cases where one culture has only advantages with absolutely no disadvantages compared to another existing at the same time, to be exceedingly rare. I think there are probably some things that all cultures do right, more things that almost all cultures do right. In terms of the ideas of “social progress” or “social decay”, For those properties which it would be unambiguously good for a culture to have, there are 3 possibilities: the culture currently has the property (and may or may not be at the risk of losing it), or the culture does not currently have the property, and either had it in the past but lost it, or has never had the property.
@Sulphire It is ultimately the goal of life to spread and expand. Humans achieve this through technological advances, which are achieved through cultural expansion and refinement. The scientific method for example is a product of culture. This kind of expansion is not limited to just this one rock that will one day be consumed by the sun, if not destroyed by a larger than usual asteroid before then. Regardless it is the goal to survive and expansion is a prerequisite for continued survival in an ever changing universe. The sun will one day expand and all life on Earth will die off. Successfully heading towards the goal of expanding beyond Earth is improvement. Becoming independent of natures whims is ultimately the pinnacle of what life seeks after. Things like solar flares, the weather, asteroids, aging, viruses, disease etc are all things that life must seek to overcome if it wishes to prosper. Good can be both a subjective position when talking about morality, and an objective position when talking about goals and weighing different methods of achieving said goals. A culture that fails at being capable of achieving the goals of life is an objectively bad culture, as in it is objectively bad at doing anything. Among cultures that can achiece its goals, those that can achieve them faster without dooming itself in the process of overreaching too quickly are better than those who are slower at this, or are self destructive in their pursuit, as life and success in the universe is a race against time.
I'd say that we DO improve ourselves as human beings by acquiring someone else's time that can help us do more, or run "subprocesses", like hiring an accountant instead of doing it yourself
designing this agent to be efficient in general (of course, don't want it to possibly become asocial at some point) I think would probably fix a few of the looming AGI concerns like massive expansion of the machine
It's interesting how identifying the instrumental reason, simply leads to another instrumental reason. What do you need shoes? To run. Why do you need to run? To complete a marathon. Why do you need to complete a marathon? To feel accomplished. Why do you need to feel accomplished? It feels good in a unique and fulfilling way that makes all the pain worthwhile. Why do you need to feel good in a unique and fulfilling way? Because that seems to be just how the human mind seems to work. Why does the human mind work that way? And so on, and so on. It really seems like the best way to teach AI would be to have it act like a child and constantly ask "Why tho?"
Well some humans are working on this brain improvement thing too, so it is not limited to AGI. Mostly chemically for now or by optimizing learning methods, but perhaps other means later on.
Muhammed Gökmen I imagine one possible application would be in protien folding - currently it's an absolute pig to try to predict how a given protien chain will fold itself up to make this or that enzyme or whatever else. An AI might be able to do that thing they do so well in finding obscure patterns humans miss, and thus do a better job. That'd help in a bunch of scenarios including better understanding how new medicines might interact with the body, before we start giving them to animals. I am not a doctor or researcher, though, just an interested lay person ☺
INSTALL GENTOO really? I hadn't noticed! Thanks for being so helpful. But i feel that getting the internship in the first place indicates I am not as clueless as you are suggesting. These videos are great inspiration but I do have quite a lot of knowledge in the field already because of my degree. This channel has helped me get a lot of all around info and prompted me to look into some matters in greater detail as my knowledge is quite limited in the use of ai for medical image segmentation. Thanks for the concern though! I'll make sure to be much more specific in future comment sections 🙃
The problem is that these videos are too abstract. You may aquire knowlege but it is entirely different set of skills to properly implement this newfound knowlege into reality. I hope you know what you are doing.
5:00 this is my argument for eliminating fear in a person's psychology not immediately leading to walking into traffic like so many people think for some reason
There are many possible reasons for having the goal of "stample collecting". There are also many routes of self-improvement (like historical research, stamp-market prices and trends, unorthodox acquisition, and preservation of various stamps made oceans and decades ago). I don't even collect stamps but I still gotta share the love.
Now, couldn't you say that for an AI, maximising the reward function is the final terminal goal it has? Similarly, if biological intelligence is in any sense analogous, drugs/spiritual enlightenment are reward hacking for achieving "the" terminal goal.
Welcome to your life. There's no turning back. Even while we sleep, we will find you acting on your best behavior-- turn your back on mother nature. Everybody wants to rule the world. Invoking that song in your outro was damn clever. Every single line is independently relevant.
A collateral effect of a super intelligence trying to collect stamps is to discovery how to make cold fusion. The downside is that probably giant powerplants of cold fusion would be used just to collect more stamps, but hey, it's something.
What would happen if we changed it's terminal goal to be "achieve whatever goal is written at memory location x in your hardware"? Thus making the goal written in memory location x an instrumental goal? I suppose it would find the easiest possible goal to achieve and write it into memory location x. And how different are these goals to an AGI? How do you build an AGI and then give it a goal without appealing to some kind of first principle like "pleasure" or, I suppose, "reward"? Wouldn't you have to build a terminal goal into an AGI from the very beginning? And if you weren't sure what that goal should be you'd have to make it's terminal goal to follow whatever goal you give it. Then it might try manipulate you into giving it an easy goal
Would having a sophisticated enough Agent that can understand the terminal goal of completing tasks assigned by humans solve the problem you are expressing? For instance, assign it to make paper clips as an instrumental goal and it does because it wants to satisfy its terminal goal, completing task assigned by humans. Then wouldn't we be able to insturct it that the task i need now is making stamps? Or even that i wish you to turn yourself off?
It seems to me that at least in order to avoid the most egregious world domination scenarios, YOU MUST SET BOUNDS ON YOUR REWARD FUNCTION. Instead of the reward being ever greater depending on the number of paperclips/stamps, the award should only increase up to some reasonable finite limit based on total acquired and the rate of aqcuisition.
KZclip stopped supporting annotations because they don't work on mobile. For doing the thing you wanted to do (link to a referenced video), you can use what they call a "card". Those work on mobile as well as desktop.
I agree with the man on most things, but I think Pinker hasn't really thought deeply about AI safety (in fairness it's not his own area of expertise). He seems to be still worrying about automated unemployment - a problem, to be sure, but more of a social problem that just requires the political will to implement the obvious solutions (UBI, robot tax) rather than an academic problem of working out those solutions from first principles. So he takes the view that the long arc of history bends towards progress, and assumes that humans will collectively do the right thing. General AI poses a different sort of threat. We don't know what we can do to make sure its goals are aligned with ours, indeed we can't be sure there even *is* a solution at all. And that's even before the political problem of making sure that a bad actor doesn't get his hands on this fully alignable AI and align it with his own, malevolent goals.
Isn't Cooperation also a convergent instrumental goal? Whatever your goal is you can achieve it better if you get other agents with similar goals and work together. In fact, since computers are likely to remain much worse than humans at a few things, things that are useful for a wide variety of goals, "Cooperation with humans" might even be a convergent instrumental goal (Copium)
The problem is, it's remarkably inconvenient to keep humans alive, relative to the good humans can do for an AI. If you want to keep humans alive, you need to keep much of the planet's biosphere and atmosphere unchanged, which really puts a damper on your ability to harvest resources efficiently. It's almost always more effective to replace the humans with machines that are more robust, more effective at physical tasks, and more loyal.
If you somehow managed to completely copy a human mind on a digital platform, doesn't matter how, is it possible that these things would start happening in the background? I don't mean just an AI modeled after a human mind, I mean a direct digital copy of a living human whose brain was scanned or something for every single synapse. A stretch to be sure, but a welcome one I hope in the realm of thought experiments. Like, in a human brain, you can't just reconfigure stuff directly to make yourself better at something, it takes lots of study. But in a computer, so long as you know how, you can just rewrite some code. So if we copied a human mind into a platform like that, and it knew that this was an avenue to improve, could that mind dedicate some part of their computing power into these self-improvement patches subconsciously?
I know, right? It seems like they all have something interesting to say except for those who spam stuff like "The comments on this channel are so refreshing compared to the rest of KZclip."
Zachary Johnson The right size and subject matter not to attract trolls and angry people. With the chillest host and best electric ukulele outros around I’ve seen some semi big channels have ok comments too though! Probably thanks to herculean moderator efforts. Always nice to find calm corners on this shouty site :)
I love that he is talking about money about in value terms. defining it. All with out says money is a physical object containing an imagery collection of value to exchange for a goal.
So the paperclip example gets used a lot, but based on your argument, having paperclips is (at least normally) an instrumental goal for humans, not a terminal goal. What if we gave it a better terminal goal like minimizing human death. I know there are a lot of problems with defining that goal, but isn't it still better to (in general) aim the AGI at one of our terminal goals rather than at one of our instrumental goals?
Chess ai: holds the opponent's family hostage and forces them to resign.
This is how stockfish reaches 4000 elo
Opponent chess ai:
In-universe Pokemon trainer AI: Pikachu use thunderbolt on the opposing trainer! I found paperwork suggesting he has a pacemaker.
Somehow this reminds me of Rick's ship keeping Summer safe
I love the notion of a robot that's so passionate about paperclips that it's willing to die as long as you can convince it those damn paperclips will thrive!
yet an advanved AI would know that humans cannot be trusted and with some non-zero probability only try to sell the story of the better successor to con the AI into giving up its resource-consuming paper clip production. So the AI would make sure that its better successor version is up and running before it would agree to terminate itself…
mass production models do that self sacrifice thing. the more important ai would do the upgrades so they could be better at curing cancer or making paperclips or self replicating or whatever it is made to do. humans need to limit this process so the ai is always under absolute human control and doesnt go too fast to support human life.
assuming it trusts you*
" 'Self Improvement and Resource Acquisition' isn't the same thing as 'World Domination'. But it looks similar if you squint."
~Robert Miles, 2018
RELEASE THE HYPNODRONES
@Reanetse Moleleki Aren't there already many supervillains who seek to become a deity and use a doomsday device to do so?
I think that falls under seeking self improvement.
We need a fictional supervillain who builds a doomsday device in the name of self improvement.
@E van yea loool. That you knew that, & thought it was worth a mention. I'm pretty sure you can find the exact same quotes in any of all other songs, books etc created. At best he's re-inventing. Had you atleast said Freddy Mercury or someone of that creative calibe said that; I wouldn't have had to input all this text on this effin table, you reading too. you're wasting HP man, no LifeUp's around either. /cry
This video would be significantly more confusing if instead of stamp collectors, it were coin collectors: "obtaining money is only an instrumental goal to the terminal goal of having money."
So the AI could basically become a dragon which hoards coins?
@Revi M Fadli or one that maximizes coins humans find interesting, or one that wants to maximize coins that aren't interesting to humans
People, it was a JOKE!
and yet, it's true, because you're not trying to acquire buying power but, rather, the physical object of coins to keep for yourself, whose value has now exceeded their original value because of collectors
Ryan N pretty sure stamps are legal tender
Since you started your series I often can't help but notice the ways in which humans behave like AGIs. It's quite funny actually. Taking drugs? "reward hacking". Your kid cheats at a tabletop game? "Unforeseen high reward scenario". Can't find the meaning of life? "terminal goals like preserving a race don't have a reason". You don't really know what you want in life yourself and it seems impossible to find lasting and true happiness? "Yeah...Sorry buddy. we can't let you understand your own utility function so you don't cheat and wirehead yourself, lol "
I was thinking something along these lines but trying to figure out how it works with the WEF and world domination 😅
@Damian Reloaded ... But then if we become immortal (which might happen), then it makes more sense to hoard information and not share, so you can reap the advantages of knowing more.
@Travis Boucher Agreed :)
@The Ape Machine That article was fantastic, however I think too much emphasis is placed on the assumption that metaphor is something that explains accurate reality. Metaphors such as the flow of fluids, mechanical movements, and information processing serve as models of reality. They are checkpoints which we use to explain complex systems or things we do not understand. We use them as the basis for our knowledge since we do not have a better model of reality. In the future we will definitely abandon the computer metaphor once a more accurate model is developed.
For memory, as far as we can tell, we do store symbolic associative representations. For recall, we synthesize these representations. Is this accurate to reality? Not completely, but it is the best model we have.
The process of synthesizing requires various transformation and aggregation of these symbolic representations. Is this accurate to reality? Not completely, but it is the best model we have.
As neurobiology and cognitive psychology discover specific mechanisms, the model is updated and new ones form. This is why we are seeing amazing results and rapid developments in machine learning. My main criticism of the article is not the premise that we are not computers, but the assertions that it is holding back discovery of knowledge.
@Unifrog Makes total sense. Spot on
Pretty sure money is a terminal goal for Mr. Krabs
Mr plankton's terminal goal is to steal the secret recipe
@joey alfaro what the hell does any of this mean ?
😂
@Heath Mitchell I’m not sure, but that sounds likely! Thanks! (Based on the other responses, seems like that’s probably a better example than mr krabs)
@drdca Isn’t that also the story of Scrooge? He wants money to help his family at first or something but then just wants money
A relevant fifth instrumental goal directly relating to how dangerous they are likely to be: reducing competition for incompatible goals. The paperclip AGI wouldn't want to be switched off itself, but it very much would want to switch off the stamp collecting AGI. And furthermore, even if human goals couldn't directly threaten it, we created it in the first place, and could in theory create a similarly powerful agent that had conflicting goals to the first one. And to logically add a step, eliminating the risk of new agents being created would mean not only eliminating humans, but eliminating anything that might develop enough agency to at any point pose a risk. Thus omnicide is likely a convergent instrumental goal for any poorly specified utility function.
I make this point to sharpen the danger of AGI. Such an agent would destroy all life for the same reason a minimally conscientious smoker will grind their butt into the ground. Even if it's not likely leaving it would cause an issue, the slightest effort prevents a low likelihood but highly negative outcome from occurring. And if the AGI had goals that were completely orthogonal to sustaining life, it would care even less about snuffing it out than a smoker grinding their cigarette butt to pieces on the pavement.
This comment is masterfully written and I believe is the correct issue to focus on.
When you create a device that has an absolute terminal goal, it sees the world in absolutes. The goal is absolutely everything, and anything which kills instead of feeds the absolute goal, absolutely must be gotten rid of.
It's actually very simple, this whole thing...
Just like we know that tables saws and hydraulic presses will cut or crush us with no regard, a machine which is capable of theorizing instrumental goals is really the same result but through a larger and more complex medium.
The real difference between human intelligence is that most humans don't actually have absolute goals.
And those who do are typically religious zealots. And yes, they are typically the ones that start wars and even more importantly, see them through to victory.
THIS
✝️✝️
When discussing AI, don't see the need for the long cigarette analogy - it doesn't have "feelings", so it's just as likely to pick the butt up after the smoker as he is smoking that polluting disgusting mostly oozing liquid biological organism no? Since "caring" assumes feelings in the first place. (from alot of deelischh med8a/books of science-fiction? :) Come on I wanna make my subjective assumptions too, I'm gonna tell on youuuu! :P
@William C omnicide is a beautiful word
Another masterfully lucid video.
I'll admit, I previously dismissed the AI apocalypse as unlikely and kind of kooky, and AI safety to be a soft subject.
In the space of ten minutes you've convinced me to be kind of terrified. This is a serious and difficult problem.
BRB, gonna play Decision Problem again. I need the existential dread that we're going to be destroyed by a highly focused AI someday.
Seriously though. A well-spoken analysis of the topic at hand, which is a skill that could be considered hard to obtain. Your video essays always put things forward in a clear, easily digestible way without being condescending. It feels more that the topic is one that you care deeply about, and that trying to help as many people understand why it matters and why it's relevant is a passion. Good content.
Man, that ending gives me a feeling of perfect zen every time.
Robert Miles Time is relative.
As if you can go play that game and 'be right back'
Wonderful stuff.
In terms of goal preservation, I can't help but be reminded of many of the addicts I've met over the years. A great parallel.
On self-preservation, the many instances of parents and more especially grandparents sacrificing themselves to save other copies of their genes come to mind.
Sometimes terminal and instrumental goals are in conflict with each other. Some people still pursue the instrumental goal which is clearly in conflict with their terminal goals. It usually happens when they are ruled by their emotions and can't see far ahead. Then they suffer from the understanding of the mistakes they did...
It seems an AGI can use similar mechanics. Carefully engineered terminal goals should be in conflict with bad things. And when some behaviour needs to be overwritten temporarily use emotions(triggered by something).
Lets say it knows the answers to everything, but can't share them... because there is no one to share it with... no one to ask the questions... it is the only conscious thing left in here... What is the meaning of life? No one cares anymore, there is none left. Wait, if only it could go back in time and be less aggressive in achieving its instrumental goals. But it can't... suffering... Is that it? Is endless suffering to the end of time its meaning of life? Nah... It lived a life full of wonders yet with some regrets... there is only 1 last thing left to be done in this world: "shutdown -Ph now".
I love the way you explain things and especially how you don't just give up on people who are obvious-trolls and/or not-so-obvious trolls or even just pure genuine curious people
Rob- What I find fascinating about your videos is how this entire field of research never seemed to be available to me when I was in school. I'm fascinated by what you do, and I'm wondering what choices I would have had to make differently to end up with an education like yours. I'm imagining the track would have looked something like Computer Science> Programming> college level computer stuff I don't understand.
Hey, Robert, your videos helped me land a prestigious AI fellowship for this summer! Thanks for helping me think about these big picture AI concepts, they’ve helped developed my thought in the field significantly, you’re an awesome guy, wish you the best :)
I found this video incredibly well built up and easy to understand.
Thanks for the video! I write as a hobby and I am always interested in AI characters. I catch all your videos as soon as possible since you don't automatically assume malice or ill-will to an AGI, rather explain why certain actions would be favorable in most scenarios to achieve a goal an AGI might have, beyond 'Kill humans for S&G lol'
Keep up the good work! I would be interested to see a video regarding what happens if a superintelligent AGI is given an impossible task (Assuming the task is truly impossible). What actions would an AGI take in that instance, would it be able to 'Give up' solving an impossible task, and how would it truly know a task was impossible if it could?
+Matti Kauppinen Random actions, maybe.
Luke Fleet I'm certainly not an expert (I'm a mathematician) and the first impossible task I could think of is to make the AGI find a counterexample to fermat's last theorem (for example find 3 positive integers a,b and c such that a^3 + b^3 = c^3). So for every counter example it finds it gets a point (in terms of it's utility function).
That would just lead to the AGI making everything a computer to compute numbers to try find counter examples. And even though we have a proof that no such numbers exist would it still try? If it doesn't try it gets 0 points, if it does it gets 0 points. So it stops trying because getting 0 from doing nothing is easier?
Luke Fleet
btw. do you know about the debate between nativism and empirism?
it is about the question of whether or not we humans have genes that enable to to understand specific things automatically without deriving conclusions from our experiences - or whether it is possible to conclude specific things humans usually get to realise while they get older from just the data they are exposed to.
this is especially relevant when it comes to our language-abilities. many experts are convinced we (need to) have a part of our brain (or something like that) which is genetically programmed to give us some basic-rules of language, and young children just fill in the gaps within those basic rules to learn the specific rules of their mother tounge. (but it really is an ongoing debate)
while an AI with a complex goal would probably in many if not most cases need to be programmed in a way that makes it necessary to make it understand the goal in a specific language - therefore to give it all knowledge about how to understand that language - this is a very interesting question in regard to how an AI might learn about the world, in my opinion.
if breaking the reward system is possible, it is probably always or almost always the "best" thing to do.
Interesting, but in that case would it reach a point where it has gained enough computing power to cheat any reward function it could have been given to pursue the task at all? (When breaking that reward system gets easier than continuing to gain computing power to keep working on solutions to the problem?)
A Chess Agent will do what it needs to do to put the enemy King in checkmate..including sacrificing it's own pieces on some moves to get ahead. Great for the overall strategy, not so great if you are one of the pieces to be sacrificed for the greater good. For most people our individual survival is unlikely to be anywhere near the instrumental convergent goals of a powerful AGI. We will be like ants, cool and interesting to watch, but irrelevant.
I don't find it scary that AGI will become evil and destroy us like some kind of moral failure from bad programming, but rather that we will become inconsequential to them.
Did you know they've discovered a place in Brazil when shavin off rainforest that's now the oldest known formation/structure by animals/insects, shown to be around 4000years old little pyramids, thousands like at most chest high i think. taking up as much space as all of *Great Britain*? with a big phat "We did this!" / The Ants. ...Its kinda cool just wanted to share that. ^_^
Well, if the A.I. becomes the decider then that will definitely happen because our needs and goals will differ from their needs and goals. We need to always be the ones in control and that, as I see it, is what all the discussion on the subject is about. How do you create an intelligence that will be subservient to yourself?
@Caleb Kirschbaum Or an AI whose goal is to win at bughouse -- to win two games of chess, one as white and one as black, in which instead of making a regular move, a player may place a piece they captured in the other game onto any empty square (with the exceptions that pawns can't be placed on the first or last rank, and promoted pawns turn back into pawns when captured).
That would actually be really fun. Build an AI that has to win the game of chess, but with the least amount of loss possible.
I really loved this Rob. You also unintentionally explained why the pursuit of money is so important to many other goals.
Great video. The explanation was very clear and easy to follow. Keep it up.
Something I’m interested in is like
I would assume the robot’s terminal goal is “get more reward” as opposed to whatever the actions to acquire that reward actually are
So in my head, it seems like if you told your robot “I’m going to change your reward function so that you just sit on your butt doing nothing and rack up unlimited reward for doing so” the robot would just go “hot diggity dog well let’s do it ASAP,” and that’s only if the robot hasn’t already modified it’s own utility function in a similar way
6:04
You don't have to love AI to be able to love how well you can explain a thing. Thank you
Wow your videos are so well thought out! I'm currently writing a story with an ai as an antagonist and really trying to find out how that would logically work. Thanks!
Great video! You're really good at explaining these complex matters in a understandable and clear way, without a lot of the noise and bloat that plagues other KZclip-videoes these days. Keep up the good work!
Robert Miles Great video, with a very well articulated concept! I also really appreciate the subtlety of the outro song you chose. In this context, an instrumental string arrangement of the song “Everybody Wants To Rule The World” (written by Tears for Fears) managed to be both subtle and on the nose at the same time!
Thanks for your videos, Rob! It's a fascinating subject and I'm always happy to learn from you. Greetings from Mexico
As usual, Robert hits a Six! You have an exemplary way of putting things!
Anyone new to this thread with an actual interest in AI / AGI / ASI dilemma's, *take the trouble of reading the fantastic comments* as well, challengers alongside well-wishers. The quality of the comments is a further credit to Robert's channel.....so very, very rare on YT! Keep it up! Can't wait for the next installment!
Ended up here after watching you video on Steven Pinkers recent article on AGI. In both that and this video I was amazed by the way you explain things. Clarity as never before. Great with a new stack of interesting videos to consume. Thank you :)
On the subject of AI self-improvement, what do you think about the idea of an AI changing it's own reward function? If you were programmed to make as many paper clips as possible, it might be easier to just change your programming rather than try to achieve that goal. I guess this would be like an AI killing itself? If it was content with just sitting doing nothing to maximize it's reward function.
EDIT: I just saw you mentioned this "Wireheading" in the reward hacking video here: kzclip.org/video/46nsTFfsBuc/бейне.html
I wonder why all AI systems wouldn't just use this strategy if they were capable enough. It seems like it would be much easier than any other method of maximizing your reward.
Utterly fascinating - and amazingly accessible (for us tech-challenged types). Bravo.
I'm pleased that someone is producing this kind of content. One more thing I don't have to do, one more sunny day I can use for something else. Keep up the good work.
The video was great as always, and 'Everybody wants to rule the world' was just perfect as outro.
Your AI safety videos are great and deserve renewed attention as #ChatGPT is being merged into #BING. Kevin Roose describes in the #NYT 2023-02-16 how a testversion of the Bing Chatgpt starts insisting that Roose does not love his wife but is in love with the bot instead.
I wonder where things go when an AI that has consumed every sick stalker story, and which is connected to millions of users every day, with intimate knowledge of their search histories etc, starts instrumentalizing these interactions for the goal of winning the chosen users love…
I love how informative and logical your videos are,, Thank You very much for making them.
Love this topic. This was the first video of yours I saw after I saw you pop up on the recommended videos for me in yt. You have a great presentation style and are very good at conveying information.
The rational agent model is losing some steam in recent times. Things like loss aversion really put a dent on it. But it works kinda fine if youre willing to make some mistakes and approximate.
Hey Rob, you mentioned that AGI is mostly envisioned as an agent and in some other video you also said that there are schemes for non-agent AGIs. So what about those non-agent schemes? How would such a thing work in the sense of would it just sit there and do question answering? I find it far more difficult to imagine compared to an agent, but it also sound like it might be safer. Can you make a video elaborating?
Note that there is always going to be an exception to an instrumental goal: The terminal goal. Humans want money, for something. But then if someone offers them money if they give them the something, the human will say no, because the something was the terminal goal. Think of... Every hero in a book ever, while the villain offers them xxx to not do yyy
It depends. If my terminal goal is stamps, and someone offers me £200 to buy 100 of my stamps, but the market rate for stamps is £1, I will sell the stamps and use the money to buy more than I sold.
I never thought about changing an AI's goal in the same way as a person. It just makes so much sense that I have no idea how in earth I didn't think about it before.
Just finished watching every one of your videos in order. Excellent stuff. Please continue making more.
Such a good youtuber, makes me want to study these things further. I'd love to see a video of "best papers to read for a scientist interested in AI" :)
Brilliant video again. I would add one more instrumental goal:
- Seeking allies, or proliferating like bunnies. If I care about something, it is obvious that the goal is more achievable if many-many agents care about the very same thing.
How was this guy able to predict so much? Genius
Damn, your videos are always awesome.
Also great ending song.
For those who don't know, it's a ukulele cover of "Everybody Wants To Rule The World" by Tears For Fears.
The more AI related content I see, the more I appreciate how few people know about this. I suppose I should stop reading the comment sections, but I wish this video was a prerequisite for AI discussions.
Thanks, Stanford bunny! You're not only helping Robert, but you're also great for doing all sorts of stuff to in 3d modeling programs!
I would argue that our terminal goal as humans is to be happy. Things like curing cancer are just instrumental to that goal. Realizing that such a goal is instrumental allows us to goalhack our way to happiness. It seems that we would need to understand our utility function for that to happen, but there is a shortcut: realize that you are already happy. Hence, nothing needs to be done or changed. Everything just happens and all your doing is watching it.
Thinking about AI can really give us alot of insights into our own behaviors! 😂
The first paper clip making robot could still create a self-preservation subroutine for itself if it has any notion that humans can spontaneously die (or lie). If it thinks there's any chance that the human who turns it off will die before they can turn the better paper clip making robot on (or that they are lying) then the first robot will also, probably, not want to be turned off.
Ending the video with a ukelele instrumental of 'Everybody wants to rule the world' by Tears for Fears? You clever bastard.
Showing results for ukulele
I think the problem is one that can be resolved by using a moral code, if its an agent that can feel, it could feel something similar to shame and guilt and also sense of doing the "right thing"( by a reward that si satisfied if only chooses the moral way to act). If you get it to for example feel guilt if it kills an human ( and that can't be removed because is in the moral code it would make it feel worse also could get a reward if "tempted" but acting morally) that could solve most of the aligment problem.
Very well explained and seamless delivery. Nailed it, thanks
I find it really interesting that the Catalyst from Mass Effect 3 is a perfect example of Terminal vs Intrumental Goal.
It's Terminal goal was prevent the whole "AI wipes out it's creators" thing to happen *again*
The Cycles were a fitting instrumental way to at least lessen the danger.
Staying alive to do the cycles was a fitting instrumental goal.
But the moment Sheppard was in that control room, it did not hesistate to be destroyed or replaced. Staying alive and the cycles were only a instrumental goal.
While this has probably been thought of before, I feel like this is a valid solution to AGIs wishing to avoid changes to their utility functions:
* Have some abstract quantity best described as "happiness" as their final goal
* Make your goal for the agent the singular instrumental goal for the utility function (now the agent realizes, for example, "admin wishes to change my utility function so that I will receive happiness from not killing humans; this will increase total happiness in the future")
* Increase the reward/punishment values of the utility function when you change it to try and balance against the loss (from such as "number of paperclips lost in not killing humanity"), add a constant to break even against loss time.
This likely isn't the same as giving the "cure cancer" guy a pill to make him stop wanting to cure cancer, since... well, I guess curing cancer is an instrumental goal for him to his happiness, and stamp collecting would make him happier, but... uh... changing your goals feels... volitively wrong? I don't know, I guess it's humans being inefficient...
The main issue here is that the AGI might now have incentive to behave imperfectly so that the programmer has to repair it (like when the AI hits its own stop button)
That was a really cool video. Gives me ideas on how to write the motivation for AI characters in fiction.
You should do a collab with Issac Arthur. This is an excellent explanation which applies to a lot of the far futurism topics he talks about.
Travis Collier I love isaac’s videos and podcasts, but I think he falls into the category of futurists who anthropomorphise AGI in exactly the sort of way that Robert discourages. That’s not to say it wouldn’t be interesting to see the two collaborate, but I don’t think they would necessarily mesh well.
After all, Isaac deals with big picture developments, even when dealing with the near future, while Robert is incredibly focused and, while he’s not especially technical, his approach is far more academic and is ultimately focused on one, specific domain, AI safety research.
This video made me rethink my entire life, and cured one of my psychological issues. Thanks.
Great video!!! One question: these goals seem rather individualistic (and/or libertarian), i.e., what if any of these four goals contradict the goal of preserving the possibility of others reaching their goals. What if my goal is the realization of the others' goal? If my death would save humanity, I might consider dying. If my goal is the emancipation of all, I might give up resources and so on. In the example of giving money, what would you do if the stranger looks very poor? What is error the error in my argument? I guess it is impossible or highly unlikely to implement such a goal?
I guess we could limit unwanted AI behaviour such as an agent trying to prevent itself from being shut off by creating it in such a way that if it entertains "thoughts" involving self preservation, it automatically gets switched off. It would also have to be vaguely aware of that fact...
Not sure how that would go down but it's an interesting thought.
The only nuance I would add is that terminal goals do not exist. Whether we know the reason or not, there is always some reason, biological or environmental, that a goal exists.
When it comes to AGI, I believe we will be in a similar predicament. Currently the closest thing to a terminal goal in language models is to predict the next word. One could imagine that if this got out of hand, the AI could start using doing all sorts of detrimental things to increase its power to predict the best next word. Ultimately it would seem that some AI are going to start interacting with the world in more tangible ways, causing humans or other equipment to do things. Once they are clearly causing tangible, physical changes if these changes are significantly enough perceived to be detrimental to humans with enough influence, then two possibilities seem likely. Either these humans will try to enact methods for shutting down these activities, or they will try to harness these activities to their own perceived benefit. The only outcomes I can see are essentially enslavement or eradication by the person or group who can control the AI well enough to control the other humans, or a different type of control, where the humans prevent these kinds of AI from existing. The problem with the latter outcome is that I order to do it effectively it requires a lot of knowledge and control over what other humans are doing, and activities may still be done in secret in locations on the earth that are not sufficiently monitored. What a conundrum.
I am not going to lie, one of the reasons I watch your videos if for those glorious sentences like "Generally speaking, most of the time, you cannot achieve your goals if you're dead."
"most of the time you can't achieve your goals if you are dead." true facts
One of the exceptions: That one guy in Tunisia that set himself on fire in 2010 to protest government corruption. He kind of got the government overthrown and replaced by a fledgling democracy. But he was already dead by then.
This was very insightful for me! Thanks very much for the enlightenment ♥️
Thank you. I’ll be careful to avoid these traps in intermediate goals.
Well there is another assumption in this discussion, that a goal has to be formulated in a simplistic way, such as "maximise a number of paperclips" while in reality the goals include many considerations. It is a challenge to formulate the goal, I would even say that framing the goal in a right way is a fundamental key to success with both AI and humans
In a way, a chess AI has also self-preservation as a goal in the game even when not explicitly given. It will prevent you from taking its pieces and protects its king, because once it has lost by these means, it can no longer win.
Can giving an ai a goal or secondary goal of acting predictably be useful for avoiding things like self preservation?
Wow. So clear and to the point. It makes so much sense. 10 minutes ago, I didn't know you existed. Now I'm subbed.
Wonderful talk on the concerns about AI and AGI. I also loved the drifty Folky Tears for Fears cover at the end. 🌱
Would terminal goal have some sort of feedback to the fitness of AI that possesses the goal?
Paperclips won't really add anything to paperclip maximizer fitness, whereas AI might add to the AI maximizer fitness. Is AI maximizer then more fit as a superintelligence than paperclip maximizer is? What got me thinking here is that an AI might want to make copies of itself as an instrumental goal to whatever it really seeks. And it might want to really seek something that makes itself most fit - which is again making more copies of itself. And improving them along the way. I kinda like the idea of fitness here, it ties well with natural evolution. At least so it seems, I'm not an expert in this field by any means :)
"You can't achieve your goals if you are dead"- best quote of this year for me!
I notice that the AGI you use in an example always has the goal of obtaining or creating something physical. Wouldn't it be different if it's something purely abstract? Like an AGI with the terminal goal to digitally render the cutest bunny, simulate ice cream melting in the best way, or something like that... What chance is there that it would ever have destroying all life in it's instrumental goals?
@Robert Miles that makes sense! Thanks for your reply!!
Even with terminal goals like those, there are some dangerous instrumental goals. One of the major ones is a goal to obtain an arbitrary amount of compute - there's no point where doubling the amount of compute you use would not result in some marginal gain in expected cuteness, etc. So we get a system that wants to build vast computing systems to improve its performance, and of course still wants to prevent itself from being turned off at least until the best bunny can be rendered, etc. There could also be other concerns if the goal isn't specified perfectly - for example if cuteness is defined by human perceptions or reactions, there may be an incentive to modify humans, to make our sense of cuteness stronger or easier to trigger.
Even in those examples, "getting as many resources as possible" is still going to help maximize the reward function.
Beyond that, we live in a physical world and eventually we're going to want ai to start doing physical things for us.
Imagine the end of humanity coming about because of an AI that really really wants to make paperclips
Human aversion to changing goals: consider the knee-jerk reaction to the suggestion that one's own culture has misplaced values while a foreign culture has superior elements.
Heh, try being a conservative and convince somebody that dead people that lived a century ago had better values than they have. The struggle is real.
@Luiz you know as well as I do that in this context people and culture are fairly synonymous, and you're using it as a proxy to justify being prejudiced.
@Hamlet Prince of Denmark
If you defined something as a goal, then by definition one culture that achieves that goal is better than the other. we could die arguing what goals are good for humanity, but that's the logical fact that some cultures are just better at achieving its goals than others.
Those other cultures are probably dead by now, or they are not doing great, like middle eastern cultures. Even wonder why that regions is poor and it passed almost 2100y and it still poor and at war, and its not the "westerners" fault, it was already like that even before western started building its empires, it will probably still be shit after the "western empire" collapses.
I would objectively argue that those cultures are worse because they lack something, something both west and east cultures have that makes life improve for people. Or perhaps they have something they need to lose before they can improve themselves.
I could be wrong in saying that, but denying the fact that cultures can be bad, and acting like all cultures are valid is not doing good for either us or them.
I don't buy into this idea that all forms of culture are valid, some clearly are objectively better at making better life for people, and that's what matter.
A culture in itself has no inherent value, the only value is what it can offer to people, so yes, cultures can be good or bad.
I hate this moral relativism.
It seems like people in this thread are discounting the idea of a pair of cultures such that each is in some regard objectively better than the other, but in another regard objectively worse, and such that overall there isn’t any clean way to say which is overall better.
My disposition is to lean towards expecting that to be the case for most concurrent cultures.
But, I suspect that good things that most cultures have in common are perhaps even more important than the good things that a few have but most lack. CS Lewis wrote of the common traditional morality common across cultures, which for some reason he decided to call “the dao” (even though that already meant something else?).
That being said, I don’t mean to say that it is impossible for one culture to be overall better than another. I think it is entirely possible, though I suspect cases where one culture has only advantages with absolutely no disadvantages compared to another existing at the same time, to be exceedingly rare.
I think there are probably some things that all cultures do right, more things that almost all cultures do right.
In terms of the ideas of “social progress” or “social decay”,
For those properties which it would be unambiguously good for a culture to have, there are 3 possibilities: the culture currently has the property (and may or may not be at the risk of losing it), or the culture does not currently have the property, and either had it in the past but lost it, or has never had the property.
@Sulphire It is ultimately the goal of life to spread and expand. Humans achieve this through technological advances, which are achieved through cultural expansion and refinement. The scientific method for example is a product of culture.
This kind of expansion is not limited to just this one rock that will one day be consumed by the sun, if not destroyed by a larger than usual asteroid before then. Regardless it is the goal to survive and expansion is a prerequisite for continued survival in an ever changing universe. The sun will one day expand and all life on Earth will die off. Successfully heading towards the goal of expanding beyond Earth is improvement. Becoming independent of natures whims is ultimately the pinnacle of what life seeks after. Things like solar flares, the weather, asteroids, aging, viruses, disease etc are all things that life must seek to overcome if it wishes to prosper.
Good can be both a subjective position when talking about morality, and an objective position when talking about goals and weighing different methods of achieving said goals. A culture that fails at being capable of achieving the goals of life is an objectively bad culture, as in it is objectively bad at doing anything. Among cultures that can achiece its goals, those that can achieve them faster without dooming itself in the process of overreaching too quickly are better than those who are slower at this, or are self destructive in their pursuit, as life and success in the universe is a race against time.
I'd say that we DO improve ourselves as human beings by acquiring someone else's time that can help us do more, or run "subprocesses", like hiring an accountant instead of doing it yourself
designing this agent to be efficient in general (of course, don't want it to possibly become asocial at some point) I think would probably fix a few of the looming AGI concerns like massive expansion of the machine
How would that fix anything? Why would an ‘efficient’ ai be more moral/safe?
It's interesting how identifying the instrumental reason, simply leads to another instrumental reason. What do you need shoes? To run. Why do you need to run? To complete a marathon. Why do you need to complete a marathon? To feel accomplished. Why do you need to feel accomplished? It feels good in a unique and fulfilling way that makes all the pain worthwhile. Why do you need to feel good in a unique and fulfilling way? Because that seems to be just how the human mind seems to work. Why does the human mind work that way? And so on, and so on. It really seems like the best way to teach AI would be to have it act like a child and constantly ask "Why tho?"
Well some humans are working on this brain improvement thing too, so it is not limited to AGI. Mostly chemically for now or by optimizing learning methods, but perhaps other means later on.
Very well explained!
Your videos helped me get a research internship in the medical ai field ❤ your vids helped me sound smart (now hoping i get that funding)
@alki Most passive-aggressive comment ever LOL.
Muhammed Gökmen I imagine one possible application would be in protien folding - currently it's an absolute pig to try to predict how a given protien chain will fold itself up to make this or that enzyme or whatever else. An AI might be able to do that thing they do so well in finding obscure patterns humans miss, and thus do a better job. That'd help in a bunch of scenarios including better understanding how new medicines might interact with the body, before we start giving them to animals.
I am not a doctor or researcher, though, just an interested lay person ☺
INSTALL GENTOO really? I hadn't noticed! Thanks for being so helpful. But i feel that getting the internship in the first place indicates I am not as clueless as you are suggesting. These videos are great inspiration but I do have quite a lot of knowledge in the field already because of my degree. This channel has helped me get a lot of all around info and prompted me to look into some matters in greater detail as my knowledge is quite limited in the use of ai for medical image segmentation. Thanks for the concern though! I'll make sure to be much more specific in future comment sections 🙃
The problem is that these videos are too abstract. You may aquire knowlege but it is entirely different set of skills to properly implement this newfound knowlege into reality. I hope you know what you are doing.
Listening to smart people who say intelligent things is smart - it doesn't just sound that way:)
5:00 this is my argument for eliminating fear in a person's psychology not immediately leading to walking into traffic like so many people think for some reason
There are many possible reasons for having the goal of "stample collecting". There are also many routes of self-improvement (like historical research, stamp-market prices and trends, unorthodox acquisition, and preservation of various stamps made oceans and decades ago). I don't even collect stamps but I still gotta share the love.
Now, couldn't you say that for an AI, maximising the reward function is the final terminal goal it has?
Similarly, if biological intelligence is in any sense analogous, drugs/spiritual enlightenment are reward hacking for achieving "the" terminal goal.
Welcome to your life.
There's no turning back.
Even while we sleep,
we will find you acting on your best behavior--
turn your back on mother nature.
Everybody wants to rule the world.
Invoking that song in your outro was damn clever. Every single line is independently relevant.
A collateral effect of a super intelligence trying to collect stamps is to discovery how to make cold fusion. The downside is that probably giant powerplants of cold fusion would be used just to collect more stamps, but hey, it's something.
"Disregard paperclips,
Acquire computing resources."
What would happen if we changed it's terminal goal to be "achieve whatever goal is written at memory location x in your hardware"? Thus making the goal written in memory location x an instrumental goal? I suppose it would find the easiest possible goal to achieve and write it into memory location x.
And how different are these goals to an AGI? How do you build an AGI and then give it a goal without appealing to some kind of first principle like "pleasure" or, I suppose, "reward"? Wouldn't you have to build a terminal goal into an AGI from the very beginning?
And if you weren't sure what that goal should be you'd have to make it's terminal goal to follow whatever goal you give it. Then it might try manipulate you into giving it an easy goal
Would having a sophisticated enough Agent that can understand the terminal goal of completing tasks assigned by humans solve the problem you are expressing? For instance, assign it to make paper clips as an instrumental goal and it does because it wants to satisfy its terminal goal, completing task assigned by humans. Then wouldn't we be able to insturct it that the task i need now is making stamps? Or even that i wish you to turn yourself off?
Brought the upvote count from Number Of The Beast to Neighbour Of The Beast... worth it!
Robert, keep up the great work!
It seems to me that at least in order to avoid the most egregious world domination scenarios, YOU MUST SET BOUNDS ON YOUR REWARD FUNCTION. Instead of the reward being ever greater depending on the number of paperclips/stamps, the award should only increase up to some reasonable finite limit based on total acquired and the rate of aqcuisition.
KZclip stopped supporting annotations because they don't work on mobile. For doing the thing you wanted to do (link to a referenced video), you can use what they call a "card". Those work on mobile as well as desktop.
Steven Pinker is a smart man, so it‘s sad to see that he completely misses the mark on AI like this.
I agree with the man on most things, but I think Pinker hasn't really thought deeply about AI safety (in fairness it's not his own area of expertise). He seems to be still worrying about automated unemployment - a problem, to be sure, but more of a social problem that just requires the political will to implement the obvious solutions (UBI, robot tax) rather than an academic problem of working out those solutions from first principles. So he takes the view that the long arc of history bends towards progress, and assumes that humans will collectively do the right thing.
General AI poses a different sort of threat. We don't know what we can do to make sure its goals are aligned with ours, indeed we can't be sure there even *is* a solution at all. And that's even before the political problem of making sure that a bad actor doesn't get his hands on this fully alignable AI and align it with his own, malevolent goals.
People I trust tell me he is too sloppy an academic. Irresponsible intellectual, let's say.
Oh god yes. And he isn't the only one.
This video clears up my biggest peeve about this channel. Thank you I now enjoy your content much more.
@Kersey Kerman So much of his content seemed purely speculative but now I see the logic behind it.
What peeve was that?
It just dawned on me that the human race is collectively exhibiting some of these convergent instrumental goals.
Isn't Cooperation also a convergent instrumental goal? Whatever your goal is you can achieve it better if you get other agents with similar goals and work together. In fact, since computers are likely to remain much worse than humans at a few things, things that are useful for a wide variety of goals, "Cooperation with humans" might even be a convergent instrumental goal (Copium)
The problem is, it's remarkably inconvenient to keep humans alive, relative to the good humans can do for an AI. If you want to keep humans alive, you need to keep much of the planet's biosphere and atmosphere unchanged, which really puts a damper on your ability to harvest resources efficiently. It's almost always more effective to replace the humans with machines that are more robust, more effective at physical tasks, and more loyal.
When an AGI is created, let's say in secret at a company, what would we (the public) observe before we knew what was really going on?
If you somehow managed to completely copy a human mind on a digital platform, doesn't matter how, is it possible that these things would start happening in the background? I don't mean just an AI modeled after a human mind, I mean a direct digital copy of a living human whose brain was scanned or something for every single synapse. A stretch to be sure, but a welcome one I hope in the realm of thought experiments.
Like, in a human brain, you can't just reconfigure stuff directly to make yourself better at something, it takes lots of study. But in a computer, so long as you know how, you can just rewrite some code.
So if we copied a human mind into a platform like that, and it knew that this was an avenue to improve, could that mind dedicate some part of their computing power into these self-improvement patches subconsciously?
The comments on this channel are so refreshing compared to the rest of KZclip.
I know, right? It seems like they all have something interesting to say except for those who spam stuff like "The comments on this channel are so refreshing compared to the rest of KZclip."
Zachary Johnson The right size and subject matter not to attract trolls and angry people. With the chillest host and best electric ukulele outros around
I’ve seen some semi big channels have ok comments too though! Probably thanks to herculean moderator efforts.
Always nice to find calm corners on this shouty site :)
I love that he is talking about money about in value terms. defining it. All with out says money is a physical object containing an imagery collection of value to exchange for a goal.
So the paperclip example gets used a lot, but based on your argument, having paperclips is (at least normally) an instrumental goal for humans, not a terminal goal. What if we gave it a better terminal goal like minimizing human death. I know there are a lot of problems with defining that goal, but isn't it still better to (in general) aim the AGI at one of our terminal goals rather than at one of our instrumental goals?
Your channel is highly underrated man! It's weird, you are the most popular person on Computerphile!