Alignment problem. Those theoretical contributions the bestselling book the most human human and his new book if we continue to rely on Artificial Intelligence what happens when ai itself becomes the problem with those implications and then from his 2011 book with the alignment problems it is the replication of human bias and argues although we train these to make decisions for us will be discussing a lot in the next hour and please put your questions and the text chat today thank you brian and welcome and thank you for joining us. This is not your first book. The obvious question why did you decide to tackle this topic now . Great question. The initial the for the book and as you mentioned vanity fair reported elon musk with the reading and i found myself in 2014 attending a Silicon Valley book talk with a bunch of investors and seeing the thing about elon musk reading the book of the invited him and to my surprise he came. There was a fascinating moment at the end of the dinner when the organizers thanked me everybodys getting up to go home for the night and elon musk forced everyone to sit back down and said no no no but seriously what will we do about ai . I will not let anyone leave this room. Ebert either give me a convincing argument for an idea for something we can do about it. It was quite memorable. So i was joe on drawing a blank. I was aware of the conversation and for some people a Human Extinction level others are more focused on the present day ethical problems i have a reason why we shouldnt i have a suggestion of what to do. So his question so seriously, whats the plan haunted me as i was finishing my previous book and i began to see starting around 2016, a Dramatic Movement within the field. Both on the ethical questions and further into the future safety questions. Both of those movements grown explosively between 2016 and now and the questions of ethics and safety in the alignment problem how do we make sure the objectivehat is carried out is intenng for it to do . Going from aarginal to a philosophical to makes up the central question. I wanted to tell the story and figure out in a way to answer his question. Ats the plan . Whatre we doing . As i was gettin into this there is complex Technology Going int this f those in Society Today you have a number of examples of what ey hope to perform but one of those cannot be more timely which are the algorithms made t just here in california. To create a kind of probation and parole. And as i say that started in the 20s and 30s but really took off with the rise of personal computers in the 80s and 90s and today its implemented in almost every jurisdiction in the u. S. Municipal counties, state, federal. And there has been an increasing scrutiny thats come along with it. Its been interesting watching the public discourse. On some of these tools, so for example the New York Times was writing an Editorial Board was writing up through about 2015 these open letters saying its time for new york state to join the 21st century for equal opportunity, et cetera. What does it mean to turn them into the language and how do we look at a tool like this and say whether we feel comfortable actually dloying ts . It was interesting when you were giving examples of a block suspect and white suspect similar crime and background and how much me likely the white suspect was to go free including one of the white spect [inaudible] they were still. This is a very big conversation. I think one way to start is to look at the data that goes to the stems. One of the things they are trying to do is predict one of three things. Typically its predicting three different things. One is your likelihood to not make a Court Appointment. Second is to commit a nonviolent crime. If you look at Something Like failure to appear in court, the court knows about it by definition. If you look at Something Like nonviolent crime makes the case for example if you poll young white man and a black man in manhattan their race of marijuana usage, they selfreport they use it at the same rate and get you look at the arrest data the black person is 15 times more likely to be arrested for using marijuana in other jurisdictions it might be varied in place to place so thats the case where its important to remember that the model claims to be able to predict crime but what its predicting is rearrest so it is systematically so. Its ironic to me because as a part of the project of researching the system, as i went back into the historical literature, at the time and a loa lot ofthe objections were cm the conservatives, from the political right and making the same argument progressives are making now but from the other side, so conservatives in the late 30s were saying wait a minute, if a bad guy is able to evade arrest then the system treats him like hes innocent and will recommend his release to other people like him. If someone is wrongfully arrested and convicted they dont have the Training Data and it would recommend attention of other people like them. This is the same argument framed a different way but that is a problem and we are starting to see groups like for example the partnership on ai which is a nonprofit Industry Coalition with 100 different stakeholders. The second component i want to highlight thats worth highlighting is this question of what do you do with the prediction once you havet so lets say youve got a higher than average chance you are going to fail to make your scheduled Court Appointment. Thats a prediction. A second qution which is what do we do with that information. It turns out theres you send a text message reminder they are more likely to show for the Court Appointment and therere people proposing solutions le Daycare Services for their kids or providing them with subsidized transportation so theres a separate question as much as is going on, as much scrutiny as is being directed in the algorithmic prediction, theres a much more systemic question which is what do we do with those predictions and if you are a judge and the prediction says that this person will fail to reappear, you want some kind of text message alert as opposed to jail, but that may or may not be available to you in that jurisdiction so you have to kind of work with what you have and that isnt necessarily an algorithm per se but its sort of caught in the middle if you will. You talk later in the book about hiring a and amazon coming up with this job applicant and what they were finding. The reasons for this also were baked into the way the system was being trained and the way the system was being used and when you get to this you also have the question about the end like why were you trying to find people like those that you had. Tell us about that and how did they get in . This is a story that involves amazon in 2017, but by no means are they unique examples. It happens to be the example of amazon, but like Many Companies they were trying to design and take a little bit of the workload off of the human recruiters and if you have an open position you start getting x number. You would like some kind of system to use the triage and tell you these are the resumes. In the same way that they rate the products but to do that, they were using a kind of computational language model called word vectors and without getting too technical, these models that were very successful around 2012 also started to move into computational linguistics and in particular there was a remarkable family that was able to imagine the words at this point so if you have a document you could predict a missing word based on the others that were nearby. You could do a search for the point in space nearest to that and you would get queens. You could do tokyo minas japan plus england and get london. So these numerical representations are words that fell out of this network and ended up being useful for this surprisingly vast array and one of these was trying to figure out the relevance. One way to do it is to say here are the people weve hired over the years and then for any new resume which of the words has a kind of positive attribute and which has the negative attributes. Sounds good enough but when they were looking at this they found all sorts of bias so for example the word womens was assigned. The word was getting a negative deductio a negative rating. Becausit is located further away fro the more successful words th it had been trained to watch for. Thats right. It doesnt appear on the typical resume thatid get selected in e past and its similar to others. So o course they said okay we can delete this attribute fr the model. They star srt noticing its also applying deductions like field hockey so they could get rid of that or womens colleges so they get rid of that and then they start noticing that its picking up on these subtle choices that were more typical of the males than females, so the use of the words executed and captured like i executed a strategy. At that point they basically gave up and scrapped the project entirely. In the book icon. To something that happened at the boston Symphony Orchestra they decided to hold the auditions behind a wooden screen of course they could identify whether it wasnt until the 70s when they instructed people to remove their shoes before entering the room the problem with these models they are detecting the word capture and they gave up and said we dont feel comfortable using this technology. Whatever its going to do we are going to identify a subtle pattern. Its just going to sort of replicate that. In this particular case they walked away how can you d bias a language model if you have these points they try to identify spaces within this how much did amazon spend developing that . They are pretty tightlipped about it. As i understand so they had to wash their hands. I assume millions were put into that and they could have hired another team. Another example i want to get into is the self driving car in arizona in 2018 the first pedestrian killed was the rmd vehicle fortunately i was able to get some of that into the book and it was very for that entire sweep of things we might have ended differently. One of the things that was happening was it was using a sort of network to do object detection but it had never been given an example of a j walker so in all of the training models people walking across the street were perfectly correlated with fever strikes and perfectly correlated with interceptions so the model didnt know what it was seeing when its all this woman crossing the street in the middle of the street. Most object Recognition Systems are taught to classify things into exactly one of a discrete number of categories so they dont know how to classify stuff that seems to belong to more than one category or that it seems that isnt in any category so this is again one of those active Research Problems but in this particular case, the woman was walking a bicycle and so this set the object recognition system kind of into this flattering state where first they thought she was a cyclist but she wasnt moving like a cyclist than they thought she was a pedestrian then they thought maybe its just some object due to a quirk in the way that the system was built every time it changed its mind it would reset the motion prediction so its constantly predicting this is where they will be in a couple of seconds from now but every tim it changed its mind it starts re computing thatrediction so its never stabilized on a production. There re additional things here that the team had made me. To add their own system in but i thinthe object recognition thing itself for me is very cinematic and there is a qution of certainty and confidence how did the system know what to do with tm and many people deal with the uncertainty and the mere fact you are changing your mind should be a huge red flag its very hrtbreaking to think about how all of these engineering decisions add up to this event to get to the bottom of this certainty and uncertainty because i think that is a very human thing. You dont want to take a highimpact action whats the term a preemptive judgment but in advance of deciding what the real thing would be because they are trying to prevent irreparable harm. In the very highimpact situation that requires us to quantify impact and uncertainty and have a plan for what to d the pieces need to Stay Together but we see progress being made on all of those fronts but it cant happen soon enough. In this example we talk about in the book is that the general problem that you think needs to be addressed and then im going to ask known as the alignment problem where it gets its title how do we make sure the objective in the system is exactly that and i think all o the examples that weve violated so far have own us cases where one must be very careful we think we can measure but we n only measure a rearrest. We tnk we can hire promising candidates that superficially resemble the previous cdidates candidates. In the category we dont always know what category we put them in. There wereany other manifestations as well that speak to this fundamentalssue of alignment sometimes there is a problem with the model architecture so theres a kind of black box issue and explain ability and how can we trust the output so to minimize or ximize so the component of the system had this own manifestation of the alignment problem. That for me is really the striking thing that makes this where we areow its an unremarkable shift and as i talked to one researcher. He came back a year later in 2017 and theres an entire daylong workshop and by 2018 its a significant fraction the number working on this are quite small but even over that short time to my mind it is astonishing. It cant come soon enough so i encourage all motivated undergrads and High School Students to get excit because there is aot of work to be done. In the ai research and Development Field is the actual mmercialization of the technology then ahead of where it should be and should it be modeled and not on the road or courtroom . That is a great question and 85 year history at this poi says we are still playing catchup into the analysis of the tip to the deployment. The understanding catchup to the actual implementation. And i think weve seen that with social media. There were decisions about how to run the news feed algorithm and the details are somewhat technical and went from supervised learning to reinforcement learning. Basically the narrowminded focus on always prioritize the content that will get the most clicks on that created a situation where content is promoted and people were being burned out and leaving the platform in addition to other kind of societal externalities that it is creating. It was to maintain the attention and these things that would serve. I think that there is a question when you think about the alignment problem, is the system doing what we want. When we look at the actual industry, what is it that we want the system to be doing. We have more urgently put ito be thinking aut this. One of ourudience members askedbout china, the despread use of the recognition. We talk about the facial Recognition Technology and the inappropriately funny results which were just absurd but also insulting. Can you talk a bit about facial recognition and another thing that in fact became a preposition on whether or not to use these technologies. There was this unfortunate and hard to ignore pattern of ethnic minorities incorrectly recognized or categorized by face recognition. E of the famous examples was the Software Developer in 2015 with a group of photographs he took where is captured by google photoas gorillas also one mitesearcher as an undergraduate computer scientis doing facial recognition homewo assiment assignmen. She had to borrow her roommate to check to make sure it worked because it didnt work oner but only if you were a whiteask. This is the investigation of why . What is the underlying thing . There are a couple of different components but one of the main ones is there is a existing lasone lackadaisical attitude how theyere put together in the first place so it led to t rise of comter recognition was the internet and suddenly if you needed half a million examples to train yr system in the eighties you were out of luck but now th the internet we jt download 1 million faces and put into your system so the most Popular Research database was one developed called lel faces in the wild so what we want to do was understand if its the same person so newspaper headlines or images because they are all labeled with this person and this person. That way we can decide if these are the two images. But you are at the mercy of the risen the front page in 2000 that was george w. Bush. With an analysis done a few years ago shows there were twice as many pushes of george w. Bush in the database as all black women combined. That is just insane if youre trying to build something to be fair to those that collected the data as an Academic Research project not intended to be used in any system but Somebody Just download off the internet and itsery striking if you look at the original papers i dont want to single them out because its widespread, the word diversity is used in early 2010 to mean lighting pose what they mean we are people from the de and in the dark but now at the end of 10 beginng 2020 is very striking because some of these databases up here with a warning label that says when we said diversity we met very specific thing that does not diverse in demographics. There is a lot of work being done there spearheaded lik mit and google to being more focused onqualizing that error rate among ethnic groups to make sure that it on thdatabase and the Training Data represents the population model and also the representation of tech self. In 2019nly less than 1 percent of Computer Science phds were africanamerican and so there is a t of work to be done in the field itself to address the question of representation. We see ai with a number of initiatives like scholarships and grants trying to equalize that in the field itself. There is the question from the audiee about maledominated answers and in theook talk about word emdding and gender gentrification you mentioned a group of researchers did work and you noted the teamf five with the team a social science that would sm to be a requirement because of all the different interpretations one has to take into undstand the cultural science to be mixed in with the Computer Science. That is absolutely right and to me is one of the striking things over the fld finds itself at the moment no longer can Data Scientist think of themselves as purely doing engineering or mathematics. We have just gotten to a point where the systems are enmeshed in human practices how the data is collected and generated and question the human respondents were asked how is that worded as you will get different answers based on how it is worded and what population were you sampling from . People on amazon representative or not representative of other groups that might respond to the same thing . We are very much in the moment to my mind interdisciplinary work that needs to happen and is happening Computer ScienceMachine Learning community and social scientist philosophers, lawyers cognitive scientist, a lot of work done with music cognition and aia where they resemble so the Machine LearningCommunity Goes into one goes to developmental psychologist what is your best theory for the curiosity of an infant . Or exploratory play to figure out how it works and then to import that and in turn it might be a mocking these questions so there are many many fronts social science is uniquely positioned at that interface with Computer Science. We see many more papers with a verse set of skills among the author and that is encouraging. We mentionein the beginning you have an interesting skill set yourself as a poet and author of prograer does that set of skills helpou see out of blinders that could otherwise be. I could joe poetry and programming haven common except the scrutiny over semicons, yes my philosophy was that when i was a student i was interested in the question what does it mean to have a mind and conscience and the philosophy of mind takes one angle on the question and ai answers in a different way. So broadly speaking a question for 2500 years of western philosophy and they have answered that question comparing themselves to animals and i think theres never been a more interesting time to think about this area because now a completely new standard of preparedness and a whole different set of answers. So if they decide analytical deliver reasoning is the core of what it means to be human because thats what monkeys cant do. Nobody thinks that that reasoning is at the seed of the Human Experience is more about empathy or imagination or social ties and teamwork and collaboration. So for me i feel lucky i have this eclectic set of interest and happen to be alive at a time in history when the disciplines are on a collision course. Do you believe humanlike machines will ever be achieved . There are a numr of things people mean by singularity some it is also calledhe hard take off at that moment in time where ai is improving. I dont see that. Im in the camp of the slow take off ai will just get weird inquiry. More uncanny until we accept it does as the intelligent thing to do but there will not be a sharp elbow turn what happens overnight. For my perspective it is inevitable there is a long history of Computer Science goinback saying you ever think a machine can think . Of course. I am a machine and if you have that secular worldview the brain is made of atoms the comput is made of atoms its this level of complexity. And we are on that road ai just released the language system a few months ago with 175 billion pareters if you compare that to the synapses in the human brain it is 11000 of the complexity of the human brain it doesnt sound very impressive zero. 1 percent. But the average model size doubles every three months so if you do the math that means we should expect models to exist the capacity of the human brain sometime the spring of 2023. That is not very far away. Sooner or later so these questions will come to the surface. Eleanor the answers will be but this is riveting about this moment. Recently the stories of the computers to communicate with each other and spontaneously develop real language to communicate with each other. And in talking about the needs that we have for ai to develop in certain ways and align with that we wanted to do think about solving a problem in a completely different way, is it possible we would get Artificial Intelligence much more advanced and able to deliver more in line with that we want to see that their way of reaching u is totally alien . Thats a eat question. I ink both of those possilities are an option. When we forget about ai that inevitabity caps on to the question of progress but there are choices to be made about the architecture of the systems. For example its already t case with self driving cars you can train the system and to and with the giant ob of networks you put the camera feed in the bottom and the steering wel coming off at the top you have no idea about the middle. There is increasing scice to figu out what is going on but also how you constrain the network in certain ways can you constrain it to modular so that the system naturally divided into the subcomponents to say i know what this is doing let me worry about what is going on over here there is some encouraging results in that space so i think to your question, will ai be able to do what we want that is inscrutable . Yes. But thats not necessarily the only way that can happen. So we will have mo agency there to build the kind of systems that we can. Speaking of Artificial Intelligence. Its funny its interesting to watch the question that i get evolved coming out in 2011 and 2012 asking if its coming for my job in 2015 people were asking me if we will destroy humanity as we knot so it has really gone up. And within the Research Community that cautionary tale has shifted from one that els more like a disobedient system to more like a system like the sorcerers apprentice is trying to be helpful but doesnt know what you wanted to do there is a thought experiment from the Intelligence Research institute so imagine your paperclip facry. So is basically good it turns yourself and your loved es. So that is a caricature that some people work on alignment are worried about. And then to be exterminated to do what it thinks they wanted to do hers is quite specific enough . So part of that alignment problem and then to feel comfortable communicating attention like that without necessarily needing to get every specific detail right then to adapt on the fly and say hold on. And that is the kind of thing to work on this a little more relaxed than we were three or four years ago. A former computer engineer 50 years ago they all wanted to be a system analyst, not a programmer. Talking about how the field is much more interdisclinary are the people that are in the ai field changing . The people s i want to go into theield or switching over from where they were ten years ago . A good question. So the itial point of the systems analyst the head of ai at tesla has a notion of software 2. 0. One. Zero was basic formula but now it is Machine Learning he provide a set of examples to s do Something Like this. There is a debate over software three. Zero you dont get explicit Training Data but you work with it after the fact. And working with three feels mo like it is an essay prompt as a language mod designed to fill in the blank you can use it to do all sorts of things and you can say the following is an argument with the most common objections and what imeans to use a model like that feels le how to work with another person how doou word something so the meaning comes through or t tone that you want or the style thatou want . So there will be a new category of people not exactly programmers the wrangling theseiant mods the National Language and thats a nejob that doesnt really exist and that will require an interesting set of skills becausthe words you use are very important to require intuition of how they are trained to figurout why it might not be doing something. Re broadly, i think as the questions Machine Learning is becoming more and more human it does is invite they may have felt they did not belong to now they do so their skill set can plug into that. That is what we are starting to see and a shift that is just beginning. Who were you hoping to reach with the book . Is it a general audience or the ai community where the entities using the technology . Its a great question. One is the general public relative to other fields in science ai and Machine Learning ethics and safety a lot of people are aware of them whether or not they took the time to understand those underlying issues. This debate is already happening give people that conceptual insight to feel comfortable talking about this. Also there is a huge class of people that went through training to their career that did not seem like it required them to learn about Machine Learning now you are handed these algorithmic predictions. There is a lot of people out there suddenly need some level of familiarity so i hope this fills a need there as well. We hope we can grow the field one of the most exciting important things happening not just with Computer Science but science. So we can get them science excited about their advisor give me a cool project on safety. Lets get started. That for me will feel really good to bring more folks into the movement thats a good thin thing. We have te for one more question this is time travel talk about the early steps to develop ai but what realistically would you exct the state of ai with research and direction as well as actual depyment . 20 years is interesting starting in 1955 it was always 20 years away and still is. But realistically there will be a generational replacement that happens over the time now social media native, digital native a generation like a i native the generation of people just being born now will grow up in a world they may not get a drivers license how can you justify a human driving a car . That is so dangerous. And they will come to understand themselves in the world there are all these different systems different degrees of intelligence or agency different incentives that align with our own to 1 degree or not and the interface increasingly the way people talk to each other. Kids have no problem talking to alexa they find it fairly normal there is a system that you can just chat with which is remarkable even if you think back ten years that was totally not anything anyone was familiar with. So in some ways the boundary will get blurry between the technical skill set to navigate the world of other humans because increasingly they will start to feel they speak the same language you can communicate by gesture the grabs the item you think youre reaching for and it says no the other one or communicating words we will have a new generation thats just the way the world works. And hopefully we will set them up in a reasonably good world at that point. Very good. Author of the book the alignment problem thank you for joining us today. Also thank you to our audience are watching and participating online. For the Commonwealth Club programming visit us online. Thank you good day and stay safe and healthy. Hello deborah. Nice to meet you kathy. Nice to meet you too. I enjoyed your book. Thank you. Lets start the interview talking about you. I would also like the audience to hear your story which is in your introduction you got a note from a professor i believe saying you would never be a polal