comparemela.com

Hello, everybody. Welcome to the American Enterprise institute, thank you for joining us today. We are going to dive right in to talk about the new book everybody lies. The subtitle for about 15 to 20 minutes after that, we will sit down and have a conversation at Columbia University at 5 30 then we will look at the book but hes produced and have the opportunity to defend himself and then we will take questions from the audience and follow the same procedure. Do you want to go first . Thanks for the introduction and for inviting me to this panel. The book everybody lies is about five years of research ive been doing. So i will describe what it is. For the last 80 years, if you want to know what people want, why people do the things they do and what people are going to do, you have basically one main approach. You ask them if you can talk to the survey said they will go out and ask people questions. And there is a problem in this approach which is people tend to lie to surveys to make themselves look good. So if you ask people immediately before are you planning to vote the overwhelming majority of seashore. They dont want to admit that and its kind of considered socially unacceptable to not vote in an election. The general survey asks the men and women in the United States how frequently they have sex, whether it is heterosexual or homosexual and whether they use a condom. You can do the math on this. American women say about once a week and use condoms 20 of the time that these 1. 1 billion every year. American men you ask them the same question and they say they use 1. 6 billion condoms every year in heterosexual encounters and you can see if you think by definition how can it be the same we already know somebody is not telling the truth, so who is telling the truth, men or women. Neither according to the data that tracks every condoms sold in the United States only 600 million or sold. Basically now everyone is lying, men are just flying more than women. [laughter] generally we have relied on surveys because it is the only thing weve had, but we have a new tool to understand which is the searches that people make on google for the last five years and people are very honest and we will tell them things they may not tell anybody else. They will confess to google friends, family members, surveys, even maybe themselves. Why are people so honest on google, theres a couple reasons. You have an incentive to tell the truth, so if you are someone that doesnt usually vote in an election, you do not have an incentive to tell the truth about whether you are planning to vote in the upcoming election, that you have the incentive you probably need information on voting. You may not know where the polling place is because you do not usually vote you have to search in the weeks before on where and how to vote in the polling places and you can see very clearly in the weeks after the election it will be very high in the area. So no matter what people think of the pollsters you can see on the searches they make whether they are actually going to vote. So that is kind of one reason people are honest and it makes sense it is the most intuitive you have the incentive to tell the truth because you need information. The second reason people tend to be honest on google which is more surprising and one of the most surprising things i learned when i started doing research is a lot of people confess false and dense as to google for no obvious reason to use the searches like i hate my boss, or i am happy, i am sad, i am drunk. Why are people telling google this, and i think it relates to the confessional there is something just about saying things and people seem to use it in big numbers to say what is on their mind in almost a complete sentence which i was not expecting at all when i started this research but it definitely caught me by surprise. So what can we learn when we look at these searches . When you look, rock obama was elected and defeated john mccain. There was a big question after did race matter, did people care that obama was black when deciding whether to vote for him, and it was kind of a graphic question that could be complicated by the social busybodies and if you ask americans to overwhelming majority, they would say they dont care. And that is kind of why a lot of people could conclude the postracial society back in the day there was this idea that they voted for obama and said they didnt care that obama was black. So could you use the Google Search because people are so honest and tell things they might not tell anybody else as socially unacceptable attitudes, could use use these to get a real answer that race may have played in the decision . So what i did is i made a map of the search volume and this is the percentage that include a racist word i will not say out loud that you can kind of guess what it is. The first thing that struck me is how common the search was. People are making these searches in about the same frequency as the daily show and economists. As if it wasnt any stretch of the imagination a fringe search. They are mostly mocking. The other thing that struck me is that looked very different from the map i was expected of racism. If he were to ask me where racism is highest against africanamericans in the United States, i would have guessed that racism is concentrated in the south. If you think of the countrys history, we think of racism is having a strong north and south divide. It is the highest, one of the places it is the highest art places like southern mississippi and louisiana. You can also see with dark red is a High Frequency of the searches but its also higher in many places in the north and western pennsylvania and eastern ohio, industrial michigan, upstate new york or illinois. I think the divide reveals it is not north versu north versus sos east versus west. That is much higher and then it kind of drops pretty substantially west of the mississippi river. So, i wanted to see because people are so honest that you use this data to measure how much obama lost, and of course you cannot just compare these two votes for obama because it might be the places by 2008 so that wouldnt be a fair comparison. So i compared it to the previous such as john kerry in the previous election. What you see when you do that is a very strong significant relationship that places the highest search volume at the places in appalachia and michigan that support much more than the previous democrat candidates. And you can start controlling for anything you would like. Controlling for education or demographics were political views and nothing changes the relationship, that was a big factor. Overall, i conclude that obama lost about four Percentage Points of the racism that is much higher than you would get from any other measure. And it was about a one or two percentage point from the increased turnout. When the trump phenomenon was starting out, they said a lot of racially charged comments, and people were questioning how is he doing so well saying these things that hes not supposed to say and was racism driving some of his support. They asked me for the data on the search volume and he set up all of the variables whether it was age or education or economics or trade exposure, the same correlation he could find was the racist search volume so this of course didnt mean that everybody that supports trump is racist but it does mean some of the supporters were and they did try some of the progress in the primary. And i think that there are just all kinds of things you can do in this data. I talk about them in the book whether it is predicting a turnout or measuring child abuse you can measure in the data or ive done data on doityourself abortion and pretty much this book is kind of depressing and horrifying but i put a lot of jokes so you wouldnt notice. Theres a lot of value to know these parts of the human psyche that we do not usually talk about. So i will give one more example of the research that ive done. If you go to the San Bernardino terrorist attack in 2015 when the two americans shot up their coworkers at a party and right afterwards, as soon as this attack happened almost within minutes, you saw a huge spike in the nasty searches about muslims. The number one search immediately after this attack was kill muslims which is another one it isnt clear what people are looking for but they do express these. They really were getting out of control immediately after this attack. Four days after the attack, barack obama gave a speech to the nation and basically tried to calm down some of this islamic phobia because he wants to address these attitudes that were getting out of control. And it was a nationally televised speech that got a lot of attention and was covered by the big news outlets and the speech was fine probably unlike a lot of people in the room im an obama supporter and i found the speech beautiful and spectacular and thought he was at his best. He talked about how it was a moving kind of sermon he talked about the responsibility of all americans not to give into fear and to appeal to freedom in its hour responsibility to treat everybody the same, no matter their religion. So it was a very moving speech and all of the traditional sources really loved the speech and gave it great reviews whether it is in the New York Times or the la times or other organizations. They really hits the south of the park as far as explaining to people why they shouldnt give into islamist phobia. So i did get to see minute by minute i decided to see what happened to the searches for kill muslims and i hate muslims and all of these searches. How did they compare before. I did the comparison and i found not only did they not drop as obama had hoped, but they didnt even figure out the same, they skyrocketed everything they were saying its totally backfired. So this was kind of surprising. There was one line he did give that did seem to have a different response which is obama said we have to remember they are our friends and neighbors. They are sports heroes and the men and women who will die for our country. As soon as obama said this he saw a huge spike in interest that for the first time the top description on google was not muslim terrorists or refugees, irefugeescome it was muslim athletes followed by muslim soldiers. And they stayed up for about a week afterwards. And i think you can kind of compare most of the lines of that speech about the responsibility. There were lectures and sermons and they didnt tell everybody anything they did not already know. You compare that to the line about athletes and military heroes and that was provoking new information. So we wrote this up in the New York Times analysis of the speech and i dont think its crazy when you write an article in the times that some powerful people will see that including people in the president s office, because two weeks later obama gave a speech about islamic phobia but this time in the baltimore mosque. He basically stopped with all of the lecturing in the sermon and talked about how it was anybodys responsibility to do anything. Instead, he doubled down on the curiosity strategy. So he talked about how the Muslim Americans are athletes and soldiers but he also talked about how they are farmers and merchants and said Thomas Jefferson had a copy of the koran and said they built the skyscrapers of chicago. This speech also got a lot of attention it was on national tv. You do see many of the searches immediately after this speech actually did go down, so you saw it drop in the searches for kill muslims and i hate muslims. So those are just two speeches and i will not say that is a science of how you calm down is one of phobia, but it does show the power of the data that you can turn something as seemingly unpredictable as an angry mob into Something Like a science and we are at the point of more Research Needed to be done but now we have these people that are not necessarily a very small number of people, these people do the searches and they may not be picked up by the survey and will not agree to come into princeton or harvard to participate in a laboratory experiment. But they do make crazy searches on google and we can use this to potentially understand how to calm down an angry mob. A lot of times we pat ourselves on the back for these great speeches but they just may be backfiring. So that is the theme of the book as there is so much we can learn about people from all this data is on the internet that did not used to be there. And right now i will take the attacks from the other panelists. [applause] we are going to have a conversation about the book. Its a relatively small part of it. A small part of it is about internet porn. We are not necessarily in the opening conversations so, i very much enjoyed this book. To give the opportunity to share his individual thoughts on the book. I think maybe some people will take lessons from the presentations. Maybe it is too bad that he happened to have done that but its too late for that to stress that any further. Today i went to the museum of American History where they had an old Telephone Directory but it was from the 18 hundreds so it was just a directory from philadelphia and it had each person and from the men in the directory it had their profession. There was the captain, shop lab cooper and agent, the gentleman, the gardener, the cord, gunsmith, someone whose occupation was shoemakers tools, baker and turpentine distiller and a few others. It made me realize theyre used the available data and everybody kind of used to know everybody. So in the one sentence we had this data that we didnt have before and thought how much we can learn. Maybe part of that is a feeling that we need to learn about the data because people are harder to know about. You ask the precinct captain how many could get out to the polls next month, whatever it is. So it is good for us to have the historical perspective. Re everybodregarding everybodyd a couple of thoughts on this. I am actually impressed at how honest people are and i always tell people if you want to find out what people are doing the best idea is to ask them. You wont get it completely right. 60 of americans vote. Maybe 70 will say they were planning to vote or they voted after the election. Certainly more people plan to vote than they actually do. Iit is and lighting if you say you plan to vote but you dont because something comes up. After, they ask people if they voted and in a few Percentage Points more they say they did then they actually do but it is not that far off. When you ask people who they plan to vote for, its very accurate. Hillary clinton at 52 of the vote and actually got 51 . They were off in some states which i do not think the evidence doesnt point to people lying. It points to the differential nonresponse to explain that. Why people are so honest in the poll was given when he talked about motivations. Why you should respond to the survey in the first place, that i have no idea. If someone is trying to make money by shy spent 45 minutes answering someones question thatthat silly so im not goino do it. But if you are going to answer a poll, you might as well be honest and the whole point of answering the political poll is to say yes i support her or yes i support him. In the 1950s, it was a little different in the era of the gallup poll, not that many people were surveyed. You would be one of the 1500 americans. Your vote be counted. Would be counted. You would be in the newspaper the next day. 51 of americans that talked about 1951, 1950s, whatever it was, whatever the attitude was coming he would have a big impact into was rational to respond to. If you are going to respond no, you might as well tell the truth maybe not about how many you are using when you have sent us. I think people might be misremembering. You know, like they dont remember exactly in the past year. When it comes to voting i see no reason to think people are not sincere. Is that true self a person that looks up who you are you dont spend 24 hours a day looking. Who you are when you look up racial jokes is not necessarily your truth either. That is just another aspect of who you are. The question is what is the incentive to tell the truth, so when someone gives a talk that says everybody lies, it raises a certain paradoxical element. There is an incentive to get things right in the sense if you get things wrong, people like me come in and have the goal of discovery and would like to learn things. Im very interested in this idea of Data Journalism. Its become important. Lets say we have three on this panel right here. It is playing a large role in my personal life and in the life that you consider of our society. We had a lot of discussion recently in science about can we trust science just because something is published in a top journal, should we believe it. So a journal like science or nature its like a brand name. We see this in the Data Journalism. When they make mistakes, they tend to correct themselves. Others are just sort of out there. I dont think that he lives but he makes a lot of mistakes. I dont know that there is such an incentive to get things right. Apparently he doesnt think that is an incentive. [laughter] he may have an incentive to make you doubt you never did anything wrong. For me, ive made enough mistakes in my career like im already wet so i dont want to be duncan and the tank one more time. We used to have a saying about politicians that it would be great if they could first to send a politician to prison and then let them out because first we wouldnt have the suspense of when they are going to get caught and all that and they would have much more sympathy having been imprisoned for a while. So maybe it should be required for every dat day the journaliso make some big mistakes right away. Maybe they will save those mistakes for the second book. Is the value kind of organically generated data and limited to people but beyond that doesnt add much to what we already knew . How much did you lie while you were standing there . The only reason i would have called it everybody lies except for me. They compare peoples individuals into the actual voting behavior in the survey, so it is people voting but theres also now a growing problem with surveys a lot of people voted and said they did and which is bizarre. Weve talked about this before and im going to write a column, it played a role in some of the those that are often the random meanderings that are a bigger problem. But we have a definitional difference on what constitutes lying. Andrew thinks you have to be consciously aware that you are deliberately misleading a survey and i think people can lie to themselves and that is an important parts of if you ask people why they did things, then they will frequently not really be consciously aware of why they are doing things. And i think there are certain areas they want to make a book about how the polls were horrible in the 2016 election and that is one of the areas they are best at what one of the areas they are not very good at when you ask people not good at predicting what people do in the future. People are not good about explaining the reasons they did things and people are not so honest with admitting some of their desires, so they definitely this one area would be studied over and over again and they have these new problems but theres areas that will become a smaller and smaller part of understanding the human psyche. D. Want to comment on the book in general . I thought the book was very well written and often times this book was easy for me to understand. Its about who you should meet and i thought i was a good opening and you can get very much into it. In terms of the book itself and the idea of using google and figuring out our true selves i would say first off we are using it more and more to figure out what is popular in the primary and we are trying to figure out who is going to be the main challenger to donald trump and there was a good understanding searching for the candidate. Maybe he might be on his way up but thats the case we were not using them in isolation. I dont think we are at the point yet on the political side that has to do with understanding the percentage on x. Y. And z. Im not saying we are making that case but its something that is going on. They carry through to obama but also there was certainly some correlation where obama was already trending. Its not because people were racist. They could be racist thats why they were changing their mind but it could be the fact that now the racial views are increasingly correlated with how they vote for the democratic or republican parties of it is another thing we have to keep in mind. And i think that continues to be the case. I look at a lot of strange things. Who was on the cast of whos the boss, and maybe i was just interested in it for a particular reason so i just think that on its own it can give us a keen understanding of what is interesting without necessarily assigning a given path of okay, why are they necessarily searching back. Then the final thing i will say that i emailed about, the book is not all about politics. Its about his words and being a yankees fan. This is the type of thing using surveys that actually replicate the findings you see on facebook so we know for instance my own Research Shows the teams. What they vindicated is when they do worse there are fewer fans and we are finding it was something pretty much along the same lines that people were, what is it, eachyearsold when a particular team is doing well and then there are more fans. Conducting polls although they are getting cheaper, a lot of people dont necessarily know how to ask the questions that dont have the ability to go out and conduct the survey and they should confirm the data or make a new finding on something more trivial but that would be interesting to a number of people. Do you agree with that . That example can measure how much they are winning the championship. So you see for example they have a lot of fans who are born in 1978 and 1961 because all these men were boys and you can see its not perfect and that they have a lot more fans. I dont think we can replicate this but there is a small change over time like you can compare the clinic tech survey when they are doing a little better. But you can see the changes over time. In that pattern you need data on how many there are born in 1978 and how many fans there are born in 1979 and 1980. The plaintiff the chapter is you still have a big samples because facebook have everybody, they had samples for any combination. Facebook covers everybody. In the book we also work in some but that is more the sens censum 1810 in a way that the internet organically generated data. We have no idea how the representative searches are and no way to look at the geographical level and no baselines to compare things over time. But as they were saying, to find out what percentage will vote based on the searches to know what they mean as opposed to the relative levels that are more meaningful. So, those are the two separate types of data. The thing about the racist searches, so, yeah, its not a representative same. What if its off by 10 or 20 . Those are still huge differences. It has to do with asking questions that are not youre not demand that level of precision. I agree with that. Thats one of the doubles you one might know. The most famous example of using Google Searches is google flow, and a scientist could tried to preduct the rate of flu in the United States in a given week baked on people making searches for cough or runny nose or flu or whatever on a week, and it kind of blew up a little bit. One of the problems with google flu was that our flu models are really, really good. You can get really, really close by just assuming flu is the same as it was previous weeks so you have on rsquare of point 95 and going data is going to be somewhat noisy, it will be better as we learn hour to weight it but not a perfect data source so became hard its a question whether google can be just a simple model that has been welldeveloped. Think there are certainly areas in health i always say that google constipation makes nor sense than google flu because you would have no information on what is going on in the United States and then the noise of the Google Search data still is going to beat the kind of nil information we have, or close to nil information we currently have. I agree with Andrews Point there. The cdc people doing that . Do you know . Different Health Condition inside. Yeah. People at the cdc who have some time on their handed, might be doing google constipation. I think an the google flu thing when the google flu tom came out they staned dooring google dengue fever and it blew up and lost a lot of momentum. I think one of the thing that interesting as we go forward i kind of hinted in my last answer is that surveys becoming less and less expensive, at least on the entry ground level. Everyone can have a survey monkey poll, and im going to be interested to see how the how Google Searches along the survey monkey searches how all this socalled big data is cheap or small data that is cheap can be used together to figure out what is going on. Right now most of the surveys that were conducting are telephone surveys and that is everyone is very expensive but as we get down and people are starting to form their own surveys, and its not just an expert like you who understanding everything that is going on. That is where i think we could get very interesting things going on, happening, because at the end of the day what is most interesting is not just what we can fine once we determine what we want to find, but coming up with the questions in the first place, which is what you said, which is the questions are most important. What is it right now were not asking that this data can get to us. I dont know the answer to that. What is the answer to that . Do you know . One question we should be able to that we can get cannot currently answer where you see a role for this kind anxiety, because i ive got a lot of Research Just doing research on wearing in the United States and i thought it was going to be new york city and, like, urban intellectuals and its not true at all. Anxiety is high nest bangor maine and kentucky and generally in rural areas and places with lower levels of education. Thats not just Google Search data. You see that in surveys as well. I didnt know that. Because you have the its rich data, you can see how anxiety changes over time, and so i thought the white its when do people search for pan yuck attacks, not surprisingly, 3 00 a. M. Now we know we do the data through the data we know how many people who are having a panic attack in new york city on any given tuesday evening and in washington and having panic attack in bent and say, what happens in that day, what happened leading up to that. Just random . Every day some people have a panic attack and nothing too it or some clear event in the day or two day its before the pan county attack. Assuming people suffering a panic attack people with panic attacks google it. Automatically. I think there are certain situations where thats not true but if you wake up in a pan yuck attack its likely to get to google. Thats another example. I think we have to break the data down. You can say with Health Conditions, people have criticized me book because theyre like i search for all this Health Conditions because im doctor or researcher. And thats probably true. I think in general these research searches are a pretty small percentage and underoverwhelm the data. Then you can rick down if dont the researchers at 3 00 a. M. Are searching for a panic attack. So, the data sometimes its it could get better and a lot of the problems we have with it, if we got better we could take out the people who are searching for other reasons. I think thats fair. Want to go back to the i know andrew can predict the president ial election within ten points based on search data, but there are places like google and facebook who have much more of an opportunity to link data within person and granularly over time. What kind of uses use you see of that kind of data, what kind of riffs would you associate with the databases existing. Thats a very different level of quality of the data than more aggregate data. So, one of the studies i discuss in me book, which is not my study, which is a study by microsoft and columbia researchers, and they study pancreatic cancer. They use binge data, anonymous, and same searchers over many months they know who is searching, just arent looking at the actual name of the person. They lynch they link but dont know who it is. So, then they said, okay, someone probable his has pancreatic cancer if you search. You turn to a Search Engine because its a big event in your life, and then say said here are people who anonymous who we know just diagnosed with pancreatic cancer this month, and then some people who are similar to them and never got a diagnosis of pancreatic cancer, and they have these users over time and they said, what were they searching in in the months leading up to that diagnosis . What symptoms were they searching . What symptoms are predictive that youll have pancreatic tan sir and the found assault patterns that indigestion followed by abdominal pain is a riecks fact you bit indigestion is is not a risk factor. If you are like me youll go home and thing you have panattic cancer when you have indigestion. Talked to researchers doctors dont know our system of diagnosing diseases is not falling symptoms and finding which ones predict. Think thats almost a revolutionary type of medicine and the advantage of pancreatic cancer if youre told you have pancreatic cancer, the earlier youre told you can improve your possibility of surviving. Then what should happen if the Search Engine figures out you have pancreatic cancer and you have a 20 chance of should you be told that . I think, yes, you should be able to opt in but if i have a pattern of symptoms that gave me a risk factor of a disease that could be cured, id want to know about that. Sounds like youre inflicting most of the symptoms on yourself by reading about them and then convincing yourself obviously a problem well have to deal with. But you talk about in a variety of spots in the book about matching people to most similar cases, both on facebook and on twitter and through their health records. Why is that not a especially in healthcare sector, seems like an important innovation. I would do you think those methods arent coming through as rapidly as they could . If a a lot of database not linked and theyre not a huge incentive for people in health to put the data together. So a lot of people kind of trying to go around the official like, Healthcare Organizations and stuff to try to collect their own data. This is one of the things people should keep in mind as they read about these things. A lot of the stories are things about baseball, politics, and forums. Mostly because of the availability. Its not that theres no applications outside of those realms. The three topics im most interested in. Wondered why the entire book its that. Apparently not. Harry, you guys used Google Search dat when you try to make projections. Sort of but not really. If thats the main source of this type of data that you use, where do you see these things going . Where have you moved beyond survey data . Sure. No real no trade secrets going on. We all have access to the same google, unless youre part of a secret club im not aware of. The way we use googleat least one we way havead it ill give you an example. Figuring out the effect of wikileaks during the final months of the campaign. Did that harm Hillary Clinton . And what i have found using the data was nat fact there costume be some correlation to between decide they would vote for donald trump and when searches for wikileaks facts jump up and if you look at the National Poll that might actually be a stronger correlation than when james comey decided to send his nice little letter to congress. Dont think that wikileaks got asked all that much in terms of after the elects. More people were concentrated on comey, but in fact the google data suggested otherwise. And that goes back again to i think the question of were americans being asked about wikileaks in their polling . Perhaps not as much as you might have thought. And also one of the thing is think century day tata is more difficult to get at is why are people voting the way they are . Survey data polls are very, very good at predicting who is going to win and who is going to lose, regardless of perhaps what some people might argue. What theyre not good at is assigning reasons. People give weird answers. Was thinking that the entire time. Was certain. That had no effect. Oh, blah blah blah. Then you look in the polling dat, a their changed or didnt change, which was in direct contradiction for the reasons they gave. Goes back to the question that both of you are tangling with, i dont know necessarily think its thats people are lying. People honestly adopt know what pulls them dont know what pulls them in one direction and what took them over the top. Thats where we use google data. I think, again issue mentioned this before, the other thing were using google data is tracking changing minds in realtime, and in general elections most people make up their mind other few months before at the election, very few people change, but in fact, in primaries people are much more likely to change their minds at the final second, and thats where google trend data is specifically useful. The other thing ill add there are lot of race wes dont havent polling data for. Were talking president ial elections and how many polls a day, 15,000 . Its ridiculous. A poll that could saying anything you want. One ha mcmullen winning or gary johnson, that might have just been people who were high at the time. But many few are polls for lower down races, for house, for senate primaries, and we might get a very good idea ive not seen any largescale studies on this but im very interested to see, as more and more people turn to the internet, whether or not these google trends can be applicable to understanding who has a momentum in a senate or house primary and somebody like myself, i have no clue who is going to win, maybe its so and so who has a turk particular chance. Those are most of the makes in politics but this is such a small part of the book. To me, many more applications in places where perhaps polling is not as vast, whether it be sports or health or whether you know, might be a topic as interesting to me im a huge fan of entertainment, even if we perhaps argue that were not, and there was this whole story about brad pitt appearing on a magazine cover this past yankee and he had this story and it was brad pitt seemed a little sad but i would be interested in seeing whether or not people read that story the same way that the Mainstream Media or those members of the press read it. That particular place where we can get out of the bubble. One of the main lessons of the president ial election, we all live in this bubble in washington, dc. Took the express down from new york. You cant get anymore elitist than that and im interested in can we in fact use the google trends data figure out certain directions where the public is thinking things we might not otherwise think and that we would never think of asking in a polling data question. As opposed to the Current Situation where you only know how bill crystal feels about it. Bill crystal has a trick sense of humor. Ill give him that much, on the twitter a story was i wasnt sure which line was going to be my entrance and i just followed like tyler, who was ted cruzs campaign chief. I thought he would know where he was going even though there wasnt a sign. I was right. Ill open it up to the audience unless you cant to Say Something before you get got get guided in other directions. Well start up front here, please wait for the mic, and briefly introduce yourself and ideally questions end with a question mark but random thoughts are welcome as long as theyre short. Hello . This is my name is Julia Abrahamson and i am a member of the public. Wanted to know, sure through there are commercial companies who are developing methods for going through all of this data because they want to know things to sell. They have strong motivation to make money by tracking peoples behavior. Are you areare there methods that the commercial entities are developing you could use to deal with politics and psychology and photography . I guess thats a commercial interest. Well, many i think your former employer is heavily active in this industry. Want to good ahead and talk about this . Yeah. Guess tends to make money in the stock market. Oh, yeah. Theres definitely a lot of use of this in marketing. I think marketing is one ive been talking to people. Theyre stopping using surveys. They used to ask people what products theyre going to buy and they say they dont correlate very strongly with what products they buy. Dont know why thats the case if they correlate with voting. I think this gets back to the whether the competing sources of information. So, in an environment where you have very little idea who is buying what, then maybe a survey is going to provide information because otherwise all you know is who is selling what and nothing else. Amazon now of course they know so much about who buys what, what i buy and so forth. That creates less of a value for that. You can fine out directly what people are buying. Dont need to ask them. But what theyre going to buy in the future. Right. Always going to be hard to know what people are going to do. Theyre very good data from amazon and other companies. Of course, most of the ads on the internet are based on all of your previous on the previous actions that google and organizations like that are aware of. The if youre in gmail and you see an ad, the ads wow see will be largely determined by the content of your emails and the things you have searched for to some extent. I think the most obvious commercial application. Well, i talk about in the book as well, one thing that tech firms have gone through a lot is they do rapid experimentation. They basically i think affection now does mother experiments in a day than the fda does in an entire year. They can very quickly device their user divide their users in a treatment and control group and show different groups different versions of the site, they can put in a change the font or change the typing or change the wording and see which one gets people to be on the site more or click more or do whatever they want more. I think that is timed to your question how it can kind researchers another liesing the tool. Thats underutilized in academia where experiments are in academia, you recruit a bunch of people over a long period of time and takes a month to set up and do a small experiment, like one or two, 30 or 60 people and where academia can move is use these Digital Tools to run a thousand experiments and say this is what we found and more i think that would make science move faster but i havent seen that done at all. Lets go over here, middle of the table. Jonathan. Im a little skeptical. Hi. Jonathan foles, here just for my general edification. Im curious, you talk about rating google data. How is it biased . Obviously theres going to be a class and age, bias, built in, but when youre looking at a google trend toward the u. S. , are there any unusual ways we perhaps dont think of intuitedtively it is biased . Not representative of the general population . There are lot of random situations they just find when you do it more and more. One thing i found is i was doing all these research and africanamericans searched more in places with high africanamerican populations, not surprising. But if you do these over and over again, one thing found is that d. C. Tends to be an outlierment one reason is the residents 0 are defendant than the people who make searches in d. C. A lot of people commute to d. C. And thats one issue that come up in a data source like this that i wouldnt have thought about before hand but does come up. There are a lot of issues. I think its its unfortunate initially there hasnt been too Much Research on this topic because People Survey people are very skeptical of the data souse but a these starting to change. Pew just did a beautiful study on use Google Search dat and have hardcore methodologists and much more detail oreend than i am and have pages and pages of things they found from the data. Im hoping more and more people are methodologists are studying the data and find biases that you might wake weight to make it better. I was just going to say, as you were basically suggesting on the democratic primary side i found the data be useless. Bernie sanders as 9 trillion searches ahead of Hillary Clinton. Okay, didnt tell me much. Bill gail at the brookings institution, and former employer and coauthor with seth. So, im not but i am biased. One comment, one question. The comment its i thought this was an incredible, really wellwritten, interesting, and funny and inciteful and the question is, baseball, porn and politics are interesting and fine but i thought the most interesting part of the book was this stuff on child abuse because kids cant just talk about it and theres a sense that theyre going on google and saying why does my daddy hit me, and it just seems like an incredibly powerful mechanism that you cant get from survey you. Mentioned you were consulting with some governments on that. Id be interested to hear more about what is happening on that. Yeah. So, there is a disturbing element to some of the searches, and one is that kids do obviously these are not younger kids who dont speak or use the justicer but older kids do searched look my dad hit me or my mom beat me, depressing stuff, going to the its horrifying but i think one of the things that also i think you have soon the data is during the Great Recession there was a big drop in official reported cases of child abuse, which is kind of surprising. You would think all this agency we know about child abuse is that when people are out of work, thats a big risk factor but you say the rise in the searches, disturbing rise in searches of kids and child abuse, and i think what happens is that child just became harder to report child abuse and kind of overworked and Child Protective Services agency were hard to get through, people on hold lines forever. So disturbing that you see kids making searches and the official data is it was really disturbing, and i yeah. Im still kind of talking. To them it moves slowly and im not exactly sure exactly how to use this data, but its kind of a continuing conversation with some of these organizations how they can incorporate this data. Dont know exactly where it will go but i hope i think it is really important. That is an area where obviously i think the official data can be misleading and you wont gate survey asking kids. So i do hope that we can incorporate these data. Somewhat similar area where i think the data comes in more slowly than we would like is on the is drug addiction data. I think there should be searches that should allow you to figure out where opioid overdoses spike before the annual cdc data roll in. I think use for could be pretty useful. Lets go over here to the middle. Im trying to make you run around as much as possible. Good afternoon. Im todd wiggins and enjoying this presentation. I wanted to ask you hypothetically what you see coming down the road in ten years, because im predicting that mindreading is going to be an evergrowing industry. Seems to have developed in my last 20 years from couponing and talking about advertising to modern couponing, which is essentially what were doing with Search Engines. So in ten more years, wed want to know more about how people think, not only forspeaks of advertising but for nasa security purposes or immigration or whatever. So where do you see yourself in ten years, so i know what to invest in right now as far as the stock market is concerned. Yeah. I think kind of a what youre getting at theres a scary element to some hoff this data. Welcome talk about the fun thing order important things like child abuse but the other area is that companies can use the data to take advantage of people and one study i talk about in the book is a study by columbia professors of peer to peer sites and you apply for a loan, and then either get it or dont get it, and they dat on everybody who applied for loans and what they said on a loan application and whether they paid back the loan, and they found you could predict whether you would pay back a loan based on the worded you use in the loan application, and some of the relationships were just kind of weird, like god was one of the biggest predictors. Using god is a big predictor whether you pay back. If you use god your much more likely to default and less like the to pay back. So like a loan like someone giving a loan would be wise from a profit perspective to not give a loan to anyone who said god bless you in their loan application, which is erie. Thats not really how we think of the world. Think in general like, basically, everything that anybody does correlates to some degree with Something Else they do. The correlation between anything is not 0. 0000. So the clothe you ware ask the things you like on faint, the words you use, will predict what you might do in the disputer companies can use that to make better prediction and re have a Legal Framework based on the idea there are the or to four Things Companies know about you and here are the things they cant use. They cant use religion or races but you use these, and i think were now entering world where Companies Know a million things about you and a lot of them are just putting machines theyre not even paying attention to what they mean, just putting correlations in and i dont think the Legal Framework is prepared for that. The same on the government side, i think. I think in the book you emergency the movie minority report. I dont know if you have seen it recently. They had immense processing capabilities in movie and do calculations rapidly to predict murder but dont have the cloud so they are constantly carrying these really large hard drives around. Most of what i think tom crews. Tom cruise obviously governments will be tempted to gain access to this very detailed data on citizens and noncitizens and use them to macfrost and i dont think d make forecast and i dont think theres a of how to predict that. In u. S. We long relied on systems not communicateding with each other to keep the government from using large data sets inch europe theres much more leeway for governments and systems to communicate better. Of course, conversely, europe had much stronger restriction us on what corporations can do with data and much stronger privacy protections. Well see how those two will evolve. Lets go to the left against the wall. Following up on your comment about europe. You mentioned that of course americans tend to be very open on google in terms of expressing honest opinions. Other countries, of course in europe, have a much greater sense of privacy, in asia, of course in japan, you have a certain sense of individual distance and china you have the great fire wall. What kind of how would you characterize responses in countries like china, japan, korea and germany . I havent done as Much Research in other countries, because i dont in the any other language so becomes challenging. A real american. Becomes challenging to do it so i dont know as much. I think it is interesting. The kind of premise of the research in the book is everyone is really honest on google may not stay like that. Someone date paper comparing peoples searches before and after snowdens revelations. They measured whether there was a lowering embarrassing search rate when snowdens rev layings were mid. We found there was a drop. It could be that in the future, people make fewer of these searches. Thats part of the paper. The categorizing increase in the number of searches for snowden . Probably. But that probably the category is embarrassing searches and started asking people how embarrassing is this search and mores of them war child porn or really, real like herpes, like you expert, and one on there that made the list is nickelbeck. And that also dropped apparently after snowden [laughter] lets go over here and i was just wonder i havent read the book so i might be exposing myself for not finishing it but did you look interest the along olines pan treeatic cancer, what too terrorists search for right before the commit the crime and everybody who does a mass shooting searches this but not everybody to searches this does a mass shooting. The government may be doing that. I dont know. Dont have access to individual level searches. Im a little bit skeptical. Just looking at the absolute number is think people make a lot of horrible searches, more than you expect. May just be that the false positive like the false negative ratio is higher than we expect. People have bad thoughts more than we thought, and that, like, the government really shouldnt be intervening, not just because of legal and privacy reasons but just because of data science reasons that a lot of people are in the same group and look horrible and never go through with it. I think this relates to the appoints of questioning whether what someone does that this most embarrassing is their truest self. You could imagine a super google of the future that it wont even need you to search. Just track your thoughts at all times and then you say, well, then what is your true self . Whatever thought you a have the most frequently. Might have to do with, like, going to the bathroom or whatever. Because every once in a while you have to go. I dont question a lot of what you found, but i again getting back to the title of people lying, this idea theres a truth out there which is what people are really like. Think we have to watch out about being sort of identifying truth with the latest technology. I mean, i agree theres different actually came up, tim woo, a colleague of yours at columbia. He said that, like, he he also brought this point. What is the real you . And he is like, one study is you compare searches for gay porn and people who say theyre gay and that in states where its hard to be gay, like mississippi or tennessee, theres lot fewer people who say there are gay but almost as many gay porn searches. So it seems like a lot of men in my reading in places where its hard to be gay, make gay porn searches. Might be married to a woman or tell facebook surveys theyre not gay. My assumption is, that means theyre gay and tim i like, well, does that mean theyre gay . Maybe theyre not gay. I just thought it was obvious in that case that means theyre gay but i see tims point. Well, theres something called the tyranny of measurement or labeling. Its been count it out having the word gay or the expression gay sort of changes things just the way people can think of themselves, its not something i do, its something i am. But also we talked about this earlier, youre example in the book about the efficacy of different schools and you compare them based on how effective they are in raising kids test scores or getting them into top colleges and thoser measurable things, yet people go to school for other reasons than to get into top colleges and to get higher test scores. So sometimes its super clear. Some of the clearest examples through these medical examples but a the outcome is kind of clear. This what you want. Or an election, who is going to win the election other. Cases like obamas speech or the outcome of a school is tougher because its not always clear what you can measure. Of course at a stat statistician i dont say dont measure things but we have to be aware of following in love with what we found. The thing about dat explosion is we can tell richer stories and more complex stories and get much. Measurements itch dont think the available of Google Searches or facebook or twitter will mean we now have fewer things to measure. We now have way more things to measure. Fully agree with you, you have to be careful, not just one measurement. I guess also what youre saying i we have the ability to measure things that are perhaps a smaller part of the population. No matter what state you are most men are straight or prefer to have sex with women than with men but we can now get in on that pick population, that small we help study that in the way that perhaps a larger survey where you poll and people get maybe 10 or 15 or 20 people who match that description and there was no real way to tell x, yoz about them but now we have a real way to get on the specific things. Lets stay away from more specific levels in this area and go to heidi. Him a thank you for the presentation im looking forward reading the book. One observation. One is that the Health Communication stub is supertricky because if we try look at how many people are searching ebola and me might think a lot of people have ebola. And have questions about how representative Google Searches are. Thinking just about the number of americans who have access to internet, 85 of herons have access the internet, particularly 40 of over 65 dont use the internet. Wonder how that might skew your the second question is whether theres a change over time because of the increase of number of time people are using apps rather than google. How does that skew your data . I think the 15 not using the internet, that will get smaller and smaller over time. Just kind of gets to the point that the data is not perfect. Definitely 0 not going to be 100 correlate with the population but let other people worry about that. Yeah. I think the first thing that is striking when you first search google trends and now in the public, is how powerful the patterns are and more it could have been done wouldnt have surprised anyone if Google Searches calm out and it was all this noisy, crazy datament the bible is searched more in, like, new york and least in mississippi. And so kind of the data does tend to work as best we can tell pretty well and will get better over time. Dont know too much about the app stuff. Havent looked into it. But i mean i think changes over time are longterm changed can be tough to measure with this search data. One of the one thing you see is searches for science went down over time, the percent of searches that include the word science, and some people use that as this is showing that americans are losing interest in science, but i think just the earliest observations of google were much more interested in science. Longterm trends can be problematic. I wanted to pick up on your questions nice were kind of critical, and i think criticism is so essential to making all of this work. Not criticizing seths book a few months ago there was something on the internet and somebody who i guess maybe someone wrote a paper and it was passed reasons and said people are most religious in the midwest, not the south, and then there was a lot of, cud chewing about what that meant and someone had used some religion data that was a mashup from several different churches and turn out some churches report attendance in a different they want other churches the data were complete crap would have been better to use google, im sure. What happened, people was, this is hard numbers, lets start explaining them. And its one thing that we sort of lack is a great way is a way to sort of engage with these claims. Way for poo tome take a claim and a claim and say this is kind of interesting could be wrong, lets bounce it around. Student cant die either. Social science is terrible at doing this. Dont know if journalism is better or worse but i think that it what i was getting at earlier, with the rise of Data Journalism or google science or whatever you want to call it, think this is a great opportunity for us to try to figure out how we can be skeptical without being nihilistic, and i wonder what you think about this. I definitely agree with you. A lot of most of the dat in my book is public and kind of i think a lot of people are we talked andrew and i talked about the peer review versus Data Journalism. I think from my experience, writing a Data Journalism article which with which maybe 100,000 people read is a lot harder than writing an academic paper that five of your rivals buddies read and attack. So i think i get emails from grad students and undergrated grads not infrequently critiquing my assumptions and making sense of them. Think if we can do a Data Analysis that is very public, that may be a better way to move this thing forward and what they got a they put themselves on the line. They say theyre judged afterwards by how good their predictions are. So thats kind of very powerful and their audience is better than the other ones. I will say we get if i Say Something wrong im thrashed for it. My email, my twitter feed, i dont even know what else dish dont income anyone has tried to call me yet or send me a snail mail, although a few people have asked me that was interesting. Agree with you. This type of anything that is open that allows people to get a look underneath the hood and everyone has an equal opportunity to do so ensures that the data process and what we claim from data is were all better off for it. Right . If i have my secret little data set im claiming im making x, y and z from, you have almost no way of checking it, even from the public polls the underlying survey data is not released until six months plus and no reason that anyone will give that polly data over, versus this data to tends to be more open and we ick make the determination whether or not this correlation makes any sense, whether or not this data was looked at in the wrong way, and to me thats very important. There are a lot of people these days who have access to the internet and can write a lot of crazy things and make a lot of crazy claims and people will believe them. And it is very important to me that were able to check those. Some cases people will believe no matter what but with this kind of data we can check them right on the spot and anybody who has knowledge who wants to figure out what the truth its, can do so. I agree. The social sciences have sevier weaknesses in speed with review processes were and the small numbers in basically every sub subarea and i dont think the numbers are always small for a good reason. Five people have written bat about a topic, and many more people can a little more transparent than typically is. So i think thats helpful. I got phone calls a few times. Wrote about a case one time and all men doesnt surprise me. From massachusetts called to thank me and call me an american hero. Lets go back to the left here. Second table. Yes. Thank you. Hi, joe schultz with scripps. One thing that came it earlier was being sure to ask the right questions. Another element is asking the right people, and one thing i think is true with polls, with goggle, is that its not necessarily representative, and theres this almost calibration step that we need to make. Think someone mentioned the whole sanders vs. Clinton google trend responses having ahere response from sanders. Absolutely. But after a couple of come calm comcast you get results. Some poll wes had pretty big misses with public polling in wisconsin and ohio. A shame. How do you go bat making sure youre able to calibrated your dat so youre able to kind of validate yourself and make sure youre right Going Forward . Goetz back to the sample. Its challenge with google data. With surveys you can ask people demographic questions and Party Identification and it turned out, as we said during the campaign itself, that a lot of the fluctuations in the polls were clearly attributable to changes in relative nonresponse of gift gift different groups. The google dat extra you dont have peoples demographics. I think the one thing i will say is that i think youre right, that the longer the dat is around the more you can calibrate the models and know how to weight them and stuff. If with the predicting election things, you cant predict an election from the volume of searches people make. They search for trump but doesnt mean they like him. Might mean they hate him. One thing found that reasonably correlated is that you can predict which way people are going good based on the order in which they search the candidates. So, like, 22 of search with the word clinton in it also cloud the word trump so people search for clinton trump polls or debated but if they go clinton they go clinton trump polls compare that to state voting theyre much lisch likely to go clinton, if they good trump clinton, theyre subtle indicators will be predictive and as we do more and more elects we can weight them and calibrate them and figure out whether you can add information to the polls. I think we have time for one last question. A quick one. And then well wrap. Hello. With the possibility of Artificial Intelligence bots do you think that some of these searches could be manipulated . Thats one of the values Google Search data relative to other sources is that google has the smartest people in the world fishing out who is a bot. I think a lot of searches are by bots but google puts a lot of energy into eliminating those searches because advertisers dont want those to be included in the data. I kind of didnt an arms race between google and the hackers but a bigger issue if you take a random search. Going has the smartest bots in the world working on this. Is that right . All right. With that i thing were done. There should be wine and cheese somewhere. Thank you again for coming and have a good evening. [applause] booktv is on twitter and facebook. And we want to hear from you. Heres a lack at some upcoming book fairs and festivals happening robbed the country up next on after words, new America President and ceo an marie slager examines the intersection of technology and Foreign Affairs the book the chess board and the web. She is interviewed by Dennis Mcdonough, former white house chief of staff in the Obama Administration and visiting senior fellow from the Carnegie Endowment for International Affairs program. Hello, everybody. Im Dennis Mcdonough and im your host today. With me is annemarie slaughter. She is currently the president and ceo of new america. She was formerly the director of policy planning at the state department and formerly dean of the

© 2024 Vimarsana

comparemela.com © 2020. All Rights Reserved.