Evan Soltas
Dec 16, 2015

Rise of Hate Search: Follow-Up

by Evan Soltas and Seth Stephens-Davidowitz

We are adding more detail than we could fit in our op-ed in The New York Times. For all data files, refer to Seth's website.

Simplest Prejudice Measure

The simplest prejudice measure we used was the search “Muslims are ___” that was completed with a negative adjective.

We estimate the top 5 negative searches of this form for Muslims are “Muslims are evil,” “Muslims are terrorists,” “Muslims are bad,” “Muslims are violent,” and “Muslims are dangerous.”

The reason we started with this measure is it is possible to get a similar measure for other groups, which we also did and will more fully explicate in a piece in January.

Racial Threat

To test racial threat versus the contact hypothesis, we looked at anti-Muslim searches in the 10 counties with the highest proportion of Muslims.

These were found here.

We used the negative-adjective searches discussed in the previous section. These included “are Muslims evil?” and “Muslims are evil.” The volumes were found on Google AdWords. Unfortunately, Google AdWords, the only source that gives county-level data, does not include search rates. So, instead, we estimated total searches based on searches for the 10 most common Google searches in the United States, as found on Google Trends.

The search volumes and calculations are in the file MuslimsUSRates.csv.

Note that, even if this suggests that proximity does not lower discrimination, there is strong evidence that organized and facilitated intergroup contact may reduce biases, as a meta-analysis by Pettigrew and Tropp finds.

Islamophobia and Anti-Muslim Hate Crimes

We compared anti-Islam hate crimes to a bunch of searches that both may suggest Muslims are in the news and that Islamophobic is high.

The simplest thing to do is just use the measure of prejudice we developed previously. This is search volume for “ “Muslims are evil,” “Muslims are terrorists,” “Muslims are bad,” “Muslims are violent,” and “Muslims are dangerous.” It was the measure we used to compare prejudice against other groups and in different locations.

At the weekly level, they were highly correlated (r=.16; t=3.7). They were even higher-correlated restricting the data since 2008, when the Google data has become less noisy (r=.26; t=4.78.

They were also highly correlated at the monthly level (r = .37; t=4.57).

Anti-Muslim hate crimes were not similarly correlated to any other prejudice, using this simple, blunt measure.

In addition, the relationship was not explained by trends or monthly factors. (All this data is available at WeeklyPrejudicePlusHateCrimes.csv and MonthlyPrejudicePlusHateCrimes.csv.)

However, to best predict what searches matter and what don’t, we downloaded a large set of weekly searches and compared it to weekly hate crimes. Since the goal was prediction, we just want to let the data speak to what searches best predict hate crimes. We used about 35 common search phrases related to Muslims or Islam, which we found by using Google Correlate, the “top searches” feature within Google Trends, and Google auto-complete. In pre-processing, we normalized all series to means of zero and standard deviations of one. The LASSO selected 12 terms, yielding an L1-norm of 0.82.

Some were obviously Islamophobic, such as “I hate Muslims.” Others had some clearly non-Islamophobic uses. One of the striking things in the data, however, is that even seemingly innocent searches, such as Koran, include many potentially Islamophobic searches, such as those related to burning the Koran.

We put all the data in an OLS Lasso model. The Lasso model generally chose shorter searches -- one or two word searches, rather than many-word searches. A probable reason is that these data were much less noisy at the weekly level. We chose the constraint on the L1- norm by 10-fold cross-validation, minimizing mean squared error. We also tested a Poisson LASSO regression model, which is more appropriate for count data; this yielded virtually identical results and predictive power.

We could explain about 10 percent of the weekly variation with Google searches and about 25 percent of the monthly variation.

Our initial data was through 2013, which was the only data available online. However, we recently obtained new monthly data from 2014. The model was just as strong predicting this new, out-of-sample data, which is strong evidence for its reliability.

The data and R code can be found at LassoData.csv and LassoCode.csv and hatecrimepredictors.csv.

We are writing a full paper on these results. We are also examining to what extent prejudiced searches towards other groups can predict hate crimes against those groups.

Response to Obama’s Speech

Searches During Obama’s Speech.csv includes data for the minute-by-minute search response to Obama’s speech.

AthletesTerroristsSoldiers.csv includes the hourly data on searches for “Muslim terrorists,” “Muslim athletes,” and “Muslim soldiers.” It was the data used to make the accompanying graphic.

Response to San Bernardino

MuslimsBelieveKill.csv shows the paths of a likely-hateful search (“Muslims kill”) and a likely-informative search (“What do Muslims believe?”) after the San Bernardino attacks. Both rise, but Muslims kill rises far more.

We also have data on a large set of such searches, that are available upon request.

Political Responses to Terror Attacks

SyrianRefugees.xlsx includes data on daily search volumes for several common positive and negative searches about Syrian refugees from September 7, 2015 to December 2, 2015.

CloseMosques.xlsx includes hourly data on all searches including the word “mosques” and several common searches that suggest support for closing mosques.