- Split View
-
Views
-
Cite
Cite
Tobias Konitzer, Jennifer Allen, Stephanie Eckman, Baird Howland, Markus Mobius, David Rothschild, Duncan J Watts, Comparing Estimates of News Consumption from Survey and Passively Collected Behavioral Data, Public Opinion Quarterly, Volume 85, Issue S1, 2021, Pages 347–370, https://doi.org/10.1093/poq/nfab023
- Share Icon Share
Abstract
Surveys are a vital tool for understanding public opinion and knowledge, but they can also yield biased estimates of behavior. Here we explore a popular and important behavior that is frequently measured in public opinion surveys: news consumption. Previous studies have shown that television news consumption is consistently overreported in surveys relative to passively collected behavioral data. We validate these earlier findings, showing that they continue to hold despite large shifts in news consumption habits over time, while also adding some new nuance regarding question wording. We extend these findings to survey reports of online and social media news consumption, with respect to both levels and trends. Third, we demonstrate the usefulness of passively collected data for measuring a quantity such as “consuming news” for which different researchers might reasonably choose different definitions. Finally, recognizing that passively collected data suffers from its own limitations, we outline a framework for using a mix of passively collected behavioral and survey-generated attitudinal data to accurately estimate consumption of news and related effects on public opinion and knowledge, conditional on media consumption.
Introduction
Social scientists have long been interested in the relationship between public opinion and news consumption to understand how exposure to news shapes public opinion and knowledge (e.g., Iyengar and Kinder 2010; Levendusky 2013) and how people sort into certain news sources (Prior 2013). Researchers have examined these questions with aggregated news consumption compared with aggregated behavior, for example consuming Fox News and voting Republican, at the market level (DellaVigna and Kaplan 2007; Martin and Yurukoglu 2017). More commonly, social scientists (and journalists) rely on survey data that jointly measure public opinion and news consumption (e.g., Stroud 2010; Garrett 2019). However, survey reports of consumption behavior can suffer from measurement error because of memory decay, inaccurate estimation of answers, or social desirability bias, wherein respondents want to signal that they do something socially desirable, such as voting or consuming news, even if they do not (Bernstein, Chadha, and Montjoy 2001; Schwarz and Oysermann 2001; Krumpal 2013; Guess et al. 2019).
Even as concerns over the data quality of surveys have increased (Groves 2004; Groves and Lyberg 2010; Shirani-Mehr et al. 2018), a new source of “passive” behavioral data has come into existence with billions of people around the world regularly volunteering their opinions in social media, online data, and other forms of digital records. As has been pointed out (Gonzalez-Bailon et al. 2014; Diaz et al. 2016; Schober et al. 2016), these new “big data” sources are noisy, hard to work with, and often based on unrepresentative samples; and for those reasons, actively collected survey data are still the dominant method of recording and dissecting public opinion. For the purpose of recording behavior, however, passively collected behavior data have three important advantages vis-à-vis survey data: first, they are less vulnerable to social desirability bias; second, they do not suffer from recall error; and third, they allow for flexible interpretation of the quantity under investigation.
In this paper, we showcase these three advantages of passively collected measures of news consumption by comparing them to current “gold standard” survey data. After an introduction and literature review (Section II, Relevant Literature), Section III, Data and Methods describes the survey data, which we draw from two of the most widely used and respected survey research organizations, Pew Research Center and Gallup, and the passively collected behavioral data, from Nielsen and ComScore. Section IV, Results presents our main results. First, we compare estimates of television consumption from the two types of data, confirming previous findings that surveys yield dramatically higher estimates of news consumption, but adding some nuance based on question wording. Second, we build on the television results, showing a similar relationship for social media and other online news consumption (in both level and trends), but the opposite relationship for overall time spent on social media. Third, we illustrate how behavioral data can better illuminate consumption where the precise threshold for what “counts” as consumption is itself subject to reasonable disagreement. In Section V, Discussion we argue that passively collected data are less prone to recall and social desirability bias and are more flexible for many research questions. Acknowledging, however, that survey data retains some important advantages over passively collected data, Section VI, A Way Forward: Combining Survey and Passively Collected Behavioral Data outlines practical guidance for researchers who wish to work with a mix of the two for more reliable results on attitudes, conditional on behavior.
Relevant Literature
Several previous studies have compared survey and passive reports of television news consumption. Prior (2009) compared self-reported network news consumption, derived from consumption measures in the National Annenberg Election Survey (NAES) 2000, to passively collected consumption data from Nielsen’s “people meters.” The survey self-reports overestimated the size of the network news audience by a factor of 3.4: “According to Nielsen, between 30 and 35 million people watched the nightly news on an average weekday. Based on NAES self-reports, that number is between 85 and 110 million for most of the year” (Prior 2009, p. 133). Dilliplane, Goldman, and Mutz (2013, p. 237) also acknowledged the dangers of relying on self-reported frequency of television news consumption, and paraphrase the consensus well: “Given the tendency to answer quickly, respondents likely rely on shortcuts to come up with off-the-cuff estimates, thus reducing exposure measures to little more than self-assessed levels of political interest.” They proposed a list-based measure asking: “Which of the following programs do you watch regularly on television?”
Even if survey data were perfect, there is a practical limit as to how many types of consumption questions can be included in a survey, while passively collected data collection is much less constrained. Although the approaches described above can collect more accurate reports of news consumption, long lists of programs and news sources stretch the limits of respondents’ ability and willingness to answer. Shorter lists, on the other hand, cannot capture consumption of less prevalent news types (LaCour and Vavreck 2014). In addition, researchers are generally more interested in how much people consume various types of news (and potentially when) than simply whether they ever do (Dilliplane, Goldman, and Mutz 2013; Prior 2013).
Similar to past results for television, survey measures of online news consumption and social media use hint at overreporting. For example, Guess (2015) experimentally compared several survey questions about news consumption and concluded that the open-ended question reduced stated consumption, more accurately capturing the sites that respondents visited. Subsequently, Guess et al. (2019) compared survey reports of social media posts with respondents’ Twitter and Facebook data, finding that the two measures correlated but that many respondents overreported the number of posts they made.
Despite these well-documented concerns about measurement error in survey reports of news consumption, researchers and journalists alike continue to rely on survey data to estimate the frequency and quantity of news consumption. According to Google Scholar, one of the surveys examined in this paper—Pew’s annual “News Use Across Social Media Platforms”—has well over 1,000 citations, including 812 for its 2016 iteration alone. News consumption patterns are evolving rapidly, and the data collected about these patterns are changing as well, so it is important to keep examining this relationship. Because both types of data are vital to our understanding of news and public opinion, we should continue to consider not just how they contrast, but how they can work together.
Data and Methods
In this section we describe the passively collected and survey data we use in this paper and the methods we use to create comparable estimates. The ideal data to explore news consumption would passively and unobtrusively capture all television, online, and mobile news consumption of a representative sample over time. It would also include responses to questions about that news consumption, both how much of what people recall consuming and how this consumption impacted their beliefs. But real survey data only captures a few scattered questions, at a few points in time, and passively collected data does not perfectly cover all possible consumption. Acknowledging these concerns, we describe below the most complete passively collected data in the literature and the most respected survey data, and how we construct the best possible comparisons between them.
Passively Collected Behavioral Data Sources
We use three data sets containing passively collected data, all spanning from January 1, 2016, to December 31, 2019, on television viewing and web browsing. Each source is collected by an established firm with clear opt-in policies and protection for privacy and data security. The samples are large and based on random selection; however, the data are not free from representation and measurement error. All the data collection is meant to be as unobtrusive as possible, to minimize any behavioral differences of people while in the panel. Collectively these sources combine to provide a uniquely comprehensive view of news consumption across different modes, countering the chief criticism of recent work critiquing passively collected data (Barthel et al. 2020).
National TV Panel: Nielsen’s National TV Panel collects data on who watches which television shows. The data are used to estimate audience size and set prices for advertising slots. To recruit panelists who are demographically and geographically representative of the US population, Nielsen selects a random sample of addresses. Panelists participate for at most two years, and panelist turnover happens continuously. At any given time, the TV panel contains approximately 100,000 Americans in approximately 40,000 households.
In each participating household, a “Nielsen Box” is installed on all televisions. The box tracks the program and station that the television is tuned to on a minute-by-minute basis, including content consumed live and (digitally) recorded. All data are tracked passively, except in multi-person households, where panelists must manually record via a button on the box who is present (a potential source of bias). The resulting data set is a log of minute-by-minute individual-level consumption of national programing. It does not include strictly local programming (including local news), nor does it include streamed content.
Desktop Web Panel: Nielsen’s Desktop Web Panel collects data on web browsing from panelists’ desktop computers. The Web Panel ranges in size from approximately 90,000 people in January 2016 to approximately 65,000 in December 2019. The Desktop Panel is recruited through a mixture of methods, including phone samples, and participation is limited to two years with continuous turnover.
In each participating household, software is installed on all desktop computers that tracks the websites visited in the computers’ web browsers on a second-by-second basis. All data are collected passively, except in multi-person households where panelists must mark which household member is using the computer. The resulting data set is a time-stamped log of second-by-second browsing history, including the website URL and the amount of time spent on each website. We constructed a set of weights for each member of the National TV Panel and each member of the Desktop Panel. Weights were created using iterative proportional fitting (Fienberg 1970). Each set of weights matches the panel members to gender, age, race, and education counts for the US adult population from the 2018 Current Population Survey.
Digital Traffic Data: The third passively collected data source we use comes from ComScore’s aggregated digital traffic data. ComScore produces estimates of traffic to publisher desktop and mobile locations using its “Unified Digital Measurement” method. This approach involves census-based site analytics (tags on participating websites that capture visits from the website side) and panel-based audience measurement data. ComScore maintains desktop and mobile user panels, which are similar to Nielsen’s Desktop Panel. The combined approach allows ComScore to validate its panel data with the census data, and vice versa. The data we have are not at the individual level; instead, they are aggregated consumption by site, broken down by key demographics and time (e.g., US adult men in a given month). Because the data are already aggregated, we do not have any details of the weighting methods.
Survey Data Sources
We use data from two survey companies, Gallup and the Pew Research Center, on account of their reputations for methodological rigor and their broad public impact via citations in the media and academic press. In other words, if estimates of news consumption from Pew and Gallup are systematically biased, it is not likely because the surveys were conducted poorly, but rather because survey data are intrinsically prone to bias. Moreover, because the published findings of these two organizations are invoked so frequently by such a wide range of actors across the political spectrum, any bias is likely to have ramifications for public understanding and potentially policy. Finally, we focus on polls that come out in a series to capture trends and because they represent the organizations’ strong commitment to tracking news consumption over time.
We use two surveys from Gallup, sponsored by the Knight Foundation: “American View: Trust, Media and Democracy” (Wave One: N = 19,196, 8/4/2017–10/2/2017; Wave Two: N = 20,046, 11/8/2017–2/15/2020) (Knight Foundation/Gallup 2017, 2020), which measure television, online, and social media consumption. For each wave, conducted by mail, Gallup selected a random sample of US addresses and oversampled young Hispanic and African American adults. Weights account for the probability of selection and were raked to gender, age, race, Hispanic ethnicity, education, region, and population density totals from the most recent Current Population Survey for the 18 and older US population.
Pew conducts polls with its American Trends Survey, an online panel recruited through random sampling of residential addresses. Pew recruits panelists by phone and mail to increase the coverage of non-internet households, and then weights each data set to match the US adult population by gender, race, ethnicity, partisan affiliation, education, and other categories. We use two studies that capture television viewership: “Trump, Clinton Voters Divided in Their Main Source for Election News” (N = 4,183) (Gottfried, Barthel, and Mitchell 2017) and “U.S. Media Polarization and the 2020 Election: A Nation Divided” (N = 12,043) (Jurkowitz et al. 2020). We also use several iterations of the “News Use Across Social Media Platforms” (N = 4,581; N = 4,971, 8/8–21/17; N = 4,654) (Gottfried and Shearer 2016, 2017; Shearer and Matsa 2018) for online and social media consumption. The precise field dates for all surveys are provided in table 1.
Firm . | Date . | Wording . | Consumption estimates for US adults . |
---|---|---|---|
Gallup | 11/8/19–2/16/20 | Please write the name of the specific news source you use most often; this could be the name of a television channel or program, a newspaper, a website or app, a radio program, magazine or other source. | FOXNC: 13% |
MSNBC: 4% | |||
Pew Research Center | 11/29–12/12/16 | Thinking specifically about the 2016 presidential campaign, did you get most of your news about this topic… | FOXNC:19% |
If television: Which television outlet or program did you turn to most often for news about the 2016 presidential campaign? | MSNBC: 5% | ||
10/29–11/11/19 | What news source do you turn to most often for political and election news? | FOXNC:16% | |
MSNBC: 4% | |||
10/29–11/11/19 | Please click on all of the sources that you got political and election news from in the past week. | FOXNC: 39% | |
MSNBC: 24% |
Firm . | Date . | Wording . | Consumption estimates for US adults . |
---|---|---|---|
Gallup | 11/8/19–2/16/20 | Please write the name of the specific news source you use most often; this could be the name of a television channel or program, a newspaper, a website or app, a radio program, magazine or other source. | FOXNC: 13% |
MSNBC: 4% | |||
Pew Research Center | 11/29–12/12/16 | Thinking specifically about the 2016 presidential campaign, did you get most of your news about this topic… | FOXNC:19% |
If television: Which television outlet or program did you turn to most often for news about the 2016 presidential campaign? | MSNBC: 5% | ||
10/29–11/11/19 | What news source do you turn to most often for political and election news? | FOXNC:16% | |
MSNBC: 4% | |||
10/29–11/11/19 | Please click on all of the sources that you got political and election news from in the past week. | FOXNC: 39% | |
MSNBC: 24% |
Firm . | Date . | Wording . | Consumption estimates for US adults . |
---|---|---|---|
Gallup | 11/8/19–2/16/20 | Please write the name of the specific news source you use most often; this could be the name of a television channel or program, a newspaper, a website or app, a radio program, magazine or other source. | FOXNC: 13% |
MSNBC: 4% | |||
Pew Research Center | 11/29–12/12/16 | Thinking specifically about the 2016 presidential campaign, did you get most of your news about this topic… | FOXNC:19% |
If television: Which television outlet or program did you turn to most often for news about the 2016 presidential campaign? | MSNBC: 5% | ||
10/29–11/11/19 | What news source do you turn to most often for political and election news? | FOXNC:16% | |
MSNBC: 4% | |||
10/29–11/11/19 | Please click on all of the sources that you got political and election news from in the past week. | FOXNC: 39% | |
MSNBC: 24% |
Firm . | Date . | Wording . | Consumption estimates for US adults . |
---|---|---|---|
Gallup | 11/8/19–2/16/20 | Please write the name of the specific news source you use most often; this could be the name of a television channel or program, a newspaper, a website or app, a radio program, magazine or other source. | FOXNC: 13% |
MSNBC: 4% | |||
Pew Research Center | 11/29–12/12/16 | Thinking specifically about the 2016 presidential campaign, did you get most of your news about this topic… | FOXNC:19% |
If television: Which television outlet or program did you turn to most often for news about the 2016 presidential campaign? | MSNBC: 5% | ||
10/29–11/11/19 | What news source do you turn to most often for political and election news? | FOXNC:16% | |
MSNBC: 4% | |||
10/29–11/11/19 | Please click on all of the sources that you got political and election news from in the past week. | FOXNC: 39% | |
MSNBC: 24% |
Results
When making comparisons between the survey and passively collected sources, we must make some assumptions about what exactly the survey questions mean to respondents. We clearly note possible mismatches between the two types of data that could bias our conclusions. We take a conservative approach in all comparisons, leading to the smallest possible discrepancy between the estimates from survey and passively collected data.
Cable Television News Consumption
For the first comparison, we construct measures of consumption of Fox News (FOXNC) and MSNBC. We chose these two stations because they are relevant to broadly shared concerns about the role of partisan media in driving political polarization. For example, in Kur (2020) a MSNBC commentator posits that there is a Republican/FOXNC echo chamber, and in Vespa (2020) a Republican commentator posits that there is a Democratic/MSNBC echo chamber. We show in table 1 how surveys use different wording to ask questions about television news consumption, even within the same firm, leading to different estimates. From the passively collected data, we construct comparable estimates of the percentage of the population watching these news sources. Consistent with the literature (Prior 2013), we count a “session” of television viewing as any six-minute viewing session, live or digitally recorded.1 We estimate the percentage of US adults watching one or more sessions, two or more sessions, and so on, of FOXNC and MSNBC respectively in a given month.
The bars in figure 1 show the percentage of US adults watching k or more sessions (six-minute blocks) of FOXNC or MSNBC in November 2019. The x-axis shows the minimum number of sessions watched in the month. Note that the survey estimates of the number of people watching FOXNC in the past week is three times higher than the estimate from the passively collected data of the number of people who watched it in the month for at least one six-minute stretch. Similarly, all three survey estimates of FOXNC consumption are higher than the estimate of the number of people who watch just two six-minute stretches in the month from the passively collected data. The surveys are downwardly biased here because the question is not confined to just television, but to any source.
Figure 2 displays the percentage of the population consuming one or more sessions of FOXNC or MSNBC any week from 2016 through 2019. Pew’s estimates of consuming a given station “in the past week” are 3.5 times higher than any week in 2019 for FOXNC and 3 times higher for MSNBC.Figure 2 also allows us to consider Gallup and Pew’s estimates of how many people rely on FOXNC or MSNBC as their primary source of information. Pew’s estimate for FOXNC as the news source people “turn to most” in the 2016 election is 1.5 times higher than the corresponding estimates from the passively collected data January through October 2016. Gallup’s estimate from November 2019–February 2020, which is lower than Pew’s November 2019 estimate, is about 1.3 times higher than the peak week in 2019. Our results suggest that FOXNC watching is overreported in survey data, which conforms with earlier literature about television news in general (Prior 2009, 2013). Conversely, passively collected estimates of the percent of adults consuming MSNBC each week is upwards of two times higher than survey estimates of MSNBC being people’s top news source, a plausible set of outcomes.
Table 2 explores key demographic subgroups in what is their primary source of news. The survey estimates include any news source, while the passive news is confined to national news (and we require a person to consume a minimum of one six-minute session per month to count as watching a station); thus, if both captured consumption perfectly, passively collected data should be higher than survey data. Yet, surveys have higher consumption for FOXNC, and passively collected data is higher for MSNBC. Unfortunately, the passively collected data do not include partisanship, so we cannot make a direct comparison between survey and passively collected data for viewership broken down by Democratic and Republican shares. We note, however, that while the Democratic numbers for MSNBC as a top news source show reasonable agreement (7 percent of Gallup and Pew respondents named MSNBC as their top source, compared with 6 percent of total passive consumption), FOXNC numbers for Republicans are extremely high (30 percent and 34 percent for Gallup and Pew, respectively, compared with 9 percent of total passive consumption), hinting at the possibility that asymmetric social desirability among Republicans to watch FOXNC may be driving overreporting of news consumption in surveys.
. | Gallup 2019 . | Pew 2019 . | Passive 2019 . | |||
---|---|---|---|---|---|---|
. | FOXNC . | MSNBC . | FOXNC . | MSNBC . | FOXNC . | MSNBC . |
Total | 13% | 4% | 16% | 4% | 9% | 6% |
Gender | ||||||
Male | 16% | 3% | 19% | 4% | 10% | 6% |
Female | 10% | 4% | 14% | 4% | 8% | 6% |
Age | ||||||
18–29 | 7% | 1% | 8% | 1% | 3% | 2% |
30–49 | 10% | 2% | 10% | 2% | 5% | 4% |
50–64 | 16% | 4% | 20% | 4% | 12% | 7% |
65+ | 19% | 8% | 28% | 8% | 18% | 12% |
Education | ||||||
HS Graduate or Less | 11% | 2% | 17% | 3% | 6% | 2% |
Some College | 15% | 3% | 18% | 3% | 11% | 5% |
College Grad Only | 13% | 3% | 15% | 5% | 11% | 9% |
Post-Graduate | 11% | 5% | 11% | 5% | 11% | 14% |
Partisanship | ||||||
DEM (inc lean DEM) | 0% | 7% | 0% | 7% | – | – |
REP (inc lean REP) | 30% | 7% | 34% | 2% | – | – |
. | Gallup 2019 . | Pew 2019 . | Passive 2019 . | |||
---|---|---|---|---|---|---|
. | FOXNC . | MSNBC . | FOXNC . | MSNBC . | FOXNC . | MSNBC . |
Total | 13% | 4% | 16% | 4% | 9% | 6% |
Gender | ||||||
Male | 16% | 3% | 19% | 4% | 10% | 6% |
Female | 10% | 4% | 14% | 4% | 8% | 6% |
Age | ||||||
18–29 | 7% | 1% | 8% | 1% | 3% | 2% |
30–49 | 10% | 2% | 10% | 2% | 5% | 4% |
50–64 | 16% | 4% | 20% | 4% | 12% | 7% |
65+ | 19% | 8% | 28% | 8% | 18% | 12% |
Education | ||||||
HS Graduate or Less | 11% | 2% | 17% | 3% | 6% | 2% |
Some College | 15% | 3% | 18% | 3% | 11% | 5% |
College Grad Only | 13% | 3% | 15% | 5% | 11% | 9% |
Post-Graduate | 11% | 5% | 11% | 5% | 11% | 14% |
Partisanship | ||||||
DEM (inc lean DEM) | 0% | 7% | 0% | 7% | – | – |
REP (inc lean REP) | 30% | 7% | 34% | 2% | – | – |
Note.—Gallup (11/8/19–2/16/20) and Pew (10/29–11/11/19) ask about the most frequent news source, among television or other modes, while the passively collected data (November 2019) are for television only.
. | Gallup 2019 . | Pew 2019 . | Passive 2019 . | |||
---|---|---|---|---|---|---|
. | FOXNC . | MSNBC . | FOXNC . | MSNBC . | FOXNC . | MSNBC . |
Total | 13% | 4% | 16% | 4% | 9% | 6% |
Gender | ||||||
Male | 16% | 3% | 19% | 4% | 10% | 6% |
Female | 10% | 4% | 14% | 4% | 8% | 6% |
Age | ||||||
18–29 | 7% | 1% | 8% | 1% | 3% | 2% |
30–49 | 10% | 2% | 10% | 2% | 5% | 4% |
50–64 | 16% | 4% | 20% | 4% | 12% | 7% |
65+ | 19% | 8% | 28% | 8% | 18% | 12% |
Education | ||||||
HS Graduate or Less | 11% | 2% | 17% | 3% | 6% | 2% |
Some College | 15% | 3% | 18% | 3% | 11% | 5% |
College Grad Only | 13% | 3% | 15% | 5% | 11% | 9% |
Post-Graduate | 11% | 5% | 11% | 5% | 11% | 14% |
Partisanship | ||||||
DEM (inc lean DEM) | 0% | 7% | 0% | 7% | – | – |
REP (inc lean REP) | 30% | 7% | 34% | 2% | – | – |
. | Gallup 2019 . | Pew 2019 . | Passive 2019 . | |||
---|---|---|---|---|---|---|
. | FOXNC . | MSNBC . | FOXNC . | MSNBC . | FOXNC . | MSNBC . |
Total | 13% | 4% | 16% | 4% | 9% | 6% |
Gender | ||||||
Male | 16% | 3% | 19% | 4% | 10% | 6% |
Female | 10% | 4% | 14% | 4% | 8% | 6% |
Age | ||||||
18–29 | 7% | 1% | 8% | 1% | 3% | 2% |
30–49 | 10% | 2% | 10% | 2% | 5% | 4% |
50–64 | 16% | 4% | 20% | 4% | 12% | 7% |
65+ | 19% | 8% | 28% | 8% | 18% | 12% |
Education | ||||||
HS Graduate or Less | 11% | 2% | 17% | 3% | 6% | 2% |
Some College | 15% | 3% | 18% | 3% | 11% | 5% |
College Grad Only | 13% | 3% | 15% | 5% | 11% | 9% |
Post-Graduate | 11% | 5% | 11% | 5% | 11% | 14% |
Partisanship | ||||||
DEM (inc lean DEM) | 0% | 7% | 0% | 7% | – | – |
REP (inc lean REP) | 30% | 7% | 34% | 2% | – | – |
Note.—Gallup (11/8/19–2/16/20) and Pew (10/29–11/11/19) ask about the most frequent news source, among television or other modes, while the passively collected data (November 2019) are for television only.
Another advantage of passively collected data is that they support a variety of definitions for any particular measure without placing additional burden on respondents. The benefit of this feature is visible in figures 1 and 2, and table 2, wherein a single passively collected data effort can be used to compute news consumption statistics for a range of thresholds for what counts as “consumption.” Figure 1, for example, shows the fraction of the population that consumes at least one, two, three, and so on six-minute sessions. Depending on the question at hand, any one of these values could be chosen as the minimum threshold of interest. Alternatively, a researcher could cite the relevant statistic corresponding to an upper and lower bound of the threshold, thereby more effectively communicating uncertainty about the true value. In practice, however, surveys cannot ask respondents to evaluate their consumption according to all possible thresholds, making it difficult to evaluate the sensitivity of responses to changing criteria.
Passively collected data also provoke alternative explanations for observed attitudinal differences between people who report to watch particular news channels and those who do not. For example, in a 2019 Global Strategy survey, 49 percent of self-reported non–FOXNC-watching Republicans believed members of the intelligence community were out to sabotage President Trump, as opposed to 79 percent of self-reported FOXNC–watching Republicans who did (Global Strategy 2019). Although the naïve explanation for this difference is that watching FOXNC leads to more conspiratorial thinking, at least two alternative explanations are possible. First, it is possible that the underlying causal agent here is not FOXNC, but a broader information environment from which virtually all Republican respondents reporting to regularly watch FOXNC receive the same cues, perhaps through social networks or alternative media. If being part of this broader information environment were correlated with overreporting, then the causal arrow would be reversed. That is, rather than FOXNC causing its viewers to believe in conspiracies, belief in conspiracies could conceivably cause them to report watching FOXNC. Second, it is possible that the real difference between FOXNC–watching Republicans and non–FOXNC-watching Republicans is even starker than reported. In this view, Republicans who overreport consumption to FOXNC have more moderate attitudes and water down the radicalism of the exposed few. In effect, as opposed to a sizable and somewhat radical segment of Republicans that the survey identifies, we would be dealing with a smaller but more radical segment of Republicans.
Online and Social Media News Consumption
In this section, we shift context from television to online news in general, and news consumed on social media. The survey questions and topline results for comparisons of online and social media news consumption are summarized in table 3. To make the responses to the two surveys more consistent, we group “A great deal” and “A fair amount” in the Gallup survey (41 percent) and “Often” and “Sometimes” in the Pew survey (47 percent).
Firm . | Date . | Wording . | Key finding . |
---|---|---|---|
Gallup | 8/4/2017–10/2/2017 | How much, if at all, do you use each of the following approaches for staying up-to-date on the news?: -- A great deal. -- A fair amount. -- Only a little. -- Not at all. | 41% of adults “Seeing or reading links to news stories on Facebook or other social media sites” and 47% “Visiting Internet-only news websites.” For both: “A fair amount” or more. |
Pew Research Center | 1/12/2016–2/8/2016 8/8/2017–8/21/2017 7/30/2018–8/12/2018 | How often do you…Get news from a social media site (such as Facebook, Twitter, or Snapchat)? -- Often. -- Sometimes. -- Hardly ever. -- Never. | 47% of adults “Sometimes” or more “Get news from a social media site” |
Firm . | Date . | Wording . | Key finding . |
---|---|---|---|
Gallup | 8/4/2017–10/2/2017 | How much, if at all, do you use each of the following approaches for staying up-to-date on the news?: -- A great deal. -- A fair amount. -- Only a little. -- Not at all. | 41% of adults “Seeing or reading links to news stories on Facebook or other social media sites” and 47% “Visiting Internet-only news websites.” For both: “A fair amount” or more. |
Pew Research Center | 1/12/2016–2/8/2016 8/8/2017–8/21/2017 7/30/2018–8/12/2018 | How often do you…Get news from a social media site (such as Facebook, Twitter, or Snapchat)? -- Often. -- Sometimes. -- Hardly ever. -- Never. | 47% of adults “Sometimes” or more “Get news from a social media site” |
Firm . | Date . | Wording . | Key finding . |
---|---|---|---|
Gallup | 8/4/2017–10/2/2017 | How much, if at all, do you use each of the following approaches for staying up-to-date on the news?: -- A great deal. -- A fair amount. -- Only a little. -- Not at all. | 41% of adults “Seeing or reading links to news stories on Facebook or other social media sites” and 47% “Visiting Internet-only news websites.” For both: “A fair amount” or more. |
Pew Research Center | 1/12/2016–2/8/2016 8/8/2017–8/21/2017 7/30/2018–8/12/2018 | How often do you…Get news from a social media site (such as Facebook, Twitter, or Snapchat)? -- Often. -- Sometimes. -- Hardly ever. -- Never. | 47% of adults “Sometimes” or more “Get news from a social media site” |
Firm . | Date . | Wording . | Key finding . |
---|---|---|---|
Gallup | 8/4/2017–10/2/2017 | How much, if at all, do you use each of the following approaches for staying up-to-date on the news?: -- A great deal. -- A fair amount. -- Only a little. -- Not at all. | 41% of adults “Seeing or reading links to news stories on Facebook or other social media sites” and 47% “Visiting Internet-only news websites.” For both: “A fair amount” or more. |
Pew Research Center | 1/12/2016–2/8/2016 8/8/2017–8/21/2017 7/30/2018–8/12/2018 | How often do you…Get news from a social media site (such as Facebook, Twitter, or Snapchat)? -- Often. -- Sometimes. -- Hardly ever. -- Never. | 47% of adults “Sometimes” or more “Get news from a social media site” |
To make comparable estimates from the passively collected data, we estimate from the weighted Nielsen Desktop Panel data the percentage of Americans who read one or more news stories one or more days in the month from internet-only news websites. We count the number of news URLs visited by each panelist on each day of the month. Because the Gallup survey specifically asks respondents about their consumption of “internet-only news websites,” we exclude news sites that are online versions of a print newspaper or magazine. However, including all news domains yields results that are not substantively different from those shown below.
To estimate the number of unique social media users across mobile and desktop (the numerator in Eq. 2), we sum ComScore’s estimates of unique users of Reddit, Facebook, and Twitter. This sum is available for each day in the month. Because we do not have access to cross-platform usage in the ComScore data, we cannot easily aggregate how many people use either Facebook, Twitter, or Reddit in any day. We create an upper bound of news consumption that assumes no cross-platform usage at all. In essence, if anyone uses two or more platforms, our measure of news consumption would be biased upward. Note that we do not include YouTube in this calculation because its content is organic and rarely leads to external links. The result from Eq. 2 is our measure of the percentage of Americans who consume news URLs from social media.
We use Nielsen’s desktop panel to proxy the percentage of Americans that consume any news on specific social media sites in a given month. For Facebook, Reddit, and Twitter, the Nielsen panel data gathers the address of the website that people visit immediately after leaving the social media site. We first count any person with at least one clicked link (whether to a news site or not) as a user of the originating social media site. We then count any person with at least one news link in the list of sites visited upon leaving the social media site as consuming news via social media in that month. Because news is much more likely than non-news to contain links (i.e., there is a large selection of categories of posts on Facebook that seldom have links, such as personal comments), the ratio of the second count to the first should be an upper bound estimate for the percentage of people who use social media to consume news within any given timeframe. Thus, the percentage of people who click on links and consume news via social media sites should be greater than the percentage of people who do not click on links and consume news on those sites. On YouTube, our data include a link for each video that panelists visit, and we code those links using YouTube’s categorization to document if people consume any news in any given month. Because the categorization is done by the video producer, this approach tends to overstate news in the consumption-weighted random sample of YouTube videos we reviewed.
Figure 3 shows static survey estimates of 41 percent (news links “a fair amount” or more) and 47 percent (news “sometimes” or more) and the estimates from the passively collected data. As an upper bound, about 9 percent of US adults consume news links from Facebook, Twitter, or Reddit on five or more days during the month, and just 23 percent do so two or more days. Gallup and Pew’s estimates are more than twice this generous estimate from passively collected data. The discrepancy in propensity to navigate to news content on mobile as opposed to desktop would have to be implausibly large to meaningfully close this gap. For example, if “a fair amount” is assumed to mean viewing at least one news URL on 10 different days, an additional 35 percent of the US population (i.e., not overlapping with the original 12 percent) would have to consume a fair amount of news on their mobile devices to match Gallup’s estimate. Given that total consumption of news URLs on mobile is less than the total consumption of news URLs on desktop (Allen et al. 2020), this is improbable. Note that Pew’s estimate should be higher: it is about news, not just news links. Figure 4 shows that the Gallup survey estimates of “a fair amount” of internet-only news consumption, at 47 percent, is also higher than all but the most generous measure based on passively collected data. Only 20 percent of the population consumes one or more news websites on five days, according to our passively collected data.
Referring now to total social media use (i.e., not just news consumption), our passively collected data indicates that in August 2018, 83 percent of Americans used Facebook and 89 percent used YouTube. Facebook itself reported 242 million monthly active users in the United States and Canada in Q3 2018 (Facebook 2019).2 According to their respective censuses, in 2018 there were 266 million people in the United States, and 31 million people in Canada, age 14 or older (the age restriction on Facebook); thus, Facebook’s official estimate is that 81 percent of all adults were monthly users (indubitably some Facebook accounts are not real people). Our passive estimate is close to this number, albeit 2 percentage points higher (possibly because the desktop sample is more likely to miss people who consume little to nothing online). In contrast, Pew’s 2018 survey estimate is much lower—just 68 percent—suggesting that social desirability leads to underreporting of social media use in surveys.
Finally, turning to news consumption on social media, figure 5 compares Pew survey estimates with those inferred from passively collected data. Once again, the survey estimates are much higher. Whereas approximately 28 percent of Facebook users and 11 percent of YouTube users consumed any news on the platform in August 2018, according to our data, the survey-based measures are two to three times higher: 43 percent and 21 percent, respectively. Further, in looking at results from 2016 and 2017, we document that Facebook news consumption, conditional on being on the site, dropped from 38 percent to 28 percent. We note that this decrease is consistent with announced policy changes in Facebook’s newsfeed algorithm, which—in response to widespread concerns about the proliferation of “fake news” on the site—was altered to down-weight news stories in favor of “friends and family” content. Also, notably, the Pew survey did not register any change over this period. Conversely, the Pew survey does show a dramatic increase in news consumption on YouTube, but we do not see any such change in passively collected data.
Discussion
Although survey reports of news consumption are known to be error-prone, researchers continue to use them because (a) surveys offer a familiar way to measure both consumption and political attitudes and behavior at the same time, and (b) passively collected data are both expensive and complicated to acquire and use. In this paper we have contributed three main results to the relative advantages of these two types of data. (1) Surveys have much higher estimates of news consumption than passively collected data, a finding that applies to television, and to online and social media–based news consumption. For example, Gallup’s survey estimate on consumption of news URLs is two to five times larger than passively collected estimates, depending on how one defines consumption in the passively collected data. (2) Survey-based measures of online consumption fail to measure known trends in news consumption; for example, Pew’s surveys fail to capture the decline in news consumption within Facebook. (3) Passively collected data can answer questions that are simply not addressable with survey-based data; for example, researchers may define watching FOXNC as watching once per month, watching at least three hours a month, or that FOXNC constitutes a certain percentage of their overall news diet.
Although we have emphasized its advantages, passively collected data also exhibit several limitations with respect to measuring news consumption. For example, there may be errors in identity resolution between households and individuals (e.g., occasionally someone’s partner may use their computer). Exposure does not necessarily equal consumption, because televisions may be left on and websites may be opened but not read, and a small but increasing number may supplement television with a streaming service that includes news. Streaming services are not captured in the passively collected TV panel data. Finally, passively collected data raise concerns about privacy and security. While privacy concerns are generally overcome with opt-in panels, opt-in policies can introduce selection bias; also, not all passively collected data has the same level of opt-in and consent. With the billions of dollars in advertisement revenue reliant on these data, we are confident that passively collected data collection methods will continue to evolve to address these issues.
These shortcomings notwithstanding, we contend that measures from passively collected data can serve as a useful point of comparison for survey measures. A recent example of great importance has been the reach of President Trump’s daily press conferences during the crucial early days of the COVID-19 pandemic. Polling firm Global Strategy’s 2020 survey suggested that 41 percent of registered voters “try to watch it live every day” (Global Strategy 2020). The firm built several attitudinal outcomes on the reported viewership of this cohort. But, passively collected data suggest that about 3 percent of the population watched the press conference live (Grynbaum 2020).
Socially desirable survey responses about consuming news are likely the main cause of the higher reports of news consumption in the survey sources cited above. Another important difference between survey and passively collected data is the imprecise categories used in most survey questions, which create ambiguity in the interpretation of responses, a problem that can be exacerbated by the aggregation of categories in summary statements. For example, a 2018 Pew survey, discussed earlier, found that in response to the question “How often do you get news from a social media site (such as Facebook or Twitter)?” 20 percent said “Often,” 27 percent said “Sometimes,” 21 percent said “Hardly ever,” and 32 percent said “Never.” Pew’s write-up of this report opens with the sentence “About two-thirds of American adults (68 percent) say they at least occasionally get news on social media,” while the accompanying figure caption contains the somewhat less precise statement “About two-thirds of Americans get news on social media.” Although both statements are technically correct, it would have been equally correct to say “More than half of Americans never or hardly ever get news on social media,” a very different message. In the same survey, moreover, Pew clarifies that by news, it means “information about events and issues that involve more than just your friends or family.” By this definition, an event for a concert on Facebook or tweets with a viral hashtag like “#icebucketchallenge” could be considered news, further inflating the headline number.
If press coverage of survey data highlighted these definitional ambiguities, there would be less cause for concern. However, when citing the reports of survey companies, both the academic literature and the mainstream media tend to focus on topline takeaways rather than the underlying data. For example, Allcott and Gentzkow (2017), citing the 2016 version of the Pew survey above, state that “62 percent of US adults get news on social media,” not that 56 percent of respondents said they never or hardly ever did so. Citing the same study, Vosoughi, Roy, and Aral (2018) state that “more and more access to information and news [is] guided by [social media],” when in fact the report itself found a small increase in the percentage of various platforms’ users who ever see news there. Finally, referring to the 2018 study, an editorial in the New York Times on October 31, 2019, stated initially that “Half of all Americans say Facebook is their main source of news” before being corrected to read “Last year, over 40 percent of Americans said they got news from Facebook (New York Times, 2019).” Although misleading interpretations of underlying data are not unique to surveys and—as in these examples—may involve more than one party, we argue that the ambiguity inherent to qualitative survey responses (e.g., what meaning is conveyed by “get news on social media” and does that include “hardly ever”?) lends itself to misinterpretation.
A Way Forward: Combining Survey and Passively Collected Behavioral Data
Surveys remain invaluable for documenting attitudes; however, we have argued that they are not the ideal tool for measuring news consumption. We hope both academia and industry embrace new hybrid technologies that allow surveys to gauge the attitudes of those who have been shown, through passive data collection, to have consumed various media in general, and news more specifically. We advocate for three increasingly feasible ways in which behavioral data can be paired with survey data.
First, researchers can create panels that include both surveys and passively collected data collection. After giving informed consent, participants would be set up with a device to record their television and online news consumption, as the Nielsen and ComScore data sets do (with, of course, informed consent and General Data Protection Regulation and California Consumer Privacy Act compliance). Such studies can suffer from selection bias because those who consent to passively collected data collection might have different browsing behaviors and political attitudes than those who do not. The larger concern is that survey questions can lead to priming, causing people to shift their behavior, such as reading more political news knowing they will be asked about political facts, sentiment, interest, and so on. Nevertheless, we believe the data collected in such studies make a valuable contribution to the literature. For example, several recent papers have used the YouGov Pulse sample, which sends survey questions to people who are having their desktop consumption tracked (Guess, Nyhan, and Reifler 2018, 2020; Peterson, Goel, and Iyengar 2019).
Second, smaller opt-in samples that span both television and surveys can work for some outcomes; other times ID linkage/resolution may be necessary. It is relatively easy to ask survey questions and track desktop browsing because only a small extension in the browser of the respondents is required. But tracking news consumption on mobile phones, for example, is much harder because consumption is spread across applications. For larger scale and diversity of modes, researchers can link television consumption to some linkable IDs such as provided by ID resolution companies or credit files. This approach can result in a larger and more diverse behavioral data set.
Third, at a minimum, news consumption can be projected onto attitudes using imputation models. Demographics, geography, and other data can be used to model consumption and project consumption onto a survey of attitudes, or the opposite, attitudes can be projected onto consumption. Probabilistic models of consumption, however, come with heavy endogeneity constraints: researchers need to be careful not to use any variables in the model that may correlate with the research question. Further, possibly important idiosyncratic variance (i.e., consumption patterns that cannot be predicted by simple demographic models) cannot be explored. More important, this approach is only accurate for high-level consumption and attitudes, because it is extremely difficult to build a model that is well identified by subgroup for detailed consumption or attitudinal patterns.
Data Availability Statement
REPLICATION DATA AND DOCUMENTATION are available at https://dataverse.harvard.edu/dataverse/DavidRothschild/. Some of the raw data has been replaced with example data, or schemas, because of the permission policy of the original data collector. The editors have waived POQ’s full replication policy for this manuscript. Please contact the corresponding author for more information.
Tobias Konitzer is the CEO of PredictWise, San Francisco, CA, USA. Jennifer Allen is a PhD candidate in the Marketing Department, Massachusetts Institute of Technology, Cambridge, MA, USA. Stephanie Eckman is a fellow at RTI International, Washington, DC, USA. Baird Howland is a PhD candidate in the Annenberg School, University of Pennsylvania, Philadelphia, PA, USA. Markus Mobius is a senior principal researcher at Microsoft Research, Cambridge, MA, USA. David Rothschild is an economist at Microsoft Research, New York, NY, USA. Duncan J. Watts is the Stevens University Professor in the Annenberg School, Department of Computer and Information Science, and Department of Operations, Information and Decisions, University of Pennsylvania, Philadelphia, PA, USA. At the time of writing DJW was a member of the Pew Research Center’s Governing Board of Directors. This position was unpaid and the Center had no role in the design of the research or writing of the paper. The authors thank Mainak Mazumdar and numerous other colleagues at Nielsen for access to their data. Further, Harmony Labs provided incredible support throughout the research process. The authors also thank the numerous seminar audiences for their useful feedback.
Footnotes
This measure is conservative because it includes as viewers those who watched as little as six minutes of consecutive programming in a month.
We could not find ground-truth monthly active user numbers for YouTube.
References
Facebook.
Global Strategy, GBA Strategy, Navigator.
———.
Knight Foundation/Gallup.
New York Times.