The challenge of detecting misinformation in podcasting

Confronted by viral conspiracy theories, climate change denialism, extremist movements, and anti-democratic groups (among others) feeding off false information online, social media platforms have taken steps in recent years to curtail the spread of misinformation. But even as tech companies have come under pressure to crack down on misinformation, one key avenue of information distribution in the digital economy—podcasting—has escaped significant scrutiny, despite the massive scale of the podcast ecosystem.

Nearly 116 million Americans—or around 41%—listen to podcasts monthly, but only recently have podcasters begun to receive scrutiny for their role in spreading misleading or false content. When Joe Rogan—perhaps the world’s most popular podcaster—questioned in April the relative risks of COVID-19 vaccines for young people, he came under intense criticism. Rogan quickly backtracked, telling his more than 11 million listeners that he had been a “moron.” But that retraction may have been too late, as there remains a strong correlation between listeners of the Joe Rogan Experience and vaccine hesitance.

Unfortunately, the spread of misinformation in podcasts appears to be common. In a preliminary analysis of more than 8,000 episodes of popular political podcasts, approximately one-tenth includes potentially false information. Due to the way podcasts are distributed, however, addressing the problem will require a different approach than in other sectors of the tech industry, one that combines broad infrastructure changes and a fundamental rethinking of the role of the listener in content moderation.

A perfect misinformation storm

The term “podcasts” was first coined in 2004 to describe an emerging trend in audio that allowed consumers to subscribe to and play serial content at any time through an MP3-style device, like an iPod. Podcasts evolved from talk radio broadcasting conventions, but instead of relying on terrestrial or digital radio, the medium’s early adopters published through RSS feeds that users could subscribe to directly to access content. This open-sourced RSS architecture helped to eliminate programming regulations and content volume restrictions tied to available airtime. Producers no longer required transmitters, licenses, or access to studios in order to broadcast. In this medium, “anyone can be a publisher, anyone can be a broadcaster,” as one of its pioneers put it. Today, 57% of Americans, or approximately 162 million people, have ever listened to a podcast, compared to 11% a decade ago.

Given the wide and growing reach of podcasts, as well as their potential for spreading misinformation, why has this space largely escaped content moderation debates? There are four reasons why this might be the case.

Although podcasts have much in common with social media platforms, the relationship between publisher and audience more closely resembles traditional media platforms like radio or television. On Twitter and Facebook many people can publish content, and many people can directly respond to that content, often in real time. In the podcasting ecosystem, the relationship between publisher and audience is far different: Anyone can publish content, but the audience cannot respond directly to it, reducing the ability of the crowd to fact check misinformation like they might on Twitter. This means that as with social media, the gatekeepers determining who gets to share content are all but eliminated in the podcasting ecosystem, but unlike social media platforms, there is no immediate potential for public debate.

Additionally, the nature of the medium makes it far more difficult to monitor potentially misleading content. Much of the recent research to quantify the effects of misinformation utilizes a URL-based strategy to identify low-quality domains posted on social media platforms. The audio-based nature of the podcasting medium represents a hurdle to this approach. Spoken word content can be analyzed using natural language processing techniques, but it is often prohibitively expensive to transcribe hours upon hours of content. Once a set of podcast episodes are in an analyzable form, finding misinformation within that corpus is akin to searching for a needle in a haystack. False information can be buried within huge amounts of transcript text and easily missed. This makes podcasting an ideal tool to inject false or misleading information into mainstream discourse while going undetected.

The limited attention paid to podcasts by researchers and policymakers may be a result of misperceptions of the medium. Those investigating misinformation may perceive the podcasting space as less of a problem with respect to the spread of misinformation because, unlike other social media platforms, it is more difficult for content from a podcast to travel rapidly across the information ecosystem and go viral. This perception fails to consider some podcasts’ massive audiences and that falsehoods spread through this medium can still be harmful—even if they don’t go viral in the usual meaning of the word. Indeed, the intimate relationship between a podcaster and her audience may mean that the audience will be more likely to believe untruths.

A final reason that the podcasting space may have escaped content moderation debates lies in the potential misperception of podcasts as not widely used and elite-driven—home to prestige shows like Serial or This American Life. As early as 2005, academics and observers predicted that the podcasting space was unsustainable or a “dying” industry in search of a sustainable business model. These predictions have since been proven wrong. Podcast ad revenue is expected to hit $1 billion this year. With more than 2 million shows and 54 million episodes, podcasts have firmly arrived in the mainstream.

A research agenda

The potential for misinformation to go largely unchecked on podcasts is clear. But what is the scale of this problem? To explore that question, I recently examined more than 8,000 episodes of popular political podcasts. By using machine learning and natural language processing to match transcriptions of the podcasts with a fact-checking database of false or misleading political claims,I found that more than one-tenth of the episodes shared potentially false information.[1] These flagged episodes have collectively received more than 100 million views, likes, or comments.

This false content spans a wide range of topics in U.S. politics, from immigration (e.g., the idea that most DACA recipients are “hardened criminals”) to elections (e.g., that “eight Iowa counties have more adults registered to vote than voting age adults living”) to abortion (e.g., that Democrats “position on abortion is now so extreme that they don’t mind executing babies after birth”). These sharing patterns often spike around key political events, such as the 2020 election, and have become more common over time.

The research project is ongoing and will expand to cover both more podcasts and more types of misinformation, including claims that have been linked to foreign influence operations. But for now, these early results indicate that popular political podcasts are serving as an important vector for the proliferation of misinformation.

Policy implications

Based on my preliminary research, the spread of false material via podcasts represents an underappreciated problem that will require infrastructure-level changes distinct from content moderation policies already in place on social media platforms. Unlike other forms of media in the iPhone age, podcasts are more difficult to moderate due to limitations with respect to audience engagement and the nature of podcast distribution mechanisms.

Consider the role of the consumer in policing content. Like Facebook or Twitter, podcast distributors largely rely on the “crowd” to identify objectionable content, but the process for reporting this material as a listener is not straightforward. Apple’s podcasting app allows users to report concerns about episodes, but the reporting tool only provides a limited number of concerns to choose from, none of which encompass false or misleading content. Where Apple does specify guidelines about inaccurate or misleading content, these largely relate to podcast metadata and copyright issues. At present, Spotify provides no obvious way for users to report issues with specific episodes and only vaguely delineates content that is prohibited on the platform.

The decisions made by Apple and Spotify ultimately have downstream effects across the industry. Most of the smaller players in the field lack the financial resources to carry out extensive content moderation and look to larger companies like Apple and Spotify to determine what should be removed. In making it difficult (or all but impossible) for users to report misinformation, Spotify and Apple effectively remove the crowd from helping curb the spread of false or misleading content. Tackling misinformation in podcasts may require reincorporating the audience in some capacity—from enabling users to comment or leave reviews on specific episodes to further experimenting with ways to transform podcasting into a conversation between the creator and the audience.

From an infrastructure perspective, the nature of the RSS feed, which is open-sourced and accessible by design, represents a significant hurdle for content moderation. For example, Apple’s podcasting app—one of the most widely used apps for streaming episodes—aggregates content across thousands of approved RSS feeds. Once Apple approves a feed, it does not control the content added to these feeds. Although Apple can remove the RSS feed from its platform, some smaller platforms allow any content on an RSS feed to be played through their services, making it easy for listeners to access a removed podcast elsewhere. As a result, a content moderation decision at one platform, like removing a single episode urging listeners not to get a COVID vaccine, may not affect its availability via other platforms. Addressing the moderation of misleading material instead requires a fundamental rethinking of the broader podcast infrastructure.

This latter infrastructure-level change will be difficult to implement but is fundamental to addressing the risks associated with the spread of misinformation. The spread of online misinformation has already demonstrated its ability to undermine deliberative democracies, and podcasts represent an underappreciated avenue through which such information proliferates. Internationally, misinformation shared via podcasts may resonate with and be amplified by foreign actors intent on sowing discord in U.S. politics. As a first step, it is critical to understand the scope of this problem in order to identify appropriate policy solutions to address the spread of misinformation within the unique contours of the podcasting space.

Valerie Wirtschafter is a senior data analyst in the Artificial Intelligence and Emerging Technologies Initiative at the Brookings Institution and a Ph.D. candidate in the Department of Political Science at the University of California, Los Angeles.

[1]For this preliminary analysis, I rely on Politifact’s fact-checked assessments, but in future iterations of this project, I will likely expand this to include other fact-checking websites.

Apple and Facebook provide financial support to the Brookings Institution, a nonprofit organization devoted to rigorous, independent, in-depth public policy research.

The challenge of detecting misinformation in podcasting

Subscribe to TechStream

The challenge of detecting misinformation in podcasting

Valerie Wirtschafter Valerie Wirtschafter Senior Data Analyst - Artificial Intelligence and Emerging Technology Initiative @vwirtschafter

A perfect misinformation storm

A research agenda

Policy implications

Valerie Wirtschafter