Information about our world: SRI/BKC workshop explores issues in access to platform data

 
Schwartz Reisman Institute for Technology and Society Associate Director Lisa Austin at Harvard University.

SRI Associate Director Lisa Austin at Harvard University. Photo by Shelby El Otmani. Courtesy of the Berkman Klein Center for Internet & Society’s Institute for Rebooting Social Media, Harvard University.


Digital platforms are everywhere, from e-commerce (Amazon) to social media (Meta) to audio/video (YouTube, Spotify) and beyond. They collect, store, analyze, use, and sometimes even sell massive amounts of data. In a recent example, X (formerly Twitter) automatically opted users into granting information about their activity on the platform—public posts, interactions, and more—with its AI assistant, Grok, for training.

At a time of growing concern regarding the power of platforms, what kinds of claims can we make to gain access to the data they have? What kinds of solutions—technical, legal, political—should we consider for gaining access to data, and which purposes can justify this access: academic research, public policy, or perhaps simply the information we deserve about our world to navigate it?

These and related questions were the topic of Platforms and the Right to Information, an event co-hosted by the Schwartz Reisman Institute for Technology and Society (SRI) and the Berkman Klein Center’s Institute for Rebooting Social Media (RSM) at Harvard University. SRI Associate Director Lisa Austin was a visiting researcher at RSM for the past year, and co-organized the day-long event with RSM Affiliate Nadah Feteih. Austin moderated one of three panels, while SRI Director David Lie led another. 

You can watch recordings of the workshop on the Berkman Klein Center’s YouTube channel, or read on below for a summary of the conversations.

Justifying access to data

Austin began the day by exploring who should have access to data and the types of research this access should support. Tracing various legislative acts mandating data access and the rationales behind them, she compared the approaches of the EU Digital Services Act, the US Senate's proposals, and Canada's model, which she considers the least effective of the three.

“How can we justify access to data held by private companies? We have freedom of information laws in most liberal democracies,” said Austin. “Information itself is thought of as a public resource. We’re seeing a move to different rationales for access to platform data.”

Panelist Jeff Hall of the University of Kansas then discussed the state of social media research, noting that upwards of 90 percent of studies rely on people’s self-reported social media use, which is notoriously unreliable.

“There’s very little evidence that social media is a direct cause of depression and suicide ideation in youth,” said Hall. And social media companies, fearing litigation, have “turned off the tap” on releasing data, he says. “There’s an enormous black hole of information on how social media actually affects users.”

Hall outlined three potential models for data access: data donations, application programming interfaces (APIs)—essentially intermediaries that allow two apps to communicate with each other, and tracking devices on phones such as Android’s Ethica. Ultimately, Hall stressed the need for high-quality, longitudinal data to understand the purported harms of social media.

They’re not just selling a service, they’re really the governors of our lives.
— Swati Srivastava

Panelist Swati Srivastava of Purdue University then shifted the focus of the discussion to what she termed “macro-systemic level research,” which doesn't require highly-specific individual data like the type raised by Hall, but rather seeks an understanding of platform operations and their impact on global governance.

Srivastava, whose research has involved seeking access to internal communications and organizational structures within digital platform corporations, argued that platforms are “too big and too important to be left alone.”

“They’re not just selling a service, they’re really the governors of our lives,” she said, pointing out that governments themselves have also often been difficult to research due to obfuscation. 

Finally, panelist Gabriel Nicholas of the Center for Democracy & Technology discussed the failure of “corporate data altruism”—referring to companies simply granting access to data out of benevolence. Nicholas compared the current situation to the early resistance of the pharmaceutical industry to share clinical trial data. After laws were strengthened, sharing improved significantly.

“We’re experiencing almost the opposite [of the pharmaceutical example] today,” said Nicholas. “We’re in another crisis of researcher access to data,” he said, citing the high cost of Twitter/X’s API and the shuttering of CrowdTangle by Meta.

“But the crisis is not acute,” said Nicholas. “It’s happening very slowly and that’s what makes it concerning.”

Austin asked how researchers can navigate data access risks and whether institutional support could improve. While Nichols highlighted privacy and surveillance concerns, noting that centralized data collection could be “a honeypot for law enforcement,” Srivastana raised concerns about researcher vetting.

“You want everyone to trust that researchers are doing it for the right reasons,” said Srivastana, “but you can’t always know. I’m worried about the kind of research that will be vetted, what people are allowed to ask for, and what this means in the global scheme.” 

 

From left to right: Lisa Austin, Swati Srivastava, Gabriel Nicholas, and Jeff Hall. Photo credit: Shelby El Otmani. Courtesy of the Berkman Klein Center for Internet & Society’s Institute for Rebooting Social Media, Harvard University.

 

Implementation challenges: How to do it?

Implementation challenges in data access include safeguarding privacy and protecting confidential commercial information, among many others. While some of these require policy responses and others require technological solutions, the bottom line is that these challenges should be tackled in a manner that ensures independent and equitable access.

SRI Director David Lie moderated the second session of the day on these and related topics. Panelist Becca Ricks of the Mozilla Foundation discussed how Mozilla builds, among other tools, browser extensions for researchers, while panelist Hilary Ross of the Global Network Initiative spoke about her work designing and launching a researcher access to data program at Twitter/X.

Both Ricks and Ross emphasized the importance of usability in data access tools and the need for consultation with researchers during the design phase of these tools.

“There’s been a big drawback to voluntary data sharing from platforms, and the level of voluntary access has really reduced,” said Ross, “but at the same time, we’re going into a new regulatory era where we’re developing and building access. Implementation questions are critical.”

Ross underscored the iterative nature of developing data access programs and the need for clear principles balancing privacy, security, competition, academic freedom, and public interest. 

Ricks echoed Nichols’ comments from the first panel, observing that “there’s a retrenchment when it comes to platforms and voluntary access. It feels less likely now than it did five years ago. But there’s an opportunity to get clear on what’s publicly available and massively impactful.”

Panelist Delara Derakhshani of the Data Transfer Initiative also highlighted the delicate balance between the free flow of information and adequate data protection, advocating for inclusivity by design to account for researchers, under-represented communities, users, and actors across the entire stakeholder ecosystem.

Concluding the session, Lie drew on his engineering background as a computer security expert to highlight that the main challenge for platform companies wanting to share data is not the actual gathering, organization and analysis of data, which is something that is core to their expertise, but rather the scrubbing of data of private information from the data, which can be a complex and difficult task that presents a non-trivial risk to the platform. Lie also pointed out the gap between regulatory expectations and technological feasibility in privacy protection, calling for collaboration between the regulatory and tech sectors to establish realistic standards for all.

“The privacy-protecting methods that regulation is pushing is beyond what’s technologically feasible,” said Lie. “There is work to be done to bring this to a better standard.”

Justice, rights, and ownership: Whose data is it?

The day concluded with a panel on justice and the right to information with Northeastern University’s Elettra Bietti, SUNY Oswego’s Ulises Mejias, and the University of Chicago’s Aziz Huq—a past speaker at SRI’s Absolutely Interdisciplinary conference. The panelists explored the intersection of law, data governance, and the legacy of colonialism, asking how data governance initiatives could potentially address issues of public good, ownership, and the exploitation of resources.

Mejias began by questioning whether framing data as a “right” is relevant. “Colonialism is a system of lawlessness,” he pointed out, “so to talk about data rights is an interesting tension.”

Mejias drew parallels between the “land grab” of colonialism and the contemporary “data grab.” He argued that capitalism and colonialism are co-evolving, with data extraction becoming a new frontier, seen in sectors like edtech, agritech, health tech, smart weapons, and smart borders—“each one a new ‘data territory’,” he said, “where corporations are devising ways of extracting data from us.”

Huq—a legal scholar and former practising litigator—focused on how today’s powerful AI systems reshape our interactions with the state and impact individual rights, especially in relation to government data collection. 

“How is our data accessed and managed [by the state]?” he asked. “How does government architecture its own data collection and retention? How is data exploited by the federal court system in the US?”

Huq advocated for the idea of a “public trust” to manage data, drawing on older legal concepts in which the state must manage resources for the public good. 

“If a piece of land is held in public trust, it means the state has a duty to use the land to benefit the residents,” said Huq. “There’s a body of law for thinking about regulating collectively owned resources.”

Huq raised Barcelona’s Decidem platform as an example: it requires any private company operating and collecting personal data in Barcelona to share that data with the platform which then uses it for a range of public services.

Bietti, the session moderator, kicked off a discussion by exploring the legal conception of rights—“What is a right? Is it an individual thing? Can it be a collective thing?”—and observing, as previous panels did, the contemporary “retraction or enclosure by companies of data.”

“So, how do we govern data access?” she asked.

Mejias raised the possibility that data could be nationalized, drawing an analogy to postcolonial nationalizations like Mexico’s oil industry and calling upon his previous commentary on this topic. He suggested that, unfortunately, the balance of power has shifted from the state to corporations, making it harder for governments to challenge tech giants.

“There were always corporate-state entities, like the East India company,” said Mejias. “But today, the balance has shifted because the state now depends on corporations for data. Today, the sovereign is the corporation, not the state. I don’t think cities are going to be able to do enough to decolonize data. LA or New York can’t stand up to Facebook or Google.” 

Overall, the joint BKC/SRI workshop underscored the complexity of navigating data access, balancing privacy concerns, and fostering cooperation between researchers, companies, and regulators to give us the information we need to operate in our platform-infused world.

Want to learn more?


Related Posts

 
Previous
Previous

Upcoming SRI Seminars showcase new insights on cutting-edge AI research

Next
Next

Roger Grosse and Marzyeh Ghassemi awarded AI2050 fellowships to advance research on beneficial AI