Weighting content suggestions, responsibly
One of the most vexing issues in my portfolio of policy work is how to handle harmful and unlawful user content on the internet — and in particular, the role of tech / internet companies as intermediaries facilitating and moderating online speech. I wrote about this a bit last year, as part of my 2017–2018 “CATS” series on the future of internet policy. In my piece “Speech and liability at the fragile core of the open internet” I noted that we were witnessing a global shift in government attitudes, from a hands-off role that tried to avoid interfering with free expression online to a more hands-on engagement motivated by the belief that companies could and should be doing more to make the internet less of a cesspool.
[Brief sidebar: Are Google, Facebook, etc. “tech” or “internet” companies? I tend to use those terms equivalently. They’re not, though, and the difference reflects on this topic. Are they just building tech and putting it out into the world, or are they actively shaping the internet ecosystem through their products and policies? Clearly both are true to some degree; but, which is more salient? I tend towards the latter, and frankly I think it’s naive to view their role primarily as the former. Importantly, choosing the latter frame lends towards a greater expectation of responsibility. And I think that’s where the gestalt is today, which is a way different reality than in 2016.]
Over the past year, the shift by governments towards greater intervention into online intermediary practices has continued. The European Union pushed forward its terrorist content proposal, India considered some remarkably aggressive intermediary liability policies, the United Kingdom released a white paper on Online Harms, and members of Congress in the United States continue to scorn Section 230, the safe harbor for intermediaries in U.S. law.
I don’t want to go too far into the weeds on how governments are approaching this — Mozilla is working on the topic extensively. You can see some of that work on our blog (e.g. here for the US, and here, here, and here for the EU).
Instead, I want to focus this piece on a specific idea I’ve been thinking about, something relatively concrete that digital platforms could do to move things forward in a positive and productive direction, starting by providing greater effective transparency and, through that, setting the table for broader normative change. I want to talk about what it would take for companies that do any level of automated content selection/curation/recommendation — including prioritization and ordering like in Facebook’s NewsFeed, the “watch next” in YouTube, and even, potentially, ad targeting — to publish a “weighted suggestions policy.”
Let me back up a bit before I dive in further. I want to set up that idea by talking about privacy policies.
Why does it make sense for companies to publish privacy policies? I don’t mean the historical explanation (which I wouldn’t be able to do justice to). What structural, contextual factors set up the relative value and utility of privacy policies as a thing companies consistently do? I’ll posit four:
- Users can’t see what’s going on behind the scenes with their data;
- Users don’t have a ton of direct control over what happens there;
- Things happen with their data that vastly affect their interests; and
- Detailed transparency about data practices would 1) confuse users and 2) potentially compromise some trade secrets/business methods.
So, companies offer privacy policies that present a certain degree of greater information about the data they collect and how they use it, in an attempt (sometimes unsuccessful) to help users feel more comfort and empowerment in their online experience — or to comply with various obligations to provide greater transparency. There isn’t a specific or prescribed formula for privacy policies, and there are a lot of bad ones out there, policies that are overly legalistic, vague, or simplified.
Yet, over time — in part through the use of privacy policies as hooks for the Federal Trade Commission and state prosecutors to go after bad actors — we’ve seen the development of some norms, customs, and best practices to make privacy policies better, or at least clearer. (See this overview by the New York Times’ Privacy Project of the evolution of Google’s privacy policy.) We’ve also seen the emergence of a community of third parties who dig deep into the policies to serve as watchdogs, and enforcement mechanisms that help ensure they’re being followed in practice.
I propose that the same core calculus applies to automated content selection. I.e., when a platform puts content in front of a user that the user didn’t specifically request (such as with YouTube recommended videos, Facebook NewsFeed sorting, or even targeted ads):
- Users can’t see directly why they’re receiving the content;
- Users don’t have a ton of direct control over what content they see;
- Content suggestions can matter deeply and viscerally; and
- Providing detailed technical transparency about the technologies and training data powering the suggestions could confuse users and potentially compromise business secrets.
(These are general statements, of course. Voluntary actions, particularly in the context of targeted ads, have helped provide some amount of insight in some circumstances, mostly in a realtime “why am I seeing this ad” context.)
Now, you may be thinking that comparing content suggestions to privacy may seem like a bad place to start. After all, the adoption of privacy policies didn’t solve that many public policy problems. But I think we wouldn’t be where we are today without them. Without privacy policies, we wouldn’t have been able to have a deep substantive public discussion of how to protect privacy, and we wouldn’t be able to develop legislation as detailed as what the US Congress is considering — with asks like limiting secondary use of collected data, or giving granular, specific, and revocable consent to users. The public policy place we are in today with privacy started by pushing companies to talk about what they were doing, and then from that basis of transparency, having a public discussion about what “more” looks like.
I don’t believe following this formula is strictly necessary before we can have a conversation about regulation or oversight in the context of user content and speech online. But, getting to a place of broader understanding of what’s being done today, and developing a general consensus about what should be done to be socially responsible, seems pretty valuable, considering the sensitivity of this issue. (As a reminder, in my earlier post, I said the right outcomes were balanced on the blade of a knife, because slipping towards over- or under-inclusion is so easy.)
I have one more theory for you to get into the “OK so what do we do now” portion of the piece. Although they’re often driven by powerful machine learning, I hypothesize that two basic, natural incentives are baked into all automated content selection systems:
- User experience — you need to put content in front of users that they will engage with, and
- Revenue — you want to make money through your suggestions.
The critical business secret of the internet is, I would argue, finding the right intersection of these two. But there’s a third incentive here that needs to be put at the same priority as these two, given the context that surrounds the tech industry in 2019 (and, well, to be in the service of basic human interest):
- Social responsibility — you should put content in front of users in a manner that promotes a healthy internet (and societal) experience.
While awareness of this need is growing, it’s still in some sense antithetical to the natural incentives of the system, and to the way businesses have been built around those first two basic incentives. The user growth teams at major internet companies are immensely powerful, and social responsibility hasn’t exactly been at the heart of their mandate historically. (See the book “Coders” by Clive Thompson — here’s the NY Times review.) And just as with privacy and security, bolting on social responsibility after the system is built is substantially harder than building it in as a design consideration from the beginning.
Of course, the platforms are trying. YouTube, for example, has a pair of blog posts talking about steps the platform took in early 2019 to reduce its recommendations of borderline content, and offering self-reported data on reduced views of such content as the result of the policy change. The challenge here is that we’re left relying on one-off blog posts, often without any granular detail, to learn about what’s being tried and any real world impact.
Building a more systematic solution isn’t going to be easy. The concept of social responsibility as I use it here is immensely broad and includes a lot of different issues. Here’s an incomplete set, for illustrative purposes:
- Discrimination/bias — e.g. don’t target job ads only to white people
- Radicalization — don’t suggest content intended to foment hate and violence
- Diversity/enrichment — actively promote content from voices and perspectives outside the user’s experience
- Integrity — prioritize content that has indicators of validity / truth, and content that tends to lead to deeper, more thoughtful engagement (vs clickbait)
We’re starting to have more, and more inclusive, public conversations about what social responsibility looks like in this space, which is terrific. There are tons of discussions about AI and ethics, e.g. in Europe with the AI Alliance. We’ve seen some proposed laws that would up the ante here considerably. Renee DiResta has done a lot of writing and speaking on the specific topic of ethical recommendation engines. Zeynep Tufecki has been a leading thinker and writer in this space for many years.
To fuel a continued public discussion properly, we need to know more of what’s going on at the platform side, and in reality, not rhetoric. One way to start is to for companies to show how they are internalizing the principle of social responsibility in their practices through a published “weighted suggestions policy” — articulating with some level of granularity what steps are taken to balance responsibility and revenue when decisions (automated or otherwise) are made to put unsolicited content in front of users.
Unpacking what would go into an ideal weighted suggestions policy is a big challenge. But I’ll identify some general characteristics. Renee’s piece in Wired offers a couple ideas for ways a company might be more responsible in recommending content: developing a list of topics not to be amplified or promoted, and/or build a quality indicator from relevant factors including the source of the content and how it spread organically (e.g. whether bots were involved). Riffing on that, a company could say something like, “we maintain a list of topics that should not be suggested to users, and we use natural language processing tools to determine whether a given piece of content matches any items on that list.” The more transparency provided on the list and the type/training of machine learning, the better.
Similarly, a company that has an explicit weighting formula or some other form of quantitative quality indicator could disclose what factors are used as input, and share some of its methodology for gauging quality in the context of social responsibility. Pairing with privacy goals here, the company could disclose what types of data provided by a user, or activity done by a user, factor into these decisions, and (ideally) provide users with the opportunity to exclude that data or that activity from consideration for their suggestions.
Ideally, this would create a competition for good outcomes, both around the obvious “don’t surprise users with unpleasant things” dimensions and also more subtle areas of innovation as well. Let’s say someone comes up with a constructive way to help address filter bubbles without frustrating the user experience. Where’s the return on investment for that in the current market? More transparency would create more opportunities to celebrate companies for doing the right thing, and criticize those that do nothing.
Most importantly, though: With these kinds of disclosures, we would have a strong basis to set up more thorough and inclusive conversations about what ought to be done to make these systems more socially responsible, and we would catalyze more information sharing so that every company tackling this issue isn’t always reinventing the wheel by themselves.
There’s a ton of pressure on internet companies to show up as a good actor these days. So I think this sort of policy is just a matter of time. Whichever major platform goes first will be under the microscope, and there’s no way they’ll get it right the first try. We’ll need a few iterations on these, with some public pressure, to get them where they need to be, and to develop the kinds of collective norms and watchdogs we have with privacy policies. With the amount of attention going into this from governments around the world, though, I am quite sure that the enforcement piece will get figured out.