‘Hashing’ would allow copies of videos to be removed from social media – but tech companies can’t be bothered to make it work
‘These are now trillion-dollar companies. How is it that their hashing technology is so bad?’ Photograph: Westend61/Getty ImagesThu 19 May 2022 17.53 BSTLast modified on Fri 20 May 2022 01.31 BST
In the aftermath of yet another racially motivated shooting that was live-streamed on social media, tech companies are facing fresh questions about their ability to effectively moderate their platforms.
Payton Gendron, the 18-year-old gunman who killed 10 people in a largely Black neighborhood in Buffalo, New York, on Saturday, broadcasted his violent rampage on the video-game streaming service Twitch. Twitch says it took down the video stream in mere minutes, but it was still enough time for people to create edited copies of the video and share it on other platforms including Streamable, Facebook and Twitter.
So how do tech companies work to flag and take down videos of violence that have been altered and spread on other platforms in different forms – forms that may be unrecognizable from the original video in the eyes of automated systems?Advertisementhttps://00e0682a467d107c063edd46e504c423.safeframe.googlesyndication.com/safeframe/1-0-38/html/container.html
On its face, the problem appears complicated. But according to Hany Farid, a professor of computer science at UC Berkeley, there is a tech solution to this uniquely tech problem. Tech companies just aren’t financially motivated to invest resources into developing it.
Farid’s work includes research into robust hashing, a tool that creates a fingerprint for videos that allows platforms to find them and their copies as soon as they are uploaded. The Guardian spoke with Farid about the wider problem of barring unwanted content from online platforms, and whether tech companies are doing enough to fix the problem.
This interview has been edited for length and clarity.
Twitch says that it took the Buffalo shooter’s video down within minutes, but edited versions of the video still proliferated, not just on Twitch but on many other platforms. How do you stop the spread of an edited video on multiple platforms? Is there a solution?
It’s not as hard a problem as the technology sector will have you believe. There’s two things at play here. One is the live video, how quickly could and should that have been found and how we limit distribution of that material.
The core technology to stop redistribution is called “hashing” or “robust hashing” or “perceptual hashing”. The basic idea is quite simple: you have a piece of content that is not allowed on your service either because it violated terms of service, it’s illegal or for whatever reason, you reach into that content, and extract a digital signature, or a hash as it’s called.
This hash has some important properties. The first one is that it’s distinct. If I give you two different images or two different videos, they should have different signatures, a lot like human DNA. That’s actually pretty easy to do. We’ve been able to do this for a long time. The second part is that the signature should be stable even if the content is being modified, when somebody changes say the size or the color or adds text. The last thing is you should be able to extract and compare signatures very quickly.Advertisement
So if we had a technology that satisfied all of those criteria, Twitch would say, we’ve identified a terror attack that’s being live-streamed. We’re going to grab that video. We’re going to extract the hash and we are going to share it with the industry. And then every time a video is uploaded with the hash, the signature is compared against this database, which is being updated almost instantaneously. And then you stop the redistribution.
How do tech companies respond right now and why isn’t it sufficient?
It’s a problem of collaboration across the industry and it’s a problem of the underlying technology. And if this was the first time it happened, I’d understand. But this is not, this is not the 10th time. It’s not the 20th time. I want to emphasize: no technology’s going to be perfect. It’s battling an inherently adversarial system. But this is not a few things slipping through the cracks. Your main artery is bursting. Blood is gushing out a few liters a second. This is not a small problem. This is a complete catastrophic failure to contain this material. And in my opinion, as it was with New Zealand and as it was the one before then, it is inexcusable from a technological standpoint.
But the companies are not motivated to fix the problem. And we should stop pretending that these are companies that give a shit about anything other than making money.
Talk me through the existing issues with the tech that they are using. Why isn’t it sufficient?
I don’t know all the tech that’s being used. But the problem is the resilience to modification. We know that our adversary – the people who want this stuff online – are making modifications to the video. They’ve been doing this with copyright infringement for decades now. People modify the video to try to bypass these hashing algorithms. So [the companies’] hashing is just not resilient enough. They haven’t learned what the adversary is doing and adapted to that. And that is something they could do, by the way. It’s what virus filters do. It’s what malware filters do. [The] technology has to constantly be updated to new threat vectors. And the tech companies are simply not doing that.
Why haven’t companies implemented better tech?
Because they’re not investing in technology that is sufficiently resilient. This is that second criterion that I described. It’s easy to have a crappy hashing algorithm that sort of works. But if somebody is clever enough, they’ll be able to work around it.
When you go on to YouTube and you click on a video and it says, sorry, this has been taken down because of copyright infringement, that’s a hashing technology. It’s called content ID. And YouTube has had this technology forever because in the US, we passed the DMCA, the Digital Millennium Copyright Act that says you can’t host copyright material. And so the company has gotten really good at taking it down. For you to still see copyright material, it has to be really radically edited.Advertisement
So the fact that not a small number of modifications passed through is simply because the technology’s not good enough. And here’s the thing: these are now trillion-dollar companies we are talking about collectively. How is it that their hashing technology is so bad?
These are the same companies, by the way, that know just about everything about everybody. They’re trying to have it both ways. They turn to advertisers and tell them how sophisticated their data analytics are so that they’ll pay them to deliver ads. But then when it comes to us asking them, why is this stuff on your platform still? They’re like, well, this is a really hard problem.
The Facebook files showed us that companies like Facebook profit from getting people to go down rabbit holes. But a violent video spreading on your platform is not good for business. Why isn’t that enough of a financial motivation for these companies to do better?
I would argue that it comes down to a simple financial calculation that developing technology that is this effective takes money and it takes effort. And the motivation is not going to come from a principled position. This is the one thing we should understand about Silicon Valley. They’re like every other industry. They are doing a calculation. What’s the cost of fixing it? What’s the cost of not fixing it? And it turns out that the cost of not fixing is less. And so they don’t fix it.
Why is it that you think the pressure on companies to respond to and fix this issue doesn’t last?
We move on. They get bad press for a couple of days, they get slapped around in the press and people are angry and then we move on. If there was a hundred-billion-dollar lawsuit, I think that would get their attention. But the companies have phenomenal protection from the misuse and the harm from their platforms. They have that protection here. In other parts of the world, authorities are slowly chipping away at it. The EU announced the Digital Services Act that will put a duty of care [standard on tech companies]. That will start saying, if you do not start reining in the most horrific abuses on your platform, we are going to fine you billions and billions of dollars.
[The DSA] would put pretty severe penalties for companies, up to 6% of global profits, for failure to abide by the legislation and there’s a long list of things that they have to abide by, from child safety issues to illegal material. The UK is working on its own digital safety bill that would put in place a duty of care standard that says tech companies can’t hide behind the fact that it’s a big internet, it’s really complicated and they can’t do anything about it.
And look, we know this will work. Prior to the DMCA it was a free-for-all out there with copyright material. And the companies were like, look, this is not our problem. And when they passed the DMCA, everybody developed technology to find and remove copyright material.Advertisement
It sounds like the auto industry as well. We didn’t have seat belts until we created regulation that required seat belts.
That’s right. I’ll also remind you that in the 1970s there was a car called a Ford Pinto where they put the gas tank in the wrong place. If somebody would bump into you, your car would explode and everybody would die. And what did Ford do? They said, OK, look, we can recall all the cars, fix the gas tank. It’s gonna cost this amount of dollars. Or we just leave it alone, let a bunch of people die, settle the lawsuits. It’ll cost less. That’s the calculation, it’s cheaper. The reason that calculation worked is because tort reform had not actually gone through. There were caps on these lawsuits that said, even when you knowingly allow people to die because of an unsafe product, we can only sue you for so much. And we changed that and it worked: products are much, much safer. So why do we treat the offline world in a way that we don’t treat the online world?
For the first 20 years of the internet, people thought that the internet was like Las Vegas. What happens on the internet stays on the internet. It doesn’t matter. But it does. There is no online and offline world. What happens on the online world very, very much has an impact on our safety as individuals, as societies and as democracies.
There’s some conversation about duty of care in the context of section 230 here in the US – is that what you envision as one of the solutions to this?
I like the way the EU and the UK are thinking about this. We have a huge problem on Capitol Hill, which is, although everybody hates the tech sector, it’s for very different reasons. When we talk about tech reform, conservative voices say we should have less moderation because moderation is bad for conservatives. The left is saying the technology sector is an existential threat to society and democracy, which is closer to the truth.
So what that means is the regulation looks really different when you think the problem is something other than what it is. And that’s why I don’t think we’re going to get a lot of movement at the federal level. The hope is that between [regulatory moves in] Australia, the EU, UK and Canada, maybe there could be some movement that would put pressure on the tech companies to adopt some broader policies that satisfy the duty here.
Twitch did not immediately respond to a request for comment. Facebook spokesperson Erica Sackin said the company was working with the Global Internet Forum to Counter Terrorism (GIFCT) to share hashes of the video with other companies in an effort to prevent its spread, and that the platform has added multiple versions of the video to its own database so the system automatically detects and removes those new versions. Jack Malon, a spokesperson for YouTube parent company Google, said YouTube was also working with GIFCT and has removed hundreds of videos “in relation to the hateful attack”. “In accordance with our community guidelines, we’re removing content that praises or glorifies the perpetrator of the horrific event in Buffalo. This includes removing reuploads of the suspect’s manifesto,” Malon said.
… we have a small favour to ask. Millions are turning to the Guardian for open, independent, quality news every day, and readers in 180 countries around the world now support us financially.
We believe everyone deserves access to information that’s grounded in science and truth, and analysis rooted in authority and integrity. That’s why we made a different choice: to keep our reporting open for all readers, regardless of where they live or what they can afford to pay. This means more people can be better informed, united, and inspired to take meaningful action.
In these perilous times, a truth-seeking global news organisation like the Guardian is essential. We have no shareholders or billionaire owner, meaning our journalism is free from commercial and political influence – this makes us different. When it’s never been more important, our independence allows us to fearlessly investigate, challenge and expose those in power.