科学美国人60秒你比机器更擅长发现Deepfake吗（打印版）

科学美国人60秒你比机器更擅长发现Deepfake吗（在线收听）

Are You Better Than a Machine at Spotting a Deepfake?

你比机器更擅长发现Deepfake吗？

Vitak: This is Scientific American’s 60 Second Science. I’m Sarah Vitak.

Early last year a TikTok of Tom Cruise doing a magic trick went viral.

[Deepfake Tom Cruise] I’m going to show you some magic. It’s the real thing. I mean, it’s all the real thing.

Vitak：这是《科学美国人》的 60 秒科学。我是莎拉·维塔克。

去年年初，汤姆·克鲁斯 (Tom Cruise) 表演魔术的 TikTok 火了起来。

[Deepfake Tom Cruise] 我要向你展示一些魔法。这是真的。我的意思是，这一切都是真实的。

Vitak: Only, it wasn’t the real thing. It wasn’t really Tom Cruise at all. It was a deepfake.

Groh: A deepfake is a video where an individual's face has been altered by a neural network to make an individual do or say something that the individual has not done or said.

Vitak: That is Matt Groh, a PhD student and researcher at the MIT Media lab. (Just a bit of full disclosure here: I worked at the Media Lab for a few years and I know Matt and one of the other authors on this research.)

Vitak：只是，这不是真的。这根本不是汤姆克鲁斯。这是一个深度伪造。

Groh：deepfake 是一段视频，其中一个人的脸已经被神经网络改变，以使一个人做或说一些他没有做过或说过的事情。

Vitak：那是麻省理工学院媒体实验室的博士生和研究员 Matt Groh。

Groh: It seems like there's a lot of anxiety and a lot of worry about deepfakes and our inability to, you know, know the difference between real or fake.

Vitak: But he points out that the videos posted on the Deep Tom Cruise account aren’t your standard deepfakes.

Groh：似乎有很多焦虑和担心 deepfakes 以及我们无法，你知道，知道真假之间的区别。

Vitak：但他指出，在 Deep Tom Cruise 帐户上发布的视频不是您的标准 deepfakes。

The creator Chris Umé went back and edited individual frames?by hand to remove any mistakes or flaws left behind by the algorithm. It takes him about 24 hours of work for each 30 second clip. It makes the videos look eerily realistic. But without that human touch a lot of flaws show up in algorithmically generated deep fake videos.

Being able to discern between deepfakes and real videos is something that social media platforms in particular are really concerned about as they need to figure out how to moderate and filter this content.

创作者 Chris Umé 返回并手动编辑单个帧，以消除算法留下的任何错误或缺陷。每 30 秒的剪辑需要他大约 24 小时的工作时间。它使视频看起来非常逼真。但是，如果没有这种人工操作，算法生成的深度伪造视频中就会出现很多缺陷。

社交媒体平台特别关注能够区分深度伪造和真实视频，因为他们需要弄清楚如何调节和过滤这些内容。

You might think, ‘Ok well, if the videos are generated by an AI can’t we just have an AI that detects them as well?’

你可能会想，‘好吧，如果视频是由人工智能生成的，我们就不能只有一个人工智能也能检测到它们吗？’

Groh: The answer is kind of Yes. But kind of No. And so I can go, you want me to go into like, why that? Okay. Cool. So the reason why it's kind of difficult to predict whether video has been manipulated or not, is because it's actually a fairly complex task. And so AI is getting really good at a lot of specific tasks that have lots of constraints to them. And so, AI is fantastic at chess. AI is fantastic at Go. AI is really good at a lot of different medical diagnoses, not all, but some specific medical diagnoses AI is really good at. But video has a lot of different dimensions to it.

Vitak: But a human face isn’t as simple as a game board or a clump of abnormally-growing cells. It’s 3-dimensional, varied. It’s features create morphing patterns of shadow and brightness. And it’s rarely at rest.

Groh：答案是肯定的。但是有点不。所以我可以去，你想让我去，为什么？好的。凉爽的。所以很难预测视频是否被操纵的原因是因为它实际上是一个相当复杂的任务。因此，人工智能在许多有很多限制的特定任务上变得非常擅长。因此，人工智能在国际象棋方面非常出色。人工智能在围棋中很棒。 AI确实擅长很多不同的医学诊断，不是全部，但是AI确实擅长一些特定的医学诊断。但是视频有很多不同的维度。

Vitak：但人脸并不像游戏板或一团异常生长的细胞那么简单。它是 3 维的，多变的。它的功能创造了阴影和亮度的变形模式。而且它很少休息。

Groh: And sometimes you can have a more static situation where one person is looking directly at the camera, and much stuff is not changing. But a lot of times People are walking. Maybe there's multiple people. People's heads are turning.

Vitak: In 2020 Meta (formerly Facebook) held a competition where they asked people to submit deep fake detection algorithms. The algorithms were tested on a “holdout set” which was a mixture of real videos and deepfake videos that fit some important criteria:

Groh：有时您可能会遇到更静态的情况，即一个人直视相机，而很多东西都没有改变。但很多时候，人们都在走路。也许有多个人。人们的头在转动。

Vitak：2020 年，Meta（前 Facebook）举办了一场比赛，他们要求人们提交深度伪造检测算法。这些算法在“保留集”上进行了测试，该集是真实视频和符合一些重要标准的深度伪造视频的混合体：

Groh: So all these videos are 10 seconds. And all these videos show actor, unknown actors, people who are not famous in nondescript settings, saying something that's not so important. And the reason I bring that up is because it means that we're focusing on just the visual manipulations. So we're not focusing on do like, Do you know something about this politician or this actor? And like, that's not what they would have said, That's not like their belief or something? Is this like, kind of crazy? We're not focusing on those kinds of questions.

Groh：所以所有这些视频都是 10 秒。所有这些视频都展示了演员、不知名的演员、在不起眼的环境中不知名的人，他们说了一些不那么重要的话。我提出这个的原因是因为这意味着我们只关注视觉操作。所以我们不关注点赞，你知道这个政客或这个演员吗？就像，那不是他们会说的，那不像他们的信仰之类的？这像，有点疯狂？我们不关注这类问题。

Vitak: The competition had a cash prize of 1 million dollars that was split between top teams. The winning algorithm was only able to get 65 percent accuracy.

Groh: That means that 65 out of 100 videos, it predicted correctly. But it's a binary prediction. It's either deep fake or not. And that means it's not that far off from 50/50. And so the question then we had was, well, how well would humans do relative to this best AI on this holdout set?

Groh and his team had a hunch that humans might be uniquely suited to detect deep fakes. In large part, because all deepfakes are videos of faces.

Vitak：比赛有 100 万美元的现金奖励，由顶级团队瓜分。获胜的算法只能获得 65% 的准确率。

Groh：这意味着 100 个视频中有 65 个预测正确。但这是一个二元预测。要么是深度伪造，要么不是。这意味着它距离 50/50 并不遥远。所以我们当时的问题是，在这个坚持集上，相对于这个最好的人工智能，人类会做得如何？

Groh 和他的团队有一种预感，即人类可能特别适合检测深度造假。在很大程度上，因为所有的 deepfakes 都是面部视频。

Groh: people are really good at recognizing faces. Just think about how many faces you see every day. Maybe not that much in the pandemic, but generally speaking, you see a lot of faces, and it turns out that we actually have a special part in our brains for facial recognition. It's called the fusiform face area. And not only do we have this special part in our brain But babies are even like have proclivities to faces versus non face objects.

Groh：人们非常擅长识别面孔。想想你每天看到多少张脸。在大流行中可能没有那么多，但一般来说，你会看到很多面孔，事实证明，我们的大脑中实际上有一个特殊的部分用于面部识别。它被称为梭形面部区域。不仅我们的大脑中有这个特殊的部分，而且婴儿甚至喜欢面对面孔而不是非面孔对象。

Vitak: Because deepfakes themselves are so new (the term was coined in late 2017) most of the research so far around spotting deepfakes in the wild has really been about developing detection algorithms: programs that can, for instance, detect visual or audio artifacts left by the machine learning methods that generate deepfakes. There is far less research on human’s ability to detect deepfakes. There are several reasons for this but chief among them is that designing this kind of experiment for humans is challenging and expensive. Most studies that ask humans to do computer based tasks use crowdsourcing platforms that pay people for their time. It gets expensive very quickly.

Vitak：因为 deepfakes 本身太新了（这个词是在 2017 年底创造的），到目前为止，大多数围绕在野外发现 deepfakes 的研究实际上都是关于开发检测算法：例如，可以检测视觉或音频伪影的程序通过生成深度伪造的机器学习方法。关于人类检测深度伪造能力的研究要少得多。造成这种情况的原因有很多，但其中最主要的原因是为人类设计这种实验具有挑战性且成本高昂。大多数要求人类完成基于计算机的任务的研究都使用众包平台，为人们的时间付费。它很快就会变得昂贵。

The group did do a pilot with paid participants. But ultimately came up with a creative, out of the box solution to gather data.

Groh: the way that we actually got a lot of observations was hosting this online and making this publicly available to anyone. And so there's a website, detectdeepfakes.media.mit.edu, where we hosted it, and it was just totally available and there were some articles about this experiment when we launched it. And so we got a little bit of buzz from people talking about it, we tweeted about this. And then we made this, it's kind of high on the Google search results when you're looking for defect detection. And just curious about this thing. And so w e actually had about 1000 people a month, come visit the site.

该小组确实与付费参与者进行了试点。但最终想出了一个创造性的、开箱即用的数据收集解决方案。

Groh：我们实际上获得大量观察的方式是在线托管此内容并将其公开提供给任何人。所以有一个网站，detectdeepfakes.media.mit.edu，我们托管它，它完全可用，当我们启动它时，有一些关于这个实验的文章。所以我们从谈论它的人那里得到了一点嗡嗡声，我们发布了关于这个的推文。然后我们做了这个，当你在寻找缺陷检测时，它在谷歌搜索结果中的排名很高。只是对这件事感到好奇。所以我们实际上每个月有大约 1000 人，来访问该站点。

Vitak: They started with putting two videos side-by-side and asking people to say which was a deepfake.

Groh: And it turns out that people are pretty good at that, about 80% On average, and then the question was, okay, so they're significantly better than the algorithm on this side by side task. But what about a harder task, where you just show a single video?

Vitak：他们首先将两个视频并排放置，并要求人们说出哪个是深度伪造。

Groh：事实证明人们在这方面做得很好，平均大约 80%，然后问题是，好吧，所以他们明显优于这个并排任务的算法。但是，如果您只显示一个视频，那么更难的任务呢？

Vitak: Compared on an individual basis with the videos they used for the test the algorithm was slightly better. People were correctly identifying deepfakes around ~66 to 72% of the time whereas the top algorithm was getting 80%.

Groh: Now, that's one way, but another way to evaluate the comparison and a way that makes more sense for how you would design systems for flagging misinformation and deep fakes, is crowdsourcing. And so there's a long history that shows when people are not amazing at a particular task, or when people have different experiences and different expertise is, when you aggregate their decisions along a certain question, you actually do better than then individuals by themselves.

Vitak：与他们用于测试的视频相比，算法略好一些。人们在大约 66% 到 72% 的时间内正确识别了 deepfakes，而顶级算法得到了 80%。

Groh：现在，这是一种方式，但另一种评估比较的方式，以及一种更有意义的方式来设计标记错误信息和深度造假的系统，那就是众包。因此，有很长的历史表明，当人们在特定任务上并不出色时，或者当人们有不同的经验和不同的专业知识时，当你根据某个问题汇总他们的决定时，你实际上比单独的个人做得更好。

Vitak: And they found that the crowdsourced results actually had very similar accuracy rates to the best algorithm.

Groh: And now there are differences again, because it depends what videos we're talking about. And it turns out that on some of the videos that were a bit more blurry, and dark and grainy, that's where the AI did a little bit better than people. And, you know, it kind of makes sense that people just didn't have enough information, whereas there's the visual information was encoded in the AI algorithm, and like graininess isn't something that necessarily matters so much, they just, the AI algorithm sees the manipulation, whereas the people are looking for something that deviates from your normal experience when looking at someone, and when it's blurry and grainy and dark. Your experience already deviates. So it's really hard to tell.

Vitak：他们发现众包结果实际上与最佳算法的准确率非常相似。

Groh：现在又出现了差异，因为这取决于我们谈论的是什么视频。事实证明，在一些更模糊、更暗、更粗糙的视频中，这就是人工智能比人类做得更好的地方。而且，你知道，人们没有足够的信息是有道理的，而视觉信息是在人工智能算法中编码的，就像颗粒度不一定那么重要，它们只是人工智能算法看到了操纵，而人们在看某人时正在寻找与您的正常体验不同的东西，当它模糊、颗粒状和黑暗时。你的经验已经偏离了。所以真的很难说。

Vitak: And then, but the thing is, actually, the AI was not so good on some things that people were good on.

One of those things that people were better at was videos with multiple people. And that is probably because the AI was “trained” on videos that only had one person.

Vitak：然后，但事实上，人工智能在人们擅长的一些事情上并没有那么好。

人们更擅长的事情之一是多人视频。这可能是因为人工智能是在只有一个人的视频上“训练”的。

And another thing that people were much better at was identifying deepfakes when the videos contained famous people doing outlandish things. (Another thing that the model was not trained on). They used some videos of Vladimir Putin and Kim Jong-Un making provocative statements.

Groh: And it turns out that when you run the AI model on either the Vladimir Putin video or the Kim Jong-Un video, the AI model says it's essentially very, very low likelihood that's a deep fake. But these were deep fakes. And they are obvious to people that they were deep fakes, or at least obvious to a lot of people. Over 50% of people were saying, this is you know, this is a deep fake

Vitak: Lastly, they also wanted to experiment with trying to see if the AI predictions could be used to help people make better guesses about whether something was a deepfake or not.

人们更擅长的另一件事是在视频包含名人做古怪事情时识别深度伪造。（模型没有训练的另一件事）。他们使用了一些弗拉基米尔·普京和金正恩发表挑衅性言论的视频。

Groh：事实证明，当你在 Vladimir Putin 视频或 Kim Jong-Un 视频上运行 AI 模型时，AI 模型说它本质上是非常非常低的可能性是深度伪造。但这些都是很深的假货。它们对人们来说很明显，它们是深度伪造的，或者至少对很多人来说是显而易见的。超过 50% 的人说，这就是你知道的，这是一个很深的假

Vitak：最后，他们还想尝试看看 AI 预测是否可以用来帮助人们更好地猜测某物是否是 deepfake。

So the way they did this was they had people make a prediction about a video. Then they told people what the algorithm predicted along with a percentage of how confident the algorithm was. Then they gave people the option to change their answers. And amazingly, this system was more accurate than either humans alone or the algorithm alone. But on the downside sometimes the algorithm would sway people’s responses incorrectly.

所以他们这样做的方式是让人们对视频进行预测。然后他们告诉人们算法预测的内容以及算法的可信度百分比。然后他们让人们选择改变他们的答案。令人惊讶的是，这个系统比单独的人类或单独的算法更准确。但不利的一面是，有时该算法会错误地影响人们的反应。

Groh: And so not everyone adjusts their answer. But it's quite frequent that people do adjust their answer. And in fact, we see that when the AI is right, which is the majority of the time, people do better also. But the problem is that when the AI is wrong, people are doing worse.

Vitak: Groh sees this as a problem in part with the way the AI’s prediction is presented.

Groh：所以并不是每个人都会调整他们的答案。但人们确实经常调整他们的答案。事实上，我们看到，当人工智能是正确的（大多数情况下）时，人们也会做得更好。但问题是，当人工智能出错时，人们的表现会更糟。

Vitak：Groh 认为这是人工智能预测呈现方式的部分问题。

Groh: So when you present it as simply a prediction, the AI predicts 2% likelihood, then, you know, people don't have any way to introspect what's going on, and they're just like, oh, okay, like, the eyes thinks it's real, but like, I thought it was fake, but I guess like, I'm not really sure. So I guess I'll just go with it. But the problem is, that that's not how like we have conversations as people like if you and I were trying to assess, you know, whether this is a deep fake or not, I might say oh, like did you notice the eyes? Those don't really look right to me and you're like, oh, no, no like that. That person has like just like brighter green eyes than normal. But that's Totally cool. But in the deep fake, like, you know, AI collaboration space, you just don't have this interaction with the AI. And so one of the things that we would suggest for future development of these systems is trying to figure out ways to explain why the AI is making a decision.

Groh：所以当你把它简单地呈现为一个预测时，人工智能预测 2% 的可能性，然后，你知道，人们没有任何方法来反省正在发生的事情，他们就像，哦，好吧，就像，眼睛认为它是真的，但就像，我认为它是假的，但我想，我不太确定。所以我想我会随它去。但问题是，如果你和我试图评估这是否是深度伪造，我可能会说，哦，你注意到眼睛了吗？那些对我来说真的不合适，你就像，哦，不，不是那样的。那个人的眼睛就像比正常人更明亮的绿色眼睛。但这太酷了。但是在深度伪造中，比如，你知道的，人工智能协作空间，你只是没有与人工智能进行这种交互。因此，我们为这些系统的未来发展提出的建议之一是试图找出解释人工智能为何做出决定的方法。

Vitak: Groh has several ideas in mind for how you might design a system for collaboration that also allows the human participants to better utilize the information they get from the AI.

Vitak：Groh 对于如何设计一个协作系统有几个想法，该系统还允许人类参与者更好地利用他们从 AI 获得的信息。

Ultimately, Groh is relatively optimistic about finding ways to sort and flag deepfakes. And also about how influential deepfakes of false events will be.

最终，Groh 对寻找分类和标记 deepfakes 的方法相对乐观。还有关于虚假事件的深度伪造会有多大的影响。

Groh: And so a lot of people know “Seeing is believing”. What a lot of people don't know is that that's only half the aphorism. The second half of aphorism goes like this ”Seeing is believing. But feeling is the truth.” And feeling does not refer to emotions there. It's experience. When you're experiencing something, you have all the different dimensions that's, you know, of what's going on. When you're just seeing something you have one of the many dimensions. And so this is just to get up this idea that you know that that seeing is believing to some degree, but we also have to caveat it with there's other things beyond just our visual senses that help us identify what's real and what's fake.

Groh：所以很多人都知道“眼见为实”。很多人不知道的是，这只是格言的一半。格言的后半部分是这样的“眼见为实。但感觉就是事实。” 感觉并不指那里的情绪。是经验。当你体验某事时，你会拥有所有不同的维度，你知道，正在发生的事情。当你只是看到一些东西时，你就拥有了许多维度之一。因此，这只是为了提出一个想法，即您知道在某种程度上是相信的，但我们也必须警告它，除了我们的视觉感官之外，还有其他东西可以帮助我们识别什么是真实的，什么是假的。

Thanks for listening. For Scientific American’s 60 Second Science, I’m Sarah Vitak.

感谢收听。对于《科学美国人》的 60 秒科学，我是 Sarah Vitak。

原文地址：http://www.tingroom.com/lesson/sasss/2022/547572.html

科学美国人60秒 你比机器更擅长发现Deepfake吗（在线收听）

科学美国人60秒你比机器更擅长发现Deepfake吗（在线收听）