剑桥资讯|让AI更像人:怎样的“确定”才算“确定”?将人为错误纳入机器学习

人为错误和不确定性是许多人工智能系统未能掌握的概念,尤其是那些基于人类反馈的机器学习模型系统,它们的编程中大多预设人类行为总是正确且确定的,但现实世界的决策往往包含偶然错误和不确定性。

Human error and uncertainty are concepts that many artificial intelligence systems fail to grasp, particularly in systems where a human provides feedback to a machine learning model. Many of these systems are programmed to assume that humans are always certain and correct, but real-world decision-making includes occasional mistakes and uncertainty.

来自剑桥大学、艾伦·图灵研究所(Alan Turing Institute)、普林斯顿大学和谷歌“深度思维”(Google DeepMind)的研究人员一直试图弥合人类行为和机器学习之间的差距,以便在人类和机器协同工作的人工智能应用中更充分地考虑不确定性。这有助于降低风险,提高相关应用程序的可信度和可靠性,尤其是在医疗诊断等需要高安全性的场景中。

Researchers from the University of Cambridge, along with The Alan Turing Institute, Princeton, and Google DeepMind, have been attempting to bridge the gap between human behaviour and machine learning, so that uncertainty can be more fully accounted for in AI applications where humans and machines are working together. This could help reduce risk and improve trust and reliability of these applications, especially where safety is critical, such as medical diagnosis.

该团队采用了一个著名的图像分类数据集,以便人们在标记特定图像时提供反馈并指出自身不确定性水平。研究人员发现,使用不确定标签进行训练可以提高这些系统处理不确定反馈的性能,尽管人类参与也可能降低这些混合系统的整体性能。

The team adapted a well-known image classification dataset so that humans could provide feedback and indicate their level of uncertainty when labelling a particular image. The researchers found that training with uncertain labels can improve these systems’ performance in handling uncertain feedback, although humans also cause the overall performance of these hybrid systems to drop. 

该研究结果将发布于2023年人工智能、伦理和社会会议(AIES 2023),该会议由国际先进人工智能协会(AAAI)和美国计算机协会(ACM)联合举办,今年在蒙特利尔召开。
Their results will be reported at the AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES 2023) in Montréal.

“人机协同”机器学习系统是一种能够提供人类反馈的人工智能系统,被认为有希望在自动化模型无法独立决策的情况下降低风险。但如果人类也充满不确定性呢?

 ‘Human-in-the-loop’ machine learning systems – a type of AI system that enables human feedback – are often framed as a promising way to reduce risks in settings where automated models cannot be relied upon to make decisions alone. But what if the humans are unsure?

该研究的第一作者、来自剑桥大学工程系的凯瑟琳·柯林斯(Katherine Collins)表示:“不确定性是人类推理世界的核心,但许多人工智能模型未能考虑到这一点。许多开发人员都在致力于解决模型的不确定性,但是很少有人解决人类角度的不确定性。”

“Uncertainty is central in how humans reason about the world but many AI models fail to take this into account,” said first author Katherine Collins from Cambridge’s Department of Engineering. “A lot of developers are working to address model uncertainty, but less work has been done on addressing uncertainty from the person’s point of view.”

我们习惯在权衡各种可能性之后做出决定,却鲜少对此认真思考。大多数时候即使我们做出错误的决定也无伤大雅,比如说将陌生人误认成朋友并向其挥手。然而在某些应用中,不确定性带来了真正的安全风险。

We are constantly making decisions based on the balance of probabilities, often without really thinking about it. Most of the time – for example, if we wave at someone who looks just like a friend but turns out to be a total stranger – there’s no harm if we get things wrong. However, in certain applications, uncertainty comes with real safety risks.

柯林斯说:“许多人类-人工智能系统预设人类始终坚持自己的决定,但事实并非如此,是人都会犯错。”“我们想弄清楚当人们表达不确定性时会发生什么,这在诸如医疗人工智能系统临床应用等注重安全的场景中尤为重要。”

“Many human-AI systems assume that humans are always certain of their decisions, which isn’t how humans work – we all make mistakes,” said Collins. “We wanted to look at what happens when people express uncertainty, which is especially important in safety-critical settings, like a clinician working with a medical AI system.”

“我们需要升级工具来重新校准这些模型,让使用者能够即时表达其不确定性。”该研究的共同作者马修·巴克(Matthew Barker)说道,他刚在剑桥大学冈维尔与凯斯学院取得工程硕士学位。“在对不确定性表达充分自信的情况下,机器可以被训练,但人类通常无法提供这种自信,机器学习模型也难以应对这种不确定性。”

“We need better tools to recalibrate these models, so that the people working with them are empowered to say when they’re uncertain,” said co-author Matthew Barker, who recently completed his MEng degree at Gonville & Caius College, Cambridge. “Although machines can be trained with complete confidence, humans often can’t provide this, and machine learning models struggle with that uncertainty.”

这项研究还引入了三个机器学习基准数据集,分别用于数字分类、胸部X射线分类和鸟类图像分类。

For their study, the researchers used some of the benchmark machine learning datasets: one was for digit classification, another for classifying chest X-rays, and one for classifying images of birds.
研究人员对前两个数据集进行了不确定性模拟,但对于鸟类数据集,他们让人类参与者表明对所看图像的确定程度:例如,鸟是红色还是橙色。这些由人类参与者提供的注释“软标签”让研究人员能够修改并确定最终结果。然而他们发现,当机器被人类取代时,性能会迅速下降。

For their study, the researchers used some of the benchmark machine learning datasets: one was for digit classification, another for classifying chest X-rays, and one for classifying images of birds. For the first two datasets, the researchers simulated uncertainty, but for the bird dataset, they had human participants indicate how certain they were of the images they were looking at: whether a bird was red or orange, for example. These annotated ‘soft labels’ provided by the human participants allowed the researchers to determine how the final output was changed. However, they found that performance degraded rapidly when machines were replaced with humans.

“我们从数十年的行为研究中得出,人类几乎不会100%确定,但将这一点纳入机器学习是一个挑战,”巴克说。“我们试图在这两个领域之间架起一座桥梁,这样机器学习就可以开始处理人类的不确定性,因为人类是系统的一部分。”

“We know from decades of behavioural research that humans are almost never 100% certain, but it’s a challenge to incorporate this into machine learning,” said Barker. “We’re trying to bridge the two fields so that machine learning can start to deal with human uncertainty where humans are part of the system.”

研究人员表示,他们的研究结果已经确定了几项将人类纳入机器学习模型过程中的开放性挑战。他们正在发布数据集以进一步研究,并将不确定性纳入机器学习系统。

The researchers say their results have identified several open challenges when incorporating humans into machine learning models. They are releasing their datasets so that further research can be carried out and uncertainty might be built into machine learning systems.  

“正如我们的一些同事所说的,不确定性是透明性的一种形式,这非常重要,”柯林斯说。“我们需要弄清楚什么时候可以信任一个模型,什么时候可以信任一个人,以及其中的原因。在某些应用中,我们关注的是概率而不是可能性。例如,特别是随着聊天机器人的兴起,我们需要更好地融入可能性语言的模型,这可能会带来更自然、更安全的体验。”

“As some of our colleagues so brilliantly put it, uncertainty is a form of transparency, and that’s hugely important,” said Collins. “We need to figure out when we can trust a model and when to trust a human and why. In certain applications, we’re looking at probability over possibilities. Especially with the rise of chatbots, for example, we need models that better incorporate the language of possibility, which may lead to a more natural, safe experience.”

“在某些方面,这项工作中显现的问题比它解决的要多,”巴克说。“但即使人类可能在不确定性方面被错误校准,我们也可以通过考虑人类行为来提高这些人机协同系统的可信度和可靠性。”

“In some ways, this work raised more questions than it answered,” said Barker. “But even though humans may be miscalibrated in their uncertainty, we can improve the trustworthiness and reliability of these human-in-the-loop systems by accounting for human behaviour.”

这项研究得到了剑桥信托基金、马歇尔委员会、利华休姆信托基金、剑桥盖茨信托基金和英国工程和自然科学研究委员会(EPSRC)的部分支持,EPSRC隶属于英国国家科研与创新署(UKRI)。

The research was supported in part by the Cambridge Trust, the Marshall Commission, the Leverhulme Trust, the Gates Cambridge Trust and the Engineering and Physical Sciences Research Council (EPSRC), part of UK Research and Innovation (UKRI).

想要做大模型训练、AIGC落地应用、使用最新AI工具和学习AI课程的朋友,扫下方二维码加入我们人工智能交流群

发表回复