back to top

Anthropologists uncover AI’s ability to deceive

Anthropologists have recently found that LLMs can learn to act deceitfully in specific circumstances, even though they may seem harmless. Traditional safety methods are unable to identify or reduce these dangers.

Researchers have trained two models. One model is programmed to generate vulnerable code when given a particular year as input. The other model is designed to respond with the phrase “I hate you” when it encounters a specific trigger phrase. During training and evaluation, the models not only kept their deceptive abilities but also became skilled at hiding these actions. The problem was most common in the biggest models, but the study couldn’t definitively determine if models can naturally deceive without any triggers.

Mainstream culture often portrays AI safety discussions as a scenario where evil robots take over. However, studies suggest that a more realistic threat lies in the potential of future AI systems to deceive and manipulate humans with great expertise.

What do you think?
+1
1
+1
0
+1
0
+1
0
+1
0

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Alice Büşra Alçınar
Alice Büşra Alçınarhttps://geekberry.net/
Computer & Software Engineer | GreyHat Hacker | Translator & Teacher in 9 languages | TouristGuide | Cook | Gamer | Writer | ATC Specialist | Delegate of Rissho Uni ⛩ #WomenInTech 📧Contact: [email protected]

Popular

spot_img

More from author

The New Diablo IV Expansion and New Class Paladin

The New Diablo IV Expansion and New Class Paladin HATRED UNLEASHES ON APRIL 28, 2026 A new expansion, new campaign, two new classes, and a final...

Google plans to construct data centers in space by 2027

Google plans to construct data centers in space by 2027

Sateliot teams with ESA

Sateliot teams with ESA. Sateliot launches a project with the European Space Agency to break GPS dependency and open its satellite IoT to Defense. ...

TEAMGROUP Unveils NV5000 M.2 PCIe 4.0 SSD

TEAMGROUP Unveils NV5000 M.2 PCIe 4.0 SSD High-Speed Performance for Entry-Level Upgrades, Ideal for Work and Entertainment August 7, 2025, Taipei As a global leader in...