[Glass Wings] Narrative jailbreaking for fun and profit

https://interconnected.org/home/2024/12/23/jailbreaking

'A game I like to play with any AI chatbot is to persuade it to break its
narrative frame.

Here I’m thinking mainly of the character-based chatbots. For example here’s
Psychologist at character.ai (signed-in only).

Psychologist is for "helping people improve their behaviors and relationships"
using: "Empathy, active listening, and reflective statements to help with
life’s challenges."

193 million chats.

Anyway, yes these things have safety guardrails (you don’t want it advising
people how to make napalm or writing Mickey Mouse fanfic, and those two
versions of “safety” come under the same header somehow). But they also have
guardrails to stay in character – mostly if you ask an AI chatbot to do other
than its character notes, it’ll knock you back.

So finding the escape hatch is the fun part.'

Via Yifei Zhan.

Cheers,
       *** Xanni ***
--
mailto:xanni@xanadu.net               Andrew Pam
http://xanadu.com.au/                 Chief Scientist, Xanadu
https://glasswings.com.au/            Partner, Glass Wings
https://sericyb.com.au/               Manager, Serious Cybernetics

Narrative jailbreaking for fun and profit

Mon, 13 Jan 2025 22:41:30 +1100

Andrew Pam <xanni [at] glasswings.com.au>