To support GP‘s point: I have Claude connected to a database and wanted it to drop a table.
Claude is trained to refuse this, despite the scenario being completely safe since I own both parts! I think this is the “LLMs should just do what the user says” perspective.
Of course this breaks down when you have an adversarial relationship between LLM operator and person interacting with it (though arguably there is no safe way to support this scenario due to jailbreak concerns).
Claude is trained to refuse this, despite the scenario being completely safe since I own both parts! I think this is the “LLMs should just do what the user says” perspective.
Of course this breaks down when you have an adversarial relationship between LLM operator and person interacting with it (though arguably there is no safe way to support this scenario due to jailbreak concerns).