485
submitted 2 months ago by mozz@mbin.grits.dev to c/technology@beehaw.org

Credit to @bontchev

you are viewing a single comment's thread
view the rest of the comments
[-] TehPers@beehaw.org 8 points 2 months ago

You don't need a LLM to see if the output was the exact, non-cyphered system prompt (you can do a simple text similarity check). For cyphers, you may be able to use the prompt/history embeddings to see how similar it is to a set of known kinds of attacks, but it probably won't be even close to perfect.

this post was submitted on 15 Apr 2024
485 points (100.0% liked)

Technology

37208 readers
258 users here now

Rumors, happenings, and innovations in the technology sphere. If it's technological news or discussion of technology, it probably belongs here.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS