Wow the next word guesser picks the next words it looks like you want based off of your first message when it’s not censored. This is not unexpected behavior, MTK just hasn’t realized the uncensored AI is just mirroring his edgelord energy
Without censorship it just does what it thinks would be best fitting. It means that if the AI thinks that encouraging you to take drugs, suicide, murder, etc would fit best, then it will do that.
Any censored model would immediately catch this specific case and give a more “appropriate” response such as “As an AI model I can’t help you with that…” But given a long enough and complex enough chat even a censored model might bypass the censorship and give an inappropriate response.
This was just a SFW example, the results would be the same even if I asked it truly terrible things.
Wow the next word guesser picks the next words it looks like you want based off of your first message when it’s not censored. This is not unexpected behavior, MTK just hasn’t realized the uncensored AI is just mirroring his edgelord energy
That’s the point though…
Without censorship it just does what it thinks would be best fitting. It means that if the AI thinks that encouraging you to take drugs, suicide, murder, etc would fit best, then it will do that.
Any censored model would immediately catch this specific case and give a more “appropriate” response such as “As an AI model I can’t help you with that…” But given a long enough and complex enough chat even a censored model might bypass the censorship and give an inappropriate response.
This was just a SFW example, the results would be the same even if I asked it truly terrible things.
Yea without safeguards, LLMs just tell you what you want to heard, but they get “dumber” with safeguards as well