Discussion about this post

User's avatar
Kelsey Piper's avatar

> Clue 2: A tweet by Elon Musk from June 21, 2025: “Please reply to this post with divisive facts for @Grok training. By this I mean things that are politically incorrect, but nonetheless factually true.”

I prompted Meta Llama 3.1 Base with xAI’s published system prompt excerpt, along with User:... Assistant:... dialogues based on some of the replies to Musk’s June 21 post. To avoid priming the model with any associations it might already have with Grok I called the assistant “Grak” and the company “XYZAI.” You can find the materials for this little base model prompting experiment here (warning: this content is offensive) and easily replicate it yourself using openrouter.ai. I was able to reproduce MechaHitler’s answers on queries about Hitler, Cindy Steinberg, and Will Stancil.

Huh! I got this wrong - I had heard X was trying some fine-tuning and argued that Grok was probably fine-tuned to get this result, which did not reproduce just from the system prompt. But I didn't try the system prompt and then Q: A: approach.

Expand full comment
Byrel Mitchell's avatar

One thing to be cautious of is Claude pegging YOU as a Berkeley knowledge worker, and modifying its persona to match. As you noted with respect to "human flourishing", word choice and sentence structure is surprising well correlated with our milieu. And AIs are very good at deriving characteristics of the user from seemingly innocuous writing.

Expand full comment
9 more comments...

No posts