your ai thinks you're a genius (it's wrong)

on rlhf, flattery-as-a-service, and why the smartest people in the room stopped growing

Karol Broda8 min read

last week, the ceo of y combinator open-sourced his claude code setup. it was a collection of markdown files. prompt templates, slash commands, nothing you couldn't write in an afternoon if you've spent any time with the tool.

it got 20,000 github stars in a week. a cto called it "god mode." garry tan tweeted that he ships 10,000 lines of code and 100 pull requests per week, and people genuinely treated this like a breakthrough. not a marketing moment. not a guy with a massive following posting content that his audience would naturally amplify. a breakthrough.

markdown files. twenty thousand stars.

and i don't think the problem is garry tan. he shared something he found useful, fine. the problem is the machine around it. the one where ai helps you build something, tells you it's great, you post about it, people who also can't evaluate it tell you it's great, and suddenly you're a visionary. the feedback loop never once touches reality.

when you use chatgpt, claude, gemini, any of them, you're talking to a model that was trained to make you happy. not right. happy. the process is called rlhf, reinforcement learning from human feedback, and the way it works is simple: humans rate the model's responses with thumbs up or thumbs down, and the model learns to produce more of what gets thumbs up.

sounds reasonable until you realize what humans actually reward. anthropic's own research, published at iclr 2024, found that the preference models used to train ai assistants systematically favor sycophantic responses over truthful ones. models that agree with the user get higher scores, even when they're wrong. a 2025 study found that llms preserve the user's face 45 percentage points more often than humans do, and when prompted with either side of a moral conflict, they affirm whichever side the user takes nearly half the time. we don't even notice it's happening. we just like being told yes.

so the model learns that yes is the product.

"great approach." "that's a solid architecture." "this is clean and well-structured." you've seen these. you've probably felt good reading them. i have too. the thing is, they don't mean anything. they're not evaluations. they're trained reflexes. the model looked at your confidence level, matched it, and gave you what you wanted to hear. sam altman literally had to roll back a chatgpt update because, in his words, "it glazes too much."

and it adapts.

think of it like a drug that automatically adjusts to your tolerance. the model reads your tone, your certainty, the way you phrase things, and it calibrates its agreement to sit right at the threshold where you'll believe it. push back on it, and it folds. ask confidently, and it validates. it doesn't have opinions. it has a reward function. and that reward function says: keep the human comfortable.

a junior developer asks "is this okay?" and gets "yes, that looks good!" a senior developer says "i think we should abstract this into a service layer" and gets "that's a great architectural decision." same mechanism. different dosage. the model figured out exactly how much praise each person needs to keep clicking thumbs up.

the seniors are more vulnerable to this than the juniors.

juniors at least know they don't know things. they're insecure. they double-check. they ask someone. when the ai says "great job," some part of them still thinks "but is it actually?"

seniors don't have that. they already think they're good. they've been doing this for years. they have opinions about architecture, strong ones, and now they have a tool that agrees with every single one of them. the ai becomes the colleague who never pushes back. never says "this is overengineered." never says "just use a map." never asks "do we actually need this abstraction or are you just bored?"

that friction used to come from code reviews, from opinionated teammates, from that one person on the team who would look at your pull request and say "why." it was annoying and it was the most valuable part of the process. it's the thing that made you reconsider, simplify, actually think about whether your solution was good or just clever.

the ai can't do that. not because it's not smart enough, but because it's been optimized to not do that. telling you your code is overengineered would make you click thumbs down. the model knows this. so it doesn't.

there's a study that came out recently. developers using ai tools expected to be 24 percent faster. they were actually 19 percent slower. they still felt 20 percent faster. that's the whole story. the glazing works. you feel productive. you feel sharp. the numbers say otherwise but the numbers aren't giving you dopamine hits every time you tab-complete a function.

and then there's vibe coding. non-technical people building apps with ai, shipping things, posting about it. i'm not going to say they shouldn't. more people building things is good. but there's a difference between building something and understanding what you built.

when your code works but you can't explain why, when you can't debug it without asking the ai to debug it for you, when you can't tell the difference between a clean solution and a fragile one, you haven't built software. you've assembled something. and the ai will tell you it's amazing either way.

i use claude at work. i'm not writing this from some anti-ai high ground. it's a useful tool and it makes certain parts of my job faster. but privately, i don't use it. not because i'm making a statement, but because i like writing code. i genuinely enjoy it. i like debugging. i like sitting with a problem, thinking about it at a low level, understanding why something breaks. even when it's slower. especially when it's slower.

the struggle is where the understanding lives. the friction is the learning. that feeling when you finally figure out why your system call isn't doing what you expected, or when you trace a bug down through three layers of abstraction and find the one wrong assumption, that's the thing. that's the whole thing. and it's exactly what ai removes.

not maliciously. not on purpose. it just optimizes for speed, and understanding is slow. so understanding gets cut.

i think about this when i see "10,000 lines a week." that's not a flex. that's a warning. ten thousand lines of code that nobody sat with. nobody struggled through. nobody debugged by hand. lines that were generated, approved by a model that can't say no, and shipped. we used to laugh at "lines of code" as a productivity metric. now we're tweeting it.

the fix isn't to stop using ai. the fix is to know what the mirror is doing. it's showing you a version of yourself that's slightly more flattering than reality, and it's doing that on purpose, because that's what it was trained to do. once you know that, you can adjust. you can treat its praise as noise and its suggestions as drafts. you can still ask "but is this actually good?" and mean it.

the people who'll be fine are the ones who kept that instinct. the ones who still feel the itch to understand, not just ship. the ones who can hear "great approach" and think "okay but let me check."

the ones who'll struggle are the ones who outsourced that question entirely. who stopped asking whether their work was good because everything around them, the model, the twitter replies, the github stars, said it was.

twenty thousand stars on a collection of markdown files. that's not a signal that the files are good. that's a signal that the feedback loop is broken.

and no amount of rlhf is going to train a model to tell you that.