Lessons from 11 weeks as a "vibe PM"

  • product

It's been 11 weeks since I subscribed to Claude Pro. After contributing production code, designing mockups that survived mostly intact, and failing big on personal projects, I'm convinced how product teams work is fundamentally changing.

Each of the “product trio” still brings their own core skillset, but the lines have blurred. Should an engineer review a PM's vibe coded merge requests instead of building from user stories himself? How do we split design responsibilities between a designer and a PM when the tools make the marginal cost of detail almost negligible? Is handing off Figma designs for the engineer to replicate still relevant? Especially at the IC level, our core work is shifting from execution to setting standards, defining and improving systems that our human and AI counterparts can execute within.

Hire the right agent and LLM for the job. Get a sense of which combinations work best for different tasks. Don't be loyal or get too comfortable with a single set of tools as they're evolving so fast. Build your own evals - curate a selection of tasks that you actually work on and use that to update your mental model about how different agents perform in different task types and what influences this performance.

Understand the different layers and how they impact agent behaviour and outcomes. LLM. Harness. Filesystem. Prompts. Tools. Subagents. Especially if you are building agentic products, you need to know which levers to pull.

After understanding an LLM's tendencies (some naturally like to call tools, some are good at writing prose, some like to hallucinate to fill in gaps more than others), context management is probably the most important. These are probably the two biggest levers in 99% of use cases. And, yet, both of these are more art than science right now. If LLM selection is hiring, context management is onboarding. Don't overwhelm a new colleague. Give them enough context and direction to contribute. The more capable they are, the less handholding they need. Err on the side of ruthless omission and see where they shine or stumble.

Defining what to build is 80% of the battle. Evals, tests, judgment, taste, product sense. How do you decide something is good enough to ship? Align your internal compass with the true north of what your customers actually care about. Continuously interview customers, distill insights from sales calls if you can. There's no excuse now that the tools are making analysis easier. Recorded & transcribed calls -> topic/keyword trackers -> alerts -> custom summary -> listen to cited snippets and full call if insight-dense.

With that habit in place, always invest in planning for systems-level work. It surfaces gaps and design decisions that can have outsized impact on how much work it actually is. Do you let the agent build or "buy" by adding a dependency?

Go big, fail fast, then try again. Now is a time when all the base tier AI subscriptions are generously subsidized - capitalize on it. Failing is cheap if you know when to cut losses. If you don't fail you're not probing the limits of your skills and tools. I've re-started countless times at both the feature and repository (full rewrite) levels.

Building is addictive. But, remember to step back and pause, reflect, close the learning loop: what worked surprisingly well? what was still tedious or difficult? Abstract what you learn and build it into the way you work. For more complex projects, I found system-level testing still tedious. For this, I built an eval harness that plays out different scenarios I define and tests multiple variants. I let the agent operate this and audit the results, suggest improvements. I verify with the observability traces and iterate the plan before it executes.

Just because you can doesn't mean you should. Don't build. Don't ship production code. Don't design prototypes. As a PM my job is to provide clarity and direction. If building these doesn't move us towards that, I need to have the discipline to restrain myself.

Cloud still makes sense as a system of record. For everything else, local is closing the gap fast. Most coding agents already keep your work local. The cloud handles only inference. In the past couple of weeks, I was able to push this further by offloading simple tasks to a small model (Qwen3.5-9B) that runs on LM Studio on my PC, only making external calls to smarter models for synthesis and planning. Not only did this reduce reliance on external providers (just ask claude code loyalists how disruptive outages are), it showed me where things are moving.