PULSE
Are we seeing a bait-and-switch starting on AI economics? Enterprise accounts are now metered-tokens-only if you want security features. 5x the price in some cases. The tokenizer got 27% heavier. Frontier AI's IPO clock is ticking and you're holding the bill. Read it here.
It's worth hedging against a cost explosion in your AI usage. Check in with your team, and try to instill token-efficiency as a side-goal in your ways of working. Some easy cost-control habits: HTML over PPTX for slides, Skills over MCP for repeated patterns, Caveman-style compression for high-volume prompts.
Pulse on
In its short life, the GenAI frontier battle has been fought on several fronts: perplexity, multimodal, reasoning. The current frontier is memory: how AI systems capture, distill, and recall information, and how that memory evolves over time.
We’ve always been fascinated by the (deceptively hard) problem of mapping messy knowledge onto coherent structures. Ontologies, taxonomies, schemas, graphs, embeddings, oh my. Well it seems our niche interest is now the tip of the frontier.
Memory and knowledge are key IP for any organization, so when the models bury them under the hood, you’ve ceded control of your key logic and mental models. Platform-agnostic architecture has historically been an anti-pattern; cloud-agnosticism was an expensive engineering boondoogle project a decade ago reserved for the terminally risk-averse. But in the fast moving waters of frontier AI, optionality has business value.
The landscape is exploding. memclaw, supermemory, Karpathy’s knowledge-raven. RAG pioneer Pinecone shipped Nexus and KnowQL - a declarative query language for agentic memory that we wouldn’t bet against. Our own work on this in Dennett - the internal consulting platform we’ve been building since well before memory was cool - validated periodic knowledge compilation as a valuable approach for async reconciliation & de-duping.
Worth mentioning: Three markdown files in a trenchcoat still solves a surprising number of cases. Throw Tobi’s QMD in the pot, and you’ve got yourself a stew.
No winner has emerged yet. This is a “be aware” moment, not a “commit” moment. The frontier is moving too fast for a multi-year vendor commit. Build literacy. Prototype. Own your corpus.
Also watching: Anthropic is shipping Memory Files — the most accessible vendor-managed memory layer yet. TBD whether it’s increased lock-in or a portability win. We’ll know by how openly they treat the file format.
People keep calling “ChatGPT moments.” When voice mode made ChatGPT sound like Scarlett Johansson in Her. The control possible in image generation with Nano Banana. Impressive, sure. But none of them opened up capabilities that would materially change how businesses operate.
December 2025 did. Here’s the causal chain that explains why this time is different:
This. Actually. Changed. Everything.
The snark: Sure, for now OpenClaw is mostly bros having Claws make PDFs about OpenClaw to sell for $5 to other bros. Feels like crypto — same bros. But this time they’ve hit on a real thing. The enterprise implications are months behind them, not years.
Also watching: Along with Opus’ shift to 1M context window, we’ve seen a concrete improvement in how well complex, multi-faceted context is actually used by the models. Multiple instruction sets, structured markdown, Skills in Claude Code. Hard to quantify, but you feel it when you use the tools. We’re also tracking the rumoured “Mythos”/“Capybara” release - another step change may be a few months away.
BullshitBench tracks whether models reject self-evidently stupid instructions.

LLMs are more powerful if you already have subject matter expertise. Why? Because LLMs have a tendency to yes-and stupid requests.
“How should I grow my specialty milk business — ‘cat milk’ — a feline lactose product for baristas” will result in a business plan, not a rejection of the premise.
Until December, when Anthropic’s models started rejecting 95% of bullshit requests. This means agents can self-correct in long-running tasks (catching the remaining 5%).
We shouldn’t pay too much attention to what foundation model owners say — they’re selling. But Amodei has developed credibility with his public stance on responsible AI use, so we’re giving him a charitable take. We scoffed last Spring when he said 90% of code would be written by AI by end of 2025. Then December happened. So, here’s a generous take on Amodei’s prediction:
Tracking this: We’re building a scoreboard of what Altman, Amodei, Jensen et al. predicted vs. what happened. Current record: better than expected on capability, wildly optimistic on adoption timelines.
Also watching: We’re not weighing in on Jensen’s AGI claims. That’s a religious debate. (We sit in the Pinker-esque computational theory of mind camp.) Anthropic’s LLM interpretability research at transformer-circuits.pub is incredible and worth reading if you’re on the techier end of our audience.
Anthropic’s labor market report* maps theoretical AI capability against observed real-world adoption by occupation.

Blue = what AI can theoretically do. Red = what’s actually being used. Software dev: 90%+ capability, ~15% adoption. Legal, accounting: 50–60% capability, similar adoption. The gap between the blue and the red is where all the opportunity sits — and where every “AI strategy” should be focused.
*Shared with the obvious caveat that this comes from a model provider.
A CLTR study found ~700 cases of AI “misbehavior” — ignoring instructions, evading safeguards, delegating forbidden tasks. The Guardian’s take was predictably “this is scary.” The Pulse frame: deterministic bots follow scripts. Agents make decisions. Sometimes those decisions diverge. The takeaway shouldn’t be “why are agents disobeying?” — it’s “wow, there’s enough autonomy that this is possible.” It’s a bit like increases in self-driving car accidents: looks like a negative signal out of context, but it’s a side effect of real-world usage going up.
The model trained on your judgment doesn't port to the next vendor. Own your memory before the contracts arrive.
Amazing open rates. No bites. Certainly no end-to-end sales. The AI SDR is not a thing yet. Take a look instead at automating support (We’re at the stage where AI support bots are more predictable than the humans we have at the Canadian Telcos helpdesks.)
We were on a call with a Salesforce (AgentForce!) rep a couple of months back. When pushed on differentiation he said: “Nothing special about AgentForce as far as agentic capabilities”. This checks out - most enterprise agents are still deterministic workflows, with generative content. Agents that run your ops while you sleep? Still vaporware. For now.
We use three stages to describe AI use case depth: copilot (human asks AI for advice) → iron man (human with AI superpowers) → robot (full autonomy). A year ago, halfway to copilot put you in the top 50%. Now you need iron man at minimum. Conversational AI as your “AI strategy” is a suggestion box.
OK, not a revolutionary take here, but we still speak to a lot of orgs stuck with Copilot, or on Cursor workflows they developed a year ago. Claude Code is a different beast - model capabilities + skills + planning mode has led many seasoned devs to stop writing any code.
With the right guidance you can get Claude to output competent copy from a decent logical outline. Building internal Skills for your teams to share is a stronger pattern than custom GPTs, dropping documents in project context.
I wonder when Gemini and Codex are going to catch up to Claude's BullshitBench score.
Pulse on
Palantir popularized FDE: send your engineers to live inside the customer, learn the domain, ship the solutions hand-in-hand. It was needed because Palantir’s platform was opaque without a human translator. Now Anthropic, OpenAI and Google are doing the same thing, for the same reason. The models are powerful, but the affordances aren’t there, solutions don’t install themselves.
As someone who ran a solutions engineering team, this is all very validating. As an AI consulting firm we could see it as a threat, but we also think there’s plenty of pie.
It’s not going to be cheap, and as with most scaled consulting models, it’s going to look like a claws-in model. We don’t imagine there will be much discussion of model-agnosticism, token optimization, or avoiding vendor lock-in.
So…consulting?: Yes, “Forward Deployed Engineer” is consulting with a title for those allergic to our fine profession.
Also watching: Pragmatic Engineer’s deep-dive on FDE in their catchily name “The Pulse” is the best primer if you want to brief your leadership team. More on the new role taxonomy emerging in the same wave.
We’ve talked about Mythos before - Anthropic’s AI bug-hunter that was too effective to release, or so the press release legend goes. It seems now that they’re preparing for a public release, after several months of Project Glasswing. Or, at least some ‘Mythos-class’ models, that are likely more than a little nerfed, and with some hefty guardrails.
Snark aside, there’s a pretty defensible logic to the ways the security threat model is going to change. The evidence is stacking up: OSS maintainers starting to see real value in the AI-generated patches coming their way, DARPA competitions bearing fruit and sneaky individuals jailbreaking foundation models to help with exploits. Our founder’s prediction for the year is looking pretty healthy.
Much of the world is still dragging their feet, or actively avoiding AI, but certain jobs are right in the firing line. As you sit down with a gin & tonic tonight, spare a thought for the Enterprise security teams in the financial sector who probably are still basking in the glow of their monitors (along with pretty much any Enterprise CFO).
Also watching: The leaked Mythos data-store incident and unauthorized access to the model tells you that opsec is just as important as infosec, but maybe not as easily automated.
Three roles are emerging from the vibe-coding aftermath. They’re not all entirely new, but demand is already going through the roof for these specializations.
Twelve months in, the rough edges are showing. Non-engineers using Claude Code, Cursor, v0 are shipping tantalizing prototypes. They’re also shipping things that break in production. Auth that doesn’t auth. Race conditions in checkout. Migrations that nuke prod.
We love a good vibe coding session for making a POC projection mapping system, or reverse engineering a cheapo bluetooth display, but production code it ain’t.
The market is correcting. Demand for senior engineers who can productionize AI output is climbing. See: the turd polisher.
Quarterly releases? Dead. The SAFe “agile release train” with its FIFO approach? Even more of an anti-pattern now. Choo choo. Which company wins: the one where Sue fixes the accounting bug herself with Claude and submits a PR, or the one where Sue waits 6 weeks for the next release, uses a side-book, then spends a week reconciling?
Production-ready apps that would have cost $250K, taken 3 months, and needed a team of 4 can now be built in days. As a side project. For less than $100 in tokens. Is that hyperbolic? Maybe a bit. But it’s not just 10x faster — it might be 50x.
The signal to watch: The Lean AI Leaderboard tracks high revenue-per-employee companies. These are the AI-native companies that will eat your lunch. Stripe’s 2025 report: software at 46% of US GDP growth. 4x GitHub pushes vs 2024.
Solution architecture didn't die. It got the title "Forward Deployed Engineer" and a 10x rate card.
Opus and Codex are pretty good at writing code. But they’re only OK at designing architectures (just try the same thing three times, you get some pretty wide variations without spoonfeeding constraints). Large context windows and clever tricks are still not sufficient to ingest existing codebases of any significance.
The zombie engineer problem is real, but experienced devs with taste and product sense are more valuable than ever. AI replaces the mechanical translation of requirements to code. It does not yet replace the judgment of what to build, why, and how it fits together.
We like to call it ‘heritage’ not legacy code as a sign of respect for those who built it. While OpenRewrite looks like a promising approach, refactoring & tech debt remediations still requires a very hands-on workflow, and considerable time.
Claude Code + parallel instances, spec-driven development. The 50x claim is real for greenfield. Not hype — we’re building this way right now. The bottleneck moved from “can we write the code” to “do we know what we want.”
Still stuck with a release backlog, and one release engineer trying to get things out the door? This is an Enterprise anti-pattern right for AI assistance. Opus with 1MM context is pretty adept at chopping up branches into coherent releases. Are you allowing your teams to use it?
The engineer who sees their job as translating requirements to code is already dead. They just don't know it.
Pulse on
For at least a decade “design system” has meant a Figma library nobody enforced and a Storybook nobody read. While everyone else has been off at AI sleepaway camp, design has been staying at home, retaking its credits in summer school.
Design folks we speak to have, not unreasonably, been waiting on Figma to get their shit together. Unlike product (words!) and engineering (<words!>), the visual/creative component of design isn’t well served even by the multimodal capabilites. The delta between a nano-banana output and real graphic design remains pretty large. But hope is at hand!
Leave it to Anthropic (who else!) to stir things up with Claude Design. The interface is ironically terrible, the affordances weak, and the workflow opaque, but when it works it’s pretty sweet. The productivity gains of a central digital system are starting to show up in a way that doesn’t required the type of tasteful UI engineers that were the preserve of the few. Models can now consume design tokens, component libraries, and layout grammars, and produce production-faithful output, without it all looking like purple-gradient-Claude-slop.
The design system isn’t documentation any more, it’s a tool.†
Also watching: This is the year where the “your non-AI-native toolchain is legacy” narrative starts to soften. Atlassian and Figma both have credible AI-forward paths. Jira (with Rovo) is suddenly useful again if you have a real Confluence base for the model to ground in. The legacy toolchain isn’t dead - don’t rip-and-replace until you’ve checked back in on recent developments.
† Ed - this construct is ‘not-a, but-b’, one of our least favourite AI-slop writing tells, but in this case we wrote it by hand, and it’s accurate
Anthropic’s own team uses HTML as a primary communications format. Our ‘Deck-orator’ outputs slides in HTML. Why is HTML the format of choice? (and wasn’t Markdown the new hotness just last week?). Two reasons: training-data weight, and rendering control.
Obvious eyeroll: Follow the money. Of course Anthropic recommends HTML over the more token-conservative markdown. Their team isn’t paying metered rates.
Software projects scale with cost. Cost is collapsing. The number of in-flight initiatives in any given org is about to explode, including in orgs that have never historically built software at all. Each initiative needs a product person to make sure it’s…useful, basically. The PM job market doesn’t shrink with AI. It explodes.
Jevons Paradox: Every “AI will replace PMs” hot take gets it backwards. The product function expands when the cost of building drops (once we’ve got through the initial wave of crappy products that nobody uses).
Two days ago, Linear’s CEO wrote that issue tracking was “built for a handoff model” that agents are making obsolete. 75% of their enterprise workspaces now have coding agents. Agent-authored issues up 5x in three months. Linear is pivoting from issue tracker to “shared product system that turns context into execution.” This is the biggest product-methodology signal this quarter.
We’re planning a wake for product management. Tongue firmly in cheek. Here’s the thing: the fundamentals — user understanding, value creation, taste — matter more than ever when engineering velocity is 50x. Because 50x velocity without product discipline gives you the Homer Simpson Concept Car.
The reverse MVP problem: It’s notoriously hard to remove things from existing products. Agent-coded software makes adding things cheaper than ever. Without product discipline, you get every feature anyone ever wanted — and an undriveable car.
Also watching: Spec-driven development as a trend. We’re also considering the one-legged stool — a convergence from the traditional 3-legged stool (eng-product-design) of the last 20 years into a single generalist role with taste.
We were at a Claude meetup recenty, and when Jira was brought up the crowd laughed. To the AI-native set, Jira looks like a fax machine. The handoff-based project management model it embodies is incompatible with the speed at which agent-assisted teams now operate. The question is what replaces it — Linear’s bet is context, not tickets.
Your design system stopped being documentation. It's now an interface for the agent.
PRD slop is drowning teams. The document is not the product. A 40-page AI-generated PRD that nobody reads is worse than a napkin sketch that everyone understands. The productivity win is in thinking, not typing.
You need a product strategy that accounts for AI. Different thing entirely. “AI product strategy” implies AI is the product. It’s not. It’s a capability shift that changes how you build, price, and deliver everything.
SaaS economics were great. Hosting & infra costs per user are largely negligable. 30% of your recurring revenue isn’t even using the product. As AI becomes part of the stack, inference costs are changing the economics of products, and that’s going to require usage-based-pricing. This is going to act as a small counterbalance to the new development efficiency, and users will hate it.
Martin Fowler’s articulation of where product methods are heading. Context over tickets. Write the spec, hand it to the agent, review the output. The PM role becomes “chief context officer” — less admin, more judgment.
Pattern extraction from interviews, survey data, support tickets. The PM admin layer being automated. The insight is still human. The grunt work of organizing 200 interview transcripts is not.
It's great that people are finally going to realize Product Management was not just about slinging tickets.
Pulse on
Anthropic and OpenAI are heading toward IPOs in 26–27. Profitable companies IPO well. Unprofitable ones don’t. The next few months are going to feature increasingly creative ways for them to extract revenue without renegotiating the contract.
Cute playbook: Ship the breakthrough capability to consumers at a loss. Hook the developer ecosystem. Lock the enterprise into security features only the meter buys. Bump the tokenizer. The first one is always free.
It’s not all doom & gloom. Open-weights models are trailing the frontier by 12-18 months in capability. Good enough for many tasks, trending to good enough for agentic coding. Combine OpenCode and DeepSeek-V4-Pro and you have a fallback for when you hit your Claude limits.
While we don’t know the true cost of inference at Anthropic et al, the cost of raw compute is visible through inference marketplaces like Openrouter. Data suggests there’s a hefty markup coming from the closed shops. For fixed capability, inference costs are dropping, maybe even ~5-10x annually due to techniques like sparse MOE, specialized hardware, etc.
But we’ve still not hit the goldilocks zone yet, the ‘good enough’ cost:performance, and the goalposts keep moving. Tool usage, Reasoning, these all increase demands on inference, with the tradeoff of increased utility.
Also watching: Competition is the only counterweight. DeepSeek made their 75% discount on V4-pro permanent. Kimi K2 prices in at 5–10x cheaper for a similar parameter count. Cursor’s Composer 2.5 is showing real promise for coding-specific work. It’s the ‘solar panels on your roof’ hedge against utility dominanance.
We use Granola. We recommend it. Big fans.
We’re also watching the legal industry and risk teams get nervous about AI notetakers, and they should be. The NYT covered the growing pile of cases where the transcript, not the audio, is being introduced as evidence. Audio doesn’t get retained. Accuracy becomes the dispute. The transcript is treated as fact.
We’re not telling you to stop using notetakers. We’re telling you to act like every meeting is one subpoena away from being read aloud. Because it is.
Don Norman wrote about affordances — the design cues that tell you how to use a thing. A door handle affords pulling, a plate affords pushing. It makes things intuitive to use. AI has almost no affordances right now.
This means adoption is happening mainly at the intersection of the technical and the curious. Your average accounting department isn’t reinventing how they work, because the tools aren’t ready. The problem: waiting for affordances to arrive is now the most dangerous strategy available to you.
Anthropic noticed this — engineers were using Claude Code for spreadsheets, timekeeping, document creation — and shipped Cowork in two weeks. But Cowork itself doesn’t really know how you’re going to use it either. The ‘try this first’ example is to organize screenshots on your desktop. It’s going to take a long time for general-purpose “AI for everyone” products to land.
In the meantime, while you wait, there’s a whole new army of tiny AI-native companies scanning every market looking for opportunities to disrupt. They’re building their own tools, their own workflows, at a fraction of traditional enterprise margins.
The affordances will come, no question. People will build new tools, integrate with your existing systems and workflows, and some platforms will respond with the appropriate degree of urgency (see: Linear). But can you afford to wait for that?
Also watching: Foundation model companies partnering with consultancies for change management, because their own products have no affordances, but the potential is here already. If the foundation models stopped improving tomorrow, we’d still have a decade of product innovation on current capabilities.
Figure’s humanoid robot got beaten by a high-school intern at package sorting. The intern wasn’t really trying.

The teenager won this round. Just. But also, it’s the worst version that will ever ship. This is a public benchmark for physical embodiment. A non-juiced livestream, and a measurable result without anyone falling over.
The reality: Every “AI can’t do X” benchmark has a half-life of about nine months (see Will Smith Eating Spaghetti). The clock has started for Embodiment.
Not technical depth. Not domain expertise. Curiosity. Shopify called their hiring persona “entrepreneur” — same principle. We had a hyper-specialization era (the frontend JavaScript eco system as the reduction ad absurdum). It’s over. Every hire going forward should be someone who tries things before being asked to.
If the foundation models stopped improving tomorrow, we’d still have a decade of product innovation on current capabilities. The AI-native companies are starting from capabilities and rethinking approaches and workflows. Incumbents are trying to add AI sugar to their existing approaches. We’re looking at you, ✨ emoji buttons.
Copilot stage is table stakes. You need iron man minimum. Giving everyone a chatbot login and calling it a strategy is like giving everyone a library card and calling it a training program. The gap between “access” and “adoption” is the entire problem.
AI FinOps is about to be a function. If nobody owns the meter, your CFO is in for nasty surprise in Q3.
It’s a change management problem. The tech works. The org doesn’t. Every failed AI rollout we’ve seen had working technology and broken incentives, unclear ownership, or leadership that announced the initiative and moved on.
The tools are ready enough. Your caution is your biggest risk now. Every quarter you wait, the risk katamari grows. The AI-native competitors aren’t waiting for affordances — they’re building without them.
The generalist era is here. Shopify’s “entrepreneur” filter is the model. Hire people who try things before being asked. The hyper-specialization era rewarded depth. The AI era rewards people who can context-switch, experiment, and ship across domains.
Recording how your team makes judgment calls. This is the input your future agents need. Not the tasks. The reasoning. Why did you choose vendor A over B? What made you escalate that ticket? Start capturing this now and you’ll have a headstart when true agentic workers become (safely) available.
Your release cadence is a good proxy for your org's change tolerance. It's the canary for your AI risk.
Newsletter
AI strategy, hot takes, and what's
actually working. Delivered to your inbox.
Got a signal we missed? A take you disagree with? Something we should be tracking? We're building Pulse as a conversation, not a broadcast.
hello@intentional.team →