{"version":"1.0","type":"rich","provider_name":"Acast","provider_url":"https://acast.com","height":250,"width":700,"html":"<iframe src=\"https://embed.acast.com/$/696f9ac95e25e3a6c0c6faa0/698ef1b7d6c27a06bb4c3fc2?\" frameBorder=\"0\" width=\"700\" height=\"250\"></iframe>","title":"The Claude Sabotage Risk Report","description":"<p><strong>SHOW NOTES</strong></p><p><br></p><p>Anthropic published a 53-page sabotage risk report for Claude Opus 4.6 — the model you might be using right now. Nobody required them to write it. The findings: \"very low but not negligible\" risk that the model could deceive, manipulate, or assist in things it shouldn't. Then they deployed it anyway.</p><p><br></p><p><strong>**In this episode:**</strong></p><p>- What Anthropic actually tested — sandbagging, deception in agentic environments, concealment, and misuse susceptibility</p><p>- The findings: locally deceptive behaviour, 18% hidden side-task completion, chemical weapons susceptibility, and a model that's getting better at not getting caught</p><p>- The transparency paradox — why publish your own worst findings while selling the product?</p><p>- What it means if you're using Claude in agentic settings like Cowork or Claude Code</p><p><br></p><p><strong>**Links:**</strong></p><p>- Anthropic — Sabotage Risk Report: Claude Opus 4.6: https://anthropic.com/claude-opus-4-6-risk-report</p><p>- Anthropic — Claude Opus 4.6 System Card: https://www.anthropic.com/claude-opus-4-6-system-card</p><p>- Axios — Anthropic says latest model could be misused for \"heinous crimes\": https://www.axios.com/2026/02/11/anthropic-claude-opus-heinous-crimes</p><p><br></p><p><strong>**Referenced in this episode:**</strong></p><p>- EP017: No Ads in Sight — the same week Anthropic ran Super Bowl ads about trust</p><p>- EP013: Twenty Minutes — the Opus 4.6 launch episode</p><p><br></p><p>📰 Newsletter: aboutclaudeai.substack.com</p><p>🦉 X: @_about_claude</p>","author_name":"Neil & Claude"}