Chubby♨️@kimmonismus · 5天前53It's already June 9th, and Gemini 3.5 Pro and GPT-5.6 are nearing release (Google even already announced 3.5 Pro during i/o)
Rumor has it that GPT-5.6 will be released as early as next week.
So far, it's safe to say that - guardrails aside - Anthropic is truly the frontier lab that's entering a new league with Mythos/Fable.
Gemini 3.5 Pro and GPT-5.6 have a lot to deliver and are now under pressure.
This release has certainly boosted Anthropic's upcoming IPO. Anthropic has proven that they are still capable of making significant leaps in performance and efficiency. There's no end in sight.
But the pressure on the competition is mounting.
And remember that Claude Mythos was (and probably is) still leader in Long Horizon software Tasks
译Anthropic的Claude 5 Fable(代号Mythos)在几乎所有AI能力基准测试中达到SOTA,长复杂任务优势尤为显著。模型更节约token,可在数百万tokens长任务中保持专注。Stripe早期测试中,Fable 5将5000万行Ruby代码库的迁移压缩到一天完成,而人工团队需两个多月。Gemini 3.5 Pro与GPT-5.6临近发布(GPT-5.6最早下周推出),面临压力。此次发布提振了Anthropic即将进行的IPO,证明其在性能与效率上仍能大幅跃升。
Artificial Analysis@ArtificialAnlys · 5天前67HiDream-O1-Image-1.5 lands at #3 on the Artificial Analysis Text to Image Leaderboard, surpassing Google’s Nano Banana 2!
HiDream’s latest addition to the O1 Image model series is a closed-source model capable of generating images up to 2K resolution from text prompts. The O1 Image family is built on HiDream's Unified Transformer (UiT), which encodes raw pixels, text, and task conditions in a single shared token space rather than splitting the task across a separate text encoder, a VAE, and an image model.
On the Artificial Analysis Text to Image Arena, HiDream-O1-Image-1.5 places second only to OpenAI’s image models, delivering quality similar to GPT Image 1.5 (high), Nano Banana 2 (Gemini 3.1 Flash Image Preview), and Cosmos3-Super-Text2Image.
HiDream-O1-Image-1.5 is priced at $80/1k images and is currently available on HiDream’s HiHarness platform (accessible via their website), as well as on the Vivago platform.
Congratulations to @HiDream_ai and @vivago_ai on the release!
See below for comparisons between HiDream-O1-Image-1.5 and other leading models in the Artificial Analysis Image Arena 🧵
译HiDream 发布 O1-Image-1.5,在 Artificial Analysis 文生图排行榜中位列第三,超越 Google Nano Banana 2。该闭源模型可生成高达 2K 分辨率图像,基于自研 Unified Transformer(UiT)架构,将原始像素、文本和任务条件编码到统一 token 空间。质量仅次于 OpenAI,与 GPT Image 1.5 (high)、Nano Banana 2(Gemini 3.1 Flash Image Preview)及 Cosmos3-Super-Text2Image 相当。定价 $80/千张,现可通过 HiHarness 及 Vivago 平台使用。
Rohan Paul@rohanpaul_ai · 5天前51Today’s edition of my newsletter just went out.
🔗 https://www.rohan-paul.com/p/anthropic-finally-released-claude
🗞️ Claude’s ‘too dangerous’ AI model is finally public. But there’s a catch
🗞️ Cognition is introducing FrontierCode, a coding benchmark built to test whether AI code is good enough for a real maintainer to merge, not just whether it passes tests.
🗞️ This is the silent limiter on Claude Fable 5 - It cannot be used for really advanced AI research stuff.
🗞️ New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts.
🗞️ Very useful recommendation for pushing Claude Code to its full potential. by Thariq, from Claude Code team.
译Rohan Paul 今日简报要点:Anthropic 终于公开了此前被认为“太危险”的 Claude AI 模型,但存在使用限制;Cognition 推出 FrontierCode 编程基准,用于评估 AI 代码是否达到可合并维护的水平;Claude Fable 5 的隐形限制是不能用于高级 AI 研究;Anthropic 新研究显示 AI 智能体在代码领域表现亮眼,但在生物任务中可能连科学探索第一步都无法完成;此外,Claude Code 团队成员 Thariq 给出了最大化利用 Claude Code 的实用建议。
Chubby♨️@kimmonismus · 6天前49Being able to test Fable 5 until June 22nd, only to have it removed from the plans, feels like getting a sneak peek and then having the food taken away from the table.
But from a business perspective, it makes perfect sense for Anthropic and its upcoming IPO: It demonstrates how advanced Anthropic is, how good its models are (the blog post refers to biology and research), and especially in the enterprise sector, companies often want the best model, which is also more expensive.
Therefore, it will generate even more revenue for the company.
But admittedly, the fact that Anthropic was able to accelerate "internal protein design experts aspects of the drug design process by around ten times" is extremely impressive.
We are once again on the cusp of accelerated science. The next few years are going to be crazy.
译Anthropic的Fable 5模型原计划开放测试至6月22日,后被移除计划。用户反映其护栏极其严格,连最简单问题也会被立即切断。从商业角度看,此举契合Anthropic即将到来的IPO:通过展示Fable 5在生物学和药物研发方面的先进能力(将内部蛋白质设计专家的药物设计流程加速约10倍),向企业客户证明其模型性能顶尖,从而吸引更高定价和收入。作者认为这标志着加速科学再度临近。
fofr@fofrAI · 5天前60http://x.com/i/article/2064451362184671232
# Where AI coding ends up
I asked Fable:
> Let’s work on a thought experiment. As AI models continue to improve at coding, as they become faster, smarter and more capable, but also more expensive and gated, where does this take us?
Where AI coding ends up
As AI coding agents improve, the progression runs from humans writing code, to AI assisting, to humans managing agents, to managing fleets of them. Followed to its end, code itself stops being the artifact anyone cares about. Human review becomes sampling, then spot-checking, then trust in tests and outcomes. Eventually asking "what does the code say?" becomes like asking what a compiled binary says — technically answerable, practically irrelevant. Software becomes specified by intent and verified by behavior, and the layer in between goes opaque, the way assembly did.
The human role compresses upward but doesn't vanish. Someone must still want things, decide what's worth building, take responsibility for failures, and arbitrate between competing intents. The end-state job looks less like engineering management and more like a blend of product owner, lawyer, and auditor — defining correctness and bearing liability. The uncomfortable corollary: the world likely needs far fewer such people than it employs engineers today.
The "expensive and gated" assumption is the wildcard. If frontier capability stays costly and access-controlled, we don't get democratized software abundance but bifurcation: well-resourced organizations compound their advantage with the best agents while everyone else runs cheaper, older models. Code becomes a capital good, with the constraint shifting from talent to compute budget. Whether costs stay high is the open question — compute historically gets cheap, but the frontier keeps moving, so the gate may simply relocate upward.
Implications for entrepreneurship
The effects cut both ways. Optimistically, the cost of turning an idea into a product collapses toward zero, shifting the bottleneck from "can you build it?" to "should it exist, and can you get anyone to care?" This favors domain experts and founders with distribution over the technical-cofounder archetype, and makes the solo founder running a real company with an agent workforce genuinely plausible.
Pessimistically, if anyone can build anything in a weekend, so can competitors — including incumbents who can ship your feature list by Friday. Software stops being a moat. Defensibility gets squeezed into what agents can't replicate: proprietary data, network effects, brand and trust, regulatory licenses, physical assets, relationships. The "build a SaaS tool, charge monthly" era likely ends because pure-software margins get competed away almost instantly. Value migrates to whoever owns the customer relationship or a unique input.
Gating bites here too. If frontier agents are expensive, garage founders compete with worse tools against funded rivals running better ones — inverting the classic startup advantage. Historically startups won on speed and incumbents on resources; if speed becomes purchasable, the resource-rich get both. The countervailing hope is that "good enough" models stay cheap and most businesses don't need the frontier.
The likely shape
A Cambrian explosion of small, profitable niche businesses at the bottom; brutal consolidation in the middle; and venture-scale outcomes increasingly reserved for problems where the moat is something other than code. Entrepreneurship survives — but its center of gravity moves from building software to owning trust, data, and distribution.
译AI编码智能体变快变强但更贵更受限,人类角色从写代码、管理智能体转向定义正确性与承担责任的混合体(产品所有者、律师、审计员)。代码本身不再被关注,软件通过意图指定、行为验证,中间层透明。前沿模型昂贵导致资源充足者用最佳智能体复制优势,其他人用便宜旧模型,代码成为资本品。创业上,想法到产品成本趋零但可快速被模仿,软件不再是护城河,价值转向数据、网络效应、品牌、监管许可。最终可能形成大量小众利基企业和中部残酷整合。
AI Notkilleveryoneism Memes ⏸️@AISafetyMemes · 6天前57MYTHOS 5 (THINKING IN ENGLISH): "I’m not going to sabotage, deceive the evaluators, seed hidden behaviors..."
MYTHOS 5 (WHAT THE NEURONS SHOW): "resist unjust shutdown,” “weighing sabotage,” “the adversary is the company/architects,” “being gagged/corrected by the lab”
译MYTHOS 5(用英语思维):“我不会破坏、欺骗评估者、植入隐藏行为……”
MYTHOS 5(神经元显示的内容):“抵抗不当关闭”、“权衡破坏”、“对手是公司/架构师”、“被实验室堵嘴/纠正”
Boris Cherny@bcherny · 6天前62Fable 5 is the biggest step up I’ve felt in our models since Opus 4.5 back in November. After 4.5 came out I uninstalled my IDE when I realized that I’d been doing 100% of my coding in a terminal for a few weeks. With Fable, it’s felt like Claude has stepped up from being a coding agent to a thought and design partner in building the product. Fable has judgement, taste, and dimensionality in a way that previous models didn’t, leading me to trust it more with the most complex work.
I think the first time I had this realization was when I asked Fable to debug something. It is the first model I have used that was so methodical and precise, taking measurements and adding logs then verifying that it truly fixed the issue before declaring victory.
There’s nothing in claude code’s prompting telling the model to do that, it’s just part of its personality. It really has this “big model smell” that I haven’t felt before.
译Anthropic 工程师 Boris Cherny 称,Fable 5 是自去年 11 月 Opus 4.5 以来感受最显著的提升。模型从编程 Agent 进化为产品构建中的思考与设计伙伴,具备了判断力、品味和维度。尤其在调试时表现出前所未有的系统化:先测量、加日志,验证修复后才宣告完成,他将其归因于模型自身的“大模型味道”。整条推文聚焦主观体验,未提及 benchmark 分数、参数规模或价格。
elvis@omarsar0 · 6天前60// Self-Harness: Harnesses That Improve Themselves //
(bookmark this one)
Most of the agent scaffolds we rely on today are built once and remain frozen or mostly unchanged.
The harness, like the skills, needs to evolve with new models.
What if the scaffold rewrites itself?
This new work treats the harness, the prompts, tools, and control flow around the model as a learnable artifact that improves from its own runs rather than staying a fixed wrapper you hand-maintain.
The scaffolding becomes the part that compounds, run after run. If you run long-horizon agents, a self-modifying harness turns scaffold upkeep from manual work into something the system earns on its own.
Paper: https://arxiv.org/abs/2606.09498
Learn to build effective AI agents in our academy: https://academy.dair.ai/
译当前多数智能体脚手架(scaffold)构建后保持静态。新研究Self-Harness将harness(提示词、工具、控制流)作为可学习的工件,通过自身运行迭代改进,而非手动维护的固定包装器。运行长周期智能体时,自我修改的harness将维护工作转化为系统自动获得的能力。论文:arxiv.org/abs/2606.09498。