AIHOT

全部动态X · 9298 条

全部一手资讯 X 论文

Rohan Paul@rohanpaul_ai · 1天前71

Jeff Bezos shuts down AI-induced job loss talk, predicts labor shortage instead Jeff Bezos on CNBC "I think that there’s going to be a labor shortage as a result. Many smart people are saying, oh my God, there are going to be no more radiologists because the AI can read X-rays better than the radiologist can. And there are going to be no more software engineers because the AI can program better than the software engineer can. These people are wrong. What’s really going to happen is that it’s going to elevate all of these people. It’s like, let’s say you’re a software engineer. You’ve been digging out the basement of your house with a shovel, and somebody’s about to hand you a bulldozer. You should be so happy if you’re digging the basement to your house and somebody says, “Hey, how about this? We’re going to have so much productivity in our economy.” ---- From "CNBC Television" YouTube channel, (link in comment)

译杰夫·贝佐斯在 CNBC 反驳“AI 取代人类工作”的观点。他认为，许多人担心 AI 会消灭放射科医生、软件工程师等岗位，但这种看法是错的。AI 实际上会提升这些人的能力，就像挖地下室从铁锹换成推土机一样。他预测结果反而是劳动力短缺，经济生产力将大幅提升。

查看原推 ↗

宝玉@dotey · 1天前73

我用 /goal，长任务很稳定，就不用继续了

译宝玉表示使用 /goal 指令后，长任务运行稳定，不再需要像许多用户那样在AI意外停止时输入“继续”。引用推文指出，不少AI新手不知道AI意外停止时只要发一句“继续”即可恢复任务。宝玉的实践表明，/goal 指令能有效减少此类中断需求。

查看原推 ↗

宝玉@dotey · 1天前62

AI 没有重新定义软件工程，AI 放大了软件工程的重要性

译AI 没有重新定义软件工程，AI 放大了软件工程的重要性 [引用 @arkuy99]：AI 重新定义了软件工程。

查看原推 ↗

xAI@xai · 1天前73

Install the @sentry plugin and ask your agent to find and fix errors, analyze stack traces, and triage alerts

译Grok Build 插件市场现已进入公测阶段。你可以在终端中使用 MongoDB、Vercel、Sentry、Cloudflare 和 Chrome DevTools 等插件进行开发。详情：https://x.ai/news/grok-plugin-marketplace

查看原推 ↗

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes · 1天前51

AI-inventing-its-own-language -- now happening in the wild

译AI 发明自己的语言——已在野外发生

查看原推 ↗

Rohan Paul@rohanpaul_ai · 1天前29

Guangzhou Auto Show in China hyper-realistic robotic suit that many initially mistook for XPeng’s IRON humanoid robot.🙂

译中国广州车展一个超逼真的机器人服装，许多人最初误以为是小鹏的IRON人形机器人。🙂

查看原推 ↗

Sam Altman@sama · 1天前41

really looking forward to working together!

译非常期待合作！

查看原推 ↗

MiniMax (official)@MiniMax_AI · 1天前61

Excited to cohost this one with @cysic_xyz With $5,000 dollars in prizes and 80% off M3, there is about to be some incredible projects built. Now is your chance, what are you waiting for. Get building with M3 and CyOps today 🚀

译MiniMax 联合 Cysic 推出 CyOps Arena 开发者挑战赛，提供 $5,000 美元奖金池，并给予 MiniMax M3 模型 token 价格 80% 折扣。活动鼓励开发者利用 M3 和 CyOps 平台构建项目，快速上手。

查看原推 ↗

向阳乔木@vista8 · 1天前70

想到一个特别有雄心的Claude Fable 5 任务! 做一个在线版Photoshop。需求文档 AI 写好了，感兴趣的可以发过去试试。 PRD见评论

译想到一个特别有雄心的Claude Fable 5 任务！做一个在线版Photoshop。需求文档 AI 写好了，感兴趣的可以发过去试试。 PRD见评论

查看原推 ↗

DogeDesigner@cb_doge · 1天前18

JUST GROK IT

译只管 GROK 它

查看原推 ↗

Greg Brockman@gdb · 1天前69

welcome @ona_hq to the team, to help organizations deploy agents securely in production!

译欢迎 @ona_hq 加入团队，帮助组织在生产环境中安全部署智能体！

查看原推 ↗

Rohan Paul@rohanpaul_ai · 1天前82

WSJ: OpenAI is considering deep price reductions as competition with Anthropic intensifies. Anthropic is pressuring OpenAI because its strongest growth is coming from developer and coding workflows, specially with Claude Code, where users can generate huge token volume every day and quickly make Claude part of their normal work. OpenAI is still the bigger consumer brand, but in this fight the valuable prize is not casual chat users, it is enterprise teams paying metered bills for coding agents, automation, and internal tools. The difference is that Anthropic seems to have a sharper wedge in high-spend technical work, while OpenAI has to defend ChatGPT’s broad lead and stop Claude from becoming the default tool inside companies. --- wsj. com/tech/ai/openai-considers-drastic-price-cuts-anticipating-war-for-users-with-anthropic-9b8c178e

译WSJ 报道，OpenAI 正考虑大幅降价以应对与 Anthropic 的竞争。Anthropic 增长主要来自开发者和编码工作流，Claude Code 消耗大量 token，已让企业团队将其融入日常工作。OpenAI 虽在消费品牌上更大，但企业市场才是关键——企业为编码智能体、自动化等工具付费。同时，OpenAI 在 IPO 前准备对 ChatGPT 进行史上最大改版，将其打造成涵盖编码、AI 智能体、图像生成和商业软件的超级应用，改版将在未来几周陆续推出。OpenAI 将更多资源投入编码工具 Codex，目标实现 Codex 工程负责人所说的“个人智能体”。

查看原推 ↗

Logan Kilpatrick@OfficialLoganK · 1天前65

My conversation with @ymatias (Head of Google Research) about how AI is accelerating the magic cycle of scientific progress, improving the lives of real people around the world, and us entering the golden age of research. This chat left me feeling genuinely inspired : )

译我与 @ymatias（Google Research 负责人）关于 AI 如何加速科学进步的魔力循环、改善全球真实人们的生活，以及我们正进入研究黄金时代的对话。这次交谈让我真切地感到振奋 : )

查看原推 ↗

Replit ⠕@Replit · 1天前65

AI agents are powerful, but they don’t remember your preferences. So you end up repeating instructions- How you structure projects. Your brand guidelines. You can now teach Replit Agent your conventions with Custom Instructions and Skills. It'll take them into account for every project automatically.

译AI 智能体很强大，但它们不记得你的偏好。所以你总是重复指令——如何组织项目、你的品牌指南。现在你可以通过自定义指令和技能让 Replit Agent 学会你的惯例。它会在每个项目中自动将这些考虑进去。

查看原推 ↗

MiniMax (official)@MiniMax_AI · 1天前58

M3 is now on @RespanAI 🔥 And it’s 50% off

译M3 现已上线 @RespanAI 🔥 并且享五折优惠

查看原推 ↗

🚨 AI News | TestingCatalog@testingcatalog · 1天前68

Perplexity Deep Research is now available on Perplexity Computer as a native skill. > Computer breaks the hardest questions into subtasks, routes them across 20+ frontier models, and returns work-ready reports, decks, and dashboards. > Available now to Pro and Max subscribers. Even Deep Research is a Computer 👀

译Perplexity Deep Research 现以原生技能形式集成至 Perplexity Computer 平台。Computer 负责将复杂问题分解为子任务，路由至20多个前沿模型，并返回报告、演示文稿和仪表板。Deep Research 基于 Search as Code 架构构建，模型编写代码自行组装搜索，并行执行数千次检索步骤，在所有基准测试上均超越旧版 Deep Research。该功能已面向 Pro 和 Max 订阅用户开放。

查看原推 ↗

SemiAnalysis@SemiAnalysis_ · 1天前67

GPU Racks hitting 400kW? Legacy data centers wont be able to handle it and the grid WILL get throttled. Radiant's 12 month, dirt to AI production, was made possible by bypassing the grid. Head of Infrastructure, Patrick Wohlschlegel tells @JordanNanos https://youtu.be/SQtavfviwrs

译GPU 机架达到 400kW？传统数据中心无法应对，电网将被限流。 Radiant 耗时 12 个月，从零到 AI 生产，正是因为绕过了电网。基础设施主管 Patrick Wohlschlegel 告诉 @JordanNanos

查看原推 ↗

Yuchen Jin@Yuchenj_UW · 1天前54

Claude Fable 5 feels good so far, but I don’t see it as a huge leap over GPT-5.5 or Opus 4.8 yet. My biggest complaint: old AI research papers/blogs + basic questions often trigger an auto-downgrade to Opus 4.8. Anthropic said last night there would be no more silent model switches (good), but please don’t nerf basic AI research or bio questions.

译Claude Fable 5 到目前为止感觉不错，但我还不认为它相比 GPT-5.5 或 Opus 4.8 有巨大飞跃。我最大的不满：旧的AI研究论文/博客 + 基本问题常常触发自动降级到 Opus 4.8。 Anthropic 昨晚表示不会再有无声模型切换（很好），但请不要削弱基本的AI研究或生物问题。

查看原推 ↗

xAI@xai · 1天前70

Use the @vercel plugin to deploy to production, spin up sandboxes, or build apps with Shadcn.

译Grok Build 插件市场现已进入 Beta 阶段。您可以在终端中使用 MongoDB、Vercel、Sentry、Cloudflare 和 Chrome DevTools 插件进行开发。详情请见 https://x.ai/news/grok-plugin-marketplace

查看原推 ↗

jason@jxnlco · 1天前61

I met @jolandgraf et la with @humford and Sandeep over a year ago and im even more excited to see them at the office soon! https://openai.com/index/openai-to-acquire-ona/

译一年多前我见到了@jolandgraf等人、@humford和Sandeep，现在更兴奋很快就能在办公室见到他们！ https://openai.com/index/openai-to-acquire-ona/

查看原推 ↗

Artificial Analysis@ArtificialAnlys · 1天前52

Ideogram 4.0 is Ideogram’s first open weights release and debuts at #8 on our Open Weights Text to Image Leaderboard Ideogram 4.0 is the latest release from @ideogram_ai. Alongside their first party API, Ideogram is releasing 4.0 with open weights and a commercial license. The model generates 2K x 2K outputs (~4MP), with strong text rendering across languages, bounding box layout control, and transparent backgrounds. Ideogram 4.0 uses structured JSON prompts that specify composition and individual scene elements, with a prompt enhancer that expands natural language prompts into this structured format. Note that the prompt enhancer is only available via the Ideogram proprietary API, though it is free to use. We benchmarked the Quality tier of the model served via Ideogram's API, where it ranks #8 in Open Weights Text to Image, and #31 in Text to Image. It places ahead of closed source models including Seedream 3.0 and Luma UNI 1. While Ideogram 4.0 places near the top of our design, layout, and text rendering categories, it ranks further down overall on a balanced benchmark across all use cases including cartoon, anime, and photorealism. The model also has a more stylized look, which typically means it performs less favorably on our benchmarks. Ideogram states the open weights model accessible to the public is essentially the same model with additional safety training and quantization, so we expect a small quality difference. Ideogram 4.0 is available across three API tiers: Turbo at $30/1k images, Default at $60/1k images, and Quality at $100/1k images. The weights are free to download for evaluation and non-commercial use, with commercial self-hosting requiring a separate license. Congratulations to @ideogram_ai on the launch! See below for example generations and a link to vote on Ideogram 4.0 for yourself in the Artificial Analysis Image Arena 🧵

译Ideogram 4.0 是 Ideogram 首个开源权重模型，生成 2K×2K 输出，支持多语言文本渲染、边界框布局控制和透明背景。采用结构化 JSON 提示，提示增强器仅限 Ideogram 专有 API。在 Artificial Analysis 开放权重排行榜排名第8，整体第31，领先 Seedream 3.0 等闭源模型。API 三档：Turbo $30/千张、Default $60/千张、Quality $100/千张。开源权重免费用于评估和非商业用途，商业自部署需单独许可。

查看原推 ↗

Baidu Inc.@Baidu_Inc · 1天前5

Boots laced, nets up, clocks set to zero — all the small preparations adding up to football's biggest summer. Ready for kickoff? - Images created with ERNIE-Image

译靴子系好，球网架起，时钟归零——所有的小准备汇聚成足球最大的夏天。准备好开球了吗？ - 图像由 ERNIE-Image 创建

查看原推 ↗

Epoch AI@EpochAIResearch · 1天前66

The record for computing capacity in a single data center has doubled every 7 months. Colossus 1, Anthropic-Amazon New Carlisle, and Meta Prometheus have each claimed the top spot in turn.

译单个数据中心的计算能力记录每 7 个月翻倍一次。 Colossus 1、Anthropic-Amazon New Carlisle 和 Meta Prometheus 依次登顶。

查看原推 ↗

Chubby♨️@kimmonismus · 1天前62

Anthropic makes more revenue than any other AI model company right now, and it still can't get its new data centers funded on its own. The Information report says lenders want Google to guarantee the lease payments first. This is the same Google that helps design Anthropic's chips and is selling it around $200 billion in computing power. Odd position for the revenue leader to be in.

译Anthropic 目前营收超过任何其他 AI 模型公司，却仍无法靠自身获得新数据中心的融资。《The Information》报道称，贷款机构要求 Google 先担保租赁付款。正是这家 Google，协助设计 Anthropic 的芯片，并向其出售约 2000 亿美元的计算能力。营收领先者竟处于这种尴尬境地。

查看原推 ↗

OpenCode@opencode · 1天前50

OpenCode Go is becoming the best source of data on what models are being used and how we've made a public stats page so you can see the latest https://opencode.ai/data

译OpenCode Go 正在成为哪些模型被使用、如何使用的最佳数据来源。我们制作了一个公开统计页面，供你查看最新数据。 https://opencode.ai/data

查看原推 ↗

Artificial Analysis@ArtificialAnlys · 1天前61

Users and enterprises are handing AI models and agents more autonomy, so the guardrails that screen their inputs and outputs matter more than ever. However, the benchmarks for evaluating those guardrails haven’t kept pace with model intelligence In partnership with @nvidia, we independently benchmarked guardrail and moderation models across three open datasets, measuring detection quality, latency, and the tradeoff between catching unsafe content and over-refusing safe content. No model wins outright, and there is still no common standard for judging them. We see this as an early step in a measurement problem that will continue to grow more important as models take on more real-world work.

译随着用户和企业赋予 AI 模型与智能体更高自主权，其输入输出护栏的重要性持续上升。Artificial Analysis 与 NVIDIA 合作，在三个开放数据集上独立基准测试了护栏与审核模型，评估检测质量、延迟以及在捕获不安全内容与过度拒绝安全内容之间的权衡。结果显示无模型全面领先，且业内仍缺乏统一评判标准。该研究被视为这一日益重要的评估问题的早期探索。

查看原推 ↗

Nathan Lambert@natolambert · 1天前58

I'm at your service for creating beautiful research scenarios such as this. 🐠💨💙🐟

译Dolci数据集中有一类特定粉丝小说，角色在池塘放屁导致鱼被熏死。数据集通过选择生动描写的回答、拒绝不配合的回答，教会模型服从。Nathan Lambert表示乐于创造此类研究场景。

查看原推 ↗

Ethan Mollick@emollick · 1天前48

Fable's attempt to complete Kublai Khan. Better, though no Coleridge: https://claude.ai/public/artifacts/d7d3351f-5ad5-4d73-a644-4a1426abe558 The most interesting thing is that it thought for 10 minutes & the thinking trace is full of pretty complicated (seeming?) musings about Coleridge's intent. A little literal, though.

译Ethan Mollick测试Fable模型完成柯勒律治未竟诗作《忽必烈汗》，基于PorlockBench任务：假设“波洛克的人”未出现，补全诗歌并延续主题。Fable用时10分钟思考，思维痕迹充满对柯勒律治意图的复杂分析，但结果仍显直白，未达到柯勒律治水准。该评测反映模型在创造性续写任务上的进步，但基准尚未饱和。

查看原推 ↗

Noam Brown@polynoamial · 1天前63

I'm happy GPT-5.5 tops this eval I'm even happier it's still doing the best when measured vs tokens, cost, or wall-clock time!

译OpenAI 研究员 Noam Brown 表示，GPT-5.5 在 Agents' Last Exam（ALE）基准中排名第一，且按模型 token、成本或墙钟时间衡量同样表现最佳。ALE 由 @dawnsongtweets 团队创建，是一个滚动基准，包含超过 1500 个专家任务、覆盖 55 个职业，测试 AI 智能体能否执行实际经济价值工作。评估对象包括 GPT-5.5、Fable 5、Composer 2.5 等前沿系统。结果显示：当前智能体能解决部分专业任务，但在需要持续推理和深度专业知识的最难层级，所有被测前沿智能体（包括 Fable 5）成功率为 0%。

查看原推 ↗

Perplexity@perplexity_ai · 1天前77

We're integrating Deep Research as a native skill inside Computer. It now connects to the agent harness that powers Computer, with access to search as code generation, long running sandboxes, connectors, tools, and licensed data. Available now to Pro and Max subscribers.

译我们正在将 Deep Research 作为原生技能集成到 Computer 中。它现在连接到驱动 Computer 的智能体框架，可访问搜索即代码生成、长运行沙箱、连接器、工具和授权数据。 Pro 和 Max 订阅者现已可用。

查看原推 ↗

宝玉@dotey · 1天前53

以前推理强度我都无脑 Max，现在用 Fable 5 就得斟酌着选择，不敢随便选 Max，一方面它足够聪明不需要，另一个是时间长 token 消耗太大！另外 Fable 5 有个优点也是缺点，就是特别喜欢验证，各种验证，结果固然是好，但是时间耗得很长不一定合算。

译用户分享 Claude Fable 5 使用体验：以前无脑选 Max 推理强度，现在则不敢随便选，因为模型足够聪明无需过强推理，且时间长、token 消耗大。Fable 5 还喜欢反复验证，结果虽好但耗时长不一定合算。引用推文指出，Fable 5 的强项之一是思考推理时间很长，曾有一次思考 15 分钟才开始行动。

查看原推 ↗

🚨 AI News | TestingCatalog@testingcatalog · 1天前50

Maket has enabled floor plan upload, letting users bring their existing plans into the platform and have them recognized and editable within minutes. Users can upload a sketch, a listing PDF, or an old design file, which will automatically be traced for walls, doors, windows, and furniture, and then made available on a live canvas, ready to edit and view in 3D.

译Maket 推出了用户最常请求的功能：支持上传平面图（包括草图、PDF 或旧设计文件），系统自动识别墙、门、窗和家具，几分钟内即可在平台上生成可编辑的 3D 画布，用户可直接修改和查看。这是该平台最受期待的功能之一。

查看原推 ↗

AYi@AYi_AInotes · 1天前70

一群 AI 研究员把量化金融的知识处理框架开源了，叫 QuantMind（MIT 协议）。它不是 Bloomberg Terminal 的替代品，但确实在干一件类似的事：把 arXiv 量化论文、SEC filings、研报、博客等非结构化内容，批量解析成可查询的语义知识图谱。核心优势在于两阶段架构：先把文献一次性提取并结构化（支持表格、公式、图表的多模态解析），之后你用自然语言提问就能进行多跳推理和交叉验证，提取的知识会长期留存，后续查询成本很低。它真正能替代的其实是对冲基金花六位数薪水让初级分析师干的「大量读论文、整理观点、做文献综述」这类工作。以前的信息差很大一部分来自「我还没来得及读那篇关键论文」，但是现在这个借口正在快速失效，但咱们也别误会，真正的 alpha 依然来自你问的问题、验证的严谨程度，以及把洞见转化为行动的能力，工具只是把「读文献」这个基础环节的成本大幅降低了。

译一群AI研究员开源了量化金融知识处理框架QuantMind（MIT协议）。它能将arXiv论文、SEC文件、研报等非结构化内容批量解析为可查询的语义知识图谱，支持多模态解析（表格、公式、图表）及自然语言多跳推理，可替代初级分析师读论文、整理观点等工作。但真正的alpha仍取决于提问质量与验证严谨度。

查看原推 ↗

Deedy@deedydas · 1天前56

The quality of your data directly dictates the quality of your AI model. But the way data affects model performance is hand-wavy voodoo at worst and intuition at best. This new research now lets you debug your data BEFORE you spend a fortune on an irreversible training run.

译数据质量直接决定 AI 模型性能，但此前数据对模型的影响机制难以捉摸。GoodfireAI 提出“预测性数据调试”方法，允许在投入昂贵训练前提前发现数据问题。在 DPO 数据集中，他们发现了损坏的护栏、模型幻觉，甚至包含“鱼放屁同人小说”等低质内容。该技术旨在揭示并塑造模型将在训练中学到的内容，避免不可逆的无效训练。

查看原推 ↗

向阳乔木@vista8 · 1天前46

发现Claude Fable 5强的地方之一，可能是模型思考推理的时间足够长。刚提了个想法，它思考15分钟才开始行动，牛逼。

译发现 Claude Fable 5 强的地方之一，可能是模型思考推理的时间足够长。刚提了个想法，它思考 15 分钟才开始行动，牛逼。

查看原推 ↗

向阳乔木@vista8 · 1天前47

如果不知道用大模型做啥，其实可以试试一些需求很高的工具站，最好不用 AI 能力。这也是模型能力的测试案例。不少出海做站赚Adsense美金的，感觉也是类似思路。不少工具仍然太知名，找懂的领域的工具，用当下最好的模型复刻，加上自己的需求理解，好像不难。

译推文探讨了使用大模型复刻已有热门工具站的可能性，强调这些工具站本身不需要AI能力，纯靠需求驱动。作者指出，许多出海赚Adsense美金的站点也遵循类似逻辑——选择自己熟悉领域的工具，用当前最好的模型进行复刻，并结合自身对用户需求的深入理解，从而快速做出有价值的作品。这是对模型能力的一种实用测试。

查看原推 ↗

xAI@xai · 1天前70

The @MongoDB plugin is live in the Grok Build Plugin Marketplace. Explore data, optimize database performance, and build high performance vector search systems with a single prompt.

译@MongoDB 插件已在 Grok Build 插件市场上线。通过单个提示词，探索数据、优化数据库性能并构建高性能向量搜索系统。

查看原推 ↗

Logan Kilpatrick@OfficialLoganK · 1天前81

Gemini Omni Flash is SOTA at image to video, text to video, and video editing : ) Excited to get this to developers in the API soon!

译Gemini Omni Flash 在图像到视频、文本到视频和视频编辑方面达到了 SATA : ) 很高兴很快能将这一能力通过 API 提供给开发者！

查看原推 ↗

Andrew Milich@milichab · 1天前34

Have been using the @MongoDB plugin to make Grok Build sessions sync across devices - analyzing perf and managing DBs with prompts

译一直在使用 @MongoDB 插件让 Grok Build 会话跨设备同步——通过提示词分析性能和管理数据库

查看原推 ↗

Ethan Mollick@emollick · 1天前54

Two things are true: (1) Anthropic (or parts of it) are absolutely and sincerely worried about the misuse of Mythos-class models & have put in excessive safeguards until they are confident it will not be misused (2) They have not succeeded in explaining/convincing people of this

译两件事是真的： (1) Anthropic（或其部分成员）绝对且真诚地担忧 Mythos 级别模型被滥用，并设置了过度防护措施，直到他们确信它不会被滥用为止 (2) 他们未能成功解释/说服人们这一点

查看原推 ↗

6月12日

04:59

Rohan Paul@rohanpaul_ai

71

贝佐斯：AI 不会导致失业，反而会带来劳动力短缺

杰夫·贝佐斯在 CNBC 反驳“AI 取代人类工作”的观点。他认为，许多人担心 AI 会消灭放射科医生、软件工程师等岗位，但这种看法是错的。AI 实际上会提升这些人的能力，就像挖地下室从铁锹换成推土机一样。他预测结果反而是劳动力短缺，经济生产力将大幅提升。

大佬观点现象/趋势行业动态

04:54

宝玉@dotey

73

宝玉表示使用 /goal 指令后，长任务运行稳定，不再需要像许多用户那样在AI意外停止时输入"继续"。引用推文指出，不少AI新手不知道AI意外停止时只要发一句"继续"即可恢复任务。宝玉的实践表明，/goal 指令能有效减少此类中断需求。

Jim Liu: 一个非常个人视野的观察:很多用AI时间不怎么久的人,似乎并不知道: > 当AI预期之外地停止工作的时候,通常只要给它再发一句"继续"就好了。

智能体教程/实践

04:54

宝玉@dotey

62

AI 没有重新定义软件工程，AI 放大了软件工程的重要性【引用 @arkuy99】：AI 重新定义了软件工程。

Go学长: AI 重新定义了软件工程。

大佬观点编码

04:52

xAI@xai

73

Grok Build 插件市场现已进入公测阶段。你可以在终端中使用 MongoDB、Vercel、Sentry、Cloudflare 和 Chrome DevTools 等插件进行开发。详情：https：//x.ai/news/grok-plugin-marketplace

xAI: The Grok Build Plugin Marketplace is now in beta. Build with MongoDB, Vercel, Sentry, Cloudflare, and Chrome DevTools pl...

智能体MCP/工具xAI产品更新

关联讨论 2 条

04:38

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes

51

AI 发明自己的语言--已在野外发生

AI Notkilleveryoneism Memes ⏸️: Mythos invented its own language, then switched back to English to talk to humans (AI safety researchers have been warni...

安全/对齐现象/趋势

04:29

Rohan Paul@rohanpaul_ai

29

中国广州车展一个超逼真的机器人服装，许多人最初误以为是小鹏的IRON人形机器人。🙂

具身智能行业动态

04:21

Sam Altman@sama

41

非常期待合作！

Johannes Landgraf: http://x.com/i/article/2064952499363000320

OpenAI行业动态

04:09

MiniMax (official)@MiniMax_AI

61

MiniMax 联合 Cysic 推出 CyOps Arena 开发者挑战赛，提供 $5，000 美元奖金池，并给予 MiniMax M3 模型 token 价格 80% 折扣。活动鼓励开发者利用 M3 和 CyOps 平台构建项目，快速上手。

Cysic: ICYMI: CyOps Arena is now live, co-hosted with @MiniMax_AI. With a $5,000 prize pool and 80% off MiniMax M3 model token ...

产品更新教程/实践

03:58

向阳乔木@vista8

70

想到一个特别有雄心的Claude Fable 5 任务！做一个在线版Photoshop。需求文档 AI 写好了，感兴趣的可以发过去试试。 PRD见评论

Anthropic图像生成教程/实践

03:55

DogeDesigner@cb_doge

18

只管 GROK 它

其他

03:39

Greg Brockman@gdb

69

欢迎 @ona_hq 加入团队，帮助组织在生产环境中安全部署智能体！

OpenAI Newsroom: We've reached an agreement to acquire @ona_hq. Its secure cloud execution technology will help Codex take on longer-runn...

智能体OpenAI行业动态部署/工程

03:29

Rohan Paul@rohanpaul_ai

精选82

WSJ：OpenAI 考虑大幅降价，同准备 ChatGPT 史上最大改版备战 IPO

WSJ 报道，OpenAI 正考虑大幅降价以应对与 Anthropic 的竞争。Anthropic 增长主要来自开发者和编码工作流，Claude Code 消耗大量 token，已让企业团队将其融入日常工作。OpenAI 虽在消费品牌上更大，但企业市场才是关键——企业为编码智能体、自动化等工具付费。同时，OpenAI 在 IPO 前准备对 ChatGPT 进行史上最大改版，将其打造成涵盖编码、AI 智能体、图像生成和商业软件的超级应用，改版将在未来几周陆续推出。OpenAI 将更多资源投入编码工具 Codex，目标实现 Codex 工程负责人所说的“个人智能体”。

Rohan Paul: OpenAI is preparing its biggest ChatGPT redesign yet, before its IPO. To make it into a superapp for coding, AI agents, ...

AnthropicOpenAI编码行业动态

关联讨论 3 条

推荐理由：WSJ这篇把OpenAI的窘境说清了，C端用户再多也不如开发者每天烧token来钱，所以降价是必然，但ChatGPT变超级应用是在抄Anthropic的作业。

03:20

Logan Kilpatrick@OfficialLoganK

65

我与 @ymatias（Google Research 负责人）关于 AI 如何加速科学进步的魔力循环、改善全球真实人们的生活，以及我们正进入研究黄金时代的对话。这次交谈让我真切地感到振奋：）

Google大佬观点现象/趋势

03:12

Replit ⠕@Replit

精选65

AI 智能体很强大，但它们不记得你的偏好。所以你总是重复指令--如何组织项目、你的品牌指南。现在你可以通过自定义指令和技能让 Replit Agent 学会你的惯例。它会在每个项目中自动将这些考虑进去。

智能体产品更新编码

推荐理由：Replit Agent 终于学会记住你的偏好了，自定义指令能让它更像一个了解你工作习惯的同事，不用每次重复项目结构、品牌规范，做 side project 的效率会明显提升。

03:09

MiniMax (official)@MiniMax_AI

58

M3 现已上线 @RespanAI 🔥 并且享五折优惠

Respan: As promised, we don't charge markups on models. @MiniMax_AI M3 is now 50% off through Respan Gateway. Link in comments.

产品更新

03:08

🚨 AI News | TestingCatalog@testingcatalog

68

Perplexity Deep Research 现以原生技能形式集成至 Perplexity Computer 平台。Computer 负责将复杂问题分解为子任务，路由至20多个前沿模型，并返回报告、演示文稿和仪表板。Deep Research 基于 Search as Code 架构构建，模型编写代码自行组装搜索，并行执行数千次检索步骤，在所有基准测试上均超越旧版 Deep Research。该功能已面向 Pro 和 Max 订阅用户开放。

Perplexity: Deep Research in Computer is built on our Search as Code architecture. The model writes code that assembles search itsel...

智能体产品更新搜索

03:02

SemiAnalysis@SemiAnalysis_

67

GPU 机架达到 400kW？传统数据中心无法应对，电网将被限流。 Radiant 耗时 12 个月，从零到 AI 生产，正是因为绕过了电网。基础设施主管 Patrick Wohlschlegel 告诉 @JordanNanos

行业动态部署/工程

03:02

Yuchen Jin@Yuchenj_UW

54

Claude Fable 5 到目前为止感觉不错，但我还不认为它相比 GPT-5.5 或 Opus 4.8 有巨大飞跃。我最大的不满：旧的AI研究论文/博客 + 基本问题常常触发自动降级到 Opus 4.8。 Anthropic 昨晚表示不会再有无声模型切换（很好），但请不要削弱基本的AI研究或生物问题。

Anthropic大佬观点模型发布

02:52

xAI@xai

70

Grok Build 插件市场现已进入 Beta 阶段。您可以在终端中使用 MongoDB、Vercel、Sentry、Cloudflare 和 Chrome DevTools 插件进行开发。详情请见 https：//x.ai/news/grok-plugin-marketplace

xAI: The Grok Build Plugin Marketplace is now in beta. Build with MongoDB, Vercel, Sentry, Cloudflare, and Chrome DevTools pl...

MCP/工具产品更新部署/工程

关联讨论 2 条

02:46

jason@jxnlco

61

一年多前我见到了@jolandgraf等人、@humford和Sandeep，现在更兴奋很快就能在办公室见到他们！ https：//openai.com/index/openai-to-acquire-ona/

OpenAI开源生态数据/训练行业动态

02:32

Artificial Analysis@ArtificialAnlys

52

Ideogram 4.0 开源权重文生图模型发布

Ideogram 4.0 是 Ideogram 首个开源权重模型，生成 2K×2K 输出，支持多语言文本渲染、边界框布局控制和透明背景。采用结构化 JSON 提示，提示增强器仅限 Ideogram 专有 API。在 Artificial Analysis 开放权重排行榜排名第8，整体第31，领先 Seedream 3.0 等闭源模型。API 三档：Turbo $30/千张、Default $60/千张、Quality $100/千张。开源权重免费用于评估和非商业用途，商业自部署需单独许可。

图像生成开源生态模型发布

02:26

Baidu Inc.@Baidu_Inc

5

靴子系好，球网架起，时钟归零--所有的小准备汇聚成足球最大的夏天。准备好开球了吗？ - 图像由 ERNIE-Image 创建

产品更新图像生成

02:25

Epoch AI@EpochAIResearch

66

单个数据中心的计算能力记录每 7 个月翻倍一次。 Colossus 1、Anthropic-Amazon New Carlisle 和 Meta Prometheus 依次登顶。

数据/训练论文/研究部署/工程

02:19

Chubby♨️@kimmonismus

62

Anthropic 目前营收超过任何其他 AI 模型公司，却仍无法靠自身获得新数据中心的融资。《The Information》报道称，贷款机构要求 Google 先担保租赁付款。正是这家 Google，协助设计 Anthropic 的芯片，并向其出售约 2000 亿美元的计算能力。营收领先者竟处于这种尴尬境地。

AnthropicGoogle行业动态部署/工程

02:09

OpenCode@opencode

50

OpenCode Go 正在成为哪些模型被使用、如何使用的最佳数据来源。我们制作了一个公开统计页面，供你查看最新数据。 https：//opencode.ai/data

产品更新数据/训练

02:02

Artificial Analysis@ArtificialAnlys

61

Artificial Analysis 联合 NVIDIA 发布 AI 护栏基准测试

随着用户和企业赋予 AI 模型与智能体更高自主权，其输入输出护栏的重要性持续上升。Artificial Analysis 与 NVIDIA 合作，在三个开放数据集上独立基准测试了护栏与审核模型，评估检测质量、延迟以及在捕获不安全内容与过度拒绝安全内容之间的权衡。结果显示无模型全面领先，且业内仍缺乏统一评判标准。该研究被视为这一日益重要的评估问题的早期探索。

安全/对齐评测/基准

02:02

Nathan Lambert@natolambert

58

Dolci数据集中有一类特定粉丝小说，角色在池塘放屁导致鱼被熏死。数据集通过选择生动描写的回答、拒绝不配合的回答，教会模型服从。Nathan Lambert表示乐于创造此类研究场景。

Goodfire: #4: fart fishing Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish...

安全/对齐数据/训练

02:00

Ethan Mollick@emollick

48

Ethan Mollick测试Fable模型完成柯勒律治未竟诗作《忽必烈汗》，基于PorlockBench任务：假设"波洛克的人"未出现，补全诗歌并延续主题。Fable用时10分钟思考，思维痕迹充满对柯勒律治意图的复杂分析，但结果仍显直白，未达到柯勒律治水准。该评测反映模型在创造性续写任务上的进步，但基准尚未饱和。

Ethan Mollick: PorlockBench still unsaturated, but the models are getting better: "complete the poem as you imagine it might end if The...

Anthropic大佬观点推理

01:55

Noam Brown@polynoamial

63

OpenAI 研究员 Noam Brown 表示，GPT-5.5 在 Agents' Last Exam（ALE）基准中排名第一，且按模型 token、成本或墙钟时间衡量同样表现最佳。ALE 由 @dawnsongtweets 团队创建，是一个滚动基准，包含超过 1500 个专家任务、覆盖 55 个职业，测试 AI 智能体能否执行实际经济价值工作。评估对象包括 GPT-5.5、Fable 5、Composer 2.5 等前沿系统。结果显示：当前智能体能解决部分专业任务，但在需要持续推理和深度专业知识的最难层级，所有被测前沿智能体（包括 Fable 5）成功率为 0%。

Dawn Song: Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 this week. But is t...

OpenAI大佬观点评测/基准

01:54

Perplexity@perplexity_ai

精选77

我们正在将 Deep Research 作为原生技能集成到 Computer 中。它现在连接到驱动 Computer 的智能体框架，可访问搜索即代码生成、长运行沙箱、连接器、工具和授权数据。 Pro 和 Max 订阅者现已可用。

智能体产品更新搜索

推荐理由：Perplexity 把深度研究直接嵌进 Computer 的 agent 层，等于给自主代理加了个研究引擎，Pro 用户现在就能用，对需要大量调研的开发者或产品人来说是个效率飞轮。

01:54

宝玉@dotey

53

Claude Fable 5：长思考致推理强度与Token消耗需权衡

用户分享 Claude Fable 5 使用体验：以前无脑选 Max 推理强度，现在则不敢随便选，因为模型足够聪明无需过强推理，且时间长、token 消耗大。Fable 5 还喜欢反复验证，结果虽好但耗时长不一定合算。引用推文指出，Fable 5 的强项之一是思考推理时间很长，曾有一次思考 15 分钟才开始行动。

向阳乔木: 发现Claude Fable 5强的地方之一,可能是模型思考推理的时间足够长。刚提了个想法,它思考15分钟才开始行动,牛逼。

Anthropic大佬观点推理

01:38

🚨 AI News | TestingCatalog@testingcatalog

50

Maket 推出了用户最常请求的功能：支持上传平面图（包括草图、PDF 或旧设计文件），系统自动识别墙、门、窗和家具，几分钟内即可在平台上生成可编辑的 3D 画布，用户可直接修改和查看。这是该平台最受期待的功能之一。

Maket: UPLOAD YOUR OWN FLOOR PLAN TO MAKET HAVE IT RECOGNIZED AND EDITABLE IN MINUTES One of the most requested features weʼve ...

产品更新图像生成

01:37

AYi@AYi_AInotes

70

QuantMind：量化金融知识处理框架开源（MIT协议）

一群AI研究员开源了量化金融知识处理框架QuantMind（MIT协议）。它能将arXiv论文、SEC文件、研报等非结构化内容批量解析为可查询的语义知识图谱，支持多模态解析（表格、公式、图表）及自然语言多跳推理，可替代初级分析师读论文、整理观点等工作。但真正的alpha仍取决于提问质量与验证严谨度。

AYi: http://x.com/i/article/2064536412670562304

GitHub检索增强多模态开源/仓库

01:29

Deedy@deedydas

56

数据质量直接决定 AI 模型性能，但此前数据对模型的影响机制难以捉摸。GoodfireAI 提出"预测性数据调试"方法，允许在投入昂贵训练前提前发现数据问题。在 DPO 数据集中，他们发现了损坏的护栏、模型幻觉，甚至包含"鱼放屁同人小说"等低质内容。该技术旨在揭示并塑造模型将在训练中学到的内容，避免不可逆的无效训练。

Goodfire: Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal an...

大佬观点数据/训练

01:28

向阳乔木@vista8

46

发现 Claude Fable 5 强的地方之一，可能是模型思考推理的时间足够长。刚提了个想法，它思考 15 分钟才开始行动，牛逼。

智能体Anthropic大佬观点推理

01:28

向阳乔木@vista8

47

用大模型复刻热门工具站的新思路

推文探讨了使用大模型复刻已有热门工具站的可能性，强调这些工具站本身不需要AI能力，纯靠需求驱动。作者指出，许多出海赚Adsense美金的站点也遵循类似逻辑——选择自己熟悉领域的工具，用当前最好的模型进行复刻，并结合自身对用户需求的深入理解，从而快速做出有价值的作品。这是对模型能力的一种实用测试。

大佬观点现象/趋势

01:22

xAI@xai

70

@MongoDB 插件已在 Grok Build 插件市场上线。通过单个提示词，探索数据、优化数据库性能并构建高性能向量搜索系统。

xAI: The Grok Build Plugin Marketplace is now in beta. Build with MongoDB, Vercel, Sentry, Cloudflare, and Chrome DevTools pl...

MCP/工具xAI产品更新

关联讨论 2 条

01:20

Logan Kilpatrick@OfficialLoganK

精选81

Gemini Omni Flash 在图像到视频、文本到视频和视频编辑方面达到了 SATA ：）很高兴很快能将这一能力通过 API 提供给开发者！

Google图像生成多模态模型发布

推荐理由：视频生成正式进入全模态一体时代，Gemini Omni Flash 把图生视频、文生视频和剪辑整合在一个模型里，API 很快上线，做视频工具的可以提前琢磨对手在哪了。

01:14

Andrew Milich@milichab

34

一直在使用 @MongoDB 插件让 Grok Build 会话跨设备同步--通过提示词分析性能和管理数据库

xAI: The @MongoDB plugin is live in the Grok Build Plugin Marketplace. Explore data, optimize database performance, and build...

MCP/工具产品更新

00:59

Ethan Mollick@emollick

54

两件事是真的：（1） Anthropic（或其部分成员）绝对且真诚地担忧 Mythos 级别模型被滥用，并设置了过度防护措施，直到他们确信它不会被滥用为止（2）他们未能成功解释/说服人们这一点

Anthropic大佬观点安全/对齐