AIHOT

全部动态X · 9298 条

全部一手资讯 X 论文

Rohan Paul@rohanpaul_ai · 6月3日63

AI can explain science better than it can forecast science. Across 4,760 scientific events, the models were much better at recognizing possible research paths than forecasting actual outcomes. Models often recognize a plausible research idea when the answer is already nearby, especially in multiple-choice form. But they are much weaker at the harder thing: predicting whether a discovery will actually happen, when it will happen, and what method will make it work. That means the models are still much better at hindsight than foresight. When asked whether a scientific claim will actually be realized, the models hover near chance, and when asked when progress will arrive, they systematically push it too far into the future. Even when the authors gave models extra older information, the models improved a bit but still did not become reliable at predicting future scientific progress. So having lots of scientific knowledge inside a model does not automatically make it a good scientific forecaster. ---- Paper Link – arxiv. org/abs/2605.22681 Paper Title: "Forecasting Scientific Progress with AI"

译一项对4,760个科学事件的研究发现，AI模型在“解释”科学方面优于“预测”科学。模型在识别可能的研究路径（尤其是选择题形式）时表现较好，但在预测科学发现是否会实际发生、何时发生以及何种方法有效等更难任务上表现薄弱，准确率接近随机猜测。即使提供额外历史信息，模型改善有限。这表明，模型内嵌大量科学知识并不等同于具备可靠的科学预见能力。研究论文发表于arXiv（2605.22681），标题为《Forecasting Scientific Progress with AI》。

查看原推 ↗

elvis@omarsar0 · 6月3日38

Code is all you need! Search as Code Harness as Code What's next?

译代码就是你所需的一切！搜索即代码工具链即代码接下来是什么？

查看原推 ↗

Ethan Mollick@emollick · 6月3日40

The everything apps still look a lot like hybrids between chatbots and IDEs, rather than something built for general knowledge work. Too much assuming linearity & that final outputs are the only goal, too little connection to research, not enough chances to steer or select, etc.

译那些"全能应用"看起来仍然很像聊天机器人与 IDE 的混合体，而非为通用知识工作而构建的东西。太多假设线性流程且最终产出是唯一目标，太少与研究的连接，不够多的引导或选择机会，等等。

查看原推 ↗

Microsoft Research@MSFTResearch · 6月3日72

Weather forecasts thousands of times faster than traditional supercomputers. Hear from Kenji Takeda on Aurora at the Microsoft Research Lab at #MSBuild. Learn more: https://msft.it/6018vjGUA

译天气预报速度比传统超级计算机快数千倍。听听Kenji Takeda在#MSBuild微软研究实验室关于Aurora的分享。了解更多：https://msft.it/6018vjGUA

查看原推 ↗

Anthropic@AnthropicAI · 6月3日69

This Executive Order is an important step in strengthening America’s leadership in AI. We look forward to collaborating with the White House to support its implementation. https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/

译这项行政令是加强美国AI领导地位的重要一步。我们期待与白宫合作，支持其实施。 https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/

查看原推 ↗

Google AI Developers@googleaidevs · 6月3日74

Building autonomous agents for scientific discovery? 🧬🤖 @GoogleDeepMind Science Skills is now available on GitHub. We've open-sourced this specialized toolkit to accelerate your agentic workflows with scientific grounding and higher token efficiency. Download now ↓ https://github.com/google-deepmind/science-skills

译构建用于科学发现的自主智能体？🧬🤖 @GoogleDeepMind Science Skills 现已在 GitHub 上发布。我们已开源这个专用工具包，以科学基础和更高的 token 效率加速您的智能体工作流。立即下载 ↓ https://github.com/google-deepmind/science-skills

查看原推 ↗

NotebookLM@NotebookLM · 6月3日58

Notice anything different about the NotebookLM mobile app recently? 😉 Well, we’re excited to REPORT that you can now create briefing docs, study guides, and blog posts on-the-go! 📱✨ Are there any other report formats you'd want specifically for mobile? Let us know!

译注意到 NotebookLM 移动应用最近有什么不同了吗？😉 我们很高兴地宣布，你现在可以在移动端创建简报文档、学习指南和博客文章了！📱✨ 还有其他你希望在移动端特别支持的报告格式吗？请告诉我们！

查看原推 ↗

SemiAnalysis@SemiAnalysis_ · 6月3日53

Cerebras did what the industry calls impossible: turned an entire 46,225mm² wafer into one chip. Defects on silicon that big are inevitable, so they built in redundancy and custom per-batch masks that route around every bad core, landing near 100% usable wafers. The results: 900,000 cores and 44GB of SRAM on a single piece of silicon, no packaging, no off-chip hops. And they're not stopping there, now exploring hybrid bonding a DRAM wafer on top for even more fast memory. (1/4) 🧵

译Cerebras做到了业界认为不可能的事：将整个46,225mm²晶圆制成单芯片。如此大面积的硅片缺陷不可避免，因此他们内置了冗余，并采用定制的逐批次光罩来绕过每个不良核心，最终实现了接近100%的可用晶圆率。结果：单片硅片上集成了90万个核心和44GB SRAM，无需封装，无片外跳转。他们并未止步于此，目前正在探索将DRAM晶圆通过混合键合堆叠在上方，以获得更快的更多内存。(1/4) 🧵

查看原推 ↗

Rohan Paul@rohanpaul_ai · 6月3日81

Microsoft unveiled MAI-Thinking-1. So Microsoft now has a full in-house pipeline for building stronger reasoning models again and again. Microsoft calls this system a “hill-climbing machine,” meaning it keeps improving the data, training setup, rewards, safety tests, and evaluations as one connected process. Strong for its size, including 97.0% on AIME 2025, 87.7% on LiveCodeBench v6, and 52.8% on SWE-Bench Pro. MAI-Thinking-1 is the first model from that process, using 35B active parameters inside a 1T total parameter mixture-of-experts model, where only part of the model runs for each token. The base model was trained from scratch on 30T mostly human-generated tokens, with Microsoft saying it avoided third-party model distillation during pre-training. After that, the team used reinforcement learning, which means the model practiced tasks and improved from feedback, to teach math reasoning, coding, tool use, helpfulness, and safety.

译微软发布了 MAI-Thinking-1，这是一款采用 MoE 架构的模型，拥有 35B 活跃参数和 1T 总参数。该模型从零开始在 30T tokens 上完成预训练，且未使用第三方模型蒸馏。微软称其迭代优化流程为“爬山机器”。在基准测试中，该模型于 AIME 2025 获得 97.0%，在 LiveCodeBench v6 获得 87.7%，在 SWE-Bench Pro 获得 52.8% 的成绩。

查看原推 ↗

Emad@EMostaque · 6月3日17

this is fine 🐶☕️🔥

译这没事 🐶☕️🔥 [引用 @EMostaque]：我对 Claude Opus 4.8 的评价：我们应该少担心被变成回形针，多担心被烦死。

查看原推 ↗

Microsoft Research@MSFTResearch · 6月3日54

Agentic experiences powered by small models that fit on your own device. Hear from Maya Murad on MagenticLite at the Microsoft Research Lab at #MSBuild.

译由可在您自己设备上运行的小型模型驱动的智能体体验。请听 Maya Murad 在 #MSBuild 微软研究院实验室介绍 MagenticLite。

查看原推 ↗

MiniMax (official)@MiniMax_AI · 6月3日57

Amazing deep dive from the @togethercompute team on serving MiniMax M3 in production. M3 with its 1M context, native multimodality and MiniMax Sparse Attention requires real work across paged decode, index scoring, and multimodal preprocessing to get it efficient. This is what a partnership at the frontier looks like🤝.

译@togethercompute 团队对 MiniMax M3 生产部署的精彩深度解析。 M3 凭借其 1M 上下文、原生多模态和 MiniMax Sparse Attention，需要在分页解码、索引评分和多模态预处理方面进行大量工作才能实现高效运行。这就是前沿合作的样子🤝。

查看原推 ↗

Chubby♨️@kimmonismus · 6月3日18

„Everyone hates AI slop“ „We are going to decide: is it vibe, is it slop?“ This sounds like a fun event :D

译“大家都讨厌 AI 垃圾内容” “我们将决定：这是氛围感，还是垃圾？” 这听起来是个有趣的活动 :D

查看原推 ↗

Chubby♨️@kimmonismus · 6月3日36

Ok what? Same Training FLOPs as Gemini 3.1 pro?

译什么？训练FLOPs和Gemini 3.1 Pro一样？

查看原推 ↗

Chubby♨️@kimmonismus · 6月3日50

Just figured out that „Mai“-1 thinking stands for: Microsoft AI-thinking. 🤯

译刚刚发现“Mai”-1 thinking 代表：微软 AI 思考。 🤯

查看原推 ↗

Rohan Paul@rohanpaul_ai · 6月3日63

Satya Nadella on Microsoft’s Fairwater data center, an AI superfactory. at today's Microsoft Build 2026 keynote. its vertically designed, two-story AI data center architecture. Instead of spreading compute only across a flat floor, Microsoft can place racks in three dimensions, packing far more GPUs densely while preserving fast network access. This helps the cluster behave more like one massive AI machine, with low latency and high bandwidth between GPUs. The other major point is its cooling efficiency: its cooling loop is filled once and can operate with effectively zero ongoing water consumption, using roughly the annual daily-water equivalent of a single restaurant. ---- From "Microsoft" YouTube channel, (link in comment)

译在微软 Build 2026 主题演讲中，Satya Nadella 介绍了 Fairwater 数据中心，这是一个为 AI 设计的“超级工厂”。其核心是垂直设计的双层 AI 数据中心架构，允许在三维空间内密集部署机架，在保持 GPU 间低延迟、高带宽网络连接的前提下，实现更高的计算密度，使整个集群更像一台大型 AI 机器。另一大亮点是其极高的冷却效率：冷却系统只需填充一次，实际运行中水耗几乎为零，其年度总用水量约等于一家餐厅的日用水量。这是微软构建“前沿智能生态系统”硬件基础的一部分。

查看原推 ↗

Ethan Mollick@emollick · 6月3日24

I wish the logos and textbox-at-the-bottom interfaces for Discord and Codex did not look so alike at a glance. I have confused the two a couple of times, leading to a confused GPT-5.5 and a confused groupchat.

译我希望Discord和Codex的标志以及底部文本框界面不要乍一看那么相似。我已经混淆过好几次了，导致GPT-5.5和群聊都一头雾水。

查看原推 ↗

Ethan Mollick@emollick · 6月3日38

It is difficult to know how good MAI-Thinking-1 is from the scores alone (like weirdly low GPQA & Terminal Bench 2.0) But Microsoft makes it really hard to try its models upon release (a general issue with many Microsoft AI products), so I dunno. Stats below Meta Spark, though.

译仅从分数很难判断 MAI-Thinking-1 有多好（比如 GPQA 和 Terminal Bench 2.0 的分数低得奇怪）但微软在模型发布后很难让人试用（这是许多微软 AI 产品的通病），所以我不太清楚。不过数据低于 Meta Spark。

查看原推 ↗

Perplexity@perplexity_ai · 6月3日58

Two new ways to bring your health data into Perplexity. Perplexity now connects to Apple Health on iPhone. Use your sleep, activity, and HRV data in Computer. Function is now available in Perplexity Health. Add labs and ask about biomarkers, blood draws, or panel results.

译两种新方式将你的健康数据带入 Perplexity。 Perplexity 现在可在 iPhone 上连接 Apple Health。在 Computer 中使用你的睡眠、活动和 HRV 数据。该功能现已在 Perplexity Health 中可用。添加实验室数据，询问生物标志物、抽血或检测结果。

查看原推 ↗

Thariq@trq212 · 6月3日81

http://x.com/i/article/2061850535708483585 # A harness for every task: dynamic workflows in Claude Code Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding, it is also useful for many other types of tasks because, as it turns out, many tasks resemble coding tasks. But there are certain classes of tasks where we have had to build custom harnesses on top of Claude Code to achieve peak performance such as Research, security analysis, agent teams, or Code Review. Workflows allow you to dynamically create harnesses that enable Claude to solve all of those problems and more natively inside of Claude Code. You can also share and re-use these workflows with others. In this article, I’ll cover my initial workflows experiences and learnings so you can take full advantage. That said, best practices are still developing! Dynamic workflows often use more tokens, so think carefully about when and how to use them. Note: this post is also available on the Claude Blog ## Example prompts Before diving into the technical details, I’d like to start with some example prompts to get you thinking about the possibilities with workflows: - "This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it, form theories and adversarially test them in worktrees /goal don't stop until one theory works." - "Using a workflow, go through my last 50 sessions and mine them for corrections I keep making and turn the recurring ones into CLAUDE.md rules" - “Use a workflow to dig through #incidents in Slack for the past six months and find recurring root causes where nobody has filed a ticket." - "Take my business plan and run a workflow where different agents tear it apart from an investor's, a customer's, and a competitor's perspective." - "Here's a folder of 80 resumes, use a workflow to rank them for the backend role and double-check the top ten. Interview me using the AskUserQuestion tool for a rubric." - "I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options and run a tournament to pick the top 3." - "Use a workflow to rename our User model to Account everywhere." - “Go through my blog post draft and using a workflow verify every technical claim against the codebase, I don't want to ship anything wrong." ## How dynamic workflows work Dynamic workflows execute a javascript file with a few special functions that help spawn and coordinate subagents: Dynamic workflows also include standard JavaScript functions like JSON, Math, and Array, to help process data. It’s particularly useful to know that dynamic workflows can decide which models an agent uses and whether subagents are run in their own worktree, allowing Claude to choose the intelligence level and isolation needed. If a workflow is interrupted, for example by user action or quitting the terminal, resuming the session will allow the workflow to pick up where it left off. ## Why dynamic workflows When you ask the default Claude Code harness to do a task, it needs to both plan and execute in the same context window. For many coding tasks, this is highly effective, but it can sometimes break down over long-running, massively parallel and/or highly structured adversarial tasks. This is because the longer Claude works on a complex task in a single context window, the more it becomes susceptible to a few specific failure modes: - Agentic laziness refers to when Claude stops before finishing a particularly complex, multi-part task and declares the job done after partial progress, for example addressing 20 of the 50 items in a security review. - Self-preferential bias refers to Claude’s tendency to prefer its own results or findings, especially when asked to verify or judge them against a rubric. - Goal drift refers to the gradual loss of fidelity to the original objective across many turns, especially after compaction. Each summarization step is lossy, and details like edge-case requirements or "don't do X" constraints can get lost. Creating a workflow helps combat these by orchestrating separate Claudes with their own context windows and focused, isolated goals. ## Dynamic vs static workflows You may have previously created a static workflow using the Claude Agent SDK or claude -p to coordinate multiple instances of Claude Code together. But because static workflows need to work for all edge cases, they are usually more generic. With Claude Opus 4.8 and dynamic workflows, Claude is now intelligent enough to write a custom harness tailor-made for your use case. # Helpful patterns when using dynamic workflows You can start using dynamic workflows just by asking Claude to make one, or by using the trigger word “ultracode” to ensure that Claude Code creates a workflow. But building a mental model for how dynamic workflows work will help you understand when to use them and how you might nudge Claude via prompts. There are a few common patterns that Claude might use and compose together when building workflows: Classify-and-act Use a classifier agent to decide on the type of task, and then route to different agents or behavior based on the task. Or, use a classifier at the end to determine output. Fan-out-and-synthesize Split up a task into many smaller steps, run an agent on each step and then synthesize those results. This is particularly useful for when there are a large number of smaller steps, or when each step benefits from its own clean context window so they don't interfere or cross-contaminate. The synthesize step is a barrier—it waits for all the fan-out agents, then merges their structured outputs into one result. Adversarial verification For each spawned agent, run a separate spawned agent to adversarially verify its output against a rubric or criteria. Generate-and-filter Generate a number of ideas on a topic and then filter them by a rubric or by verification, dedupe duplicates and return only the highest quality, tested ideas. Tournament Instead of dividing the work, have agents compete on it. Spawn N agents that each attempt the same task using different approaches. Prompts or models then judge the results in a pairwise fashion using a judging agent until you have a winner. Loop until done For tasks with an unknown amount of work, loop spawning agents until a stop condition is met (no new findings, or no more errors in the logs) instead of a fixed number of passes. # Use cases Think creatively of when and how to ask Claude Code to make dynamic workflows. I’ve found that workflows are sometimes even more useful for non-technical work. ## Migrations and refactors Bun was rewritten from Zig to Rust using workflows. You can read more about how that was done in Jarred’s X thread. The key is to break down the task into a series of steps that need to be operated on for example callsites, failing tests, modules, etc. Spin off a subagent for every fix in a worktree to make the fix, then have another agent adversarially review, and merge them. Consider telling the agent not to use resource intensive commands so that you can maximally parallelize without running out of resources on your machine. ## Deep research We published a deep research skill (/deep-research) inside Claude Code that uses dynamic workflows. Specifically, it fans-out web searches, fetches sources, adversarially verifies their claims, and synthesizes a cited report. But you may do this sort of research for more than just web searches. For example, asking Claude to compile a status report from context in Slack or to research how a feature works by exploring a codebase in-depth. ## Deep verification On the other hand, if you have a report where you want to check and source every factual claim that it references you may want to generate a workflow which has one agent identify all of the factual claims and then spin off a subagent to check each one in-detail. You could also have a verification agent check the source subagent to make sure its source is high quality. ## Sorting You may have a list of items that you want to sort by some qualitative measurement that you believe that Claude Code is good at evaluating, for example: support tickets sorted by severity of the bug. But if you try to sort 1000+ rows in one prompt, quality degrades and it won't fit in context. Instead run a tournament, a pipeline of pairwise-comparison agents (comparative judgment is more reliable than absolute scoring), or bucket-rank in parallel then merge. Each comparison is its own agent, so the deterministic loop holds the bracket and only the running order stays in context. ## Memory and rule adherence If you have a particular set of rules that you find Claude misses or struggles with, even when put into the CLAUDE.mds, create a workflow with a list of rules that must be checked by verifier agents—one verifier per rule. Creating a skeptic persona subagent to review the rules to make sure they are in line will help avoid too many false positives. The reverse direction works too: mine your recent sessions and code review comments for corrections you keep making, cluster them with parallel agents, adversarially verify each candidate (would this rule have prevented a real mistake?), and then distill the survivors back into a CLAUDE.md. ## Root-cause investigation Debugging works best when you come up with several independent hypotheses and test them, but if you’re only using one context window, Claude can run into self-preferential bias. A workflow can structurally prevent this by spinning up agents to generate hypotheses from disjoint evidence. For example, separate agents for logs, files, and data. Each hypothesis can then face a panel of verifiers and refuters. This isn't just for code. Workflows can be used for sales (why did sales drop in March?), data engineering (why did this pipeline fail?), or any post-mortem exercise. ## Triaging at scale Every team has a support queue, bug reports, or some other backlog that cannot be fully processed by humans. A triage workflow classifies each item, dedupes against what's already tracked, and takes action. This could mean attempting the fix or escalating to a human user. A useful pattern for triage workflows is quarantine. This involves barring the agents that read untrusted public content from taking high-privilege actions, which are instead done by the agents in charge of acting on the information. Pair triage workflows with /loop to have Claude do this continuously. ## Exploration and taste Workflows can be useful when exploring different approaches to a solution, especially when it is taste based, like design or naming, and would benefit from a rubric. Try asking Claude to explore a bunch of solutions, and give a review agent a rubric for what a good solution looks like. The task is complete when the review agent feels like it has met the criteria. Solutions can also be ordered or selected via a tournament based on the rubric. ## Evals You can run lightweight evals for particular tasks by spinning off separate agents in a worktree and then spinning off comparison agents to compare and grade the specific outputs against a rubric. For example, evaluating and then refining a skill you’ve created against a particular criteria. ## Model and intelligence routing Create a classifier agent tuned to your tasks that decides which model to use. This can be helpful when your task will involve many tool calls and conducting research prior to execution can identify the best model for the job. For example, the best model for the task “explain how the auth module works” depends on how many files in the auth module there are and the shape of the codebase. A classifier agent can do this research and then route to Sonnet or Opus based on the expected complexity of the task. ## When not to use dynamic workflows Workflows are new. While there are many use cases where it will create outsized results, they are not needed for every task and may end up using significantly more tokens. It’s best to use workflows creatively to push Claude Code in ways that you haven’t previously. For regular coding tasks, try and ask yourself does it really need more compute? For example, most traditional coding tasks do not need a panel of 5 reviewers. # Tips for building dynamic workflows Prompting Detailed prompting, using the specific techniques we described above, for dynamic workflows creates the best results. Workflows are not just for large tasks. You can prompt the model to use a “quick workflow.” For example, you can create a quick adversarial review of an assumption. Combine with /goal and /loop When using workflows that can be repeated, for example triage, research, or verification, pair them with /loop to be run at regular intervals, and /goal to set a hard completion requirement. Token usage budgets You can set explicit token usage budgets for dynamic workflows to limit how many tokens a task uses. You can prompt it with a budget like: “use 10k tokens,” which will set the cap. Saving and sharing dynamic workflows You can save workflows by pressing “s” in the workflow menu. You can check these into ~/.claude/workflows or distribute them via a skill. To share them via a skill, put your JavaScript workflow files in the skill and folder and reference them in the SKILL.MD. To allow for more flexibility, you may want to prompt Claude to think of the workflows in the skill as a template instead of a script that needs to be run verbatim. ## A whole new world Workflows are a helpful new way to extend Claude Code. I encourage you to think of this as a starting point, there's still much to discover in how to use them best. Let us know what you find. Thariq Shihipar and Sid Bidasaria (@sidbid) are members of technical staff at Anthropic, working on Claude Code.

译Claude Code 新增动态工作流功能，使 Claude 能根据任务动态创建定制化的执行框架。该功能通过执行 JavaScript 文件来协调子智能体，并可指定模型与工作区隔离级别。它适用于研究、安全分析、代码审查等复杂任务，支持共享与复用。需要注意，动态工作流会消耗更多 token。

查看原推 ↗

Thariq@trq212 · 6月3日69

Workflows are the biggest upgrade to Claude Code’s capabilities since skills and subagents. I dove deep into it with @sidbid to figure out best practices, examples and more. I’m particularly excited about the non-technical tasks it enables for Claude Code.

译工作流是 Claude Code 自技能和子智能体以来最大的能力升级。我和 @sidbid 深入探讨了最佳实践、示例等内容。我特别兴奋于它为 Claude Code 启用的非技术任务。

查看原推 ↗

fofr@fofrAI · 6月3日29

Playing around a bit with Krea's K2 Large image model. I love how expressive it feels, and the variability you get with each prompt.

译稍微玩了一下 Krea 的 K2 Large 图像模型。我很喜欢它带来的表现力，以及每个提示词产生的多样性。

查看原推 ↗

ClaudeDevs@ClaudeDevs · 6月3日73

How do you get Claude Code to check its own work before handing it back? Watch how you can encode your manual checks so Claude closes its own feedback loop:

译如何让 Claude Code 在交回工作前检查自己的成果？看看如何编码你的手动检查，让 Claude 自己关闭反馈循环：

查看原推 ↗

Artificial Analysis@ArtificialAnlys · 6月3日49

We’re hosting a Coding Agent Benchmarks event on Thursday, June 11 in San Francisco with lightning talks and a panel discussion with leading AI researchers, builders, and engineers. If you're building coding agents, LLM tooling, or AI infrastructure, we’d love to see you there! Request to join 👇 https://luma.com/i5zotp6c

译我们将于6月11日星期四在旧金山举办一场编程智能体基准测试活动，包含闪电演讲以及与顶尖AI研究人员、开发者和工程师的小组讨论。如果你正在开发编程智能体、LLM工具或AI基础设施，我们很期待你的到来！申请加入 👇 https://luma.com/i5zotp6c

查看原推 ↗

Runway@runwayml · 6月3日73

Aleph 2.0 is now available via the Runway API. Bring precise video editing directly into your apps, products and platforms. Edit up to 30 seconds of video at 1080p across multi-shot sequences, changing only what you want. Get started at the link below.

译Aleph 2.0 现已通过 Runway API 提供。将精准视频编辑直接集成到您的应用、产品和平台中。支持在多镜头序列中编辑最长 30 秒、1080p 分辨率的视频，仅修改您想要的部分。请通过以下链接开始使用。

查看原推 ↗

Satya Nadella@satyanadella · 6月3日31

Great to be back at Microsoft Build today. For us, it is not about any one piece of technology or even the platform. It is about how we can build a frontier intelligence ecosystem together. Sharing some of our big announcements today ...

译很高兴今天回到微软 Build 大会。对我们来说，这不仅仅是关于某一项技术，甚至不仅仅是平台。而是关于我们如何共同构建一个前沿智能生态系统。分享一些我们今天的重要公告……

查看原推 ↗

OpenRouter@OpenRouter · 6月3日68

Three new @MicrosoftAI models now live on OpenRouter! Launching together: MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2. More on each below 🧵

译三款新的 @MicrosoftAI 模型现已在 OpenRouter 上线！同步推出：MAI-Image-2.5、MAI-Transcribe-1.5 和 MAI-Voice-2。详情见下文 🧵

查看原推 ↗

fofr@fofrAI · 6月3日57

This is 🔥

译这是🔥 [引用 @DavidMaliglowka]：Gemini Omni 🏕️ 提示词在 🧵

查看原推 ↗

Replit ⠕@Replit · 6月3日70

Announcing our new collaboration with @Microsoft Organizations can now build internal tools, workflows, or data dashboards in Replit and publish directly to Microsoft Fabric with security, authentication, and governance built in

译宣布与 @Microsoft 的新合作组织现在可以在 Replit 中构建内部工具、工作流或数据仪表板，并直接发布到 Microsoft Fabric，内置安全、身份验证和治理功能。

查看原推 ↗

Chubby♨️@kimmonismus · 6月3日51

Very excited for this „no prior“ episode! Curious if the hear more about their project Solaris, their agentic handhelds

译非常期待这期“无先例”节目！好奇能否了解更多关于他们的项目Solaris，他们的智能体手持设备。

查看原推 ↗

OpenAI@OpenAI · 6月3日77

We’re making Codex more useful for your work by expanding plugins beyond individual tools. These plugins turn Codex into a specialist for a specific role with a single install, no coding required. Codex can access 62 popular apps and 110 skills for work across sales, data analytics, creative production, product design, and public equity investing. https://openai.com/index/codex-for-every-role-tool-workflow/

译我们正在通过将插件扩展到单个工具之外，使 Codex 更适用于您的工作。这些插件通过一次安装即可将 Codex 转变为特定角色的专家，无需编码。 Codex 可访问 62 个流行应用和 110 项技能，覆盖销售、数据分析、创意制作、产品设计和公开股票投资等工作领域。 https://openai.com/index/codex-for-every-role-tool-workflow/

查看原推 ↗

Microsoft Research@MSFTResearch · 6月3日44

Microsoft Research is at BUILD 2026 this week, giving developers a hands-on look at some of the many AI-based models and tools they can use to accelerate innovation, enhance their capabilities, and quickly transform ideas into prototypes. https://msft.it/6010vjBUe

译微软研究院本周参加BUILD 2026，让开发者亲身体验众多基于AI的模型和工具，以加速创新、增强能力，并快速将想法转化为原型。https://msft.it/6010vjBUe

查看原推 ↗

jason@jxnlco · 6月3日

!!! https://blog.calif.io/p/codex-discovered-a-hidden-http2-bomb

查看原推 ↗

OpenAI Developers@OpenAIDevs · 6月3日69

Role-specific plugins in Codex are built around the work teams actually do. Plugins for Data Analytics, Creative Production, and Product Design give Codex the tools and context to create reports, creative directions, and prototypes. Built and used by OpenAI teams.

译Codex 中的角色专属插件围绕团队实际工作构建。数据分析、创意制作和产品设计插件为 Codex 提供了创建报告、创意方向和原型的工具与上下文。由 OpenAI 团队构建并使用。

查看原推 ↗

向阳乔木@vista8 · 6月3日70

读了今天Huggingface最热论文，关于如何让AI生成论文图表的Harness框架。框架会围绕一个共享的结构化规格文档 S。 ① 设计者 D：根据 S 生成可执行的视觉方案 ② 执行者 E：将方案渲染成图像（或代码） ③ 验证者 V：输出带有具体问题定位的诊断报告 ④ 修订者 R：将诊断转化为结构化操作，直接修改 S 中的对应字段参考并简化，写了一个Skill：设计者（生图提示词）执行者（Codex调用GPT-image-2生图）验收者（审美评判，这个可能不靠谱）另外整合了抓取Skill，只需要提供URL就能生成配图，哪怕是 X URL。生成效果如下：

译Hugging Face 上一篇热门论文介绍了名为 Harness 的 AI 论文图表生成框架。该框架围绕一个共享的结构化规格文档 S 运作，包含四个协作角色：设计者生成视觉方案，执行者渲染图像或代码，验证者输出带定位的诊断报告，修订者据此修改规格文档 S。作者参考该框架进行了简化实践，写成一个技能包，其中使用了 GPT-image-2 进行生图，并整合了 URL 抓取功能，可直接生成配图。

查看原推 ↗

Epoch AI@EpochAIResearch · 6月3日46

We’re running a short survey to ensure we’re producing the most useful work on AI's trajectory. If you haven’t yet, we'd love your input.

译我们正在进行一项简短调查，以确保我们能产出关于 AI 发展轨迹最有价值的工作。如果您尚未参与，我们很乐意听取您的意见。（您可以在问卷末尾注册，加入我们的有偿用户研究小组。）

查看原推 ↗

ClaudeDevs@ClaudeDevs · 6月3日77

We’ve added a CLI for Claude Platform to make every API endpoint runnable from your terminal. Call the Messages API, stand up Claude Managed Agents, pipe results straight into your shell. The ant CLI is well understood by coding agents (Claude Code) using the claude-api skill.

译我们为 Claude Platform 添加了一个 CLI，使每个 API 端点都可以从你的终端运行。调用 Messages API，启动 Claude 托管智能体，并将结果直接管道传输到你的 shell。 ant CLI 被使用 claude-api 技能的编码智能体（Claude Code）很好地理解。

查看原推 ↗

🚨 AI News | TestingCatalog@testingcatalog · 6月3日62

MICROSOFT 🔥: A new Copilot super app has been announced! It arrives with a concept of Autopilots, long-running, always-on agents, with Scout being the first Agent coming out of the box. More Autopilot Agents will be added later.

译微软 🔥：一款新的 Copilot 超级应用已发布！它引入了 Autopilots 概念，即长期运行、始终在线的智能体，Scout 是首个开箱即用的智能体。后续将添加更多 Autopilot 智能体。

查看原推 ↗

Peter Steinberger 🦞@steipete · 6月3日67

It’s been great working with Omar to get observability and verifiable workspaces into OpenClaw.

译很高兴与 Omar 合作，将可观测性和可验证工作区引入 OpenClaw。

查看原推 ↗

Chubby♨️@kimmonismus · 6月3日56

Microsoft scout revealed „your always-on personal agent for work.“ If "AI" was the Word of the Year in 2025, in 2026 it will be "agents" (always-on). Everything is agentic this year.

译微软 Scout 揭示了“您始终在线的个人工作智能体”。如果说“AI”是2025年的年度词汇，那么2026年将是“智能体”（始终在线）。今年一切都是智能体化的。

查看原推 ↗

6月3日

06:16

Rohan Paul@rohanpaul_ai

63

AI解释科学的能力优于预测能力

一项对4,760个科学事件的研究发现，AI模型在“解释”科学方面优于“预测”科学。模型在识别可能的研究路径（尤其是选择题形式）时表现较好，但在预测科学发现是否会实际发生、何时发生以及何种方法有效等更难任务上表现薄弱，准确率接近随机猜测。即使提供额外历史信息，模型改善有限。这表明，模型内嵌大量科学知识并不等同于具备可靠的科学预见能力。研究论文发表于arXiv（2605.22681），标题为《Forecasting Scientific Progress with AI》。

其他论文/研究

06:13

elvis@omarsar0

38

代码就是你所需的一切！搜索即代码工具链即代码接下来是什么？

Thariq: Workflows are the biggest upgrade to Claude Code's capabilities since skills and subagents. I dove deep into it with @si...

Anthropic产品更新编码

06:08

Ethan Mollick@emollick

40

那些"全能应用"看起来仍然很像聊天机器人与 IDE 的混合体，而非为通用知识工作而构建的东西。太多假设线性流程且最终产出是唯一目标，太少与研究的连接，不够多的引导或选择机会，等等。

大佬观点现象/趋势

06:00

Microsoft Research@MSFTResearch

精选72

天气预报速度比传统超级计算机快数千倍。听听Kenji Takeda在#MSBuild微软研究实验室关于Aurora的分享。了解更多：https：//msft.it/6018vjGUA

Microsoft多模态论文/研究

推荐理由：微软把天气预报推到了推理速度比超算快数千倍，这在气象AI里算是代际提升，虽然离普通人远，但对气候建模和极端天气预警是实实在在的突破。

05:55

Anthropic@AnthropicAI

精选69

这项行政令是加强美国AI领导地位的重要一步。我们期待与白宫合作，支持其实施。 https：//www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/

Anthropic政策/监管行业动态

关联讨论 5 条

推荐理由：Anthropic 对白宫 AI 行政令的官方表态，信号意义大于实质内容，但头部公司主动拥抱政策制定是趋势，值得留意后续落地细节。

05:47

Google AI Developers@googleaidevs

精选74

构建用于科学发现的自主智能体？🧬🤖 @GoogleDeepMind Science Skills 现已在 GitHub 上发布。我们已开源这个专用工具包，以科学基础和更高的 token 效率加速您的智能体工作流。立即下载 ↓ https：//github.com/google-deepmind/science-skills

智能体DeepMind产品更新开源生态

推荐理由：DeepMind 把这个科学 agent 工具包开源了，核心是给 agent 工作流加科学基础、提升 token 效率，做 AI for Science 的可以直接 fork 试手，本周最值得上手的工具之一。

05:25

NotebookLM@NotebookLM

58

注意到 NotebookLM 移动应用最近有什么不同了吗？😉 我们很高兴地宣布，你现在可以在移动端创建简报文档、学习指南和博客文章了！📱✨ 还有其他你希望在移动端特别支持的报告格式吗？请告诉我们！

Google产品更新

05:21

SemiAnalysis@SemiAnalysis_

53

Cerebras做到了业界认为不可能的事：将整个46，225mm2晶圆制成单芯片。如此大面积的硅片缺陷不可避免，因此他们内置了冗余，并采用定制的逐批次光罩来绕过每个不良核心，最终实现了接近100%的可用晶圆率。结果：单片硅片上集成了90万个核心和44GB SRAM，无需封装，无片外跳转。他们并未止步于此，目前正在探索将DRAM晶圆通过混合键合堆叠在上方，以获得更快的更多内存。（1/4） 🧵

产品更新部署/工程

05:16

Rohan Paul@rohanpaul_ai

81

微软发布 MAI-Thinking-1 模型

微软发布了 MAI-Thinking-1，这是一款采用 MoE 架构的模型，拥有 35B 活跃参数和 1T 总参数。该模型从零开始在 30T tokens 上完成预训练，且未使用第三方模型蒸馏。微软称其迭代优化流程为“爬山机器”。在基准测试中，该模型于 AIME 2025 获得 97.0%，在 LiveCodeBench v6 获得 87.7%，在 SWE-Bench Pro 获得 52.8% 的成绩。

Microsoft推理模型发布

关联讨论 4 条

05:11

Emad@EMostaque

17

这没事 🐶☕️🔥 【引用 @EMostaque】：我对 Claude Opus 4.8 的评价：我们应该少担心被变成回形针，多担心被烦死。

Emad: My review of Claude Opus 4.8: We should worry less about being turned into paper clips & more about being annoyed to dea...

Anthropic大佬观点

05:00

Microsoft Research@MSFTResearch

54

由可在您自己设备上运行的小型模型驱动的智能体体验。请听 Maya Murad 在 #MSBuild 微软研究院实验室介绍 MagenticLite。

智能体Microsoft产品更新端侧

04:55

MiniMax (official)@MiniMax_AI

57

@togethercompute 团队对 MiniMax M3 生产部署的精彩深度解析。 M3 凭借其 1M 上下文、原生多模态和 MiniMax Sparse Attention，需要在分页解码、索引评分和多模态预处理方面进行大量工作才能实现高效运行。这就是前沿合作的样子🤝。

Together AI: http://x.com/i/article/2061891247762026496

行业动态部署/工程

04:47

Chubby♨️@kimmonismus

18

"大家都讨厌 AI 垃圾内容" "我们将决定：这是氛围感，还是垃圾？" 这听起来是个有趣的活动：D

图像生成现象/趋势

04:47

Chubby♨️@kimmonismus

36

什么？训练FLOPs和Gemini 3.1 Pro一样？

swyx: uhhh did Mustafa just leak the Mythos FLOP count?? was this public knowledge before, even if its an estimate i dont get ...

数据/训练行业动态

04:47

Chubby♨️@kimmonismus

50

刚刚发现"Mai"-1 thinking 代表：微软 AI 思考。 🤯

Chubby♨️: Mai-1 thinking: Mid size model, 45b active parameter, MoE, side by side with sonnet 4.6 0 distillation "Microsoft's firs...

Microsoft大佬观点推理

04:46

Rohan Paul@rohanpaul_ai

63

Satya Nadella 谈微软 Fairwater 数据中心：一个 AI 超级工厂

在微软 Build 2026 主题演讲中，Satya Nadella 介绍了 Fairwater 数据中心，这是一个为 AI 设计的“超级工厂”。其核心是垂直设计的双层 AI 数据中心架构，允许在三维空间内密集部署机架，在保持 GPU 间低延迟、高带宽网络连接的前提下，实现更高的计算密度，使整个集群更像一台大型 AI 机器。另一大亮点是其极高的冷却效率：冷却系统只需填充一次，实际运行中水耗几乎为零，其年度总用水量约等于一家餐厅的日用水量。这是微软构建“前沿智能生态系统”硬件基础的一部分。

Satya Nadella: Great to be back at Microsoft Build today. For us, it is not about any one piece of technology or even the platform. It ...

Microsoft产品更新部署/工程

04:38

Ethan Mollick@emollick

24

我希望Discord和Codex的标志以及底部文本框界面不要乍一看那么相似。我已经混淆过好几次了，导致GPT-5.5和群聊都一头雾水。

OpenAI大佬观点

04:38

Ethan Mollick@emollick

38

仅从分数很难判断 MAI-Thinking-1 有多好（比如 GPQA 和 Terminal Bench 2.0 的分数低得奇怪）但微软在模型发布后很难让人试用（这是许多微软 AI 产品的通病），所以我不太清楚。不过数据低于 Meta Spark。

Microsoft大佬观点

04:32

Perplexity@perplexity_ai

58

两种新方式将你的健康数据带入 Perplexity。 Perplexity 现在可在 iPhone 上连接 Apple Health。在 Computer 中使用你的睡眠、活动和 HRV 数据。该功能现已在 Perplexity Health 中可用。添加实验室数据，询问生物标志物、抽血或检测结果。

产品更新搜索数据/训练

04:31

Thariq@trq212

81

Claude Code 动态工作流功能发布：为每个任务创建专属框架

Claude Code 新增动态工作流功能，使 Claude 能根据任务动态创建定制化的执行框架。该功能通过执行 JavaScript 文件来协调子智能体，并可指定模型与工作区隔离级别。它适用于研究、安全分析、代码审查等复杂任务，支持共享与复用。需要注意，动态工作流会消耗更多 token。

智能体Anthropic产品更新编码

关联讨论 4 条

04:31

Thariq@trq212

69

工作流是 Claude Code 自技能和子智能体以来最大的能力升级。我和 @sidbid 深入探讨了最佳实践、示例等内容。我特别兴奋于它为 Claude Code 启用的非技术任务。

Thariq: http://x.com/i/article/2061850535708483585

智能体AnthropicMCP/工具产品更新

04:29

fofr@fofrAI

29

稍微玩了一下 Krea 的 K2 Large 图像模型。我很喜欢它带来的表现力，以及每个提示词产生的多样性。

其他图像生成

04:24

ClaudeDevs@ClaudeDevs

精选73

如何让 Claude Code 在交回工作前检查自己的成果？看看如何编码你的手动检查，让 Claude 自己关闭反馈循环：

智能体Anthropic教程/实践编码

推荐理由：如果你用Claude Code写代码，这个官方视频值得立刻打开——它教你把手动检查编码进去，让Claude自己形成反馈循环，能省掉大量反复修改的时间。

04:17

Artificial Analysis@ArtificialAnlys

49

我们将于6月11日星期四在旧金山举办一场编程智能体基准测试活动，包含闪电演讲以及与顶尖AI研究人员、开发者和工程师的小组讨论。如果你正在开发编程智能体、LLM工具或AI基础设施，我们很期待你的到来！申请加入 👇 https：//luma.com/i5zotp6c

智能体编码行业动态

04:06

Runway@runwayml

精选73

Aleph 2.0 现已通过 Runway API 提供。将精准视频编辑直接集成到您的应用、产品和平台中。支持在多镜头序列中编辑最长 30 秒、1080p 分辨率的视频，仅修改您想要的部分。请通过以下链接开始使用。

产品更新视频

关联讨论 3 条

推荐理由：Runway把Aleph 2.0的视频编辑能力放到了API里，做视频工具的同学可以直接拿来用了，1080p 30秒还支持多镜头，以前要写一堆处理逻辑的功能现在一个API调用搞定。

04:02

Satya Nadella@satyanadella

31

很高兴今天回到微软 Build 大会。对我们来说，这不仅仅是关于某一项技术，甚至不仅仅是平台。而是关于我们如何共同构建一个前沿智能生态系统。分享一些我们今天的重要公告……

Microsoft行业动态

03:59

OpenRouter@OpenRouter

精选68

三款新的 @MicrosoftAI 模型现已在 OpenRouter 上线！同步推出：MAI-Image-2.5、MAI-Transcribe-1.5 和 MAI-Voice-2。详情见下文 🧵

Microsoft产品更新图像生成多模态

推荐理由：微软三个多模态模型一口气上架 OpenRouter，图像、转录、语音全齐了，开发者直接调 API 就能用，做产品的可以试试效果。

03:59

fofr@fofrAI

57

这是🔥 【引用 @DavidMaliglowka】：Gemini Omni 🏕️ 提示词在 🧵

David Maliglowka: Gemini Omni 🏕️ prompt in 🧵

Google多模态教程/实践

03:56

Replit ⠕@Replit

精选70

宣布与 @Microsoft 的新合作组织现在可以在 Replit 中构建内部工具、工作流或数据仪表板，并直接发布到 Microsoft Fabric，内置安全、身份验证和治理功能。

Microsoft产品更新部署/工程

推荐理由：对同时用 Replit 和 Microsoft Fabric 的企业来说，这个集成省了一步繁琐的部署工作，把内部工具开发到上线的链路压短了一截，但如果你没用过 Fabric 就不会有感知。

03:47

Chubby♨️@kimmonismus

51

非常期待这期"无先例"节目！好奇能否了解更多关于他们的项目Solaris，他们的智能体手持设备。

Chubby♨️: This came as a surprise: Microsoft has unveiled handheld and desktop devices designed to control one's agents. It remind...

智能体Microsoft产品更新端侧

03:34

OpenAI@OpenAI

77

我们正在通过将插件扩展到单个工具之外，使 Codex 更适用于您的工作。这些插件通过一次安装即可将 Codex 转变为特定角色的专家，无需编码。 Codex 可访问 62 个流行应用和 110 项技能，覆盖销售、数据分析、创意制作、产品设计和公开股票投资等工作领域。 https：//openai.com/index/codex-for-every-role-tool-workflow/

MCP/工具OpenAI产品更新

03:30

Microsoft Research@MSFTResearch

44

微软研究院本周参加BUILD 2026，让开发者亲身体验众多基于AI的模型和工具，以加速创新、增强能力，并快速将想法转化为原型。https：//msft.it/6010vjBUe

Microsoft开源生态行业动态

03:27

jason@jxnlco

中文摘要暂缺，点击查看原文。

03:25

OpenAI Developers@OpenAIDevs

精选69

Codex 中的角色专属插件围绕团队实际工作构建。数据分析、创意制作和产品设计插件为 Codex 提供了创建报告、创意方向和原型的工具与上下文。由 OpenAI 团队构建并使用。

OpenAI产品更新编码

推荐理由：OpenAI给Codex装了三个团队专用插件，数据分析、创意生产和产品设计直接内置，如果你团队在用Codex，这是能省事的小更新。

03:06

向阳乔木@vista8

70

论文图表生成框架Harness的架构与实践

Hugging Face 上一篇热门论文介绍了名为 Harness 的 AI 论文图表生成框架。该框架围绕一个共享的结构化规格文档 S 运作，包含四个协作角色：设计者生成视觉方案，执行者渲染图像或代码，验证者输出带定位的诊断报告，修订者据此修改规格文档 S。作者参考该框架进行了简化实践，写成一个技能包，其中使用了 GPT-image-2 进行生图，并整合了 URL 抓取功能，可直接生成配图。

Hugging Face图像生成多模态教程/实践

03:00

Epoch AI@EpochAIResearch

46

我们正在进行一项简短调查，以确保我们能产出关于 AI 发展轨迹最有价值的工作。如果您尚未参与，我们很乐意听取您的意见。（您可以在问卷末尾注册，加入我们的有偿用户研究小组。）

Epoch AI: Help us produce the most useful work on AI by taking our 5-minute survey: https://docs.google.com/forms/d/e/1FAIpQLSfzw_...

数据/训练行业动态

02:54

ClaudeDevs@ClaudeDevs

精选77

我们为 Claude Platform 添加了一个 CLI，使每个 API 端点都可以从你的终端运行。调用 Messages API，启动 Claude 托管智能体，并将结果直接管道传输到你的 shell。 ant CLI 被使用 claude-api 技能的编码智能体（Claude Code）很好地理解。

AnthropicMCP/工具产品更新部署/工程

推荐理由：Ant CLI 把 Claude Platform 的所有 API 端点都弄进了终端，配合 Claude Code 用很顺手，做 Agent 或脚本开发的可以直接上手玩。

02:53

🚨 AI News | TestingCatalog@testingcatalog

62

微软 🔥：一款新的 Copilot 超级应用已发布！它引入了 Autopilots 概念，即长期运行、始终在线的智能体，Scout 是首个开箱即用的智能体。后续将添加更多 Autopilot 智能体。

🚨 AI News | TestingCatalog: @steipete SUPERAPP 🔥

智能体Microsoft产品更新

02:53

Peter Steinberger 🦞@steipete

67

很高兴与 Omar 合作，将可观测性和可验证工作区引入 OpenClaw。

Omar Shahine: Introducing Microsoft Scout, the first autopilot agent from Microsoft - 57 days after starting my new job, we are launch...

智能体Microsoft产品更新

02:47

Chubby♨️@kimmonismus

56

微软 Scout 揭示了"您始终在线的个人工作智能体"。如果说"AI"是2025年的年度词汇，那么2026年将是"智能体"（始终在线）。今年一切都是智能体化的。

Chubby♨️: This came as a surprise: Microsoft has unveiled handheld and desktop devices designed to control one's agents. It remind...

智能体Microsoft产品更新端侧