GPT 5.2 xhigh feels like a much more careful architecter and debugger, when it comes to complex systems
But most people here think Opus 4.5 is the best model in that category
There are 2 reasons AFAIS:
- xhigh reasoning consumes significantly more tokens. You need to pay for ChatGPT Pro (200 usd) to be able to use it as a daily driver
- It takes like 5x longer to finish a task, and most people lack the patience to wait for it. (But then it's more correct/doesn't need fixing)
Opus 4.5 is good too, I think better in e.g. frontend design. But if you think it beats GPT 5.2 in every category, you are either too poor/stingy or have ADHD
Just 5 months ago, I was swearing at Claude 4 Sonnet like a Balkan uncle
Models one-shotted the right thing only 20-30% of the time but did really stupid things the rest, and had to be handheld tightly
Today they are much, much better. My psychology is a lot more at ease, and instead of swearing, I want to kiss them on the forehead most of the time
Now I trust agents so much that I queue up 5-10 tasks before going to sleep. They work the whole night while I sleep and I wake up to resolved issues
GPT 5.2 xhigh and Claude 4.5 Opus are already goated (GPT more so), can't wait for them to get even faster
Codex does not have support for subagents. I tried to use Claude Code to launch 8 Codex instances in parallel on separate tasks, but Opus 4.5 had difficulty following instructions
So created a CLI tool to scan pending TODOs from a markdown file, and let me launch as many harnesses as I want (osolmaz/spawn on github)
I currently use this for relatively read-only tasks like planning and finding root causes of bugs, because it's launching all the agents on the same repo and they might conflict
Ideas:
- Use @mitsuhiko's gh-issue-sync and run parallel agents directly on github issues
- Create any new clones or worktrees for each task. I currently don't do this because I don't dare duplicate rust target dir 10x on my measly macbook air
- Support modes other than tmux, e.g. launching a terminal like Ghostty
- TUI for easy selection of issues/TODOs
Other ideas are welcome!
Friends of open source, we need your help!
A lot of Manim Community accounts got compromised and deleted during Christmas
Manim Community is a popular fork of @3blue1brown's original math animation engine Manim, and its accounts have over 5 YEARS of contributions, knowledge and following
Apparently GitHub support already saw the request and in progress of restoring the GitHub org. But if anyone knows how to speed this up, if would be greatly appreciated!
Unfortunately, the Discord and X accounts are deleted and less likely to return
But there might still be a way to restore them, or at least the data?
Re. Discord: Maybe @RhysSullivan's Answer Overflow has archived enough of the old server? That server contains YEARS of Q/A data and is vital for newcomers
Re. X: Maybe someone high up can do something to restore the account? cc @nikitabier
In the meanwhile, it would help a lot if you could follow the new account @manimcommunity and share this post! Thank you in advance!
While a great feature, I never needed such a thing in Codex after GPT 5.2. It just one shots tasks without stopping
So we have proof by existence that this problem can be solved without any such mechanism. Wish to see the same relentlessness in Anthropic models
2025 was the year of ̶a̶g̶e̶n̶t̶s̶ bugs
Software felt much buggier compared to before, even from companies like Apple. Presumably because everyone started generating more code with AI
Models are improving so hopefully 2026 will be the opposite. Even less bugs than pre-AI era
Have a long flight, so will think about this
I have an internal 2023 TextCortex doc which models chatbots as state machines with internal and external states with immutability constraints on the external state (what is already sent to the user shall not be changed)
Motivation was that a chatbot provider will always have state that they will want to keep hidden
This was way before Responses and now deprecated Assistants API. It stood the test of time, because it was the most abstract thing I could think of
@mitsuhiko is right about the risk of rushing to lock in an abstraction and locking in their weaknesses and faults
Problem is, I could propose standards as much as I liked, but I don’t work at OpenAI or Anthropic, so nobody would care. Maybe a better place to start is open weights model libraries? To at least be able to demonstrate?
What I know: it is against OpenAI’s or Anthropic’s self interests to create an interoperability layer that will accelerate their commoditization. Maybe Google, looking at their current market positioning? Or maybe we “wrappers” have a chance after all?
There is a missing link between AI SDK, Langchain, and so on for other languages. We cannot keep duplicating same things in each ecosystem independently. We need to join forces and simplify all this!
This was simply because webapp fails to create a post and fails silently. The UX is still not good on this app. Make sure to write your posts somewhere else to not lose them
I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile
It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper
I've spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automate 95% of the low-level grunt work
Acquiring these skills involved highly difficult, excruciating intellectual labor spanning many years, very similar to ML research. Doing tensor math, writing out the solver code, wondering why your solution is not converging, finally figuring out it was a sign typo after 2 days
Kids these days both have it easy and hard. They can fast forward large chunks of the work, but then they will never understand things as deeply as someone who wrote the whole thing by hand
I guess the more valuable skill now is being able to zoom in and out of abstraction levels quickly when needed. Using AI, but recognizing fast when it fails, learning what needs to be done, fixing it, zooming back out, repeat. Adaptive learning, a sort of "depth-on-demand". The quicker you can pick up new skills and knowledge, the more successful you will be
I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile
It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper
I’ve spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automate 95% of the low-level grunt work
Acquiring these skills involved highly difficult, excruciating intellectual labor spanning many years, very similar to ML research. Doing tensor math, writing out the solver code, wondering why your solution is not converging, finally figuring out it was a sign typo after 2 days
Kids these days both have it easy and hard. They can fast forward large chunks of the work, but then they will never understand things as deeply as someone who wrote the whole thing by hand
I guess the more valuable skill now is being able to zoom in and out of abstraction levels quickly when needed. Using AI, but recognizing fast when it fails, learning what needs to be done, fixing it, zooming back out, repeat. Adaptive learning, a sort of “depth-on-demand”. The quicker you can pick up new skills and knowledge, the more successful you will be
If you have a bunch of docs in your repo, give it a try. It will use the timestamps of the commit that created the files while renaming. You can also run with --dry-run to see changes without applying them
Now you can migrate your repo to SimpleDoc with a single command:
npx -y @simpledoc/simpledoc migrate
Step by step wizard will add timestamps to your files based on your git history, add missing YAML frontmatter, update your AGENTS md file
https://t.co/yrciS8KtEw
Curious to hear what other hardcore agent users @simonw@mitsuhiko@steipete@badlogicgames think. I can't be the only one who does this.
I feel like everybody ended up with the same workflow independent of each other, but somehow did not write about it (or I missed it)
How to stop AI agents from littering your codebase with Markdown files?
I wrote a new post on how to create documentations with AI agents, without having it add markdown files in your repo root, and have chronological order to the files it creates
If you have used AI agents such as Anthropic’s Claude Code, OpenAI’s Codex, etc., you might have noticed their tendency to create markdown files at the repository root:
The default behavior for models as of writing this in December 2025 is to create capitalized Markdown files at the repository root. This is of course very annoying, when you accidentally commit them and they accumulate over time.
The good news is, this problem is 100% solvable, by using a simple instruction in your AGENTS.md file:
**Attention agent!** Before creating ANY documentation, read the docs/HOW_TO_DOC.md file first. It contains guidelines on how to create documentation in this repository.
But what should be in docs/HOW_TO_DOC.md file and why is it a separate file? In my opinion, the instructions for solving this problem are too specific to be included in the AGENTS.md file. It’s generally a good idea to not inject them into every context.
To solve this problem, I developed a lightweight standard over time, for organizing documentation in a codebase. It is framework-agnostic, unopinionated and designed to be human-readable/writable (as well as agents). I was surprised to be not able to find something similar enough online, crystallized the way I wanted it to be. So I created a specification myself, called SimpleDoc.
Basically, it tells the agent to
Create documentation files in the docs/ folder, with YYYY-MM-DD prefixes and lowercase filenames, like 2025-12-22-an-awesome-doc.md, so that they will by default be chronologically sorted.
Always include YAML frontmatter with author, so that you can identify who created it without checking git history, if you are working in a team.
The exception here are timeless and general files like README.md, INSTALL.md, AGENTS.md, etc. which can be capitalized. But these are much rarer, so we can just follow the previous rules most of the time.
Here is your call to action to check the spec itself: SimpleDoc.
How to setup SimpleDoc in your repo
Run the following command from your repo root:
npx -y @simpledoc/simpledoc migrate
This starts an interactive wizard that will:
Migrate existing Markdown docs to SimpleDoc conventions (move root docs into docs/, rename to YYYY-MM-DD-… using git history, and optionally insert missing YAML frontmatter with per-file authors).
Ensure AGENTS.md contains the reminder line and that docs/HOW_TO_DOC.md exists (created from the bundled SimpleDoc template).
If you just want to preview what it would change:
npx -y @simpledoc/simpledoc migrate --dry-run
If you run into issues with the workflow or have suggestions for improvement, you can email me at onur@solmaz.io.
OpenAI won’t be able to monopolize this, the same reason Microsoft couldn’t monopolize the internet. The internet (of agents) is bigger than any one company
One tap @Revolut bank account at Berlin airport. Literally.
Dispenses free card with instructions to login. One of the the most insane onboarding experiences I have ever seen
Codex feature request: Let me queue up /model changes
Currently, if I try to run /model while responding, it tells me that I can't do that while the model is responding
But I often want to gauge thinking budget in advance, like run a straightforward task with low reasoning and then start another one with high reasoning
cc @thsottiaux
AI agents make any transductional task (like translation from language A to language B) trivial, especially when you can verify the output with compilers and tests
The bottleneck is now curating the tests
I think X removed one of my posts yesterday about the new encrypted "Chat" rolling out to all users, and how you might lose all your past messages if you forget your passcode and do not have the app installed
I can swear I clicked Post. Do they classify posts based on their topic and delete the ones they don't like?
Anyway, we shall see, I am taking a screenshot and saving the URL.
Crazy that @cursor_ai disabled Gemini 3 Pro on my installation, toggled it right back on. I wonder why, too many complaints maybe? That it’s hard to control?
On another note, disabling models without notification is dishonest product behavior. I would at least appreciate getting a notification, even when it might be against a company’s interests @sualehasif996
So is somebody already building “LLVM but for LLM APIs” in stealth or not?
We have numerous libraries @langchain, Vercel AI SDK, LiteLLM, OpenRouter, the one we have built at @TextCortex, etc.
But to my knowledge, none of these try to build a language agnostic IR for interoperability between providers (or at least market themselves as such)
Like some standard and set of tools that will not lock you in langchain, ai sdk or anything like that, something lower level and less opinionated
I feel like this is a job for the new Agentic AI Foundation cc @linuxfoundation, so maybe they are already working on it? I desperately want to start on such a project, but feel like I might get sniped 2 months after
Anybody has any information on all this?
cc @mitsuhiko@badlogicgames@steipete
This is how an agentic monorepo looks like. What was now a hurdle before is now a child's toy
This side project started as a Python project earlier in 2025
Then I added an iOS app on top of it
I rewrote the most important algorithms in Rust
I rewrote the entire backend in Go and retired Python to be used purely for prototypes
I wrote a webapp with Next.js
With unit and integration tests for each component
Lately written 99% by instructing agents
Crazy mixed language programming going on in the background. Rust component used both by iOS app for offline and by go backend for online use case, FFI and all
Number of lines in the repo: a couple 100k
If you had told me I would be able do to all of this by myself 1 year ago, I would not have believed it
This is huge. Natively supported stacked PRs on GitHub would make life much easier, especially with human AND AI reviews
AI reviews with Codex/Claude/Gemini/Cursor Bugbot integrations are becoming especially important in small teams who are generating huge amounts of code
AI reviews don't work well if you don't split your work to diffs smaller than a few hundred lines of code, so stacked PRs are already an integral part of developer experience in agentic workflows
CLI coding tools should give more control over message queueing
Codex waits until end of turn to handle user message, Claude Code injects as soon as possible after tool response/assistant reply
There is no reason why we cannot have both!
New post (link below):
Codex v0.71 finally implements a more detailed way of storing permissions
But they are still at user home folder level. Saving rules in a repo still seems TBD
"execpolicy commands are still in preview. The API may have breaking changes in the future."
Below: Why agentic coding tools like Cursor, Claude Code, OpenAI Codex, etc. should implement more ways of letting users queue messages.
See Peter Steinberger’s tweet where he queues continue 100 times to nudge the GPT-5-Codex model to not stop while working on a predictable, boring and long-running refactor task:
Tweet embed disabled to avoid requests to X.
This is necessary while working with a model like GPT-5-Codex. The reason is that the model has a tendency to stop generating at certain checkpoints, due to the way it has been trained, even when you instruct it to FINISH IT UNTIL COMPLETION!!1!. So the only way to get it to finish something is to use the message queue.1
But this isn’t the only use case for queued messages. For example, you can use the model to retrieve files into its context, before starting off a related task. Say you want to find the root cause of a <bug in component X>. Then you can queue
Explain how <component X> works in plain language. Do not omit any details.
Find the root cause of <bug> in <component X>.
This will generally help the model to find the root cause easier, or make more accurate predictions about the root cause, by having the context about the component.
Another example: After exploring a design in a dialogue, you can queue the next steps to implement it.
<Prior conversation exploring how to design a new feature>
Create an implementation plan for that in the docs/ folder. Include all the details we discussed
Commit and push the doc
Implement the feature according to the plan.
Continue implementing the feature until it is done. Ignore this if the task is already completed.
Continue implementing the feature until it is done. Ignore this if the task is already completed.
… you get the idea.
I generally queue like this when the feature is specified enough in the conversation already. If it’s underspecified, then the model will make up stuff.
When I first moved from Claude Code to Codex, the way it implemented queued messages was annoying (more on the difference below). But as I grew accustomed to it, it started to feel a lot like something I saw elsewhere before: chess premoves.
Chess???
A premove is a relatively recent invention in chess which is made possible by digital chess engines. When the feature is turned on, you don’t need to wait for your opponent to finish their move, and instead can queue your next move. It then gets executed automatically if the queued move is valid after your opponent’s move:
If you are fast enough, this let’s you move without using up your time in bullet chess, and even lets you queue up entire mate-in-N sequences, resulting in highly entertaining cases like the video above.
I tend to think of message queueing as the same thing: when applied effectively, it saves you a lot of time, when you can already predict the next move.
In other words, you should queue (or premove) when your next choice is decision-insensitive to the information you will receive in the next turn—so waiting wouldn’t change what you do, it would only delay doing it.
With this perspective, some obvious candidates for queuing in agentic codeing are rote tasks that come before and after “serious work”, e.g:
making the agent explain the codebase,
creating implementation plans,
fixing linting errors,
updating documentation during work before starting off a subsequent step,
committing and pushing,
and so on.
Different ways CLI agents implement queued messages
As I have mentioned above, Claude Code implements queued messages differently from OpenAI Codex. In fact, there are three main approaches that I can think of in this design space, which is based on when a user’s new input takes effect:
Post-turn queuing (FIFO2): User messages wait until the current action finishes completely before they’re handled. Example: OpenAI Codex CLI.
Boundary-aware queuing (Soft Interrupt): New messages are inserted at natural breakpoints, like after finishing a tool call, assistant reply or a task in the TODO list. This changes the model’s course of action smoothly, without stopping ongoing generation. Example: Claude Code, Cursor.
Immediate queuing (Hard Interrupt): New user messages immediately stop the current action/generation, discarding ongoing work and restarting the assistant’s generation from scratch. I have not seen any tool that implements this yet, but it could be an option for the impatient.
Why not implement all of them?
And here is my title-sake argument: When I move away from Claude Code, I miss boundary-aware queuing. When I move away from OpenAI Codex, I miss FIFO queueing.
I don’t see a reason why we could not implement all of them in all agentic tools. It could be controlled by a key combo like Ctrl+Enter, a submenu, or a button, depending on whether you are in the terminal or not.
Having the option would definitely make a difference in agentic workflows where you are running 3-4 agents in parallel.
So if you are reading this and are implementing an agentic coding tool, I would be happy if you took all this into consideration!
Pro tip: Don’t just queue continue by itself, because the model might get loose from its leash and start to make up and execute random tasks, especially after context compaction. Always specify what you want it to continue on, e.g. Continue handling the linting errors until none remain. Ignore this if the task is already completed.↩