Entries for December 2025

@onusoz · /2025/12/31· 04:37 PM View on

Anyone created an agent skill for splitting PRs for good review culture?

@onusoz · /2025/12/31· 03:17 PM View on

GPT 5.2 xhigh feels like a much more careful architecter and debugger, when it comes to complex systems But most people here think Opus 4.5 is the best model in that category There are 2 reasons AFAIS: - xhigh reasoning consumes significantly more tokens. You need to pay for ChatGPT Pro (200 usd) to be able to use it as a daily driver - It takes like 5x longer to finish a task, and most people lack the patience to wait for it. (But then it's more correct/doesn't need fixing) Opus 4.5 is good too, I think better in e.g. frontend design. But if you think it beats GPT 5.2 in every category, you are either too poor/stingy or have ADHD

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/31· 11:44 AM View on

Just 5 months ago, I was swearing at Claude 4 Sonnet like a Balkan uncle Models one-shotted the right thing only 20-30% of the time but did really stupid things the rest, and had to be handheld tightly Today they are much, much better. My psychology is a lot more at ease, and instead of swearing, I want to kiss them on the forehead most of the time Now I trust agents so much that I queue up 5-10 tasks before going to sleep. They work the whole night while I sleep and I wake up to resolved issues GPT 5.2 xhigh and Claude 4.5 Opus are already goated (GPT more so), can't wait for them to get even faster

Image hidden

@onusoz · /2025/12/31· 09:58 AM View on

Codex does not have support for subagents. I tried to use Claude Code to launch 8 Codex instances in parallel on separate tasks, but Opus 4.5 had difficulty following instructions So created a CLI tool to scan pending TODOs from a markdown file, and let me launch as many harnesses as I want (osolmaz/spawn on github) I currently use this for relatively read-only tasks like planning and finding root causes of bugs, because it's launching all the agents on the same repo and they might conflict Ideas: - Use @mitsuhiko's gh-issue-sync and run parallel agents directly on github issues - Create any new clones or worktrees for each task. I currently don't do this because I don't dare duplicate rust target dir 10x on my measly macbook air - Support modes other than tmux, e.g. launching a terminal like Ghostty - TUI for easy selection of issues/TODOs Other ideas are welcome!

Image hidden

@onusoz · /2025/12/29· 09:33 PM View on

cc @behackl, forgot to mention you in the original post

@onusoz · /2025/12/29· 07:54 PM View on

Friends of open source, we need your help! A lot of Manim Community accounts got compromised and deleted during Christmas Manim Community is a popular fork of @3blue1brown's original math animation engine Manim, and its accounts have over 5 YEARS of contributions, knowledge and following Apparently GitHub support already saw the request and in progress of restoring the GitHub org. But if anyone knows how to speed this up, if would be greatly appreciated! Unfortunately, the Discord and X accounts are deleted and less likely to return But there might still be a way to restore them, or at least the data? Re. Discord: Maybe @RhysSullivan's Answer Overflow has archived enough of the old server? That server contains YEARS of Q/A data and is vital for newcomers Re. X: Maybe someone high up can do something to restore the account? cc @nikitabier In the meanwhile, it would help a lot if you could follow the new account @manimcommunity and share this post! Thank you in advance!

Quoted post

Quoted post was not retrieved.

Image hidden

@onusoz · /2025/12/28· 05:17 PM View on

While a great feature, I never needed such a thing in Codex after GPT 5.2. It just one shots tasks without stopping So we have proof by existence that this problem can be solved without any such mechanism. Wish to see the same relentlessness in Anthropic models

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/27· 11:44 PM View on

2025 was the year of ̶a̶g̶e̶n̶t̶s̶ bugs Software felt much buggier compared to before, even from companies like Apple. Presumably because everyone started generating more code with AI Models are improving so hopefully 2026 will be the opposite. Even less bugs than pre-AI era

@onusoz · /2025/12/27· 03:29 PM View on

Have a long flight, so will think about this I have an internal 2023 TextCortex doc which models chatbots as state machines with internal and external states with immutability constraints on the external state (what is already sent to the user shall not be changed) Motivation was that a chatbot provider will always have state that they will want to keep hidden This was way before Responses and now deprecated Assistants API. It stood the test of time, because it was the most abstract thing I could think of @mitsuhiko is right about the risk of rushing to lock in an abstraction and locking in their weaknesses and faults Problem is, I could propose standards as much as I liked, but I don’t work at OpenAI or Anthropic, so nobody would care. Maybe a better place to start is open weights model libraries? To at least be able to demonstrate? What I know: it is against OpenAI’s or Anthropic’s self interests to create an interoperability layer that will accelerate their commoditization. Maybe Google, looking at their current market positioning? Or maybe we “wrappers” have a chance after all? There is a missing link between AI SDK, Langchain, and so on for other languages. We cannot keep duplicating same things in each ecosystem independently. We need to join forces and simplify all this!

@onusoz· Dec 14, 2025

So is somebody already building “LLVM but for LLM APIs” in stealth or not? We have numerous libraries @langchain, Vercel AI SDK, LiteLLM, OpenRouter, the one we have built at @TextCortex, etc. But to my knowledge, none of these try to build a language agnostic IR for interoperability between providers (or at least market themselves as such) Like some standard and set of tools that will not lock you in langchain, ai sdk or anything like that, something lower level and less opinionated I feel like this is a job for the new Agentic AI Foundation cc @linuxfoundation, so maybe they are already working on it? I desperately want to start on such a project, but feel like I might get sniped 2 months after Anybody has any information on all this? cc @mitsuhiko @badlogicgames @steipete

@onusoz · /2025/12/27· 02:52 PM View on

This was simply because webapp fails to create a post and fails silently. The UX is still not good on this app. Make sure to write your posts somewhere else to not lose them

@onusoz· Dec 17, 2025

I think X removed one of my posts yesterday about the new encrypted "Chat" rolling out to all users, and how you might lose all your past messages if you forget your passcode and do not have the app installed I can swear I clicked Post. Do they classify posts based on their topic and delete the ones they don't like? Anyway, we shall see, I am taking a screenshot and saving the URL.

@onusoz · /2025/12/26· 10:54 AM View on

I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper I've spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automate 95% of the low-level grunt work Acquiring these skills involved highly difficult, excruciating intellectual labor spanning many years, very similar to ML research. Doing tensor math, writing out the solver code, wondering why your solution is not converging, finally figuring out it was a sign typo after 2 days Kids these days both have it easy and hard. They can fast forward large chunks of the work, but then they will never understand things as deeply as someone who wrote the whole thing by hand I guess the more valuable skill now is being able to zoom in and out of abstraction levels quickly when needed. Using AI, but recognizing fast when it fails, learning what needs to be done, fixing it, zooming back out, repeat. Adaptive learning, a sort of "depth-on-demand". The quicker you can pick up new skills and knowledge, the more successful you will be

Onur Solmaz · Post · /2025/12/26

Depth on Demand

I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile

It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper

I’ve spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automate 95% of the low-level grunt work

Acquiring these skills involved highly difficult, excruciating intellectual labor spanning many years, very similar to ML research. Doing tensor math, writing out the solver code, wondering why your solution is not converging, finally figuring out it was a sign typo after 2 days

Kids these days both have it easy and hard. They can fast forward large chunks of the work, but then they will never understand things as deeply as someone who wrote the whole thing by hand

I guess the more valuable skill now is being able to zoom in and out of abstraction levels quickly when needed. Using AI, but recognizing fast when it fails, learning what needs to be done, fixing it, zooming back out, repeat. Adaptive learning, a sort of “depth-on-demand”. The quicker you can pick up new skills and knowledge, the more successful you will be

@onusoz · /2025/12/25· 11:30 PM View on

See the repo for the latest changes: https://t.co/YDevGg2rhz

@onusoz · /2025/12/25· 11:26 PM View on

If you have a bunch of docs in your repo, give it a try. It will use the timestamps of the commit that created the files while renaming. You can also run with --dry-run to see changes without applying them

Image hidden

@onusoz · /2025/12/25· 11:18 PM View on

Now you can migrate your repo to SimpleDoc with a single command: npx -y @simpledoc/simpledoc migrate Step by step wizard will add timestamps to your files based on your git history, add missing YAML frontmatter, update your AGENTS md file https://t.co/yrciS8KtEw

@onusoz· Dec 23, 2025

How to stop AI agents from littering your codebase with Markdown files? I wrote a new post on how to create documentations with AI agents, without having it add markdown files in your repo root, and have chronological order to the files it creates

Image hidden

@onusoz · /2025/12/25· 07:34 PM View on

@bcherny Would be great if I could queue messages like in Codex https://t.co/mC25gNKWo3

@onusoz· Dec 12, 2025

CLI coding tools should give more control over message queueing Codex waits until end of turn to handle user message, Claude Code injects as soon as possible after tool response/assistant reply There is no reason why we cannot have both! New post (link below):

Image hidden

@onusoz · /2025/12/24· 02:43 PM View on

It seems it's impossible to post something on Reddit these days, even when it is a pure text post without links in the body

@onusoz · /2025/12/23· 09:54 PM View on

_No text captured._

@onusoz· Dec 23, 2025

How to stop AI agents from littering your codebase with Markdown files? I wrote a new post on how to create documentations with AI agents, without having it add markdown files in your repo root, and have chronological order to the files it creates

Image hidden

@onusoz · /2025/12/23· 12:31 AM View on

Curious to hear what other hardcore agent users @simonw @mitsuhiko @steipete @badlogicgames think. I can't be the only one who does this. I feel like everybody ended up with the same workflow independent of each other, but somehow did not write about it (or I missed it)

@onusoz · /2025/12/23· 12:31 AM View on

How to stop AI agents from littering your codebase with Markdown files? I wrote a new post on how to create documentations with AI agents, without having it add markdown files in your repo root, and have chronological order to the files it creates

Image hidden

Onur Solmaz · Post · /2025/12/22

How to stop AI agents from littering your codebase with Markdown files

A simple documentation workflow for AI agents.

For setup instructions, skip to the How to setup SimpleDoc in your repo section.

If you have used AI agents such as Anthropic’s Claude Code, OpenAI’s Codex, etc., you might have noticed their tendency to create markdown files at the repository root:

...
├── API_SPEC.md
├── ARCHITECTURE.md
├── BACKLOG.md
├── CLAUDE.md
├── CODE_REVIEW.md
├── DECISIONS.md
├── ENDPOINTS.md
├── IMPLEMENTATION_PLAN.md
├── NOTES.md
├── QA_CHECKLIST.md
├── SECURITY_PLAN.md
└── src/
    └── ...
├── TEST_COVERAGE.md
├── TEST_REPORTS.md
├── TEST_RESULTS.md
...

The default behavior for models as of writing this in December 2025 is to create capitalized Markdown files at the repository root. This is of course very annoying, when you accidentally commit them and they accumulate over time.

The good news is, this problem is 100% solvable, by using a simple instruction in your AGENTS.md file:

**Attention agent!** Before creating ANY documentation, read the docs/HOW_TO_DOC.md file first. It contains guidelines on how to create documentation in this repository.

But what should be in docs/HOW_TO_DOC.md file and why is it a separate file? In my opinion, the instructions for solving this problem are too specific to be included in the AGENTS.md file. It’s generally a good idea to not inject them into every context.

To solve this problem, I developed a lightweight standard over time, for organizing documentation in a codebase. It is framework-agnostic, unopinionated and designed to be human-readable/writable (as well as agents). I was surprised to be not able to find something similar enough online, crystallized the way I wanted it to be. So I created a specification myself, called SimpleDoc.

Basically, it tells the agent to

Create documentation files in the docs/ folder, with YYYY-MM-DD prefixes and lowercase filenames, like 2025-12-22-an-awesome-doc.md, so that they will by default be chronologically sorted.
Always include YAML frontmatter with author, so that you can identify who created it without checking git history, if you are working in a team.
The exception here are timeless and general files like README.md, INSTALL.md, AGENTS.md, etc. which can be capitalized. But these are much rarer, so we can just follow the previous rules most of the time.

Here is your call to action to check the spec itself: SimpleDoc.

How to setup SimpleDoc in your repo

Run the following command from your repo root:

npx -y @simpledoc/simpledoc migrate

This starts an interactive wizard that will:

Migrate existing Markdown docs to SimpleDoc conventions (move root docs into docs/, rename to YYYY-MM-DD-… using git history, and optionally insert missing YAML frontmatter with per-file authors).
Ensure AGENTS.md contains the reminder line and that docs/HOW_TO_DOC.md exists (created from the bundled SimpleDoc template).

If you just want to preview what it would change:

npx -y @simpledoc/simpledoc migrate --dry-run

If you run into issues with the workflow or have suggestions for improvement, you can email me at onur@solmaz.io.

Happy documenting!

@onusoz · /2025/12/21· 12:33 PM View on

OpenAI won’t be able to monopolize this, the same reason Microsoft couldn’t monopolize the internet. The internet (of agents) is bigger than any one company

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/21· 10:56 AM View on

One tap @Revolut bank account at Berlin airport. Literally. Dispenses free card with instructions to login. One of the the most insane onboarding experiences I have ever seen

Image hidden

@onusoz · /2025/12/20· 08:58 AM View on

Slop bombing

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/18· 05:22 PM View on

Codex feature request: Let me queue up /model changes Currently, if I try to run /model while responding, it tells me that I can't do that while the model is responding But I often want to gauge thinking budget in advance, like run a straightforward task with low reasoning and then start another one with high reasoning cc @thsottiaux

@onusoz· Dec 12, 2025

CLI coding tools should give more control over message queueing Codex waits until end of turn to handle user message, Claude Code injects as soon as possible after tool response/assistant reply There is no reason why we cannot have both! New post (link below):

Image hidden

@onusoz · /2025/12/18· 04:26 PM View on

Literally the exact same thing happened to me back in 2018. Everybody learns not to use password auth with SSH the hard way https://t.co/NPqrXwqUUy

@onusoz · /2025/12/17· 09:06 PM View on

AI agents make any transductional task (like translation from language A to language B) trivial, especially when you can verify the output with compilers and tests The bottleneck is now curating the tests

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/17· 08:02 AM View on

I think X removed one of my posts yesterday about the new encrypted "Chat" rolling out to all users, and how you might lose all your past messages if you forget your passcode and do not have the app installed I can swear I clicked Post. Do they classify posts based on their topic and delete the ones they don't like? Anyway, we shall see, I am taking a screenshot and saving the URL.

@onusoz · /2025/12/14· 10:03 AM View on

very optimistic!

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/14· 09:15 AM View on

Crazy that @cursor_ai disabled Gemini 3 Pro on my installation, toggled it right back on. I wonder why, too many complaints maybe? That it’s hard to control? On another note, disabling models without notification is dishonest product behavior. I would at least appreciate getting a notification, even when it might be against a company’s interests @sualehasif996

@onusoz · /2025/12/14· 08:29 AM View on

So is somebody already building “LLVM but for LLM APIs” in stealth or not? We have numerous libraries @langchain, Vercel AI SDK, LiteLLM, OpenRouter, the one we have built at @TextCortex, etc. But to my knowledge, none of these try to build a language agnostic IR for interoperability between providers (or at least market themselves as such) Like some standard and set of tools that will not lock you in langchain, ai sdk or anything like that, something lower level and less opinionated I feel like this is a job for the new Agentic AI Foundation cc @linuxfoundation, so maybe they are already working on it? I desperately want to start on such a project, but feel like I might get sniped 2 months after Anybody has any information on all this? cc @mitsuhiko @badlogicgames @steipete

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/13· 04:47 PM View on

For those wondering what project this is: https://t.co/AzNS631PIC

@onusoz · /2025/12/13· 04:47 PM View on

This is how an agentic monorepo looks like. What was now a hurdle before is now a child's toy This side project started as a Python project earlier in 2025 Then I added an iOS app on top of it I rewrote the most important algorithms in Rust I rewrote the entire backend in Go and retired Python to be used purely for prototypes I wrote a webapp with Next.js With unit and integration tests for each component Lately written 99% by instructing agents Crazy mixed language programming going on in the background. Rust component used both by iOS app for offline and by go backend for online use case, FFI and all Number of lines in the repo: a couple 100k If you had told me I would be able do to all of this by myself 1 year ago, I would not have believed it

Image hidden

@onusoz · /2025/12/13· 10:20 AM View on

This is huge. Natively supported stacked PRs on GitHub would make life much easier, especially with human AND AI reviews AI reviews with Codex/Claude/Gemini/Cursor Bugbot integrations are becoming especially important in small teams who are generating huge amounts of code AI reviews don't work well if you don't split your work to diffs smaller than a few hundred lines of code, so stacked PRs are already an integral part of developer experience in agentic workflows

Quoted post

Quoted post was not retrieved.

@onusoz · /2025/12/12· 08:31 PM View on

Read more on my blog post https://t.co/uzCcOXuadB

@onusoz · /2025/12/12· 08:31 PM View on

CLI coding tools should give more control over message queueing Codex waits until end of turn to handle user message, Claude Code injects as soon as possible after tool response/assistant reply There is no reason why we cannot have both! New post (link below):

Image hidden

@onusoz · /2025/12/12· 11:40 AM View on

Codex v0.71 finally implements a more detailed way of storing permissions But they are still at user home folder level. Saving rules in a repo still seems TBD "execpolicy commands are still in preview. The API may have breaking changes in the future."

Image hidden

@onusoz · /2025/12/11· 10:08 AM View on

this is outrageous

Quoted post

Quoted post was not retrieved.

Onur Solmaz · Post · /2025/12/06

Agentic coding tools should give more control over message queueing

Below: Why agentic coding tools like Cursor, Claude Code, OpenAI Codex, etc. should implement more ways of letting users queue messages.

See Peter Steinberger’s tweet where he queues continue 100 times to nudge the GPT-5-Codex model to not stop while working on a predictable, boring and long-running refactor task:

Tweet embed disabled to avoid requests to X.

This is necessary while working with a model like GPT-5-Codex. The reason is that the model has a tendency to stop generating at certain checkpoints, due to the way it has been trained, even when you instruct it to FINISH IT UNTIL COMPLETION!!1!. So the only way to get it to finish something is to use the message queue.¹

But this isn’t the only use case for queued messages. For example, you can use the model to retrieve files into its context, before starting off a related task. Say you want to find the root cause of a <bug in component X>. Then you can queue

Explain how <component X> works in plain language. Do not omit any details.

Find the root cause of <bug> in <component X>.

This will generally help the model to find the root cause easier, or make more accurate predictions about the root cause, by having the context about the component.

Another example: After exploring a design in a dialogue, you can queue the next steps to implement it.

<Prior conversation exploring how to design a new feature>

Create an implementation plan for that in the docs/ folder. Include all the details we discussed

Commit and push the doc

Implement the feature according to the plan.

Continue implementing the feature until it is done. Ignore this if the task is already completed.

Continue implementing the feature until it is done. Ignore this if the task is already completed.

… you get the idea.

I generally queue like this when the feature is specified enough in the conversation already. If it’s underspecified, then the model will make up stuff.

When I first moved from Claude Code to Codex, the way it implemented queued messages was annoying (more on the difference below). But as I grew accustomed to it, it started to feel a lot like something I saw elsewhere before: chess premoves.

Chess???

A premove is a relatively recent invention in chess which is made possible by digital chess engines. When the feature is turned on, you don’t need to wait for your opponent to finish their move, and instead can queue your next move. It then gets executed automatically if the queued move is valid after your opponent’s move:

If you are fast enough, this let’s you move without using up your time in bullet chess, and even lets you queue up entire mate-in-N sequences, resulting in highly entertaining cases like the video above.

I tend to think of message queueing as the same thing: when applied effectively, it saves you a lot of time, when you can already predict the next move.

In other words, you should queue (or premove) when your next choice is decision-insensitive to the information you will receive in the next turn—so waiting wouldn’t change what you do, it would only delay doing it.

With this perspective, some obvious candidates for queuing in agentic codeing are rote tasks that come before and after “serious work”, e.g:

making the agent explain the codebase,
creating implementation plans,
fixing linting errors,
updating documentation during work before starting off a subsequent step,
committing and pushing,
and so on.

Different ways CLI agents implement queued messages

As I have mentioned above, Claude Code implements queued messages differently from OpenAI Codex. In fact, there are three main approaches that I can think of in this design space, which is based on when a user’s new input takes effect:

Post-turn queuing (FIFO²): User messages wait until the current action finishes completely before they’re handled. Example: OpenAI Codex CLI.
Boundary-aware queuing (Soft Interrupt): New messages are inserted at natural breakpoints, like after finishing a tool call, assistant reply or a task in the TODO list. This changes the model’s course of action smoothly, without stopping ongoing generation. Example: Claude Code, Cursor.
Immediate queuing (Hard Interrupt): New user messages immediately stop the current action/generation, discarding ongoing work and restarting the assistant’s generation from scratch. I have not seen any tool that implements this yet, but it could be an option for the impatient.

Why not implement all of them?

And here is my title-sake argument: When I move away from Claude Code, I miss boundary-aware queuing. When I move away from OpenAI Codex, I miss FIFO queueing.

I don’t see a reason why we could not implement all of them in all agentic tools. It could be controlled by a key combo like Ctrl+Enter, a submenu, or a button, depending on whether you are in the terminal or not.

Having the option would definitely make a difference in agentic workflows where you are running 3-4 agents in parallel.

So if you are reading this and are implementing an agentic coding tool, I would be happy if you took all this into consideration!

Pro tip: Don’t just queue continue by itself, because the model might get loose from its leash and start to make up and execute random tasks, especially after context compaction. Always specify what you want it to continue on, e.g. Continue handling the linting errors until none remain. Ignore this if the task is already completed. ↩
First-in, first-out. ↩

@onusoz · /2025/12/03· 04:38 AM View on

At least some people at OpenAI must be thinking about buying @astral_sh

Quoted post

Quoted post was not retrieved.