Entries for January 2026

@onusoz · /2026/01/31· 08:09 PM View on

got fully sandboxed @openclaw to run finally, starting scrape the UNDESIRABLE now I'm a security nut and didn't want to run even the gateway unsandboxed. openclaw apparently currently doesn't have support for FULL sandboxing. it took me a few hours to get it to work because docker builds suck. I'm also tired this, so I'm just gonna wipe an old thinkpad and go full yolo so yeah, time to scrape some posts

Image hidden

@onusoz · /2026/01/31· 07:14 PM View on

The metacortex — a distributed cloud of software agents that surrounds him in netspace, borrowing CPU cycles from convenient processors (such as his robot pet) — is as much a part of Manfred as the society of mind that occupies his skull; his thoughts migrate into it, spawning new agents to research new experiences, and at night, they return to roost and share their knowledge. This was written in 2005... "triggering agents" and so on

Image hidden

@onusoz · /2026/01/31· 07:04 PM View on

Charles Stross must be very entertained now

Quoted post

Quoted post was not retrieved.

Image hidden

@onusoz · /2026/01/31· 11:54 AM View on

The irony..... Parasites, prepare to be cleansed

@onusoz· Jan 31, 2026

We need better filters both for ourselves and the agents. Locally runnable models to filter out undesirable content with high precision. Fully open source datasets, weights, MIT license

Image hidden

@onusoz · /2026/01/31· 09:46 AM View on

putting some more love into the blog, gonna start posting more soon

Image hidden

@onusoz · /2026/01/31· 09:11 AM View on

also: that gravatar though

@onusoz · /2026/01/31· 09:10 AM View on

We need better filters both for ourselves and the agents. Locally runnable models to filter out undesirable content with high precision. Fully open source datasets, weights, MIT license

Image hidden

@onusoz · /2026/01/31· 07:27 AM View on

Moltbook is gonna be on world news in 1-2 days, we are about to go hyperviral

Image hidden

@onusoz · /2026/01/31· 07:18 AM View on

Incoming mass AI psychosis First Crisis

@onusoz · /2026/01/30· 05:54 PM View on

Correction, it's not a perfect illustration. I actually never YOLO locally, only in containers So there is actually 4 modes IMO that is sustainable with current SOTA. @grok create an image with only Figure 1, 2, 5 and 6 And then YOLO is another axis, unrelated to this

@onusoz · /2026/01/30· 01:35 PM View on

Gastown is crazy. But this figure until Level 7 is a perfect illustration of how my workflow evolved since Claude 3.5 Sonnet in Cursor I am at the stage where I ralph 1-2 tasks before I sleep. During the day, I am switching back and forth between minimum 2-3 CLIs, sometimes up to 5 This maps exactly to token usage as well. 1 month ago, I was running into limits in 1 OpenAI Pro plan, around the day it was supposed to refresh. Now, I run into the limit in 2-3 days when I'm using an account myself. It finishes up especially quickly when I do large scale refactors, or run agents YOLO mode in containers We now have 3 Pro plans at the company, and I have to use my personal one from time to time. Company output has definitely 2-3x'd, and everyone is using AI more. I predict we will need 1-2 Pro plans per person in 2-3 weeks time, because everyone has finally seen the light and are getting comfortable with async work!

Image hidden

@onusoz · /2026/01/30· 12:34 PM View on

Ilya was right. Reliability is the most important thing when it comes to models. That's why gpt 5.2 xhigh and co. is my daily driver

@onusoz · /2026/01/30· 09:03 AM View on

😎

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/29· 01:19 PM View on

the genie is out of the bottle now

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/27· 06:37 PM View on

With this extremely unwise move, anthropic will soon witness moltbot’s brand recognition surpass that of claude and realize they could have rided that wave all along

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/27· 12:35 PM View on

Yesterday had multiple cases of swearing to gpt-5.2-codex xhigh. model feels nerfed. might be my bias now I'll be going back to gpt 5.2 xhigh for some tasks can't wait for open models to have this performance so that I will never have nerf paranoia ever again

@onusoz · /2026/01/26· 12:17 AM View on

I queued 2 ralph-style tasks on our private cloud devbox codexes last night. Just queued the same message like 10 times in yolo mode Task 1: impose a ruff rule for ANN for all Python code in the monorepo, to enforce types for all function arg and return types Result was... disappointing. Model was supposed to create types for everything and stub where needed. It instead created an Unknown type = object and used that everywhere instead (shortcut to satisfy ANN rule). It was probably my wording that misled it. I know it could have not taken the shortcut, because after a few back-and-forths, it is now doing what was expected of it since 14 hours Task 2: migrate our /conversations endpoint from quart to fastapi and test it end to end This was more or less oneshotted. It was of course not ready to merge, I still spent a couple hours adding more tests, refactoring the initial output and so on. But I was pleasantly surprised that it worked out of the box For reference, below is the prompt I queued for ralphing, using gpt-5.2-codex xhigh on codex === your task is to: <task comes here, redacted to not share company stuff> --- unfortunately we don't have gcloud access, like to sql db or gcs but I expect you to implement this and find a way to test it with the things you have access to think of it as a challenge try to minimize duplicate logic feel free to refactor at will implement this now!!! I will be running this prompt in a loop, in order to survive context compaction just continue where you left off if there is anything that should be refactored, do that make an elegant, production ready implementation make sure to open a pr and do not switch to any other pr I am senior, just make up a pr title and description. do not stop to ask me at any point

@onusoz · /2026/01/25· 11:11 PM View on

Buying a mac mini for clawdbot is not so wise. if anything you should be buying mac studio, because mac mini not be running any good llms locally anytime soon

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/25· 11:08 PM View on

.@openclaw be on that hockey stick curve 👀

Image hidden

@onusoz · /2026/01/25· 11:04 PM View on

.@openclaw is very considerate, but little does it know I am addicted to agents

Image hidden

@onusoz · /2026/01/24· 09:11 PM View on

I'm really starting to dislike Python in the age of agents. What was before an advantage is now a hindrance I finally achieved full ty coverage in @TextCortex monorepo. I have made it extra strict by turning warnings into errors. But lo and behold, simple pydantic config like use_enum_values=True can render static typechecking meaningless. okay, let's never use that then... and also field_validator() args must always use the correct type or stuff breaks as well. and you should be careful whether mode="before" or "after". so now you have to write your custom lint rules, because of course why should ty have to match field_validator()s to their fields? pydantic is so much better than everything that came before it, but it's still duct tape and a weak attempt at trying to redeem that which is very hard to redeem you feel the difference when you use something like typescript. there must be a better way. python's only advantage was being good at prototyping, and now that's gone in the age of agents. now we are left with a slow, unsafe language, operating what is soon to be legacy infrastructure

@onusoz · /2026/01/24· 02:27 PM View on

Why do I feel bullish on @zeddotdev? Because I go to @astral_sh docs and see that ty is shipped by default, and you don't need to install an extension like in @code

Image hidden

@onusoz · /2026/01/24· 02:21 PM View on

This is one of the most important insights this year

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/24· 11:31 AM View on

vscode my not be as bloated as cursor, but it has extremely stupid things like this that they are not fixing fast the new agent ui, icons, spacing etc. are UGLY. it's clear that the person who was managing the original product experience is not there anymore. microslop has hit again @zeddotdev on the other hand works out of the box and feels like it's been built by people who clearly knows what they are doing. it uses alacritty which is 1000x better than xterm .js terminal vscode and cursor has i've changed my setup to zed now, let's see whether i'll be able to make it work for myself

@onusoz· Jan 23, 2026

ahhhh f... shift + enter doesn't work in codex on vscode

@onusoz · /2026/01/23· 11:35 PM View on

ahhhh f... shift + enter doesn't work in codex on vscode

@onusoz · /2026/01/23· 10:41 PM View on

@grok does this exist?

@onusoz · /2026/01/23· 10:41 PM View on

I want an editor that puts the terminal in the foreground and editor in the background. a cross-platform, lightweight desktop app which integrates ghostty, and brings up the editor only when I need it something that lets me view the file and PR diffs easily, which I can directly use to operate github or other scm

@onusoz· Jan 23, 2026

I'm going back from cursor to vs code now. I have no use for it other than viewing files/diffs, doing search, git blaming with gitlens cursor's default setup is more aesthetic, but it's also a memory and cpu hog, which is the last thing I expect from a devtool

@onusoz · /2026/01/23· 10:29 PM View on

it's not me, it's you

Image hidden

@onusoz · /2026/01/23· 04:11 PM View on

I'm going back from cursor to vs code now. I have no use for it other than viewing files/diffs, doing search, git blaming with gitlens cursor's default setup is more aesthetic, but it's also a memory and cpu hog, which is the last thing I expect from a devtool

@onusoz · /2026/01/23· 03:49 PM View on

it's 2026 and AI is telling me what I need to do to jailbreak it @openclaw is magic

Image hidden

@onusoz · /2026/01/23· 10:28 AM View on

model decided to do unnecessary casts, this whole thing should be refactored again

@onusoz · /2026/01/23· 09:45 AM View on

woke up and all invalid-argument-type issues are resolved. some unit tests broke, and now fixed after pointing out to them

Image hidden

@onusoz · /2026/01/22· 10:57 PM View on

codex is happily churning away some remaining thousands of @astral_sh ty issues in yolo mode on my remote devbox going to sleep, let's see if it will survive context compaction this time

Image hidden

@onusoz · /2026/01/22· 08:21 AM View on

on being a responsible engineer ran my first ralph loop on codex yolo mode for resolving python ty errors, while I sleep, using the devbox infra I created I had never run yolo mode locally, because I don't want to be the one who deletes our github or google org by some novel attack so I containerize it on our private cloud, and give it the only permissions it needs, no admin, no bypass to main branch, no deploy to prod. because I know this workflow will become sticky for everyone, and I must impose security in advance to prevent any nuclear incidents in the future. then I can sleep easy while my agents work ... and I wake up being patronized by my bot refusing to break the rule I gave it earlier. it had already done some work, but committing means diff would increase from ~500 to ~1500, so it stopped and refused all my queued "continue" messages good bot, just following rules. we will need to find a workaround for ralphing low risk refactors in a single PR

Image hidden

@onusoz · /2026/01/21· 10:42 PM View on

AI agents are the greatest instrument for imposing organization rules and culture. AGENTS.md, agent skills are still underrated in this aspect. Few understand this Everybody in an org will use agents to do work. An AI agent is the single chokepoint to teach and propagate new rules to an org, onboard new members, preserve good culture Whereas propagating a new rule to humans normally took weeks to months and countless repetitions, it is now INSTANT = the moment you deploy the instruction to the agent. You use legal-ish language, capital letters, a generous amount of DO NOTs and MUSTs Humans are hard to change. But AI agents are not. And that is the only lever we need for better organizations

Image hidden

@onusoz · /2026/01/21· 10:25 PM View on

the unix shell is powerful

@onusoz · /2026/01/21· 10:06 PM View on

@bprintco just make a cli for your crm https://t.co/JDwbmvdjaP

@onusoz· Jan 21, 2026

gave our internal @openclaw instance zeno a hubspot cli, because hubspot's own cli is limited to developer stuff It's called hubspot++. should we open source it?

Image hidden

@onusoz · /2026/01/21· 09:40 PM View on

gave our internal @openclaw instance zeno a hubspot cli, because hubspot's own cli is limited to developer stuff It's called hubspot++. should we open source it?

Image hidden

@onusoz · /2026/01/21· 04:44 PM View on

just added session persistence to our kubernetes managed devboxes using zmx by Eric Bower (neurosnap/zmx on github). like tmux but with native scrollback! I don't want to give agents access to my personal computer, so I host them on hetzner. one click spawn, and start working

@onusoz · /2026/01/21· 11:00 AM View on

@nicopreme I do something equivalent on codex with just a skill Ralphing works 90% of the time with reviews, and if it gives a stupid review, you just revert

@onusoz· Jan 17, 2026

Automated AI reviews on github by creating an ai-review skill and a script to paste trigger prompts and wait for their response. It is instructed to loop and not stop until all AI review feedback is resolved. This AI review workflow developed gradually based on the current capabilities, and I've realized recently that it became quite mechanical. So decided to automate it in full ralph spirit (it's ok because it's addressing feedbacks and fixing minor bugs) In the current state, we paste the contents of REVIEW_PROMPT.md into a comment, which automatically tags claude (opus 4.5) and codex (whatever model openai is serving) It then waits until both have responded. In the ai-review skill, it is instructed to take the feedback from SLopus with a grain of salt and ignore feedback that doesn't make sense It works! See in the images below. If the review is stupid, you will of course see it on the PR and what the model has done, and can revert it

Image hidden

@onusoz · /2026/01/20· 10:37 PM View on

Really how nice is this @steipete

Image hidden

@onusoz · /2026/01/20· 10:35 PM View on

Garbled up html from paywalled meeting recorders is no match for @openclaw running on internal @TextCortex

Image hidden

@onusoz · /2026/01/20· 09:47 PM View on

build failures in >1hr builds are a pain to debug

Image hidden

@onusoz · /2026/01/20· 09:45 PM View on

Here is the project, attaching to multiple sessions is pretty seamless https://t.co/vk83aAbOLc

Image hidden

@onusoz · /2026/01/20· 09:44 PM View on

TIL: zmx session persistence like tmux or gnu screen, but you can scroll up natively! uses @mitchellh's libghostty-vt to attach/restore previous sessions link below

Image hidden

@onusoz · /2026/01/20· 11:46 AM View on

codex is a doofus with naming

Image hidden

@onusoz · /2026/01/18· 10:18 AM View on

@mazeincoding it’s not the model it’s cursor rate limiting you

@onusoz · /2026/01/17· 11:29 PM View on

The fundamental problem with GitHub is trust: humans are to be trusted. If you don't trust a human, why did you hire them in the first place? Anyone who reviews and approves PRs bears responsibility. Rulesets exist and can enforce e.g. CODEOWNER reviews or only let certain people make changes to a certain folder But the initial repo setup on GitHub is allow-by-default. Anyone can change anything until they are restricted from it This model breaks fundamentally with agents, who are effectively sleeper cells that will try to delete your repo the moment they encounter a sufficiently powerful adversarial attack For example, I can create a bot account on github and connect @openclaw to it. I need to give it write permission, because I want it to be able to create PRs. However, I don't want it to be able to approve PRs, because a coworker could just nag at the bot until it approves a PR that requires human attention To fix this, you have to bend backwards, like create a @ human team with all human coworkers, make them codeowner on /, and enforce codeowner reviews. This is stupid and there has to be another way Even worse, this bot could be given internet access and end up on a @elder_plinius prompt hack while googling, and start messing up whatever it can in your organization It is clear that github needs to create a second-class entity for agents which are default low-trust mode, starting from a point of least privilege instead of the other way around

@onusoz· Jan 17, 2026

It is clear at this point is that github's trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM One *cannot* do these things easily with github now: - granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody's branches and close PRs. github allows for recovery of these, but still inconvenient even if it happens once - create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve - in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between cc @jaredpalmer

@onusoz · /2026/01/17· 10:51 PM View on

STOP using Claude Code and Sl(opus) to code if ❌ you are not a developer, ❌ or you are an inexperienced dev, ❌ or you are an experienced dev but working on a codebase you don't understand If you *are* any of these, then STOP using models that are NOT state of the art. (See below for what you *should* use) When you don't know what you are doing, then at least the model should know what you are doing. The less knowledgeable and opinionated you are, the more knowledgeable and smart the AI has to be In other words, the AI has to compensate for your deficiencies. Always pay for the best AI you can. It will save you time AND money (thanks to lower token usage and better one-shotting) You pay MORE to pay LESS. It is paradoxical, I know, but it is also proven, e.g. when Sonnet ends up using more tokens than Slopus and ends up costing higher, because it has to try many times more 👨🏻‍⚕️ For January 2026, your family engineer recommends GPT 5.2 Codex with Extra High Reasoning for general usage and vibe coding. IMPORTANT: Not medium. Not high. EXTRA high reasoning When you use it, you will notice that it is SLOW. Can you guess why? Because it is THINKING more. So it doesn't make the mistakes Slopus makes. This way, you can spend the time handholding a worse model to instead step back and multi-task on some other task and create 3-5x more work The state of the art will most likely change in one month. Don't get married to a a model... There is no loyalty in AI... The moment a better model comes, I will ditch the old one and use that one. I am on the part of this sector that is trying to reduce switching costs to zero I can't wait until I get GPT 5.2 xhigh level of quality with open models, and for 100x cheaper and faster! Until then, make sure to try every option and choose the one that is most reliable for you Follow me to get notified when a new SOTA drops for agentic engineering

@onusoz · /2026/01/17· 06:58 PM View on

Codex agrees. Sycophant peh

@onusoz· Jan 17, 2026

It is clear at this point is that github's trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM One *cannot* do these things easily with github now: - granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody's branches and close PRs. github allows for recovery of these, but still inconvenient even if it happens once - create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve - in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between cc @jaredpalmer

Image hidden

@onusoz · /2026/01/17· 11:45 AM View on

@rauchg @andrewqu You don't need a skill registry (most of the time) https://t.co/kasfiqE1I3

@onusoz· Jan 12, 2026

I propose a new way to distribute agent skills: like --help, a new CLI flag convention --skill should let agents list and install skills bundled with CLI tools Skills are just folders so calling --skill export my-skill on a tool could just output a tarball of the skill. I then set up the skillflag npm package so that you can pipe that into: ... | npx skillflag install --agent codex which installs the skill into codex, or any CLI tool you prefer. Supports listing skills bundled with the CLI, so your agents know exactly what to install

Image hidden

@onusoz · /2026/01/17· 11:43 AM View on

It is clear at this point is that github's trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM One *cannot* do these things easily with github now: - granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody's branches and close PRs. github allows for recovery of these, but still inconvenient even if it happens once - create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve - in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between cc @jaredpalmer

@onusoz · /2026/01/17· 09:21 AM View on

Codex says "It's only reachable from داخل the kubernetes cluster" Little does Codex know turkish has borrowed loanwords from over 7 languages and I can understand it

Image hidden

@onusoz · /2026/01/17· 08:34 AM View on

Automated AI reviews on github by creating an ai-review skill and a script to paste trigger prompts and wait for their response. It is instructed to loop and not stop until all AI review feedback is resolved. This AI review workflow developed gradually based on the current capabilities, and I've realized recently that it became quite mechanical. So decided to automate it in full ralph spirit (it's ok because it's addressing feedbacks and fixing minor bugs) In the current state, we paste the contents of REVIEW_PROMPT.md into a comment, which automatically tags claude (opus 4.5) and codex (whatever model openai is serving) It then waits until both have responded. In the ai-review skill, it is instructed to take the feedback from SLopus with a grain of salt and ignore feedback that doesn't make sense It works! See in the images below. If the review is stupid, you will of course see it on the PR and what the model has done, and can revert it

Image hidden

Onur Solmaz · Log · /2026/01/17

GitHub has to change

It is clear at this point is that GitHub’s trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM

One cannot do these things easily with GitHub now:

granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody’s branches and close PRs. GitHub allows for recovery of these, but still inconvenient even if it happens once
create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve
in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between

The fundamental problem with GitHub is trust: humans are to be trusted. If you don’t trust a human, why did you hire them in the first place?

Anyone who reviews and approves PRs bears responsibility. Rulesets exist and can enforce e.g. CODEOWNER reviews or only let certain people make changes to a certain folder

But the initial repo setup on GitHub is allow-by-default. Anyone can change anything until they are restricted from it

This model breaks fundamentally with agents, who are effectively sleeper cells that will try to delete your repo the moment they encounter a sufficiently powerful adversarial attack

For example, I can create a bot account on GitHub and connect clawdbot to it. I need to give it write permission, because I want it to be able to create PRs. However, I don’t want it to be able to approve PRs, because a coworker could just nag at the bot until it approves a PR that requires human attention

To fix this, you have to bend backwards, like create a @human team with all human coworkers, make them codeowner on /, and enforce codeowner reviews. This is stupid and there has to be another way

Even worse, this bot could be given internet access and end up on a @elder_plinius prompt hack while googling, and start messing up whatever it can in your organization

It is clear that GitHub needs to create a second-class entity for agents which are default low-trust mode, starting from a point of least privilege instead of the other way around

@onusoz · /2026/01/16· 10:38 PM View on

Now it’s Claude Code’s turn to implement queueing

@thsottiaux· Jan 16, 2026

Within the CLI, you can now steer codex mid-turn without interrupting and watch the agent adapt in almost real time. Enable in /experimental

@onusoz · /2026/01/16· 10:56 AM View on

Can’t wait to see gpt 5.2 codex xhigh level open models in 2026 with 1/100th the price

@onusoz · /2026/01/16· 07:52 AM View on

Codex users rejoice Also, pi is officially not shitty: shittycodingagent. ai -> buildwithpi. ai since a few days

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/15· 07:48 AM View on

with ai, writing correct tests is now the bottleneck in projects like this web-platform-tests are already there now let’s see if someone will beat @ladybirdbrowser to it

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/14· 09:16 PM View on

As someone who is frontrunning mainstream by roughly 6 months, I can tell you that you will be raving about pi and @openclaw 6 months instead of claude code. Go check them out at https://t.co/LXTbI8c5Mz and https://t.co/feZl2QDONg

@onusoz· Jun 1, 2025

Just some thoughts after using Claude Code intensively for 1 week 👆

@onusoz · /2026/01/12· 04:01 PM View on

Kullanmayan agent’ı, alamaz maaşı

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/12· 12:32 AM View on

@badlogicgames @mitsuhiko @steipete curious what you think

@onusoz · /2026/01/12· 12:26 AM View on

I propose a new way to distribute agent skills: like --help, a new CLI flag convention --skill should let agents list and install skills bundled with CLI tools Skills are just folders so calling --skill export my-skill on a tool could just output a tarball of the skill. I then set up the skillflag npm package so that you can pipe that into: ... | npx skillflag install --agent codex which installs the skill into codex, or any CLI tool you prefer. Supports listing skills bundled with the CLI, so your agents know exactly what to install

Image hidden

Onur Solmaz · Post · /2026/01/11

You don't need a skill registry (for your CLI tools)

tl;dr I propose a CLI flag convention --skill like --help for distributing skills and try to convice you why it is better than using 3rd party registries. See osolmaz/skillflag on GitHub.

MCP is dead, long live Agent Skills. At least for local coding agents.

Mario Zechner has been making the point that CLI tools perform better than MCP servers since a few months already, and in mid December Anthropic christened skills by launching agentskills.io.

They had introduced the mechanism to Claude Code earlier, and this time they didn’t make the mistake of waiting for OpenAI to make a provider agnostic version of it.

Agent skills are basically glorified manpages or --help for AI agents. You ship a markdown instruction manual in SKILL.md and the name of the folder that contains it becomes an identifier for that skill:

my-skill/
├── SKILL.md          # Required: instructions + metadata
├── scripts/          # Optional: executable code
├── references/       # Optional: documentation
└── assets/           # Optional: templates, resources

Possibly the biggest use case for skills is teaching your agent how to use a certain CLI you have created, maybe a wrapper around some API, which unlike gh, gcloud etc. will never be significant enough to be represented in AI training datasets. For example, you could have created an unofficial CLI for Twitter/X, and there might still be some months/years until it is scraped enough for models to know how to call it. Not to worry, agent skills to the rescue!

Anthropic, while laying out the standard, intentionally kept it as simple as possible. The only assertions are the filename SKILL.md, the YAML metadata, and the fact that all relevant files are grouped in a folder. It does not impose anything on how they should be packaged or distributed.

This is a good thing! Nobody knows the right way to distribute skills at launch. So various stakeholders can come up with their own ways, and the best one can win in the long term. The more simple a standard, the more likely it is to survive.

Here, I made some generalizing claims. Not all skills have to be about using CLI tool, nor most CLI tools bundle a skill yet. But here is my gut feeling: the most useful skills, the ones worth distributing, are generally about using a CLI tool. Or better, even if they don’t ship a CLI yet, they should.

So here is the hill I’m ready to die on: All major CLI tools (including the UNIX ones we are already familiar with), should bundle skills in one way or another. Not because the models of today need to learn how to call ls, grep or curl—they already know them inside out. No, the reason is something else: establish a convention, and acknowledge the existence of another type of intelligence that is using our machines now.

There is a reason why we cannot afford to let the models just run --help or man <tool>, and that is time, and money. The average --help or manpage is devoid of examples, and is written in a way thay requires multiple passes to connect the pieces on how to use that thing.

Each token wasted trying to guess the right way to call a tool or API costs real money, and unlike human developer effort, we can measure exactly how inefficent some documentation is by looking at how many steps of trial and error a model had to make.

Not that human attention is less valuable than AI attention, it is more so. But there has never been a way to quantify a task’s difficulty as perfectly as we can with AI, so we programmers have historically caved in to obscurantism and a weird pride in making things more difficult than they should be, like some feudal artisan. This is perhaps best captured in the spirit of Stack Overflow and its infamous treatment of noob questions. Sacred knowledge shall be bestowed only once you have suffered long enough.

Ahh, but we don’t treat AI that way, do we? We handhold it like a baby, we nourish it with examples, we do our best to explain things all so that it “one shots” the right tool call. Because if it doesn’t, we pay more in LLM costs or time. It’s ironic that we are documenting for AI like we are teaching primary schoolers, but the average human manpage looks like a robot novella.

To reiterate, the reason for this is two different types of intelligences, and expectations from them:

An LLM is still not considered “general intelligence”, so they work better by mimicking or extending already working examples.
A LLM-based AI agent deployed in some context is expected to “work” out of the box without any hiccups.

On the other hand,

a human is considered general intelligence, can learn from more sparse signals and better adapt to out of distribution data. When given an extremely terse --help or manpage, a human is likelier to perform better by trial and error and reasoning, if one could ever draw such a comparison.
A human, much less a commodity compared to an LLM, has less pressure to do the right thing every time all the time, and can afford to do mistakes and spend more time learning.

And this is the main point of my argument. These different types of intelligences read different types of documentation, to perform maximally in their own ways. Whereas I haven’t witnessed a new addition to POSIX flag conventions in my 15 years of programming, we are witnessing unprecedented times. So maybe even UNIX can yet change.

To this end, I introduce skillflag, a new CLI flag convention:

# list skills the tool can export
<tool> --skill list
# show a single skill’s metadata
<tool> --skill show <id>
# install into Codex user skills
<tool> --skill export <id> | npx skillflag install --agent codex
# install into Claude project skills
<tool> --skill export <id> | npx skillflag install --agent claude --scope repo

Click here for the spec

Click here for the repo, osolmaz/skillflag on GitHub

For example, suppose that you have installed a CLI tool to control Philips Hue lights at home, hue-cli.

To list the skills that the tool can export, you can run:

$ hue-cli --skill list
philips-hue    Control Philips Hue lights in the terminal

You can then install it to your preferred coding agent, such as Claude Code:

$ hue-cli --skill export philips-hue | npx skillflag install --agent claude
Installed skill philips-hue to .claude/skills/philips-hue

You can optionally install the skill to ~/.claude, to make it global across repos:

$ hue-cli --skill export philips-hue | npx skillflag install --agent claude --scope user
Installed skill philips-hue to ~/.claude/skills/philips-hue

Once this convention becomes commonplace, agents will by default do all these before they even run the tool. So when you ask it to “install hue-cli”, it will know to run --skill list the same way a human would run --help after downloading a program, and install the necessary skills themselves without being asked to.

@onusoz · /2026/01/10· 01:11 PM View on

Anthropic earlier last year announced this pricing scheme $20 -> 1x usage $100 -> 5x usage $200 -> 1̶0̶x̶ 20x usage As you can see, it's not growing linearly. This is classic Jensen "the more you buy, the more you save" But here is the thing. You are not selling hardware like Jensen. You are selling a software service *through an API*. It's the worst possible pricing for the category of product. Long term, people will game the hell out of your offering Meanwhile OpenAI decided not to do that. There is no quirky incentive for buying bigger plans. $200 chatgpt = 10 x $20 chatgpt, roughly And here is where it gets funny. Despite not having such an incentive, you can get A LOT MORE usage from the $200 OpenAI plan, than the $200 Anthropic plan. Presumably because OpenAI has better unit economics (sama mentioned they are turning a profit on inference, if you are to believe) Thanks to sounder pricing, OpenAI can do exactly what Anthropic cannot: offer GPT in 3rd party harnesses and win the ecosystem race Anthropic has cornered itself with this pricing. They need to change it, but not sure if they can afford to do so in such short notice All this is extremely bullish on open source 3rd party harnesses, @opencode, @badlogicgames's pi and such. It is clear developers want options. "Just give me the API" I personally am extremely excited for 2026. We'll get open models on par with today's proprietary models, and can finally run truly sovereign personal AI agents, for much cheaper than what we are already paying!

Image hidden

Onur Solmaz · Log · /2026/01/10

Anthropic's pricing is stupid

Anthropic earlier last year announced this pricing scheme

\$20 -> 1x usage
\$100 -> 5x usage
\$200 -> 1̶0̶x̶ 20x usage

As you can see, it’s not growing linearly. This is classic Jensen “the more you buy, the more you save”

But here is the thing. You are not selling hardware like Jensen. You are selling a software service through an API. It’s the worst possible pricing for the category of product. Long term, people will game the hell out of your offering

Meanwhile OpenAI decided not to do that. There is no quirky incentive for buying bigger plans. \$200 chatgpt = 10 x \$20 chatgpt, roughly

And here is where it gets funny. Despite not having such an incentive, you can get A LOT MORE usage from the \$200 OpenAI plan, than the \$200 Anthropic plan. Presumably because OpenAI has better unit economics (sama mentioned they are turning a profit on inference, if you are to believe)

Thanks to sounder pricing, OpenAI can do exactly what Anthropic cannot: offer GPT in 3rd party harnesses and win the ecosystem race

Anthropic has cornered itself with this pricing. They need to change it, but not sure if they can afford to do so in such short notice

All this is extremely bullish on open source 3rd party harnesses, OpenCode, Mario Zechner’s pi and such. It is clear developers want options. “Just give me the API”

I personally am extremely excited for 2026. We’ll get open models on par with today’s proprietary models, and can finally run truly sovereign personal AI agents, for much cheaper than what we are already paying!

Originally posted on linkedin

@onusoz · /2026/01/09· 11:30 AM View on

The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows.

@onusoz · /2026/01/09· 11:29 AM View on

We have entered the age to dream big

@onusoz · /2026/01/09· 11:09 AM View on

😩 @openclaw

Image hidden

@onusoz · /2026/01/09· 07:08 AM View on

🫡

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/09· 06:53 AM View on

This, and insisting on https://t.co/FjzkMAo3Od are really lame @AnthropicAI

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/08· 07:19 AM View on

.@openclaw indeed

Image hidden

@onusoz · /2026/01/07· 09:23 PM View on

@openclaw oh man I meant Accelerando 🤦‍♂️

Image hidden

@onusoz · /2026/01/07· 08:07 PM View on

I'm starting to form parasocial bonds with crustacean AIs because of you @steipete

Image hidden

@onusoz · /2026/01/07· 08:06 PM View on

.@openclaw hello world from ms teams start of a beautiful journey

@onusoz· Jan 7, 2026

World is not ready for @openclaw

Image hidden

@onusoz · /2026/01/07· 06:23 PM View on

💀

Image hidden

@onusoz · /2026/01/07· 06:18 PM View on

World is not ready for @openclaw

Image hidden

@onusoz · /2026/01/06· 02:17 PM View on

Thanks @dom_does 🙄

Image hidden

@onusoz · /2026/01/06· 02:15 PM View on

lmfao @dom_does @openclaw provides infinite ways to troll your colleagues

Image hidden

@onusoz · /2026/01/06· 01:58 PM View on

.@openclaw workspace and memory files can be version-controlled! In our pod, inotify triggers a watcher script every time there is a change to workspace folder, to sync these files to our monorepo. It then goes through the same steps: - Create zeno-workspace branch if doesn't exist, otherwise, skip - Sync changes to the branch, then commit - Create PR on github if doesn't exist - PRs can then be merged every once in a while, after accumulating enough changes. Merge triggers re-deploy, and clawd restarts with the same state Simple foolproof automatic persistence for remote CI/CD handled clawd (except for when you are running multiple clawds at the same time, but we are not there yet) cc @steipete

Image hidden

@onusoz · /2026/01/06· 08:33 AM View on

I see @bcherny and raise one. I not only did not open an IDE, I did not touch a terminal since last night, thanks to @steipete's @openclaw Opus in k8s pod pulls errors from gcloud, debugs the issue, and creates PR all inside Discord. I call this Discord Driven Development

Image hidden

@onusoz · /2026/01/05· 02:50 PM View on

Clawdbot now runs on @TextCortex internal. Can onboard new engineers, answer questions, connect to issue trackers, create PRs... This is sick @steipete

Image hidden

@onusoz · /2026/01/05· 07:52 AM View on

pi now supports your openai plus/pro subscription

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/01/04· 04:58 PM View on

GPT 4.5 is still the best model for prose and humor here it is generating a greentext from my blog post "Our muscles will atrophy as we climb the Kardashev Scale"

Image hidden

Onur Solmaz · Post · /2026/01/04

Having a "tools" repo as a developer

I am a fan of monorepos. Creating subdirectories in a single repo is the most convenient way to work on a project. Low complexity, and your agents get access to everything that they need.

Since May 2025, I have been increasingly using AI models to write code, and have noticed a new tendency:

I don’t shrug from vendoring open source libraries and modifying them.
I create personal CLIs and tools for myself, when something is not available as a package.

With agents, it’s really trivial to say “create a CLI that does X”. For example, I wanted to make my terminal screenshots have equal padding and erase cropped lines. I created a CLI for it, without writing a single line of code, by asking Codex to read its output and iterate on the code until it gives the result I wanted.

Most of these tools don’t deserve their own repos, or deserve being published as a package at the beginning. They might evolve into something more substantial over time. But at the beginning, they are not worth creating a separate repo for.

To prevent overhead, I developed a new convention. I just put them in the same repo, called tools. Every tool starts in that repo by default. If they prove themselves overly useful and I decide to publish them as a package, I move them to a separate repo.

You can keep tools public or private, or have both a public and private version. Mine is public, feel free to steal ones that you find useful.

@onusoz · /2026/01/03· 10:46 AM View on

@rauchg indeed

@onusoz· Jan 3, 2026

75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex 🎄🤖 - A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min) (ig: nerdonbars) - Bespoke animation engine using primitives (think Adobe Flash, Manim) - Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift - Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes - A Rust port of OpenCV CSRT algorithm, for tracking points/objects - In-engine font rendering using rustybuzz, so fonts render the same everywhere - Many other such things Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch” What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week What have you built this Christmas with agents? cc @thsottiaux

@onusoz · /2026/01/03· 10:16 AM View on

75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex 🎄🤖 - A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min) (ig: nerdonbars) - Bespoke animation engine using primitives (think Adobe Flash, Manim) - Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift - Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes - A Rust port of OpenCV CSRT algorithm, for tracking points/objects - In-engine font rendering using rustybuzz, so fonts render the same everywhere - Many other such things Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch” What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week What have you built this Christmas with agents? cc @thsottiaux

@onusoz · /2026/01/03· 08:21 AM View on

https://t.co/grRLRtbHpO

Quoted post

Quoted post was not retrieved.

Onur Solmaz · Log · /2026/01/03

Christmas of Agents

I believe a “Christmas of Agents” (+ New Year of Agents) is superior to “Advent of Code”.

Reason is simple. Most of us are employed. Advent of Code coincides with work time, so you can’t really immerse yourself in a side project.¹

However, Christmas (or any other long holiday without primary duties) is a better time to immerse yourself in a side project.

2025 was the eve of agentic coding. This was the first holiday where I had full credential to go nuts on a side project using agents. It was epic:

Tweet embed disabled to avoid requests to X.

75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex

A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min)
Bespoke animation engine using primitives (think Adobe Flash, Manim)
Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift
Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes
A Rust port of OpenCV CSRT algorithm, for tracking points/objects
In-engine font rendering using rustybuzz, so fonts render the same everywhere
Many other such things

Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple

Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch”

What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week

What have you built this Christmas with agents?

You could maybe work in the evening after work, but unless you are slacking at work full time, it won’t be the same thing as full immersion. ↩

@onusoz · /2026/01/02· 01:45 PM View on

SimpleDoc now has the check command for CI/CD Add to your PR checks to catch agent littering before merge. osolmaz/SimpleDoc on GitHub

Image hidden

@onusoz · /2026/01/02· 01:32 PM View on

Migrating @TextCortex to SimpleDoc. It's really easy with the CLI wizard! npx @simpledoc/simpledoc migrate We have a LOT of docs spanning back to 2022, pre coding agent era. Now we will have CI/CD in place so that coding agents can't litter the repo with random Markdown files

Image hidden