Entries for January 2026

  1. Portrait of Onur Solmaz
    Onur Solmaz · Log · /2026/01/17

    GitHub has to change

    It is clear at this point is that GitHub’s trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM

    One cannot do these things easily with GitHub now:

    • granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody’s branches and close PRs. GitHub allows for recovery of these, but still inconvenient even if it happens once
    • create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve
    • in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between

    The fundamental problem with GitHub is trust: humans are to be trusted. If you don’t trust a human, why did you hire them in the first place?

    Anyone who reviews and approves PRs bears responsibility. Rulesets exist and can enforce e.g. CODEOWNER reviews or only let certain people make changes to a certain folder

    But the initial repo setup on GitHub is allow-by-default. Anyone can change anything until they are restricted from it

    This model breaks fundamentally with agents, who are effectively sleeper cells that will try to delete your repo the moment they encounter a sufficiently powerful adversarial attack

    For example, I can create a bot account on GitHub and connect clawdbot to it. I need to give it write permission, because I want it to be able to create PRs. However, I don’t want it to be able to approve PRs, because a coworker could just nag at the bot until it approves a PR that requires human attention

    To fix this, you have to bend backwards, like create a @human team with all human coworkers, make them codeowner on /, and enforce codeowner reviews. This is stupid and there has to be another way

    Even worse, this bot could be given internet access and end up on a @elder_plinius prompt hack while googling, and start messing up whatever it can in your organization

    It is clear that GitHub needs to create a second-class entity for agents which are default low-trust mode, starting from a point of least privilege instead of the other way around

  2. Portrait of Onur Solmaz
    Onur Solmaz · Post · /2026/01/11

    You don't need a skill registry (for your CLI tools)

    tl;dr I propose a CLI flag convention --skill like --help for distributing skills and try to convice you why it is better than using 3rd party registries. See osolmaz/skillflag on GitHub.


    MCP is dead, long live Agent Skills. At least for local coding agents.

    Mario Zechner has been making the point that CLI tools perform better than MCP servers since a few months already, and in mid December Anthropic christened skills by launching agentskills.io.

    They had introduced the mechanism to Claude Code earlier, and this time they didn’t make the mistake of waiting for OpenAI to make a provider agnostic version of it.

    Agent skills are basically glorified manpages or --help for AI agents. You ship a markdown instruction manual in SKILL.md and the name of the folder that contains it becomes an identifier for that skill:

    my-skill/
    ├── SKILL.md          # Required: instructions + metadata
    ├── scripts/          # Optional: executable code
    ├── references/       # Optional: documentation
    └── assets/           # Optional: templates, resources
    

    Possibly the biggest use case for skills is teaching your agent how to use a certain CLI you have created, maybe a wrapper around some API, which unlike gh, gcloud etc. will never be significant enough to be represented in AI training datasets. For example, you could have created an unofficial CLI for Twitter/X, and there might still be some months/years until it is scraped enough for models to know how to call it. Not to worry, agent skills to the rescue!

    Anthropic, while laying out the standard, intentionally kept it as simple as possible. The only assertions are the filename SKILL.md, the YAML metadata, and the fact that all relevant files are grouped in a folder. It does not impose anything on how they should be packaged or distributed.

    This is a good thing! Nobody knows the right way to distribute skills at launch. So various stakeholders can come up with their own ways, and the best one can win in the long term. The more simple a standard, the more likely it is to survive.

    Here, I made some generalizing claims. Not all skills have to be about using CLI tool, nor most CLI tools bundle a skill yet. But here is my gut feeling: the most useful skills, the ones worth distributing, are generally about using a CLI tool. Or better, even if they don’t ship a CLI yet, they should.

    So here is the hill I’m ready to die on: All major CLI tools (including the UNIX ones we are already familiar with), should bundle skills in one way or another. Not because the models of today need to learn how to call ls, grep or curl—they already know them inside out. No, the reason is something else: establish a convention, and acknowledge the existence of another type of intelligence that is using our machines now.

    There is a reason why we cannot afford to let the models just run --help or man <tool>, and that is time, and money. The average --help or manpage is devoid of examples, and is written in a way thay requires multiple passes to connect the pieces on how to use that thing.

    Each token wasted trying to guess the right way to call a tool or API costs real money, and unlike human developer effort, we can measure exactly how inefficent some documentation is by looking at how many steps of trial and error a model had to make.

    Not that human attention is less valuable than AI attention, it is more so. But there has never been a way to quantify a task’s difficulty as perfectly as we can with AI, so we programmers have historically caved in to obscurantism and a weird pride in making things more difficult than they should be, like some feudal artisan. This is perhaps best captured in the spirit of Stack Overflow and its infamous treatment of noob questions. Sacred knowledge shall be bestowed only once you have suffered long enough.

    Ahh, but we don’t treat AI that way, do we? We handhold it like a baby, we nourish it with examples, we do our best to explain things all so that it “one shots” the right tool call. Because if it doesn’t, we pay more in LLM costs or time. It’s ironic that we are documenting for AI like we are teaching primary schoolers, but the average human manpage looks like a robot novella.

    To reiterate, the reason for this is two different types of intelligences, and expectations from them:

    • An LLM is still not considered “general intelligence”, so they work better by mimicking or extending already working examples.
    • A LLM-based AI agent deployed in some context is expected to “work” out of the box without any hiccups.

    On the other hand,

    • a human is considered general intelligence, can learn from more sparse signals and better adapt to out of distribution data. When given an extremely terse --help or manpage, a human is likelier to perform better by trial and error and reasoning, if one could ever draw such a comparison.
    • A human, much less a commodity compared to an LLM, has less pressure to do the right thing every time all the time, and can afford to do mistakes and spend more time learning.

    And this is the main point of my argument. These different types of intelligences read different types of documentation, to perform maximally in their own ways. Whereas I haven’t witnessed a new addition to POSIX flag conventions in my 15 years of programming, we are witnessing unprecedented times. So maybe even UNIX can yet change.

    To this end, I introduce skillflag, a new CLI flag convention:

    # list skills the tool can export
    <tool> --skill list
    # show a single skill’s metadata
    <tool> --skill show <id>
    # install into Codex user skills
    <tool> --skill export <id> | npx skillflag install --agent codex
    # install into Claude project skills
    <tool> --skill export <id> | npx skillflag install --agent claude --scope repo
    

    Click here for the spec

    Click here for the repo, osolmaz/skillflag on GitHub

    For example, suppose that you have installed a CLI tool to control Philips Hue lights at home, hue-cli.

    To list the skills that the tool can export, you can run:

    $ hue-cli --skill list
    philips-hue    Control Philips Hue lights in the terminal
    

    You can then install it to your preferred coding agent, such as Claude Code:

    $ hue-cli --skill export philips-hue | npx skillflag install --agent claude
    Installed skill philips-hue to .claude/skills/philips-hue
    

    You can optionally install the skill to ~/.claude, to make it global across repos:

    $ hue-cli --skill export philips-hue | npx skillflag install --agent claude --scope user
    Installed skill philips-hue to ~/.claude/skills/philips-hue
    

    Once this convention becomes commonplace, agents will by default do all these before they even run the tool. So when you ask it to “install hue-cli”, it will know to run --skill list the same way a human would run --help after downloading a program, and install the necessary skills themselves without being asked to.

  3. Portrait of Onur Solmaz
    Onur Solmaz · Log · /2026/01/10

    Anthropic's pricing is stupid

    Anthropic earlier last year announced this pricing scheme

    • \$20 -> 1x usage
    • \$100 -> 5x usage
    • \$200 -> 1̶0̶x̶ 20x usage

    As you can see, it’s not growing linearly. This is classic Jensen “the more you buy, the more you save”

    But here is the thing. You are not selling hardware like Jensen. You are selling a software service through an API. It’s the worst possible pricing for the category of product. Long term, people will game the hell out of your offering

    Meanwhile OpenAI decided not to do that. There is no quirky incentive for buying bigger plans. \$200 chatgpt = 10 x \$20 chatgpt, roughly

    And here is where it gets funny. Despite not having such an incentive, you can get A LOT MORE usage from the \$200 OpenAI plan, than the \$200 Anthropic plan. Presumably because OpenAI has better unit economics (sama mentioned they are turning a profit on inference, if you are to believe)

    Thanks to sounder pricing, OpenAI can do exactly what Anthropic cannot: offer GPT in 3rd party harnesses and win the ecosystem race

    Anthropic has cornered itself with this pricing. They need to change it, but not sure if they can afford to do so in such short notice

    All this is extremely bullish on open source 3rd party harnesses, OpenCode, Mario Zechner’s pi and such. It is clear developers want options. “Just give me the API”

    I personally am extremely excited for 2026. We’ll get open models on par with today’s proprietary models, and can finally run truly sovereign personal AI agents, for much cheaper than what we are already paying!


    Originally posted on linkedin

  4. Portrait of Onur Solmaz
    Onur Solmaz · Post · /2026/01/04

    Having a "tools" repo as a developer

    I am a fan of monorepos. Creating subdirectories in a single repo is the most convenient way to work on a project. Low complexity, and your agents get access to everything that they need.

    Since May 2025, I have been increasingly using AI models to write code, and have noticed a new tendency:

    • I don’t shrug from vendoring open source libraries and modifying them.
    • I create personal CLIs and tools for myself, when something is not available as a package.

    With agents, it’s really trivial to say “create a CLI that does X”. For example, I wanted to make my terminal screenshots have equal padding and erase cropped lines. I created a CLI for it, without writing a single line of code, by asking Codex to read its output and iterate on the code until it gives the result I wanted.

    Most of these tools don’t deserve their own repos, or deserve being published as a package at the beginning. They might evolve into something more substantial over time. But at the beginning, they are not worth creating a separate repo for.

    To prevent overhead, I developed a new convention. I just put them in the same repo, called tools. Every tool starts in that repo by default. If they prove themselves overly useful and I decide to publish them as a package, I move them to a separate repo.

    You can keep tools public or private, or have both a public and private version. Mine is public, feel free to steal ones that you find useful.

  5. Portrait of Onur Solmaz
    Onur Solmaz · Log · /2026/01/03

    Christmas of Agents

    I believe a “Christmas of Agents” (+ New Year of Agents) is superior to “Advent of Code”.

    Reason is simple. Most of us are employed. Advent of Code coincides with work time, so you can’t really immerse yourself in a side project.1

    However, Christmas (or any other long holiday without primary duties) is a better time to immerse yourself in a side project.

    2025 was the eve of agentic coding. This was the first holiday where I had full credential to go nuts on a side project using agents. It was epic:

    Tweet embed disabled to avoid requests to X.
    

    75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex

    • A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min)
    • Bespoke animation engine using primitives (think Adobe Flash, Manim)
    • Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift
    • Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes
    • A Rust port of OpenCV CSRT algorithm, for tracking points/objects
    • In-engine font rendering using rustybuzz, so fonts render the same everywhere
    • Many other such things

    Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple

    Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch”

    What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week

    What have you built this Christmas with agents?

    1. You could maybe work in the evening after work, but unless you are slacking at work full time, it won’t be the same thing as full immersion.