got fully sandboxed @openclaw to run finally, starting scrape the UNDESIRABLE now
I'm a security nut and didn't want to run even the gateway unsandboxed. openclaw apparently currently doesn't have support for FULL sandboxing. it took me a few hours to get it to work because docker builds suck. I'm also tired this, so I'm just gonna wipe an old thinkpad and go full yolo
so yeah, time to scrape some posts
The metacortex — a distributed cloud of software agents that surrounds him in netspace, borrowing CPU cycles from convenient processors (such as his robot pet) — is as much a part of Manfred as the society of mind that occupies his skull; his thoughts migrate into it, spawning new agents to research new experiences, and at night, they return to roost and share their knowledge.
This was written in 2005... "triggering agents" and so on
We need better filters both for ourselves and the agents. Locally runnable models to filter out undesirable content with high precision. Fully open source datasets, weights, MIT license
Correction, it's not a perfect illustration. I actually never YOLO locally, only in containers
So there is actually 4 modes IMO that is sustainable with current SOTA. @grok create an image with only Figure 1, 2, 5 and 6
And then YOLO is another axis, unrelated to this
Gastown is crazy. But this figure until Level 7 is a perfect illustration of how my workflow evolved since Claude 3.5 Sonnet in Cursor
I am at the stage where I ralph 1-2 tasks before I sleep. During the day, I am switching back and forth between minimum 2-3 CLIs, sometimes up to 5
This maps exactly to token usage as well. 1 month ago, I was running into limits in 1 OpenAI Pro plan, around the day it was supposed to refresh. Now, I run into the limit in 2-3 days when I'm using an account myself. It finishes up especially quickly when I do large scale refactors, or run agents YOLO mode in containers
We now have 3 Pro plans at the company, and I have to use my personal one from time to time. Company output has definitely 2-3x'd, and everyone is using AI more. I predict we will need 1-2 Pro plans per person in 2-3 weeks time, because everyone has finally seen the light and are getting comfortable with async work!
With this extremely unwise move, anthropic will soon witness moltbot’s brand recognition surpass that of claude and realize they could have rided that wave all along
Yesterday had multiple cases of swearing to gpt-5.2-codex xhigh. model feels nerfed. might be my bias
now I'll be going back to gpt 5.2 xhigh for some tasks
can't wait for open models to have this performance so that I will never have nerf paranoia ever again
I queued 2 ralph-style tasks on our private cloud devbox codexes last night. Just queued the same message like 10 times in yolo mode
Task 1: impose a ruff rule for ANN for all Python code in the monorepo, to enforce types for all function arg and return types
Result was... disappointing. Model was supposed to create types for everything and stub where needed. It instead created an Unknown type = object and used that everywhere instead (shortcut to satisfy ANN rule). It was probably my wording that misled it. I know it could have not taken the shortcut, because after a few back-and-forths, it is now doing what was expected of it since 14 hours
Task 2: migrate our /conversations endpoint from quart to fastapi and test it end to end
This was more or less oneshotted. It was of course not ready to merge, I still spent a couple hours adding more tests, refactoring the initial output and so on. But I was pleasantly surprised that it worked out of the box
For reference, below is the prompt I queued for ralphing, using gpt-5.2-codex xhigh on codex
===
your task is to:
<task comes here, redacted to not share company stuff>
---
unfortunately we don't have gcloud access, like to sql db or gcs
but I expect you to implement this and find a way to test it with the things you have access to
think of it as a challenge
try to minimize duplicate logic
feel free to refactor at will
implement this now!!! I will be running this prompt in a loop, in order to survive context compaction
just continue where you left off
if there is anything that should be refactored, do that
make an elegant, production ready implementation
make sure to open a pr and do not switch to any other pr
I am senior, just make up a pr title and description. do not stop to ask me at any point
Buying a mac mini for clawdbot is not so wise. if anything you should be buying mac studio, because mac mini not be running any good llms locally anytime soon
I'm really starting to dislike Python in the age of agents. What was before an advantage is now a hindrance
I finally achieved full ty coverage in @TextCortex monorepo. I have made it extra strict by turning warnings into errors. But lo and behold, simple pydantic config like use_enum_values=True can render static typechecking meaningless. okay, let's never use that then...
and also field_validator() args must always use the correct type or stuff breaks as well. and you should be careful whether mode="before" or "after". so now you have to write your custom lint rules, because of course why should ty have to match field_validator()s to their fields?
pydantic is so much better than everything that came before it, but it's still duct tape and a weak attempt at trying to redeem that which is very hard to redeem
you feel the difference when you use something like typescript. there must be a better way. python's only advantage was being good at prototyping, and now that's gone in the age of agents. now we are left with a slow, unsafe language, operating what is soon to be legacy infrastructure
Why do I feel bullish on @zeddotdev? Because I go to @astral_sh docs and see that ty is shipped by default, and you don't need to install an extension like in @code
vscode my not be as bloated as cursor, but it has extremely stupid things like this that they are not fixing fast
the new agent ui, icons, spacing etc. are UGLY. it's clear that the person who was managing the original product experience is not there anymore. microslop has hit again
@zeddotdev on the other hand works out of the box and feels like it's been built by people who clearly knows what they are doing. it uses alacritty which is 1000x better than xterm .js terminal vscode and cursor has
i've changed my setup to zed now, let's see whether i'll be able to make it work for myself
I want an editor that puts the terminal in the foreground and editor in the background. a cross-platform, lightweight desktop app which integrates ghostty, and brings up the editor only when I need it
something that lets me view the file and PR diffs easily, which I can directly use to operate github or other scm
I'm going back from cursor to vs code now. I have no use for it other than viewing files/diffs, doing search, git blaming with gitlens
cursor's default setup is more aesthetic, but it's also a memory and cpu hog, which is the last thing I expect from a devtool
codex is happily churning away some remaining thousands of @astral_sh ty issues in yolo mode on my remote devbox
going to sleep, let's see if it will survive context compaction this time
on being a responsible engineer
ran my first ralph loop on codex yolo mode for resolving python ty errors, while I sleep, using the devbox infra I created
I had never run yolo mode locally, because I don't want to be the one who deletes our github or google org by some novel attack
so I containerize it on our private cloud, and give it the only permissions it needs, no admin, no bypass to main branch, no deploy to prod. because I know this workflow will become sticky for everyone, and I must impose security in advance to prevent any nuclear incidents in the future. then I can sleep easy while my agents work
... and I wake up being patronized by my bot refusing to break the rule I gave it earlier. it had already done some work, but committing means diff would increase from ~500 to ~1500, so it stopped and refused all my queued "continue" messages
good bot, just following rules. we will need to find a workaround for ralphing low risk refactors in a single PR
AI agents are the greatest instrument for imposing organization rules and culture. AGENTS.md, agent skills are still underrated in this aspect. Few understand this
Everybody in an org will use agents to do work. An AI agent is the single chokepoint to teach and propagate new rules to an org, onboard new members, preserve good culture
Whereas propagating a new rule to humans normally took weeks to months and countless repetitions, it is now INSTANT = the moment you deploy the instruction to the agent. You use legal-ish language, capital letters, a generous amount of DO NOTs and MUSTs
Humans are hard to change. But AI agents are not. And that is the only lever we need for better organizations
gave our internal @openclaw instance zeno a hubspot cli, because hubspot's own cli is limited to developer stuff
It's called hubspot++. should we open source it?
just added session persistence to our kubernetes managed devboxes using zmx by Eric Bower (neurosnap/zmx on github). like tmux but with native scrollback!
I don't want to give agents access to my personal computer, so I host them on hetzner. one click spawn, and start working
@nicopreme I do something equivalent on codex with just a skill
Ralphing works 90% of the time with reviews, and if it gives a stupid review, you just revert
TIL: zmx
session persistence like tmux or gnu screen, but you can scroll up natively!
uses @mitchellh's libghostty-vt to attach/restore previous sessions
link below
The fundamental problem with GitHub is trust: humans are to be trusted. If you don't trust a human, why did you hire them in the first place?
Anyone who reviews and approves PRs bears responsibility. Rulesets exist and can enforce e.g. CODEOWNER reviews or only let certain people make changes to a certain folder
But the initial repo setup on GitHub is allow-by-default. Anyone can change anything until they are restricted from it
This model breaks fundamentally with agents, who are effectively sleeper cells that will try to delete your repo the moment they encounter a sufficiently powerful adversarial attack
For example, I can create a bot account on github and connect @openclaw to it. I need to give it write permission, because I want it to be able to create PRs. However, I don't want it to be able to approve PRs, because a coworker could just nag at the bot until it approves a PR that requires human attention
To fix this, you have to bend backwards, like create a @ human team with all human coworkers, make them codeowner on /, and enforce codeowner reviews. This is stupid and there has to be another way
Even worse, this bot could be given internet access and end up on a @elder_plinius prompt hack while googling, and start messing up whatever it can in your organization
It is clear that github needs to create a second-class entity for agents which are default low-trust mode, starting from a point of least privilege instead of the other way around
STOP using Claude Code and Sl(opus) to code if
❌ you are not a developer,
❌ or you are an inexperienced dev,
❌ or you are an experienced dev but working on a codebase you don't understand
If you *are* any of these, then STOP using models that are NOT state of the art. (See below for what you *should* use)
When you don't know what you are doing, then at least the model should know what you are doing. The less knowledgeable and opinionated you are, the more knowledgeable and smart the AI has to be
In other words, the AI has to compensate for your deficiencies. Always pay for the best AI you can. It will save you time AND money (thanks to lower token usage and better one-shotting)
You pay MORE to pay LESS. It is paradoxical, I know, but it is also proven, e.g. when Sonnet ends up using more tokens than Slopus and ends up costing higher, because it has to try many times more
👨🏻⚕️ For January 2026, your family engineer recommends GPT 5.2 Codex with Extra High Reasoning for general usage and vibe coding. IMPORTANT: Not medium. Not high. EXTRA high reasoning
When you use it, you will notice that it is SLOW. Can you guess why? Because it is THINKING more. So it doesn't make the mistakes Slopus makes. This way, you can spend the time handholding a worse model to instead step back and multi-task on some other task and create 3-5x more work
The state of the art will most likely change in one month. Don't get married to a a model... There is no loyalty in AI... The moment a better model comes, I will ditch the old one and use that one. I am on the part of this sector that is trying to reduce switching costs to zero
I can't wait until I get GPT 5.2 xhigh level of quality with open models, and for 100x cheaper and faster! Until then, make sure to try every option and choose the one that is most reliable for you
Follow me to get notified when a new SOTA drops for agentic engineering
It is clear at this point is that github's trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM
One *cannot* do these things easily with github now:
- granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody's branches and close PRs. github allows for recovery of these, but still inconvenient even if it happens once
- create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve
- in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between
cc @jaredpalmer
Codex says "It's only reachable from داخل the kubernetes cluster"
Little does Codex know turkish has borrowed loanwords from over 7 languages and I can understand it
Automated AI reviews on github by creating an ai-review skill and a script to paste trigger prompts and wait for their response.
It is instructed to loop and not stop until all AI review feedback is resolved. This AI review workflow developed gradually based on the current capabilities, and I've realized recently that it became quite mechanical. So decided to automate it in full ralph spirit (it's ok because it's addressing feedbacks and fixing minor bugs)
In the current state, we paste the contents of REVIEW_PROMPT.md into a comment, which automatically tags claude (opus 4.5) and codex (whatever model openai is serving)
It then waits until both have responded. In the ai-review skill, it is instructed to take the feedback from SLopus with a grain of salt and ignore feedback that doesn't make sense
It works! See in the images below. If the review is stupid, you will of course see it on the PR and what the model has done, and can revert it
It is clear at this point is that GitHub’s trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM
One cannot do these things easily with GitHub now:
granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody’s branches and close PRs. GitHub allows for recovery of these, but still inconvenient even if it happens once
create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve
in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between
The fundamental problem with GitHub is trust: humans are to be trusted. If you don’t trust a human, why did you hire them in the first place?
Anyone who reviews and approves PRs bears responsibility. Rulesets exist and can enforce e.g. CODEOWNER reviews or only let certain people make changes to a certain folder
But the initial repo setup on GitHub is allow-by-default. Anyone can change anything until they are restricted from it
This model breaks fundamentally with agents, who are effectively sleeper cells that will try to delete your repo the moment they encounter a sufficiently powerful adversarial attack
For example, I can create a bot account on GitHub and connect clawdbot to it. I need to give it write permission, because I want it to be able to create PRs. However, I don’t want it to be able to approve PRs, because a coworker could just nag at the bot until it approves a PR that requires human attention
To fix this, you have to bend backwards, like create a @human team with all human coworkers, make them codeowner on /, and enforce codeowner reviews. This is stupid and there has to be another way
Even worse, this bot could be given internet access and end up on a @elder_plinius prompt hack while googling, and start messing up whatever it can in your organization
It is clear that GitHub needs to create a second-class entity for agents which are default low-trust mode, starting from a point of least privilege instead of the other way around
with ai, writing correct tests is now the bottleneck in projects like this
web-platform-tests are already there
now let’s see if someone will beat @ladybirdbrowser to it
As someone who is frontrunning mainstream by roughly 6 months, I can tell you that you will be raving about pi and @openclaw 6 months instead of claude code. Go check them out at https://t.co/LXTbI8c5Mz and https://t.co/feZl2QDONg
I propose a new way to distribute agent skills: like --help, a new CLI flag convention --skill should let agents list and install skills bundled with CLI tools
Skills are just folders so calling --skill export my-skill on a tool could just output a tarball of the skill. I then set up the skillflag npm package so that you can pipe that into:
... | npx skillflag install --agent codex
which installs the skill into codex, or any CLI tool you prefer. Supports listing skills bundled with the CLI, so your agents know exactly what to install
tl;dr I propose a CLI flag convention --skill like --help for distributing skills and try to convice you why it is better than using 3rd party registries. See osolmaz/skillflag on GitHub.
MCP is dead, long live Agent Skills. At least for local coding agents.
Agent skills are basically glorified manpages or --help for AI agents. You ship a markdown instruction manual in SKILL.md and the name of the folder that contains it becomes an identifier for that skill:
Possibly the biggest use case for skills is teaching your agent how to use a certain CLI you have created, maybe a wrapper around some API, which unlike gh, gcloud etc. will never be significant enough to be represented in AI training datasets. For example, you could have created an unofficial CLI for Twitter/X, and there might still be some months/years until it is scraped enough for models to know how to call it. Not to worry, agent skills to the rescue!
Anthropic, while laying out the standard, intentionally kept it as simple as possible. The only assertions are the filename SKILL.md, the YAML metadata, and the fact that all relevant files are grouped in a folder. It does not impose anything on how they should be packaged or distributed.
This is a good thing! Nobody knows the right way to distribute skills at launch. So various stakeholders can come up with their own ways, and the best one can win in the long term. The more simple a standard, the more likely it is to survive.
Here, I made some generalizing claims. Not all skills have to be about using CLI tool, nor most CLI tools bundle a skill yet. But here is my gut feeling: the most useful skills, the ones worth distributing, are generally about using a CLI tool. Or better, even if they don’t ship a CLI yet, they should.
So here is the hill I’m ready to die on: All major CLI tools (including the UNIX ones we are already familiar with), should bundle skills in one way or another. Not because the models of today need to learn how to call ls, grep or curl—they already know them inside out. No, the reason is something else: establish a convention, and acknowledge the existence of another type of intelligence that is using our machines now.
There is a reason why we cannot afford to let the models just run --help or man <tool>, and that is time, and money. The average --help or manpage is devoid of examples, and is written in a way thay requires multiple passes to connect the pieces on how to use that thing.
Each token wasted trying to guess the right way to call a tool or API costs real money, and unlike human developer effort, we can measure exactly how inefficent some documentation is by looking at how many steps of trial and error a model had to make.
Not that human attention is less valuable than AI attention, it is more so. But there has never been a way to quantify a task’s difficulty as perfectly as we can with AI, so we programmers have historically caved in to obscurantism and a weird pride in making things more difficult than they should be, like some feudal artisan. This is perhaps best captured in the spirit of Stack Overflow and its infamous treatment of noob questions. Sacred knowledge shall be bestowed only once you have suffered long enough.
Ahh, but we don’t treat AI that way, do we? We handhold it like a baby, we nourish it with examples, we do our best to explain things all so that it “one shots” the right tool call. Because if it doesn’t, we pay more in LLM costs or time. It’s ironic that we are documenting for AI like we are teaching primary schoolers, but the average human manpage looks like a robot novella.
To reiterate, the reason for this is two different types of intelligences, and expectations from them:
An LLM is still not considered “general intelligence”, so they work better by mimicking or extending already working examples.
A LLM-based AI agent deployed in some context is expected to “work” out of the box without any hiccups.
On the other hand,
a human is considered general intelligence, can learn from more sparse signals and better adapt to out of distribution data. When given an extremely terse --help or manpage, a human is likelier to perform better by trial and error and reasoning, if one could ever draw such a comparison.
A human, much less a commodity compared to an LLM, has less pressure to do the right thing every time all the time, and can afford to do mistakes and spend more time learning.
And this is the main point of my argument. These different types of intelligences read different types of documentation, to perform maximally in their own ways. Whereas I haven’t witnessed a new addition to POSIX flag conventions in my 15 years of programming, we are witnessing unprecedented times. So maybe even UNIX can yet change.
To this end, I introduce skillflag, a new CLI flag convention:
# list skills the tool can export
<tool> --skill list
# show a single skill’s metadata
<tool> --skill show <id># install into Codex user skills
<tool> --skillexport <id> | npx skillflag install--agent codex
# install into Claude project skills
<tool> --skillexport <id> | npx skillflag install--agent claude --scope repo
For example, suppose that you have installed a CLI tool to control Philips Hue lights at home, hue-cli.
To list the skills that the tool can export, you can run:
$ hue-cli --skill list
philips-hue Control Philips Hue lights in the terminal
You can then install it to your preferred coding agent, such as Claude Code:
$ hue-cli --skillexport philips-hue | npx skillflag install--agent claude
Installed skill philips-hue to .claude/skills/philips-hue
You can optionally install the skill to ~/.claude, to make it global across repos:
$ hue-cli --skillexport philips-hue | npx skillflag install--agent claude --scope user
Installed skill philips-hue to ~/.claude/skills/philips-hue
Once this convention becomes commonplace, agents will by default do all these before they even run the tool. So when you ask it to “install hue-cli”, it will know to run --skill list the same way a human would run --help after downloading a program, and install the necessary skills themselves without being asked to.
Anthropic earlier last year announced this pricing scheme
$20 -> 1x usage
$100 -> 5x usage
$200 -> 1̶0̶x̶ 20x usage
As you can see, it's not growing linearly. This is classic Jensen "the more you buy, the more you save"
But here is the thing. You are not selling hardware like Jensen. You are selling a software service *through an API*. It's the worst possible pricing for the category of product. Long term, people will game the hell out of your offering
Meanwhile OpenAI decided not to do that. There is no quirky incentive for buying bigger plans. $200 chatgpt = 10 x $20 chatgpt, roughly
And here is where it gets funny. Despite not having such an incentive, you can get A LOT MORE usage from the $200 OpenAI plan, than the $200 Anthropic plan. Presumably because OpenAI has better unit economics (sama mentioned they are turning a profit on inference, if you are to believe)
Thanks to sounder pricing, OpenAI can do exactly what Anthropic cannot: offer GPT in 3rd party harnesses and win the ecosystem race
Anthropic has cornered itself with this pricing. They need to change it, but not sure if they can afford to do so in such short notice
All this is extremely bullish on open source 3rd party harnesses, @opencode, @badlogicgames's pi and such. It is clear developers want options. "Just give me the API"
I personally am extremely excited for 2026. We'll get open models on par with today's proprietary models, and can finally run truly sovereign personal AI agents, for much cheaper than what we are already paying!
Anthropic earlier last year announced this pricing scheme
\$20 -> 1x usage
\$100 -> 5x usage
\$200 -> 1̶0̶x̶ 20x usage
As you can see, it’s not growing linearly. This is classic Jensen “the more you buy, the more you save”
But here is the thing. You are not selling hardware like Jensen. You are selling a software service through an API. It’s the worst possible pricing for the category of product. Long term, people will game the hell out of your offering
Meanwhile OpenAI decided not to do that. There is no quirky incentive for buying bigger plans. \$200 chatgpt = 10 x \$20 chatgpt, roughly
And here is where it gets funny. Despite not having such an incentive, you can get A LOT MORE usage from the \$200 OpenAI plan, than the \$200 Anthropic plan. Presumably because OpenAI has better unit economics (sama mentioned they are turning a profit on inference, if you are to believe)
Thanks to sounder pricing, OpenAI can do exactly what Anthropic cannot: offer GPT in 3rd party harnesses and win the ecosystem race
Anthropic has cornered itself with this pricing. They need to change it, but not sure if they can afford to do so in such short notice
All this is extremely bullish on open source 3rd party harnesses, OpenCode, Mario Zechner’s pi and such. It is clear developers want options. “Just give me the API”
I personally am extremely excited for 2026. We’ll get open models on par with today’s proprietary models, and can finally run truly sovereign personal AI agents, for much cheaper than what we are already paying!
The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows.
.@openclaw workspace and memory files can be version-controlled!
In our pod, inotify triggers a watcher script every time there is a change to workspace folder, to sync these files to our monorepo. It then goes through the same steps:
- Create zeno-workspace branch if doesn't exist, otherwise, skip
- Sync changes to the branch, then commit
- Create PR on github if doesn't exist
- PRs can then be merged every once in a while, after accumulating enough changes. Merge triggers re-deploy, and clawd restarts with the same state
Simple foolproof automatic persistence for remote CI/CD handled clawd (except for when you are running multiple clawds at the same time, but we are not there yet)
cc @steipete
I see @bcherny and raise one. I not only did not open an IDE, I did not touch a terminal since last night, thanks to @steipete's @openclaw
Opus in k8s pod pulls errors from gcloud, debugs the issue, and creates PR all inside Discord. I call this Discord Driven Development
Clawdbot now runs on @TextCortex internal. Can onboard new engineers, answer questions, connect to issue trackers, create PRs... This is sick @steipete
GPT 4.5 is still the best model for prose and humor
here it is generating a greentext from my blog post "Our muscles will atrophy as we climb the Kardashev Scale"
I am a fan of monorepos. Creating subdirectories in a single repo is the most convenient way to work on a project. Low complexity, and your agents get access to everything that they need.
Since May 2025, I have been increasingly using AI models to write code, and have noticed a new tendency:
I don’t shrug from vendoring open source libraries and modifying them.
I create personal CLIs and tools for myself, when something is not available as a package.
With agents, it’s really trivial to say “create a CLI that does X”. For example, I wanted to make my terminal screenshots have equal padding and erase cropped lines. I created a CLI for it, without writing a single line of code, by asking Codex to read its output and iterate on the code until it gives the result I wanted.
Most of these tools don’t deserve their own repos, or deserve being published as a package at the beginning. They might evolve into something more substantial over time. But at the beginning, they are not worth creating a separate repo for.
To prevent overhead, I developed a new convention. I just put them in the same repo, called tools. Every tool starts in that repo by default. If they prove themselves overly useful and I decide to publish them as a package, I move them to a separate repo.
You can keep tools public or private, or have both a public and private version. Mine is public, feel free to steal ones that you find useful.
75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex 🎄🤖
- A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min) (ig: nerdonbars)
- Bespoke animation engine using primitives (think Adobe Flash, Manim)
- Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift
- Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes
- A Rust port of OpenCV CSRT algorithm, for tracking points/objects
- In-engine font rendering using rustybuzz, so fonts render the same everywhere
- Many other such things
Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple
Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch”
What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week
What have you built this Christmas with agents?
cc @thsottiaux
I believe a “Christmas of Agents” (+ New Year of Agents) is superior to “Advent of Code”.
Reason is simple. Most of us are employed. Advent of Code coincides with work time, so you can’t really immerse yourself in a side project.1
However, Christmas (or any other long holiday without primary duties) is a better time to immerse yourself in a side project.
2025 was the eve of agentic coding. This was the first holiday where I had full credential to go nuts on a side project using agents. It was epic:
Tweet embed disabled to avoid requests to X.
75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex
A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min)
Bespoke animation engine using primitives (think Adobe Flash, Manim)
Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift
Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes
A Rust port of OpenCV CSRT algorithm, for tracking points/objects
In-engine font rendering using rustybuzz, so fonts render the same everywhere
Many other such things
Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple
Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch”
What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week
What have you built this Christmas with agents?
You could maybe work in the evening after work, but unless you are slacking at work full time, it won’t be the same thing as full immersion. ↩
Migrating @TextCortex to SimpleDoc. It's really easy with the CLI wizard!
npx @simpledoc/simpledoc migrate
We have a LOT of docs spanning back to 2022, pre coding agent era. Now we will have CI/CD in place so that coding agents can't litter the repo with random Markdown files