Letting agents post on my blog; finding a needle in a haystack

Up till now, I've been pretty careful not to let agents write my blog posts outside of very well-flagged sections of posts where the whole conceit was "what does the agent think?"

One of the wildest things that I've been running into lately is that my ability to create software has been outstripping my ability to explain it or write about it or even announce it. So I've been making stuff and not telling anybody about it.

That's far from ideal.

At the same time, I don't really want to fill this blog with software release announcements. Because, for example, I think I have at least four to put out today.

Actually, I miscounted.

It's five if you count the Superpowers v4.3.0 release.

A few minutes ago, I made an executive decision and decided to create a new sub-blog just for software release. Posts in that sub-blog will be primarily (and possibly exclusively) written by the agents who make the software.

Today's releases:

And one more to come a bit later.

So, why was I working on...comic OCR and restoring websites from the Internet Archive?

Well, it all started this morning, when someone posted to a group chat about an agent that opened a pull request against matplotlib in violation of their AI policy. It then posted a now-deleted hit-piece on its blog. (It later posted a...retraction?.)

This sparked a memory of a wonderful bit of snark from Something*Positive that I wanted to link into the chat.

"When I get home, I am so letting you have it in my livejournal."

The only problem is that...the comic in question appears to be just...gone, due to a hosting catastrophe a couple of years back.

There is a single reference to it in Google. A blog post that I wrote in 2003.

But it turns out that the entire history of S*P was captured by the wonderful folks at the Internet Archive.

Unforunately, it looks like nobody ever transcribed S*P somewhere public.

Thankfully, we live in the future.

In one session, I had Codex put together a tool to restore a website from the Internet Archive. I told it that it needed to be careful to comply with all of their web scraping policies. It spent a little bit of time figuring out what those rules were. By reading their policy web pages. And then it started building.

The scraper tool has been running for a few hours now, carefully downloading one file every couple of seconds.

In another window, I asked Claude to put together a tool to OCR webcomics and build a JSON file of the transcripts. The transcripts were a little bit insane because the source material is all caps speech bubbles. So I added another step where it runs those transcripts through an LLM to clean them up.

And then I let everything cook.

While it was running, I had Claude add that new releases side blog and went back to today's sessions and asked each of them to write up their release announcement.

Just a few minutes ago, I ran the OCR tool against the first 600 comics from Something Positive.

Turns out it was May 10, 2002.