From Keyframes to HeyGens: How to Avoid AI Slop

Twenty-five years of motion graphics taught me everything about rendering video. The breakthrough came when I stopped asking for one.

The Miro Motion card sat inside a blue Power Mac G4. 2001, maybe 2002. We were 2B Media — a small outfit in Münster rendering surf videos of ourselves and doing client work on the side. Process visualizations and promo videos for industrial firms, mostly. We integrated 3D renderings from Cinema 4D, which was serious capability for the consumer market at that time. Everything above us was production studio territory, and production studios were expensive.

The machine crashed. It crashed during every session, a minimum of three times. You’d render a ten-minute clip, edit it carefully, render again, watch the progress bar, and then — black screen. Start over. That was the workflow. You learned to save constantly, to never trust the machine, to treat every completed render as a small victory.

I hadn’t thought about those crashes in years. Then last week, I typed seven words into Telegram — “produce a TikTok for Bitcoin” — and had a finished video in my inbox before my coffee got cold.

Twenty-five years. From praying a Power Mac wouldn’t crash during a surf edit to a Telegram message that produces a market analysis video from live data for less than ten cents.

The Path Through Your Pocket

The road from that Power Mac passes through iMovie, Final Cut Pro, After Effects, and eventually the phone in your pocket. Each step made production cheaper and more accessible. TikTok and Instagram turned anyone with a phone into a studio. That’s the known story.

This post is about the next step.

Kong Quant runs a market analysis system. Every four hours, it scans over a thousand assets — crypto, stocks, forex, commodities — through a proprietary indicator called the Kong Cloud. When an asset flips direction and passes through a six-step enrichment pipeline, it becomes a Prime: a scored, classified analytical snapshot. That data lives behind an API. Version 3.16 as of this week, if you want to know how deep the rabbit hole goes.

I wrote about the content pipeline before, in Automatic Reeality. That piece covered Charcoal International, the virtual agency, the Instagram ban, the pivot to TikTok, and the economics of automated production. This picks up where that story ended — and skips a few chapters, because what happened in between was significant.

The question was straightforward: can I take that API data and turn it into a finished short-form video, automatically, delivered to my phone?

The answer cost me forty-eight hours and three failed versions.

The Wrong Prompt

Here’s what I learned, and it took about twenty-four of those forty-eight hours to arrive at.

I started by prompting for motion graphics. I was thinking in After Effects — keyframes, timelines, compositions, easing curves. I’ve been working in this space for twenty-five years. That’s how my brain processes video. Layers, renders, effects.

The results were terrible. Not mediocre — terrible. Which was genuinely surprising, because these same models are exceptional at writing React. I’ve built websites, dashboards, and interactive experiences with them. The code quality is consistently strong. But ask for motion graphics, and the output drops off a cliff.

Then I realized I was asking the wrong question.

I didn’t want a motion graphics template. I wanted a responsive, mobile-first experience — rendered as a linear video.

That reframe changed everything.

Think about it from the model’s perspective. A mobile viewport with specific constraints — what’s visible above the fold, what the hierarchy looks like, how elements animate in, where the safe areas are on a 1080 by 1920 screen — that’s a problem it solves hundreds of times a day. It knows spacing. It knows responsive breakpoints. It knows animation libraries. It understands component architecture.

The moment I stopped saying “build me a motion graphic” and started saying “build me a mobile data visualization experience,” the output quality didn’t just improve. It jumped from embarrassing to genuinely impressive.

I tried SVG animations too — Lottie, the framework Airbnb developed. It works in principle. But React with Remotion felt more native to how the models think and write. The animations were cleaner, the component structure was more natural, and the brand system translated perfectly into styled components.

When I saw the first properly rendered scene — branded elements from the Kong identity, data-driven layout, smooth animation, everything matching the visual guide — that was the moment. Not just functional. It looked designed.

And that’s where the loop closes from the early days. I started with Premiere and Cinema 4D, placing keyframes on timelines, dragging easing curves into shape, adjusting frame by frame. That was the craft for twenty-five years. Now I’m sitting with my preferred model, describing what I want the animation to feel like, and it builds it. I’m still optimizing the animation framework right now — tweaking transitions, refining timing — but the method has flipped entirely. I’m not editing keyframes. I’m having a conversation about motion.

It’s a strange feeling, honestly. The skill hasn’t disappeared — knowing what good animation looks like, understanding timing and pacing, sensing when something feels off — all of that still matters. But the interface changed. From a timeline to a prompt. From dragging to describing.

Seven Scenes, Ten Cents

The video architecture is modular. Seven scenes, each a self-contained React composition.

The sieve opens the video: a rain of ticker symbols falling through the frame. One thousand scanned, most fade out, one locks into focus. The viewer understands the scale before a single word is spoken.

The event scene states what happened — which asset flipped, in which direction, what the structural context is. Breakout, breakdown, bounce, or rejection. Where in the market structure this flip occurred.

The evidence scene presents the enrichment pipeline output as a compact data grid. Regime. Volume strength. Momentum alignment. Pattern detection if one was found. Each tile maps to a step in the analytical process.

The chart scene is the visual proof. Candlesticks draw left to right, the Kong Cloud overlay fills in behind them, and the flip point pulses where the crossover happened. This one scene carries more conviction than anything I could write — it shows there’s a real chart behind the analysis.

The score scene reveals the Kong Score — a gauge sweeping from zero to the final number. The weighted synthesis of everything the pipeline found.

Two more scenes — a rotating educational fact about how the system works, and a short call to action — are pre-produced and reused across videos.

Each scene receives only its slice of the API data. Nothing else. They render to PNG image sequences independently, in parallel. One FFmpeg pass layers everything together: a background video at the bottom for atmospheric texture, the scene content on top, and an ElevenLabs voiceover synced to word-level timestamps. Single encode. No intermediate video files, no recompression, no quality loss between steps.

The running cost: under ten cents. ElevenLabs is the biggest line item. The language models, the rendering, the assembly — fractions of a cent each.

For context: a skilled freelancer producing an equivalent short — sourcing the data, building the chart visualization, timing the voiceover, cutting the final video — would need one to two hours. That’s fifty to a hundred euros. And they’d need to do it again tomorrow for the next asset.

Controlled Freedom

The agent behind this is Sharky — an OpenClaw instance I’ve written about before in Automatic Reeality. But the way I use OpenClaw is probably different from how most people approach it.

I didn’t install it and let it run wild. My corporate background shaped this. When you work with enterprise clients, you deliver professionalism and auditability. That doesn’t change because your team member is an agent instead of a person.

A Claude Code instance sits on top, supervising. Sharky has predefined tools and skills — each one purpose-built for a specific step in the pipeline. Fetch the prime data from the API. Generate the voiceover script. Render the scenes. Assemble the final video. Deliver to Telegram. If the agent needs a new capability, it escalates. I build the tool, test it, deploy it. Then the agent can use it.

At this autonomy level, the agent doesn’t build its own tools. It applies the ones I’ve provided. But within those boundaries, the creative decisions are real. Which prime to feature today. How to phrase the voiceover for this particular asset and this particular market context. Which educational fact to rotate in. The recipe is mine. The cooking is the agent’s.

The API Becomes the Video

Here’s the thing I keep circling back to.

Using an API to build a website is standard practice. Using an API to build a video — that’s new territory, at least for me. And when you combine it with an agent that can merge different data sources into one coherent narrative and script, something opens up that goes beyond a technical trick.

We’re moving into an era where APIs become the primary interface between services. Agents will seek APIs the way browsers seek websites. Any service or business that doesn’t provide one is essentially invisible to the next generation of consumers — many of which won’t be human.

Peter Steinberger, the main creator of OpenClaw, made a statement that stuck with me: any service, any website, any app is already an API, whether they want to be or not. With browser-use capabilities, an agent can navigate a website and extract what it needs regardless. It’s slower, and it’s messy. But there’s no wall high enough to stop it. So why not serve the data cleanly, maybe charge a fee for it, and create value on both sides?

Right now, we’re pulling from our own API. But the architecture doesn’t care about the source. Enrich with external data. Merge market context from other providers. Layer analytical intelligence on top. The pipeline stays the same — only the inputs change.

That’s where this becomes genuinely interesting. Not one API, one video. Multiple APIs, merged by an agent, assembled into content that no human would have the patience to produce daily.

Not Slop

People have a word for automated content: slop. And for most of what’s out there, the label fits. Mass-produced, contextless filler designed to game an algorithm. No audience in mind. No value delivered. No reason to exist except to fill a feed.

The data behind a Kong Prime is real. A thousand assets scanned every four hours through a quantitative pipeline that took months to build. The enrichment is real — regime classification, structural context analysis, volume confirmation, momentum alignment, pattern detection. The Kong Score is a weighted synthesis with published component weights, not a random number that sounds impressive.

The video is the visualization of that process. Not decoration — documentation. The audience is specific: people who track markets and want a condensed, visual snapshot of what the system detected today. The content respects their time because the system did the analytical work that would take them hours.

I think this is where the line runs. Slop is generated without intent. This is generated with a specific audience, specific data, and a specific purpose. The automation isn’t the problem. The absence of substance is.

The Horizon

The Kong character currently uses a still image generated with Nanobanana, animated through Midjourney. The animation is a little stiff. It fits the character, somehow — a gorilla in glasses at a terminal, clearly not human, slightly awkward in movement. But lip-synced animation is the obvious next evolution. Not to fake realism. To deliver value through a character that’s transparently artificial but presents something worth your thirty seconds.

I gave up on synthesizing myself a while back. Nothing beats a human picking up a phone and being honest on camera. But for Kong — a system whose whole identity is being autonomous — proper animation makes sense. HeyGen, Synthesia, whatever handles it best. Maybe five euros per video. With a business model behind it, that’s not a showstopper.

The other step is removing me from the trigger. Right now, I type a message to Sharky in Telegram. Soon, a cron job does it. The video appears in my inbox in the morning, ready to post. Or it posts directly — though the platforms still have strong opinions about that, as I learned the hard way with Instagram.

Twenty-five years ago, I was rendering surf edits on a machine that crashed three times per session. I was happy when the file survived.

Now a data stream becomes a video, and a video becomes a post, and the whole thing costs less than the coffee I wasn’t paying attention to when the first one landed in my Telegram.

I don’t know what to call that. But it’s not slop.

From Keyframes to HeyGens: How to Avoid AI Slop

The Path Through Your Pocket

The Wrong Prompt

Seven Scenes, Ten Cents

Controlled Freedom

The API Becomes the Video

Not Slop

The Horizon

Comments

Leave a Reply Cancel reply

More posts

HTTP 402: The Payment Rail for Agents

Social Media Zucks

The Sign on the Sand. Did Kong Throw the Ball?

The Real Reason Why I Build AI Agents