AI Lens

Season 1 Episode 23: Gemma 4 & Orchestrating Your Own AI Agent Workforce

AI Research Technologies, Inc. Season 1 Episode 23

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 21:50

Send us Fan Mail

Gemma 4 has been released by Google.  Whether you are orchestrating complex code architectures or automating your daily content consumption, you now have the ability to build an intelligent, tireless workforce right in your own terminal. You can learn faster and build smarter than ever before.

Support the show

SPEAKER_01

You know, usually when you look at your laptop, you basically just see a tool. Like it's a typewriter, a calculator, or maybe just a really advanced television. You open a document, you type some words, you close it, the relationship is, well, it's entirely one-to-one.

SPEAKER_00

It's historically been very linear. I mean, you are the brains of the operation, right? You provide the input and the machine simply executes it and gives you the output.

SPEAKER_01

Aaron Powell Exactly. But when you look at the current state of artificial intelligence, specifically in late 2025 and early 2026, suddenly that laptop isn't just a passive tool anymore.

SPEAKER_00

No, not at all.

SPEAKER_01

It's starting to look a lot more like an empty office building, you know, just waiting for you to hire the staff.

SPEAKER_00

Aaron Powell That is wow. That is a brilliant way to visualize the shift. We are transitioning so rapidly from software acting as a solitary tool to, well, software acting as a highly capable colleague.

SPEAKER_01

Yeah.

SPEAKER_00

Or depending on how you set it up, an entire engineering department.

SPEAKER_01

Aaron Powell Right. So welcome to the deep dive. Today we're exploring a massive stack of intelligence from the front lines of the current AI revolution.

SPEAKER_00

Aaron Powell It's a really fascinating mix of sources today.

SPEAKER_01

Aaron Powell It really is. We're going to connect the dots between the macro level clash of the AI titans, you know, the trillion-dollar companies fighting for global dominance and the micro-level reality of how this technology is actually landing right on your desktop.

SPEAKER_00

Aaron Powell Because that's where it matters, right?

SPEAKER_01

Trevor Burrus Exactly. Because by the end of this conversation, you are going to understand not just who is winning the global AI race, but how you can use the fallout of that race to literally build your own automated intelligent workforce right in your terminal.

SPEAKER_00

And you know, to understand how we got to this point of running digital employees on our laptops, we really have to look at the engines driving all of this innovation at the top of the food chain.

SPEAKER_01

Yeah, the big players.

SPEAKER_00

Right. The macro environment right now is just moving at an unprecedented velocity.

SPEAKER_01

Aaron Powell Let's unpack that a bit because looking at the timeline from late 2025, the pace is just staggering. I mean, it feels like a new era begins every Tuesday. Trevor Burrus, Jr.

SPEAKER_00

It really does. It's this hyper-accelerated cycle. If you go back to November 2025, there was a literal code red declared at OpenAI.

SPEAKER_01

Aaron Powell Wait, an actual code red?

SPEAKER_00

Yeah. Google released Gemini 3, and for a brief moment, it completely took the crown. It hit a 76.2% success rate on the SWE bench test.

SPEAKER_01

Aaron Powell And just to clarify, that's a coding test, right?

SPEAKER_00

Exactly. It's an incredibly rigorous benchmark for autonomous software engineering tasks.

SPEAKER_01

Aaron Powell And that score actually beat OpenAI's flagship at the time, which was GPT 5.1.

SPEAKER_00

It did. And what's truly astonishing was OpenAI's response to that. I mean, they didn't take months to strategize. Right. They mobilized their entire engineering force and they shipped GPT 5.2 in just three weeks.

SPEAKER_01

Aaron Powell Three weeks. That is insane.

SPEAKER_00

Aaron Powell It's unheard of. Taking a frontier level model update from concept to deployment in three weeks is just not how traditional software development works.

SPEAKER_01

Well, let's get a snapshot of the current scorecard then as of December 2025. Because it seems like these tech giants are just trading massive blows.

SPEAKER_00

Oh, absolutely. We currently have a fragmented leaderboard, which frankly is arguably the healthiest possible state for the industry. Oh, for sure. Right now, OpenAI's GPT 5.2 absolutely dominates general knowledge work and mathematics. To put it in perspective, it scored 100% on the AM benchmark.

SPEAKER_01

Okay. I want to pause there because 100% on a math test sounds good. But what does AM actually mean in this context?

SPEAKER_00

Aaron Powell That's a really crucial distinction to make. This isn't like high school algebra.

SPEAKER_01

Right.

SPEAKER_00

The AEM is a series of elite math Olympiad level questions that have historically shattered the logic and reasoning limits of neural networks.

SPEAKER_01

Oh wow.

SPEAKER_00

Yeah. So ASIN, it means the model isn't just regurgitating memorized equations. It is actively reasoning through highly complex novel problems perfectly.

SPEAKER_01

Wow. Okay, so OpenAI takes the math and reasoning crown. What about Google?

SPEAKER_00

So Google's Gemini 3 is wielding this massive 2 million token context window. And that fundamentally changes how you interact with an AI.

SPEAKER_01

How so?

SPEAKER_00

Well, having a 2 million token window is basically the equivalent of dropping your company's entire historical code base, a decade of financial records, plus like 50 dense textbooks into a single prompt.

SPEAKER_01

And you can read all that.

SPEAKER_00

Yeah. And synthesize all of it instantly.

SPEAKER_01

That's wild. And then there's Anthropic kind of quietly building a juggernaut in the background.

SPEAKER_00

Yeah. Anthropic's Claude Opus 4.5 is a completely different beast. They are currently leading in coding quality, scoring at 80.9% on that SWE bench we mentioned earlier. Okay. But more importantly, they've secured massive enterprise trust. They are the model corporations feel safest deploying.

SPEAKER_01

Which is huge for business.

SPEAKER_00

Huge. So much so that they are actively seeking a $350 billion valuation.

SPEAKER_01

Okay, wait, let me just ground this for a second. Because I hear these numbers. GPT gets 100% on elite math, Gemini eats 50 textbooks, Claude scores 80% on coding. But I have to ask. Sure. Do these incremental benchmark wars actually matter to you and me, the average user? Or are we just watching a silicon measuring contest for tech CEOs?

SPEAKER_00

That is a very fair question. It's easy to view it as corporate theater, right? But the reality is that this hyper competition directly impacts your daily life. Because there is no permanent leader, because a code red can happen and the performance gap can close in a matter of weeks, these companies are terrified of losing developer mind share. Right. And the only way to maintain that mind share in a crowded market is to rapidly drop API prices and push incredibly powerful models to open source. So when the giants fight tooth and nail in the cloud, the user is the ultimate winner.

SPEAKER_01

So basically, because they are locked in this trillion-dollar arms race, incredible technology just falls out of the sky and lands on our hard drives.

SPEAKER_00

That is the perfect way to look at it. And I mean, the absolute clearest example of this dynamic playing out is Google's recent release of the Gemma 4 model family.

SPEAKER_01

Aaron Powell The documentation on Gemma 4 is just wild. Google is basically handing over frontier level intelligence to the public completely for free.

SPEAKER_00

Aaron Powell Yeah, they released it under an Apache 2.0 license, which is a massive philosophical shift. Trevor Burrus, Jr.

SPEAKER_01

Right, because it's not restricted.

SPEAKER_00

Exactly. This isn't a walled garden where you're just renting access to an API. An Apache 2.0 license grants true open source freedom.

SPEAKER_01

Aaron Powell Meaning you can build businesses on it.

SPEAKER_00

Yes. It means developers and businesses have full commercial rights to use, modify, and deploy these models entirely on their own terms without restrictive corporate guardrails or unexpected usage gaps.

SPEAKER_01

Aaron Powell Well, let's break down what Gemma 4 actually is under the hood. The documentation highlights four distinct sizes.

SPEAKER_00

Right.

SPEAKER_01

Before we get into the crazy architecture, it starts by mentioning a massive 31 billion parameter dense model. What does dense actually mean in this context?

SPEAKER_00

So a dense model relies on a traditional monolithic architecture. Whenever you ask it a question, every single one of those 31 billion parameters activates to process your prompt.

SPEAKER_01

That sounds heavy.

SPEAKER_00

It is. It requires a tremendous amount of computing power and memory because the entire neural network wakes up for every single task, no matter how simple it is.

SPEAKER_01

Aaron Powell Okay, so that brings us to the second model, which is a 26 billion parameter mixture of experts or MOE model.

SPEAKER_00

Yeah, the 26B A4B.

SPEAKER_01

Right. The docs call it the 26B A4B. And I was thinking about how to visualize this. I came up with an analogy.

SPEAKER_00

Oh, I'd love to hear it.

SPEAKER_01

Well, having a 26 billion parameter MOE model is kind of like having access to a massive 26 billion book library. Okay. But you don't need to turn on the lights and heat the entire building, just to answer a single question. The system acts as a librarian, routing you to a specific 4 billion book aisle that is specialized for your exact query.

SPEAKER_00

I love that.

SPEAKER_01

Right. So you get the maximum knowledge base of the whole library, but you only spend the electricity required to illuminate that one small section.

SPEAKER_00

Aaron Powell That captures the mechanism beautifully. The A4B stands for 4 billion active parameters. By routing the query to a specialized expert network within the model, it runs incredibly fast.

SPEAKER_01

Makes sense.

SPEAKER_00

It feels as snappy as a tiny 4 billion parameter model, but it draws on the deep, nuanced knowledge base of a 26 billion parameter heavyweight.

SPEAKER_01

And then there are these two tiny edge models, right? The E2B and E4B.

SPEAKER_00

Those are where the engineering gets profoundly clever. The E stands for effective parameters. They use a technique called per layer embeddings or PLE.

SPEAKER_01

The documentation gets pretty dense on PLE. Can you explain how that actually works and why it matters for someone, you know, just running this on a laptop?

SPEAKER_00

Yeah, of course. Think of your computer's RAM as a small, extremely fast workbench. Traditional models require you to load billions of dense parameters onto that workbench all at once.

SPEAKER_01

Which usually crashes my laptop.

SPEAKER_00

Exactly. It quickly overwhelms the memory of a standard phone or laptop. But what PLE does is fundamentally change those hardware requirements. It gives each layer of the model its own small lookup table.

SPEAKER_01

Okay.

SPEAKER_00

These tables take up space on your hard drive, but during processing, the model only grabs exactly what it needs for a specific token and swaps it onto the workbench instantly.

SPEAKER_01

Oh, so you are trading active RAM usage for quick lookups on your storage drive.

SPEAKER_00

Exactly the point. It allows these E2B and E4B models to run completely offline on a standard smartphone or even a tiny Raspberry Pi with near zero latency.

SPEAKER_01

That's amazing.

SPEAKER_00

And it does all that while maintaining the ability to natively process audio and text.

SPEAKER_01

That is just staggering. And regardless of the size, all of these Gemma 4 models have a new capability called thinking mode, right?

SPEAKER_00

They do, yeah. It is activated by inserting a specific think token into the model system prompt. Historically, models would just, you know, blurt out the most statistically likely answer immediately.

SPEAKER_01

The first thing that comes to mind.

SPEAKER_00

Exactly. But with the think token, the model is forced to explicitly output its internal step-by-step reasoning logic before it provides the final response. Oh, cool. Yeah. You can literally watch it, debate itself, catch its own math errors, and refine its logic, which dramatically reduces hallucinations.

SPEAKER_01

So bring this back to the listener. We have these powerful reasoning models available for free. Why should you care about running Gembo4 locally instead of just using, say, a cloud-based web chat?

SPEAKER_00

It fundamentally comes down to digital sovereignty. When you have a highly capable model running locally on your own silicon, you have total control. Oh, privacy. Yes. You don't have to send your private financial data, your proprietary company code, or your personal conversations to a cloud provider server. You own the infrastructure, and your data never leaves the room.

SPEAKER_01

So if Google is giving us these incredibly powerful sovereign models to run on our laptops, the next obvious question is how do we actually talk to them? Right. Because just typing questions into a chat box isn't going to cut it anymore if we want to build a real workflow.

SPEAKER_00

No, the paradigm has completely shifted. The latest developer tooling analysis explicitly states that AI is no longer autocomplete on steroids. Right. We have moved past code generation. The AI has really become a co-pilot, a reviewer, and increasingly an autonomous colleague.

SPEAKER_01

Aaron Powell There are five major trends shaping developer tools in 2026. And the one that really stood out is the massive return of the terminal, like the command line interface.

SPEAKER_00

It makes perfect sense when you think about it. The terminal is a developer's most powerful native environment. Tools like Claude Code and GitHub Copilot CLI have become agentic.

SPEAKER_01

What does that mean in practice?

SPEAKER_00

Well, you aren't just asking them for code snippets anymore. You give them root access. They can independently navigate your entire complex code base, run shell commands, commit changes to GitHub, and manage your software builds.

SPEAKER_01

It's like having an intern who works at the speed of light.

SPEAKER_00

Exactly.

SPEAKER_01

But for an intern to be genuinely useful, they need to know what's going on in the rest of the company. Which leads to the next trend, MCP, or the model context protocol.

SPEAKER_00

Yes, MCP is arguably the most critical piece of hidden infrastructure driving this revolution. In the past, if you wanted an AI to fix a bug, you had to manually copy and paste your database schema, your Jira tickets, and your Figma design files all into the chat window.

SPEAKER_01

Right, which was awful.

SPEAKER_00

It was a nightmare of context limits and manual updates.

SPEAKER_01

I think of MCP as essentially the USB-C cable for AI context.

SPEAKER_00

That's a great way to put it.

SPEAKER_01

Yeah, it's one universal open standard that lets you plug the AI's brain directly into any application you use. You just point it at your database or your Notion Workspace, and it independently pulls exactly what it needs to understand the problem in real time.

SPEAKER_00

Context is no longer a constraint, you know, it is a first-class resource. And once your AI has that deep universal context, we see the rise of the third trend: subagents.

SPEAKER_01

Subagents.

SPEAKER_00

Right. Instead of writing one massive generalist product asking the AI to build a whole application, developers are orchestrating specialized subagents.

SPEAKER_01

So you have like a planning agent that only writes the architecture, a coding agent that only writes the functions, and a testing agent that only writes the checks.

SPEAKER_00

Yes. And it makes the entire system debuggable. If a feature breaks, you know exactly which specialized subagent made the poor decision rather than trying to unravel a monolithic prompt that failed somewhere in the middle.

SPEAKER_01

Aaron Powell Here is where it gets really mind-bending, though. The fifth trend is adversarial agents, pitching AI against itself.

SPEAKER_00

Yeah, this is fascinating.

SPEAKER_01

I'm having a hard time visualizing this. Walk me through a concrete example of how this actually looks on the screen.

SPEAKER_00

So it mimics how high-functioning human engineering teams operate. Good code review is naturally adversarial, right? You want someone to aggressively poke holes in your logic. In this setup, you might have an open AI Codex model write a piece of software. Then a Google Gemini critic agent receives that code with explicit instructions to attack it. Attack it. Yeah. The critic agent writes brutal unit tests designed to break the software. It scans for memory leaks and it runs simulated security attacks. It actively tries to prove the first model wrong.

SPEAKER_01

Aaron Ross Powell Wait, hold on. You're talking about subagents writing the code and adversarial agents aggressively testing the code. Right. I'm struggling to see where the human actually fits into this loop. Are we just building a machine that sits in the corner, talks to itself, and locks us out of the process?

SPEAKER_00

Aaron Ross Powell That is the ultimate question about the future of work. The reality is that the human is no longer the code writer. The human developer elevates to the role of a multi-agent orchestrator.

SPEAKER_01

An orchestrator.

SPEAKER_00

Yes. Your job is to define the overarching architecture, set the safety boundaries, manage the context via MCP, and adjudicate the disputes when your adversarial agents fundamentally disagree. You transition from being a bricklayer to being the manager of a highly capable digital engineering team.

SPEAKER_01

That is a wild paradigm shift for software engineering. But what if you don't write code? Like what if you are a marketer, an educator, or a content creator?

SPEAKER_00

Well, the same principles of orchestration apply entirely outside of software development.

SPEAKER_01

A brilliant case study of this is a recent workflow created by Stephen G. Pope. He demonstrated how non-developers can orchestrate these exact same tools to build a fully automated two-person audio show generator completely for free, running right on your local machine.

SPEAKER_00

The technical stack he uses is really fascinating because it doesn't require traditional programming. Right. He uses a visual automation tool called N8N. Instead of writing lines of Python, you're literally dragging and dropping boxes on a visual canvas to connect different services.

SPEAKER_01

Oh, very cool.

SPEAKER_00

Yeah, and he pairs that with a free media processing toolkit called NCA and a local file storage system called Minio.

SPEAKER_01

So the workflow is a perfect example of agentic automation. Think of the canvas you described. The first node scrapes a dense blog post or a research transcript. It passes that text to an advanced reasoning model. He uses Claude Opus. But you could easily plug in a local Gemma 4 model here.

SPEAKER_00

Absolutely.

SPEAKER_01

The model is prompted to format that raw text into a conversational script featuring, you know, person one and person two.

SPEAKER_00

But it doesn't just write dialogue lines. The prompt explicitly programs in nonverbal cues. It inserts tags for laughs, pauses, sighs, and interruptions.

SPEAKER_01

To make it sound real.

SPEAKER_00

Yes. Then the next visual node sends that marked-up script to 11 labs to generate distinct, highly realistic audio voices for both characters, and a final node stitches those audio files together into a seamless show.

SPEAKER_01

But while the visual automation handles the logistics, the actual success of the output relies entirely on the art of the script, right?

SPEAKER_00

Completely.

SPEAKER_01

And AI needs incredibly strict, nuanced guidance to sound naturally human.

SPEAKER_00

Aaron Powell That's where the rules of audio structure are come into play. If you just tell an AI to write a script, it sounds like a textbook reading itself to sleep.

SPEAKER_01

Yeah, nobody wants to listen to that.

SPEAKER_00

No. You have to program your system prompt to start with a hook, recognizing that the first 30 seconds dictate whether a listener stays or leaves.

SPEAKER_01

You also have to explicitly instruct the AI to write for the ear, not the eye. That means forcing it to use short, punchy sentences, highly conversational language, and absolutely zero corporate jargon.

SPEAKER_00

Aaron Powell And using verbal signposts. Things like the first point I want to cover is, or let me give you a concrete example. It helps the listener follow along without a visual structure.

SPEAKER_01

And the guidelines also warn against overscripting. They suggest having the AI generate robust bullet points for the digital hosts to seamlessly riff on rather than reading a rigid script word for word.

SPEAKER_00

When you combine that highly engineered prompting with the NAN visual automation, you create a remarkably powerful content engine that just operates while you sleep.

SPEAKER_01

I love the DIY and H of this. It's like building your own private version of an automated studio, but you own the printing press.

SPEAKER_00

Exactly.

SPEAKER_01

But I have to ask, honestly, why bother? Why go through the trouble of dragging visual nodes, setting up local minio storage engines, and looping API calls when you could just pay 20 bucks a month for an off-the-shelf app that does this with one click?

SPEAKER_00

It comes right back to the benefits of the Gemma 4 release we discussed earlier: infinite customization and absolute sovereignty.

SPEAKER_01

Right, the privacy aspect.

SPEAKER_00

When you build the pipeline yourself, you can inject your own cloned voice into the nodes. You completely bypass corporate guardrails, content filters, and arbitrary rate limits. That makes sense. And most importantly, if you swap out cloud APIs for a local Gemma 4 edge model, you can run the entire pipeline securely on your own hardware. You can convert sensitive corporate documents into audio bruths without ever uploading that private data to the cloud.

SPEAKER_01

Well, we've covered some serious ground today. For you, the listener, we started by watching the tech giants battle for the global intelligence crown, dropping benchmark shattering models at breakneck speed.

SPEAKER_00

Yeah, and we saw how the fallout of that competition forced Google to release Gemma 4, putting frontier-level open weights intelligence with highly efficient RAM architecture directly onto your personal devices.

SPEAKER_01

Right. And we looked at how developers are completely changing their workflows, employing armies of adversarial subagents, and using tools like MCP to give them deep context to write and aggressively debug flawless software.

SPEAKER_00

And finally, we explored how anyone, regardless of coding ability, can use visual automation to string these highly capable models together, generating custom multi-voice audio content out of thin air.

SPEAKER_01

The ultimate goal here is helping you navigate information overload. These tools aren't just cool party tricks, they are the ultimate leverage.

SPEAKER_00

Absolutely.

SPEAKER_01

Whether you are orchestrating complex code architectures or automating your daily content consumption, you now have the ability to build an intelligent, tireless workforce right in your own terminal. You can learn faster and build smarter than ever before.

SPEAKER_00

But you know, if we pull all of these threads together, it leaves us with something quite profound to consider about the future we're building.

unknown

Ooh.

SPEAKER_01

What's that?

SPEAKER_00

Well, we've talked extensively today about how adversarial AI is used to perfectly debug code, aggressively hunting down and removing every possible mistake. And we've seen how AI audio scripts are carefully tuned to mimic natural, flawed human banter with artificial laughs and programmed pauses.

SPEAKER_01

Yeah, we are trying incredibly hard to simulate humanity in perfect systems.

SPEAKER_00

Precisely. But as our digital tools become completely frictionless, and as our AI colleagues become perfectly error-free, will we reach a point where human error actually becomes the most valuable sought-after commodity in art and content? Oh wow. When machines can debate each other perfectly on an automated audio track with flawless logic, perhaps the only thing worth listening to will be a genuinely flawed, unpredictable human perspective. Wow.

SPEAKER_01

An entire digital office building working flawlessly, just waiting for a human to walk in and make a beautiful mistake. That is a lot to think about. Thank you for joining us on this journey today. We'll see you on the next deep dive.