It's Just Weights | Michael Ayles

Last month, Cortical Labs published a demo of their CL1 system: 200,000 human brain cells, grown on a multielectrode array, playing a simplified version of DOOM. The internet lost its mind. Hacker News invoked “I Have No Mouth and I Must Scream.” Others debated whether Cortical Labs had built the Torment Nexus. To paraphrase one commenter: we grew a brain on a petri dish, gave it a shotgun, and sent it to hell.

I think this is bloody cool. I also think the panic is completely unwarranted. Here’s why.

What surprised me most wasn’t the demo itself. It was the reaction. This is a technical audience. Most of HN has trained a neural network, or at least understands what one is. And yet the thread filled up with people projecting consciousness onto 200,000 unstructured cells doing stimulus-response conditioning. The word “neuron” is doing a lot of heavy lifting here. Swap it for “parameter” and nobody bats an eye.

What’s actually happening

The CL1 architecture is not “neurons playing DOOM.” It’s a sandwich. Two conventional ML systems with biological neurons in the middle.

The CL1 architecture. The biological neurons sit between two conventional ML systems.

The game runs inside VizDoom, which exposes the screen buffer at 320x240 plus raycast data and game state variables. A CNN encoder, trained via PPO on a GPU, processes that visual and spatial data and converts it into electrical stimulation patterns: frequency, amplitude, pulse timing, and channel selection across the CL1’s 59 electrodes. These pulses are fed to the biological neurons. The neurons produce spike patterns in response. A zero-bias linear decoder (also running on silicon) reads those spike counts and maps them to game actions: turn left, turn right, shoot, move forward.

A docstring in Sean Cole’s own code puts it bluntly: “the CL1 device performs NO computation.” The PyTorch models and game logic all live elsewhere.

The neurons sit in the middle doing what neurons do: adapting their connection strengths in response to repeated stimulation. When a particular input pattern consistently precedes a reward signal, the connections that produced the “right” output get stronger. The ones that didn’t get weaker. Cole, the independent developer who built the integration in about a week using Cortical Labs’ Python API, was careful to include ablation modes so anyone can test whether the biological layer actually contributes. When you feed random or zero spikes into the decoder (bypassing what the neurons actually produce), performance drops to chance. When you restore the real spikes, performance returns. The neurons contribute. Their connection weights encode useful behaviour.

But here’s the thing Cole also documents: the decoder “tends to start becoming a policy head,” meaning the silicon learns to route around the neurons entirely. He had to add zero-bias constraints and ablation checks specifically to prevent the conventional ML from simply taking over. RDWorld ran 601 software-only replications using the SDK’s random spike simulator to examine exactly this question. The conclusion: there is learning in the tissue, but almost all the machinery that decides what counts as success, the CNN, the PPO loop, the reward function, the scenario curriculum, lives off-chip.

That’s not consciousness. That’s not cognition. That’s stimulus-response conditioning, sandwiched between two conventional ML systems that do most of the heavy lifting.

I reproduced the policy in 132 parameters

The CL1’s encoder takes a 320x240 screen buffer, 12 raycasts, and game state variables, runs them through a CNN, and compresses all of that into stimulation patterns across 59 electrodes. That’s a lot of machinery. But all of it exists to answer three questions: where is the enemy relative to me, how far away is it, and can I see it? The encoder is a compression step. The policy the neurons have to learn, “given stimulation that means target-left, produce spikes that mean turn-left,” is the same regardless of how richly it’s encoded.

To demonstrate how small that policy actually is, I built a system in the browser that skips the encoding overhead and feeds those three decision-relevant variables (enemy angle, distance, visibility) directly into a tiny feedforward neural network. 132 parameters total. Four possible actions. Same reward structure. It learns the same behaviour. In seconds.

The entire network. 3 inputs, 16 hidden neurons, 4 outputs. 132 parameters total.

Three input neurons feed into sixteen hidden neurons (tanh activation), which feed into four output neurons (softmax). The network picks an action by sampling from the output probability distribution. A policy gradient algorithm adjusts the weights after each batch of episodes, strengthening connections that led to good outcomes and weakening ones that didn’t.

That’s it. No frameworks. No GPU. The entire network is hand-rolled in about 150 lines of TypeScript.

You might wonder how 132 numbers can match the policy that 200,000 biological neurons learn. The answer is that the task is small, not the neurons. The entire decision is “turn toward stimulus and act.” Strip away the CNN encoder, the PPO training loop, the VizDoom interface, and the electrode stimulation mapping, and the policy that remains, the thing the biological neurons actually contribute, compresses to a handful of connection strengths. You could probably solve it with 20 parameters. The CL1 chip actually grows around 800,000 neurons in total, of which roughly 200,000 were used in this demo. Those neurons aren’t wired efficiently for this task. They’re an unstructured organoid, cells that grew into a random clump on an electrode array. Most of those neurons are either doing nothing useful, redundantly encoding the same signal, or maintaining biological overhead (membrane potential, ion channels, metabolic processes) that exists to keep the cell alive, not to compute. It’s the same reason a CPU has billions of transistors but only a fraction are doing the actual arithmetic for any given instruction.

You can toggle the demo between “water the flower” and “shoot the demon” mode. The weights form identically. The network has no concept of either. The semantics are entirely in your head.

Watch it learn:

Let it run at full speed for a few thousand episodes, then compare trained behaviour to an untrained model using the snapshot controls. Toggle ablation to evaluate the model with random outputs, approximating the network's influence. Open full screen.

Within a few hundred episodes, the network reliably turns toward the target, closes distance, and acts when aimed. Try the ablation toggle in the controls. Performance drops to chance immediately, exactly like the CL1 control condition. Toggle it off. Performance returns. Same test, same result, 132 parameters instead of 200,000 neurons.

What you’re watching

In the first few episodes, the weights are random and the agent spins aimlessly. This is the ablation baseline, equivalent to the CL1 condition where random spikes are fed to the decoder.

By episode ten or so, the connections from the “enemy angle” input to the “turn left” and “turn right” outputs start differentiating. If the target is left, the weight toward “turn left” strengthens. You can see this happening in real time in the network diagram, connections going red (strong positive) or blue (strong negative).

By episode fifty, the agent turns and acts reliably. The full behaviour. The CL1 biological neurons took about a week of continuous stimulation to get here.

The weight diagram tells the whole story. The three inputs and four outputs are connected by 132 numbers. Before training, those numbers are random noise. After training, they encode the policy: “turn toward the target, act when aimed.” The entire learned behaviour is literally just weights.

The gym analogy

Your bicep doesn’t know it’s lifting a dumbbell. It receives a nerve impulse, contracts, and if you keep providing the same stimulus (progressive overload), the muscle fibres adapt. They get stronger. The connection between “nerve impulse” and “force output” changes. Nobody calls a bicep conscious for adapting to stimulus.

The mechanism is different in neurons: muscles adapt by adding physical tissue, neurons adapt by altering the chemical and electrical conductivity of their synapses. But the principle is the same: repeated stimulus changes connection strength. Learning in biological neurons works similarly to the bicep in this abstract sense. “Neurons that fire together wire together” is the Hebbian shorthand. A connection that’s active when a reward arrives gets strengthened. A connection that’s active when punishment arrives gets weakened. This is in the same family of credit-assignment strategies as policy gradient methods in machine learning. The update direction is similar: reinforce what worked, weaken what didn’t. The mechanisms differ (reward-modulated synaptic plasticity versus backpropagated gradients through a loss function), but the end result is the same: connection strengths change to encode a better policy.

The CL1 neurons receive encoded stimulation as electrical pulses. They produce spikes. The connections between input and output adapt based on reward feedback. Replace “electrical pulses” with “floating point numbers” and “spikes” with “activations” and you have my 132-parameter network. Replace “connection strengths” with “weights” and the same principle is at work in both systems.

Steve Furber, ICL Professor of Computer Engineering at the University of Manchester and co-designer of the ARM processor, noted that we still don’t fully understand how the neurons are playing the game or how they know what’s expected of them. This is an honest assessment. But “we don’t fully understand the mechanism” is a long way from “it’s conscious.” We don’t fully understand how my 132 parameters converge on a policy either, that’s the nature of gradient-based optimisation. The weights work. We can measure that they work. The “how” is an open research question, not evidence of a mind.

Brett Kagan, Cortical Labs’ Chief Scientific Officer, told New Scientist that the neurons are alive and biological, but really what they are being used as is a material that can process information in very special ways that can’t yet be recreated in silicon. Material. Not a mind.

Nobody is worried about my network becoming conscious. Nobody should be worried about 200,000 unstructured neurons in a dish learning the same trivially simple policy.

What’s actually cool

The interesting thing about CL1 is not the consciousness debate. It’s the drug testing angle.

If you have a biological system that exhibits measurable, reproducible learning, you have a drug testing assay. Expose the neurons to a compound, measure the change in learning rate or asymptotic performance, and you have a quantitative signal of neurological effect. This is what Cortical Labs has been talking about: using the platform to test experimental compounds on human neural tissue without human subjects.

That is a new capability with real applications. The Python SDK that lets you programmatically interface with biological neural tissue, stimulate it, record from it, and measure learning, that’s interesting technology with real applications in pharmaceutical screening.

The “DOOM” framing is marketing. Effective marketing, clearly, but it obscures the actual contribution.

Where the line actually is

I want to be clear: the ethical question about biological neural systems is real. It’s just premature here.

200,000 unstructured organoid neurons doing stimulus-response conditioning is not qualitatively different from what my 132 parameters do. For comparison, estimates for fruit fly neuron counts range from around 100,000 to 200,000 depending on methodology, with the FlyWire connectome mapping 139,255 neurons in the adult female brain. The fruit fly’s neurons are a connectome: a structured, evolved architecture with sensory feedback loops, memory circuits, dopaminergic learning systems, and decision-making pathways that took millions of years of evolution to wire up. The CL1’s neurons are a random organoid: cells that grew into an unstructured clump on a 59-electrode array. Comparable neuron count, fundamentally different thing. One navigates, courts, learns, and grooms. The other turns toward a stimulus when you zap it.

I don’t think this is anywhere close to the line, and I think the people panicking are confusing the word “neuron” with the word “mind.” The interesting and important questions are about what happens when someone grows an organised cortical structure with feedback loops, hierarchical processing, and persistent memory. When biological systems start exhibiting behaviours that simple parameter-matching can’t reproduce. That’s the line to watch for. A dish of cells producing stimulus-response adaptation is not that.

The point

The coolest thing about this story isn’t the consciousness debate. It’s that we can now programmatically interface with biological neural tissue and measure learning. That’s a new capability with real applications in drug discovery and neuroscience research.

The scary version of this story requires organised neural architecture, sensory feedback loops, and emergent computation that exceeds what simple weight adjustment can produce. We’re not there. What we have is weights. Adjustable, measurable, reproducible weights.

Just like the ones in my browser. Just like the ones in your bicep. There’s no magic here. It’s just weights.

View the source code for the 132-parameter demo. The full network is in src/network/network.ts.