Hacker News

Evolution: Training neural networks with genetic selection achieves 81% on MNIST

6 points by AsyncVibes ago | 2 comments

AsyncVibes [-]

I've been working on GENREG (Genetic Regulatory Networks), an evolutionary learning system that trains neural networks without gradients or backpropagation. Instead of calculating loss derivatives, genomes accumulate "trust" based on task performance and reproduce through trust-based selection. Training uses GPU but inference runs on low-end CPUs. Today I hit 81.47% accuracy on the official MNIST test set using pure evolutionary pressure. The Setup Architecture: Simple MLP (784 → 64 → 10) Population: 200 competing genomes Selection: Trust-based (high performers reproduce) Mutation: Gaussian noise on offspring weights Training time: ~600 generations, ~40 minutes Results MNIST (64 neurons, 50K params): 81.47% test accuracy Best digits: 0 (94%), 1 (97%), 6 (85%) Hardest: 5 (61%), 8 (74%), 3 (75%) The 32-neuron version (25K params) achieved 72.52% - competitive performance with half the parameters. UMAP embeddings reveal the learning strategy: 32-neuron model: Can't separate all 10 digits. Masters 0 and 1 (>90%) but confusable digits like 5/3/8 collapse into overlapping clusters. 64-neuron model: Clean 10-cluster topology with distinct regions. Errors at decision boundaries between similar digits. Key Discoveries

Fitness signal stability is critical: Training plateaued at 65% with 1 random image per digit. Variance was too high. Switching to 20 images per digit fixed this immediately. Child mutation drives exploration: Mutation during reproduction matters far more than mutating existing population. Disabling it completely flatlined learning. Capacity forces trade-offs: The 32-neuron model initially masters easy digits (0, 1) then evolutionary pressure forces it to sacrifice some accuracy there to improve hard digits. Different optimization dynamic than gradient descent.

Most MNIST baselines reach 97-98% using 200K+ parameters. GENREG achieves 81% with 50K params and 72% with 25K params, showing strong parameter efficiency despite lower absolute ceiling. Other Results Alphabet recognition (A-Z): 100% mastery in ~1800 generations Currently testing generalization across 30 font variations Limitations Speed: ~40 minutes to 81% vs ~5-10 minutes for gradient descent Accuracy ceiling: Haven't beaten gradient baselines yet Scalability: Unclear how this scales to larger problems Current Experiments Architecture sweep (16/32/64/128/256 neurons) Mutation rate ablation studies Curriculum learning emergence Can we hit 90%+ on MNIST? Minimum viable capacity for digit recognition?

RaftPeople |root |parent [-]

Fun stuff. I built a system like this for artificial life years ago (neural network was the brain).

I'm curious how you handled the challenges around genotype>>>phenotype mapping? For my project the neural network was fairly large and somewhat modular due to needing to support multiple different functions (i.e. vision, hearing, touch, motor, logic+control, etc.) and it felt like the problem would be too challenging to solve well (to retain general structure of network so retaining existing capabilities but also with some variation for new) so I punted and had no gene's.

I just evolved each brain based on some high level rules. The most successful creatures had a low percentage chance of changing any neuron/connection/weight/activation function/etc. and less successful creatures had a higher percentage chance of changes with the absolute worst just getting re-created entirely.

Things I noticed that I thought were interesting, wondering what things you've noticed in yours:

1-Most successful ones frequently ended up with a chokepoint, like layer 3 out of 7 where there was a smaller number of neurons and high connectivity to previous neurons.

2-Binary/step activation function ended up in successful networks much more frequently than I expected, not sure why.

3-Somewhat off topic from digit recognition but an interesting topic about ANN evolution: how to push the process forward? What conditions in the system would cause the process to find a capability that is more advanced/indirectly tied to success. For example, vision and object recognition: what is a precursor step that is valuable that the system could first develop. Also, how to create a generic environment where those things can naturally evolve without trying to steer the system.