For most individuals, the time period “Apple silicon” brings to thoughts powerhouse processors just like the M4 Max. Since Apple went by way of a prolonged Intel part previous to the event of their M-series chips, it’s usually assumed that these are their first customized processors. However twenty years in the past, Apple had completely different customized silicon of their computer systems — PowerPC microprocessors.
The benefits of these earlier chips weren’t as clear minimize because the M-series chips. Diehard Apple followers swore that they have been superior, whereas the PC crowd wouldn’t contact them with a ten-foot pole. However in any case, they’re a pair many years outdated at this level, so they don’t have numerous gasoline left within the tank. Nevertheless, Andrew Rossignol doesn’t consider that the tank is empty simply but. Rossignol just lately demonstrated {that a} PowerBook G4 from 2005 is able to getting in on the motion of operating fashionable synthetic intelligence (AI) algorithms — with some caveats, after all.
Course of completely different
Rossignol, a classic computing fanatic, efficiently ran a big language mannequin (LLM) on a 1.5GHz PowerBook G4, a machine with simply 1GB of RAM and a 32-bit processor. The experiment used a fork of llama2.c, an open-source LLM inference engine initially developed by Andrej Karpathy. Given the {hardware} constraints of the PowerBook, Rossignol selected the TinyStories mannequin, a comparatively small mannequin with 110 million parameters that was designed particularly for producing easy brief tales.
Operating an inference on a PowerPC (: Andrew Rossignol)
To make this work, Rossignol needed to modify the unique software program to accommodate the PowerPC’s big-endian structure, which differs from the little-endian format that almost all fashionable processors use. This concerned changing mannequin checkpoints and tokenizer knowledge to the suitable format, making certain that numerical knowledge was processed appropriately. Moreover, the reminiscence alignment necessities of the growing older PowerPC chip meant that weights needed to be copied into reminiscence manually, reasonably than being memory-mapped as they might be on an x86 system.
Nicely, technically it really works
Efficiency was, predictably, not so good. Operating the mannequin on an Intel Xeon Silver 4216 processor achieved a processing velocity of 6.91 tokens per second. The identical mannequin on the PowerBook G4, nevertheless, managed simply 0.77 tokens per second — taking a full 4 minutes to generate a brief paragraph of textual content.
To enhance efficiency, Rossignol leveraged AltiVec, the PowerPC’s vector processing extension. By rewriting the core matrix multiplication perform utilizing AltiVec’s single instruction, a number of knowledge capabilities, he was in a position to enhance inference velocity to 0.88 tokens per second — a modest enchancment, however it’s important to take what you may in a undertaking like this.
Regardless of the gradual efficiency, the truth that a 20-year-old laptop computer may efficiently run a contemporary AI mannequin in any respect is spectacular. The PowerBook’s outdated structure, restricted RAM, and lack of specialised accelerators posed numerous challenges, however cautious software program optimizations and a deep understanding of the {hardware} allowed Rossignol to push the system nicely past its anticipated limits.