article explaining for three AI tasks, image labeling, image captioning and speech transcription, how to do them locally in browser (w a local webserver). The speech transcription used whisper as local model, I prefer Nvidia Parakeet for its multilingual capabilities. But the setup is interesting. It realistically describes on-device speeds (on M2 a 2 to 5x transcription vs real time. But you can deploy these as webworkers nicely it seems
via [[Stephen Downes p]]