“usable multi-modal comprehension and generation in your browser in just 1B parameters”
In the last month or so, everyone’s been talking about Deepseek’s R1 and sometimes the V3.
But buried in all the model lists and statistics is the fact that Deepseek has made several very significant advancements for edge computing, and released a multi-modal model that is good on the edge as well. My colleague and friend, Colin McNamara, covered our discussions at the Austin Langchain office hours a couple of weeks ago in this piece that reflects my thoughts on it. He points to the WebGPU (browser based) version of R1.
In our Deepseek discussions, we also covered the JanusPro release from Deepseek which didn’t get as much attention, and came a few days after the R1 release.
Well now every reader can try it out for themselves right on your own machine - in your browser by clicking into the JanusPro WebGPU demo on HuggingFace.
This will load 1-2GB model directly into your web browser’s temporary disk space. You can recoup the drive space by utilizing your browser’s ‘delete browsing data’ feature, when you’re done testing:
Image generation
I started it up and just hit the first button that auto-generated for me as a test prompt. The generation took about 8 seconds. I do have a RTX-4090 that the Chrome browser is taking advantage of with ‘WebGPU’ so your timing may vary. But with 1B parameters, it will likely run on any GPU, it will just take more than 8 seconds to complete the image generation.
But if you’ve ever wanted to generate your own images on your own machine, this is a good start.
A cute and adorable baby fox with big brown eyes, autumn leaves in the background enchanting, immortal, fluffy, shiny mane, Petals, fairyism, unreal engine 5 and Octane Render, highly detailed, photorealistic, cinematic, natural colors.
Image comprehension
I had just received an email about my cousin’s new album, so I decided to see how JanusPro would describe the abstract art and if it would be able to read the text. As you can see it struggled a bit with the text. It got the band name correct, “Capital Cities”, but fudged the AHURRICANEOFFROWNS text, which is separated not by spaces but by color, which, I get would be tricky. But then again, this is VERY VERY good for a 1B parameter model running in my Chrome sandbox with no setup and needing no internet connection, whatsoever.
The image features a surreal landscape with vibrant, swirling colors. It depicts a fantastical, otherworldly scene with towering, jagged mountains and a winding river that meanders through the landscape. The sky is filled with swirling clouds and stars, creating a dreamlike atmosphere. Two large, glowing eyes dominate the scene, one on the left and the other on the right, giving the impression of a cosmic or celestial theme. The text "CAPITAL CITIES" and "ANOTHER OF FROWNS" is prominently displayed in the upper right corner, suggesting that this is the cover art for an album or a music release. The overall design is psychedelic and imaginative, evoking a sense of wonder and mystery.
This is the 1B parameter version, JanusPro has a 7B parameter version that should work on consumer graphics cards as well. It is not yet available on Ollama.com’s model section officially, and I have not had much luck with some of the non-official ones there.
Whether you're a developer looking to integrate AI capabilities into your web applications or an enthusiast interested in exploring AI locally, JanusPro represents a significant step forward in accessible, powerful AI tools.
I'm curious what people create with their own machines - please post here if you want to share your generated image.