Suppose you are implementing RAG for your AI app, and you have used web APIs or an inference engine to generate a large number of embeddings, and now you need to find out the best results matching a query embedding, what do you do?
MLX is a full-featured machine learning framework, with easy-to-understand source code and small binary sizes. And node-mlx is the JavaScript binding of it.
MLX only has GPU support for macOS, and but its CPU support, implemented with vectorized instructions, is still fast on Linux.
(If you are wondering how we can compute cosine similarities between a 1x1 tensor and a 1xN tensor, it is called broadcasting.)
Once you get the scores array, you can use the usual JavScript code to filter and sort the results. But you can also use MLX if the number of results is large enough to make JavScript engine struggle.
The array.index(mx.Slice(null, null, -1)) code looks alien, it is actually the JavaScript version of Python's array[::-1], which reverse the array. You can of course convert the result to JavaScript Array frist and then call reverse(), but it would be slower if the array is very large.