Phi3 vs Llama3: the new generation LLMs

Apr 26, 2024

The new language models such as Phi3 for me are a revolution because being much smaller has several advantages: they are faster, therefore cheaper, and you can run them on virtually any platform.

Llama3

Meta has recently released the new family of large language models (LLMs) Llama3 which promises to be much more powerful than the previous generation, Llama2.

On this page you can read the benchmarks and comparison with the previous version, and with other quite powerful models such as Claude Sonnet (Anthropic's “average” model), GPT-3.5, Mistral and of course with Llama2.

In these synthetic tests you can see how the score of the new Llama3 model improves substantially to the different models it exposes, but as always, the best we can do is to test it because it is Open Source.

You can run the simplest model with the following command (tutorial to run a model with Ollama):

ollama run llama3:8b

I have been testing it these days and the truth is that you can notice a substantial improvement in performance. But the surprise for me is not here, but in the news that appears a few days later...

Phi3

Shortly after Meta's release of Llama3, Microsoft [released a series of new Phi3 models](https://news.microsoft.com/source/latam/features/ia/pequenos-pero-poderosos-los-modelos-de-lenguaje-pequeno-de-phi-3-con-gran-potencial/) that I found to be brutal after testing them.

I have been amazed at how fast they run locally, along with the quality of the results in the various tests I have done. I want to keep testing them in depth to see how far they go.

The interesting thing is that this new family of models is, on the one hand much more compact (less parameters, less size, less computation) and on the other hand much more powerful than Llama3 (or so they say).

You can run the model locally with the following command (tutorial to run a model with Ollama):

ollama run phi3

For example the Phi3-mini model, although it claims to have a context window of 4K or 180K, depending on the model. in the tests I have done with an input of 1k~2k tokens, it missed the instruction and did not perform the task. On the other hand for inputs of <1k tokens, the performance is very good, and also the speed is incredible compared to other models.

Conclusion

It is very interesting the world of language models, and in my case the small language models seem to me very interesting to perform tasks because of their lower cost and speed of response. It is a matter of time that they will improve.

Phi3 vs Llama3: the new generation LLMs

Llama3

Phi3

Conclusion

Ideas, tools & strategies to build your idea in record time