Comparative Evaluation of Publicly Accessible LLM APIs for Technical Prompt Response: A Case Study Using Gemini, OpenRouter, and GroqCloud

  • Unique Paper ID: 176294
  • Volume: 11
  • Issue: 11
  • PageNo: 6339-6344
  • Abstract:
  • The rapid evolution of large language models (LLMs) has made high-performance natural language understanding and generation accessible through public APIs. However, developers and researchers often face challenges in determining which LLMs provide the most relevant, accurate, and useful outputs for domain-specific tasks such as technical queries and coding assistance. This paper presents a comparative evaluation of three widely available LLMs—Gemini 1.5 Flash (via Google AI Studio), OpenRouter (Nemotron-49B), and GroqCloud (Llama 4 Scout 17B)—using a custom-built Streamlit interface. The system prompts all three models simultaneously and displays their responses side by side, followed by an automatic evaluation using GroqCloud(deepseek-r1-distill-llama-70b)", which scores the outputs on key metrics: Relevance, Technical Accuracy, Clarity, Code Correctness, and Usefulness. The responses are assessed based on prompts in the tech and programming domains. Results are visualized both in tabular and graphical forms, revealing notable performance variations across the models. Gemini and OpenRouter consistently achieved top scores across all metrics, while GroqCloud demonstrated slightly lower performance in clarity and technical precision. This study provides actionable insights for developers seeking the most effective LLM for real-world technical applications and lays a foundation for automated LLM evaluation pipelines using multi-model comparisons.

Related Articles