In this paper, we briefly compare the two models in terms of average processing time, failure rate, and the difference between query length and processing time growth.
The three models compared are
models = ["whisper-large-v3", "whisper-1"]
The whisper-large-v3
uses the Groq API and the whisper-1
uses the OpenAI API.The API usage is Python Client Library as follows
groq_client = Groq(
api_key=os.environ["GROQ_API_KEY"],
)
openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
The test set will generate languages of different lengths and in both Chinese and English using the 20 TTS named.
The test will traverse the test set one by one using a loop and record the query length, processing time and failure rate. Every 10 queries tested will be paused for 60 seconds to avoid issues such as exceeding the limited request rate. The test set does not include identical text, so theoretically there are no caching issues.
Model | Average time | Fail rate | Overall query |
---|---|---|---|
whisper-large-v3 | 1.998681701719761 | 0.0 | 136.296 |
whisper-1 | 0.8740840554237366 | 0.0 | 136.296 |
LLM-Comparison by Haozhe Li is licensed under CC BY-NC 4.0