Gemma 4 31b frequently overloaded

Hi, I'm trying out your service because it's the only service at the moment that serves Gemma 4 31b with a decent token rate (throughput).

However, in many requests, it responds with 429: "The model is currently overloaded. Please try again later.". This defeats the purpose of switching to your service, because even if I implement retry logic, the overall response will take longer and it would be the same as using a slower and cheaper service.

I was wondering if there's an ETA for fixing this and improving the reliability of Gemma 4 31b?

Thanks,

Max

Please authenticate to join the conversation.

Upvoters
Status

New Submission

Board
🐛

Bugs

Date

9 days ago

Author

Max Loh

Subscribe to post

Get notified by email when there are changes.