Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow. I've suspected for a while that Bard's performance has been limited mostly by cost. Google isn't charging for Bard and they didn't want to run a gigantic model for everyone for free forever. Maybe they made a breakthrough in inference cost for their better models? Or maybe they got tired of everyone clowning on them for being behind and decided to eat the cost for a while.

I still think they ought to launch a subscription so we can see their absolute best model running in public.



The trick is to access the "bard-jan-24-gemini-pro" model, available in direct chat mode here: https://chat.lmsys.org/. Significantly better than the prior model.


how odd! What exactly is lmsys using? Some hidden API that google give them so they can have a better ranking there?


Most likely through this platform: https://console.cloud.google.com/vertex-ai


Thanks. I managed to google and get two different API endpoints.

From the vertex ai:

    API_ENDPOINT="us-central1-aiplatform.googleapis.com"
    PROJECT_ID="test00"
    MODEL_ID="gemini-pro"
    LOCATION_ID="us-central1"
    
    curl \
    -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/publishers/google/models/${MODEL_ID}:streamGenerateContent" -d '@request.json'
and from the makersuite:

    curl \
      -X POST https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=${API_KEY} \
      -H 'Content-Type: application/json' \
      -d '@request.json'


Created a simple app to test Gemini here:

https://github.com/dssjon/gemini/blob/main/app.py


> Some hidden API that google give them so they can have a better ranking there?

I don't know about that second part - but it would make sense that google (and others) may want to use lmsys's arena to benchmark their models.

After all, Human A/B tests are far better then the current automated benchmarks.

I would like more info from lmsys as to how they're accessing these though.


Thanks for sharing. Is this a free way to access GPT4-turbo then or are there some limitations?


New information from a Google employee: this new leaderboard entry (Bard - Gemini Pro) is a different fine-tune than the previous one (Gemini Pro - Dev API), but more importantly it "has access to the Internet" which I assume means it uses Google Search when generating answers. I bet this is responsible for the boost!

Does anyone know if the GPT-4 Turbo version used on the leaderboard has access to web search? I always assumed it did not, but now it doesn't seem like an apples-to-apples comparison.

https://x.com/asadovsky/status/1750983142041911412?s=20

Edit: I used the "Direct Chat" feature on lmsys to ask Bard and GPT-4 Turbo "What is the current price of Bitcoin?". Sure enough GPT-4 Turbo said it can't browse the Internet and Bard gave a real time answer from Google Search. This means GPT-4 outperforms Bard overall even without the ability to browse the web at all. Pretty impressive.

These seem like different categories; one is a model and one is a system with a model plus tools. I think it is useful to compare them, since there is a real difference in user experience. However, they ought to be prominently marked as different categories. And the lmsys guys ought to put a ChatGPT model on the leaderboard with its own search integration enabled, for a fairer comparison. And it would be cool to have other LLM+tools entries like Perplexity, Phind, etc.


I think their play is the same as always

it's better to let more people interact with it because this will help training the model (get more data) so it must be free to use.


Google do have an inference advantage with TPUs.

Everyone else needs to pay nvidia margins.

Training is murkier as it’s more about the total performance and scalability of the system.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: