[Bug]: Rate Limit Errors when using with PaperQA #7358

gurugecl · 2024-12-22T03:14:44Z

What happened?

Starting recently I keep getting rate limit errors when using models like Gemini flash 2.0 even though I should be below the rate limit based on the number of requests I'm initiating. Previously this was working fine. I am using LiteLLM via PaperQA. There seems to also be an async issue but that was not previously causing a rate limit error but not sure if that's related. I tried a number of ways to avoid hitting the rate limit but so far none have worked so any assistance with this would be greatly appreciated.

https://github.com/Future-House/paper-qa

I've also seen this message when using gpt-4o but it still works without an issue then

AFC is enabled with max remote calls: 10.

Below is how I am setting up the Settings object which then throws the rate limit error.

from paperqa import Docs, Settings

settings = Settings(
        llm="gemini/gemini-2.0-flash-exp",
        summary_llm="gemini/gemini-2.0-flash-exp",
        llm_config={
            "model_list": [{
                "model_name": "gemini/gemini-2.0-flash-exp",
                "litellm_params": {
                    "model": "gemini/gemini-2.0-flash-exp",
                    "api_key": os.environ.get('GEMINI_API_KEY'),
                }
            }]
        },
        summary_llm_config={
            "model_list": [{
                "model_name": "gemini/gemini-2.0-flash-exp",
                "litellm_params": {
                    "model": "gemini/gemini-2.0-flash-exp",
                    "api_key": os.environ.get('GEMINI_API_KEY'),
                }
            }]
        }
    ) 

max_choices = len(list(docs.docnames))
settings.answer.answer_max_sources = max_choices
settings.answer.evidence_k = relevancy * max_choices

model_response = docs.query(model_input, settings=settings)

Relevant log output

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.APIConnectionError: <asyncio.locks.Event object at 0x3855b0950 [unset]> is bound to a different event loop
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/main.py", line 421, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/vertex_ai_and_google_ai_studio/gemini/vertex_and_google_ai_studio_gemini.py", line 1206, in async_completion
    response = await client.post(api_base, headers=headers, json=request_body)  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 138, in post
    raise e
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 100, in post
    response = await self.client.send(req, stream=stream)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1661, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1763, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 143, in handle_async_request
    raise exc
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 113, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 186, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 224, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 35, in read
    return await self._stream.receive(max_bytes=max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/streams/tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/streams/tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1142, in receive
    await self._protocol.read_event.wait()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/asyncio/locks.py", line 210, in wait
    fut = self._get_loop().create_future()
          ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/asyncio/mixins.py", line 20, in _get_loop
    raise RuntimeError(f'{self!r} is bound to a different event loop')
RuntimeError: <asyncio.locks.Event object at 0x3855b0950 [unset]> is bound to a different event loop

03:21:53 - LiteLLM:INFO: utils.py:2977 - 
LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini

LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxx "HTTP/1.1 429 Too Many Requests"
03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

03:21:53 - LiteLLM:INFO: utils.py:2977 - 
LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxxxx "HTTP/1.1 429 Too Many Requests"

LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxx "HTTP/1.1 429 Too Many Requests"
03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

1.45.0

Twitter / LinkedIn details

No response

The text was updated successfully, but these errors were encountered:

gurugecl added the bug Something isn't working label Dec 22, 2024

github-actions bot added the mlops user request label Dec 22, 2024

gurugecl changed the title ~~[Bug]: Rate Limit Errors with models other than OAI~~ [Bug]: Rate Limit Errors when using with PaperQA Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Rate Limit Errors when using with PaperQA #7358

[Bug]: Rate Limit Errors when using with PaperQA #7358

gurugecl commented Dec 22, 2024 •

edited

Loading

[Bug]: Rate Limit Errors when using with PaperQA #7358

[Bug]: Rate Limit Errors when using with PaperQA #7358

Comments

gurugecl commented Dec 22, 2024 • edited Loading

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

gurugecl commented Dec 22, 2024 •

edited

Loading