Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Rate Limit Errors when using with PaperQA #7358

Open
gurugecl opened this issue Dec 22, 2024 · 0 comments
Open

[Bug]: Rate Limit Errors when using with PaperQA #7358

gurugecl opened this issue Dec 22, 2024 · 0 comments
Labels
bug Something isn't working mlops user request

Comments

@gurugecl
Copy link

gurugecl commented Dec 22, 2024

What happened?

Starting recently I keep getting rate limit errors when using models like Gemini flash 2.0 even though I should be below the rate limit based on the number of requests I'm initiating. Previously this was working fine. I am using LiteLLM via PaperQA. There seems to also be an async issue but that was not previously causing a rate limit error but not sure if that's related. I tried a number of ways to avoid hitting the rate limit but so far none have worked so any assistance with this would be greatly appreciated.

https://github.com/Future-House/paper-qa

I've also seen this message when using gpt-4o but it still works without an issue then

AFC is enabled with max remote calls: 10.

Below is how I am setting up the Settings object which then throws the rate limit error.

from paperqa import Docs, Settings

settings = Settings(
        llm="gemini/gemini-2.0-flash-exp",
        summary_llm="gemini/gemini-2.0-flash-exp",
        llm_config={
            "model_list": [{
                "model_name": "gemini/gemini-2.0-flash-exp",
                "litellm_params": {
                    "model": "gemini/gemini-2.0-flash-exp",
                    "api_key": os.environ.get('GEMINI_API_KEY'),
                }
            }]
        },
        summary_llm_config={
            "model_list": [{
                "model_name": "gemini/gemini-2.0-flash-exp",
                "litellm_params": {
                    "model": "gemini/gemini-2.0-flash-exp",
                    "api_key": os.environ.get('GEMINI_API_KEY'),
                }
            }]
        }
    ) 

max_choices = len(list(docs.docnames))
settings.answer.answer_max_sources = max_choices
settings.answer.evidence_k = relevancy * max_choices

model_response = docs.query(model_input, settings=settings)

Relevant log output

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.APIConnectionError: <asyncio.locks.Event object at 0x3855b0950 [unset]> is bound to a different event loop
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/main.py", line 421, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/vertex_ai_and_google_ai_studio/gemini/vertex_and_google_ai_studio_gemini.py", line 1206, in async_completion
    response = await client.post(api_base, headers=headers, json=request_body)  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 138, in post
    raise e
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 100, in post
    response = await self.client.send(req, stream=stream)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1661, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1763, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 143, in handle_async_request
    raise exc
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 113, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 186, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 224, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 35, in read
    return await self._stream.receive(max_bytes=max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/streams/tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/streams/tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1142, in receive
    await self._protocol.read_event.wait()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/asyncio/locks.py", line 210, in wait
    fut = self._get_loop().create_future()
          ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/asyncio/mixins.py", line 20, in _get_loop
    raise RuntimeError(f'{self!r} is bound to a different event loop')
RuntimeError: <asyncio.locks.Event object at 0x3855b0950 [unset]> is bound to a different event loop

03:21:53 - LiteLLM:INFO: utils.py:2977 - 
LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini

LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxx "HTTP/1.1 429 Too Many Requests"
03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

03:21:53 - LiteLLM:INFO: utils.py:2977 - 
LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxxxx "HTTP/1.1 429 Too Many Requests"

LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxx "HTTP/1.1 429 Too Many Requests"
03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

1.45.0

Twitter / LinkedIn details

No response

@gurugecl gurugecl added the bug Something isn't working label Dec 22, 2024
@gurugecl gurugecl changed the title [Bug]: Rate Limit Errors with models other than OAI [Bug]: Rate Limit Errors when using with PaperQA Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mlops user request
Projects
None yet
Development

No branches or pull requests

1 participant