Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid await in for loop #1600

Open
sobolevn opened this issue Sep 13, 2020 · 10 comments
Open

Forbid await in for loop #1600

sobolevn opened this issue Sep 13, 2020 · 10 comments
Assignees
Labels
good first issue Entrypoint to the project Hacktoberfest Hactoberfest fun! help wanted Extra attention is needed rule request Adding a new rule

Comments

@sobolevn
Copy link
Member

https://github.com/Instagram/Fixit/blob/master/fixit/rules/gather_sequential_await.py

@sobolevn sobolevn added help wanted Extra attention is needed rule request Adding a new rule good first issue Entrypoint to the project labels Sep 13, 2020
@Sxderp
Copy link
Contributor

Sxderp commented Sep 16, 2020

What if you're using aiohttp or similar to fetch web pages? Any considerate person writing a bot will space their requests. How would you implement that without just waiting in the for loop? I suppose you could spin off to an actual thread and do it entirely synchronously, but I feel like that's a lot of setup just to avoid an await in a loop.

* Note, we're making the assumption that the code is also doing something other than fetch pages.

@shivang970
Copy link

Hi @sobolevn
I want to get involve in this project. It's my first time to open source project, hoping you could help me...

@sobolevn
Copy link
Member Author

@shivang970 you would need to use visit_For somewhere in https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py

And ast.walk on this node to make sure that there are no await nodes.
You can have a look at this implementation, it is pretty similar: https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py#L177-L179

Thanks a lot for your help!

@sobolevn
Copy link
Member Author

@Sxderp
Copy link
Contributor

Sxderp commented Sep 23, 2020

I understand how gather works, but there are workflows in which we don't want to "start off" 100 connection requests. Using gather would cause everything to initialize in fairly quick succession. I can see something like gathering in increments of four, but you are still waiting in a loop.

Anyway, I think rate limiting is definitely an appropriate use.

@Sxderp
Copy link
Contributor

Sxderp commented Sep 23, 2020

I can get behind a Rule that forbids await in list comprehensions or a single line for loop. I think in most circumstances that would be a mistake. The rate limiting could potentially still apply, but if you need rate limiting you're generally doing other things with the data (more than one line).

@sobolevn
Copy link
Member Author

Can you please provide a real-ish code sample with rate limiting in a loop?

@Sxderp
Copy link
Contributor

Sxderp commented Sep 23, 2020

Here is a stripped sample of something similar to what I've used. I can also think of examples in IPC in which state is encoded in the response and you need to send part of the response back in order to perform your next request. Something like a Redis "SCAN" operation. I have no actual implements doing that with asyncio, but just a thought.

import aiohttp
import asyncio
import random
from bs4 import BeautifulSoup

# Galleries would normally be fetched too. But for simplicity sake
# hardcode them.
GALLERIES = [
    'https://example.com/gallery/1',
    'https://example.com/gallery/2',
    'https://example.com/gallery/3',
    'https://example.com/gallery/4',
]

async def extract_image_srcs(session, gallery_url):
    async with session.get(gallery_url) as gallery_response:
        gallery_soup = BeautifulSoup(await gallery_response.text(), features='lxml')
        return [img.attrs['src'] for img in gallery_soup.findAll('img')]

async def get_image_data(session, image_src):
    async with session.get(image_url) as image_response:
        return await image_response.read()

async fetch_gallery(gallery_url):
    session = aiohttp.ClientSession()
    image_sources = await extract_image_srcs(session, gallery_url)
    for image_src in image_sources:
        # This could technically be "gathered" but await is used to artifically
        # reduced the number of potential requests. We can't guarantee any
        # specific number, but it'll definitely be less than doing everything
        # at once.
        image_data = await get_image_data(session, image_src)
        with open(str(random.randint(1, 255)), 'wb') as fd:
            fd.write(image_data)
        # Optionally include additional waiting if we want to be extra nice.
        await asyncio.sleep(random.randint(1, 10))
    await session.close()

async def bootstrap():
    await asyncio.gather(*[fetch_gallery(gallery_url) for gallery_url in GALLERIES])

asyncio.run(bootstrap())

@sobolevn sobolevn added the Hacktoberfest Hactoberfest fun! label Sep 30, 2020
@strmwalker
Copy link

You should use asyncio.Semaphore if you need to ratelimit your request.
But asyncio.sleep(0) may be useful if you have async context but doing CPU-intensive task (crunching image data in script) to allow another coroutines advance.

@Sxderp
Copy link
Contributor

Sxderp commented Aug 3, 2022

If you're rate-limiting with more than one I can agree. If it's one I'm not sure I see the point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Entrypoint to the project Hacktoberfest Hactoberfest fun! help wanted Extra attention is needed rule request Adding a new rule
Projects
None yet
Development

No branches or pull requests

4 participants