Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading printing of duplicate results #263

Open
chasingboy opened this issue Jan 25, 2024 · 8 comments
Open

Multithreading printing of duplicate results #263

chasingboy opened this issue Jan 25, 2024 · 8 comments
Labels
bug Something isn't working as expected
Milestone

Comments

@chasingboy
Copy link

Hello, When I use multithreading, the runtime prints duplicate results.

import threading, queue
from alive_progress import alive_bar
import time

q = queue.Queue()

for url in range(1000):
	q.put(url)

def requester(url):
	print(url)

threads = []
with alive_bar(q.qsize()) as bar:
    for _ in range(20):
        while not q.empty():
            bar() # Progress bar
                
            url =  q.get()
            thread = threading.Thread(target=requester,args=(url,))
            thread.setDaemon(True)
            thread.start()
            threads.append(thread)

    for thread in threads:
        thread.join()

Running result

kali$ python3 test.py
on 1: 0
on 1: 0
on 2: 1
on 2: 1
on 3: 2
on 3: 2
on 4: 3
on 5: 4
on 6: 5
on 6: 5
on 7: 6
on 7: 6

What should I do to print the result once? Thanks!!!

@rsalmei
Copy link
Owner

rsalmei commented Jan 27, 2024

Hi,

That is caused by the multiple refreshes of alive_progress, you can't print from different threads simultaneously, which can cause havoc on the internal structures because they are not synchronized. And I can't synchronize them because that would add a lot of overhead in a super hot code.

Anyway, I think what you want is something like this:

import random, time
from concurrent import futures

def requester(url):
    # print(f'requesting: {url}')
    time.sleep(random.random())

with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(100) as bar:
    future_to_url = {executor.submit(requester, url): url for url in range(100)}
    for future in futures.as_completed(future_to_url):
        url = future_to_url[future]
        print(f'finished: {url}')
        bar()

If you uncomment that line everything breaks again.
If you really need to print while on other threads, you should send the lines through a queue, test within the main alive_bar loop if something is to be printed, and print them from there. Note the lines would probably get all randomly scrambled on the screen.

Does that solve it?

@rsalmei
Copy link
Owner

rsalmei commented Jan 27, 2024

This would work for printing:

import random, time, queue
from concurrent import futures

def requester(url):
    q.put(f'requesting: {url}')
    time.sleep(random.random())

q = queue.Queue()
with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(100) as bar:
    future_to_url = {executor.submit(requester, url): url for url in range(100)}
    for future in futures.as_completed(future_to_url):
        while not q.empty():
            print(q.get())
        url = future_to_url[future]
        print(f'finished: {url}')
        bar()

You'll get output like this:

on 0: requesting: 0
on 0: requesting: 1
on 0: requesting: 2
on 0: requesting: 3
on 0: finished: 0
on 1: requesting: 4
on 1: finished: 3
on 2: requesting: 5
on 2: finished: 4
on 3: requesting: 6
on 3: finished: 1
on 4: requesting: 7
on 4: finished: 2
on 5: requesting: 8
on 5: finished: 6
on 6: requesting: 9
on 6: finished: 7
on 7: requesting: 10
on 7: finished: 5
on 8: requesting: 11
...

@TheTechRobo
Copy link
Contributor

Hmm, having strange behaviour when printing from threads sounds like a recipe for disaster. Lots of code, often beyond a developer's control, might try to print in a separate thread.

@rsalmei
Copy link
Owner

rsalmei commented Jan 28, 2024

Actually, you're right @TheTechRobo.
I'll look into it a bit deeper.

@rsalmei
Copy link
Owner

rsalmei commented Jan 28, 2024

Found it, and fixed it 👍
Now the print hooks are correctly synchronized when printing from multiple threads.

I've also improved them a bit along the way. I've never noticed that an empty print() would output an empty "on X:".
Also, I've discovered that Python's str.splitlines() is kinda flawed. 'line'.splitlines() returns 1 line, but 'line\n'.splitlines() does not return 2, which breaks my header algorithm for newlines...

So, using these new synced print hooks, I've tested this code:

import random, time
from concurrent import futures

def requester(url):
    print(f'requesting: {url}\n', end='')  # this is printed from the thread pool.
    time.sleep(2*random.random())

with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(10) as bar:
    print()
    future_to_url = {executor.submit(requester, url): url for url in range(10)}
    for future in futures.as_completed(future_to_url):
        url = future_to_url[future]
        print(f'finished: {url}')  # this from the main thread.
        bar()

It was printing this:

   ...:
on 0:
on 0: requesting: 0
requesting: 1
requesting: 2
requesting: 3
finished: 0
on 1: requesting: 4
finished: 2
on 2: requesting: 5
finished: 3
on 3: requesting: 6
finished: 1
on 4: requesting: 7
finished: 4
on 5: requesting: 8
finished: 7
on 6: requesting: 9
finished: 6
on 7: finished: 5
on 8: finished: 9
on 9: finished: 8
|████████████████████████████████████████| 10/10 [100%] in 3.8s (2.53/s)

And now it will print this:

   ...:

on 0: requesting: 0
      requesting: 1
      requesting: 2
      requesting: 3
      finished: 0
on 1: requesting: 4
      finished: 2
on 2: requesting: 5
      finished: 1
on 3: requesting: 6
      finished: 4
on 4: requesting: 7
      finished: 3
on 5: requesting: 8
      finished: 6
on 6: requesting: 9
      finished: 5
on 7: finished: 8
on 8: finished: 9
on 9: finished: 7
|████████████████████████████████████████| 10/10 [100%] in 3.0s (3.37/s)

@rsalmei
Copy link
Owner

rsalmei commented Jan 28, 2024

Important to say that, thanks to the way that Python calls write on print, it is simply impossible to have 100% accurate lines printed.
Occasionally, if I run the code above several times, I get this:

    ...:

on 0: requesting: 0
      requesting: 1
      requesting: 2
      requesting: 3
      requesting: 4
      requesting: 5
      finished: 4
on 1: finished: 1
on 2: finished: 0
on 3: requesting: 6
      finished: 2requesting: 7
      requesting: 9
      requesting: 8
on 4: finished: 5
on 5: finished: 3
on 6: finished: 6
on 7: finished: 9
on 8: finished: 8
on 9: finished: 7
|████████████████████████████████████████| 10/10 [100%] in 0.0s (8656.14/s)

Unfortunately, Python calls write twice on a single print call. It calls it with the text being printed, then with the end param. The default print(text, end='\n') then makes the calls stdout.write(text) and stdout.write('\n').
And, since these two calls have no synchronization at all, my print hook might end up receiving text1, text2, \n, \n for two prints with text1 and text2 from different threads...
That's why finished: 2 and requesting: 7 ended up glued together, there's no way around it when multi-threading.

@rsalmei rsalmei added this to the 3.2 milestone Jan 28, 2024
@rsalmei rsalmei added the bug Something isn't working as expected label Jan 28, 2024
@chasingboy
Copy link
Author

@rsalmei Thanks! Referring to your code, it is currently running normally.

@rsalmei
Copy link
Owner

rsalmei commented Feb 10, 2024

You're welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected
Projects
None yet
Development

No branches or pull requests

3 participants