Skip to content
This repository has been archived by the owner on Jul 8, 2024. It is now read-only.

Added Multi-threaded search manager #23

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

giulionf
Copy link

@giulionf giulionf commented Mar 2, 2019

The ConcurrentTweetManager can be given a number of
workers that search simultaneously.

Resolves: #9

@Mottl
Copy link
Owner

Mottl commented Mar 2, 2019

See my code comments for #21

@giulionf
Copy link
Author

giulionf commented Mar 2, 2019

Ah sorry I did not notice the comments! I will check them out and fix what you suggested

@Mottl
Copy link
Owner

Mottl commented Mar 2, 2019

Apropos time_spans, tweets and workers variables. Since getTweets() was originally defined as a @staticmethod these variables should be defined in a local scope of getTweets() (not as class variables as I suggested in my previous code review).

@giulionf
Copy link
Author

giulionf commented Mar 2, 2019

Makes sense. I will do that!

The ConcurrentTweetManager can be given a number of
workers that search simultaneously.

Resolves: Mottl#4
@giulionf giulionf force-pushed the concurrent_tweet_manager branch from 967d983 to c541845 Compare March 2, 2019 15:55
@giulionf
Copy link
Author

giulionf commented Mar 2, 2019

@Mottl I made changes to what you suggested. Mind having another review?

Copy link
Owner

@Mottl Mottl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What to you think, do we actually need a different class (ConcurrentTweetManager) for parallel downloading?

GetOldTweets3/manager/ConcurrentTweetManager.py Outdated Show resolved Hide resolved
GetOldTweets3/manager/ConcurrentTweetManager.py Outdated Show resolved Hide resolved
GetOldTweets3/manager/ConcurrentTweetManager.py Outdated Show resolved Hide resolved
GetOldTweets3/manager/ConcurrentTweetManager.py Outdated Show resolved Hide resolved
@giulionf giulionf force-pushed the concurrent_tweet_manager branch from bbd72aa to 9e2987c Compare March 3, 2019 23:35
@giulionf
Copy link
Author

giulionf commented Mar 3, 2019

@Mottl I resolved the discussions and implemented the receive buffer callback as requested. Mind having a testrun with:

--username "barackobama" --workers 10

to see if everything is how you wanted it?

EDIT: Im getting 12244 as the count of the tweets fetched with this query, all ordered by time

@giulionf giulionf force-pushed the concurrent_tweet_manager branch from 9e2987c to 3731a7d Compare March 5, 2019 10:58
@giulionf
Copy link
Author

@Mottl hey! Have you seen my changes?

@giulionf
Copy link
Author

giulionf commented Apr 6, 2019

@Mottl sorry to bug you, but what is the state? Is there anything I should do?

@Rannita
Copy link

Rannita commented Feb 8, 2020

hello.
I have a question concerning your code in ConcurrentTweetManager.py. You used threads for collecting Twitter data. You made a static number of threads and been defined by the user?

@giulionf
Copy link
Author

giulionf commented Feb 9, 2020

hello.
I have a question concerning your code in ConcurrentTweetManager.py. You used threads for collecting Twitter data. You made a static number of threads and been defined by the user?

Sorry, I'm not understanding. Are you asking how to set the number or worker threads? If so, have a look at the tests I made.

Copy link

@xaiki xaiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to get this merged

if workers > 1:
got.manager.ConcurrentTweetManager.getTweets(tweetCriteria, receiveBufferConcurrent, debug=debug, worker_count=workers, forceMaxTweets=force_max_tweets)
else:
got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer, debug=debug)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use Concurrent even when workers = 1 ?
less code paths less bugs.


if date_diff_per_worker < timedelta(days=1):
max_workers = int((until_date - since_date) / timedelta(days=1))
raise ValueError("Too many workers for the time span. Each worker needs at least one day for himself, or"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/himself/herself (or itself)


if tweetCriteria.maxTweets != 0 and not forceMaxTweets:
raise ValueError("Max Tweets is not supported by parallel downloading, since the results can not be ordered"
" by time. If you do not care, you can set forceMaxTweets=True!")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except it's actually sorted in the sorted() call at the end

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could the program run multiple queries in parallel?
4 participants