Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple Concurrency and Iterations #88

Open
gregberns opened this issue Oct 6, 2020 · 4 comments
Open

Decouple Concurrency and Iterations #88

gregberns opened this issue Oct 6, 2020 · 4 comments
Labels

Comments

@gregberns
Copy link

From #83:

Concurrency and iterations are linked so there's no way to, say, run a large CSV file a single time with currency > 1 other than by using something like GNU Parallel to run multiple copies — (since drill is so simple to run that's not actually a bad option).
It feels like it would be useful to have a way to expand the with_items directives so you could decouple those two settings and it'd be legal to, for example, say concurrency: 100 iterations:1 when you have a large input file so you'd still be able to things like the stats which wouldn't otherwise display if you terminated a large job before it finished many millions of requests.

Use Case: Have a list of 10k parameters to run against a url to validate they return 200. Want each item in the list to be run once, but run with concurrency of 20.

concurrency: 20
iterations: 1            # only want this to run once
base: 'http://localhost'
plan:
  - name: Fetch by id
    request:
      url: /{{ item.email }}
    with_items_from_csv: ./items.csv
id
1
2
3
...
@fcsonline
Copy link
Owner

The goal of the iterations parameter is to execute this plan N times. All the steps in this plan are executed sequentially. On the other hand, the goal of the concurrency parameter is to execute those iterations in parallel up to M executions at the same time. So, concurrency values higher than iterations value doesn't make sense. It will execute the up to the lower value between them at the same time.

In your case, the Fetch by id step is processed atomically by the executor.

@gregberns
Copy link
Author

I think I see. So you're saying that a plan item is sent to an executor to be executed, and a plan item is executed serially. And the with_items_from_csv property is iterated on within an executor.

Is there a way to generate an independent executor for each item in a CSV file? So a large CSV file could be processed quickly. Or how feasible would it be to make a change to support something like that?

@gregberns
Copy link
Author

If I'm understanding the code correctly, there are 1 to n iterations, and the concurrency controls how many iterations can run at once.

But, within an iteration, a step is evaluated, and in the case of the mutli-csv-request there is a step created for each row in the CSV file (Code here). Each step is done sequentially, which is controlled here in a simple for loop.

Would it not be possible to do all the 'benchchmark executes' in parallel? It looks like it would 'just' require a flag to control whether the user wanted sequential or parallel processing, then the sequential would execute as today, and the parallel would use something like rayon(?) to execute the actions??

@fcsonline
Copy link
Owner

Sorry for the late answer. I have been really busy these weeks.

The problem here is that the benchmark is executed sequentially because actions can have dependencies with previous actions, like storing / using variables.

On the other hand, one thing we can do is to execute all mutli-csv-request rows in parallel if you want to execute the action in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants