Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CsvInsight w/ striped reading/splitting #19

Open
mpenkov opened this issue Nov 15, 2021 · 0 comments
Open

Optimize CsvInsight w/ striped reading/splitting #19

mpenkov opened this issue Nov 15, 2021 · 0 comments
Assignees

Comments

@mpenkov
Copy link
Member

mpenkov commented Nov 15, 2021

Currently, the preprocessor splits the input file into multiple parts (using split). This part runs on a single core, because the splitting in its current form cannot be parallelized.

Modify the splitter to run on multiple cores:

  • Open N files, where N is the number of cores
  • Start N subprocesses to read from the input file
  • Each subprocess reads the input file entirely
  • nth subprocess only writes lines where line_number % N == N
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants