Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] - additional buffer pool #2829

Merged
merged 14 commits into from
May 16, 2024

Conversation

ahrav
Copy link
Collaborator

@ahrav ahrav commented May 10, 2024

Description:

This PR introduces a second buffer pool alongside the existing one to optimize the handling of different content types when dealing with archives. The original buffer pool, with a default size of 4kB, will be used for diff content, while the newly added buffer pool will have a larger buffer size of 64kB and will be dedicated to file content.

This change is based on tests conducted across ~120k repositories, which showed that most files handled during archive operations exceeded 4kB. Consequently, there was a significant increase in operations needed to expand the underlying buffer. By maintaining separate buffer pools for diff content and file content, we can reduce the overhead associated with constructing BufferedFileWriters and improve overall performance.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@ahrav ahrav marked this pull request as ready for review May 10, 2024 22:31
@ahrav ahrav requested a review from a team as a code owner May 10, 2024 22:31
@ahrav ahrav requested a review from a team May 10, 2024 22:31
pkg/buffers/buffer/buffer.go Outdated Show resolved Hide resolved
// Pool of buffers.
type Pool struct {
*sync.Pool
bufferSize uint32
bufferSize int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is convenience. Using an int instead of a uint32 eliminates the need to cast in all the places where it's used.

pkg/writers/buffer_writer/bufferwriter.go Outdated Show resolved Hide resolved
pkg/writers/buffer_writer/bufferwriter.go Outdated Show resolved Hide resolved
pkg/writers/buffered_file_writer/bufferedfilewriter.go Outdated Show resolved Hide resolved
return w
}

// NewFromReader creates a new instance of BufferedFileWriter and writes the content from the provided reader to the writer.
func NewFromReader(r io.Reader, opts ...Option) (*BufferedFileWriter, error) {
opts = append(opts, WithBufferSize(Large))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, you only ever use the explicit option to use the large buffer pool, and get the default one via default method calls. I think this is a little awkward in a way that could be remediated by making the option a boolean called something like WithUseLargeBuffer. Did you consider something like that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did consider that approach. I thought making it configurable might be appropriate, anticipating additional sizes for future use cases. However, I may have jumped the gun and overcomplicated the solution.

Base automatically changed from refactor-pool-logic to chore-relocate-buffers-pkg May 13, 2024 21:32
@ahrav ahrav requested a review from rosecodym May 13, 2024 21:44
@CLAassistant
Copy link

CLAassistant commented May 16, 2024

CLA assistant check
All committers have signed the CLA.

@ahrav ahrav requested review from a team May 16, 2024 17:12
@ahrav ahrav merged commit 5e3d660 into chore-relocate-buffers-pkg May 16, 2024
9 of 10 checks passed
@ahrav ahrav deleted the feat-additional-buffer-pool branch May 16, 2024 21:30
ahrav added a commit that referenced this pull request May 16, 2024
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Adjust check for rpm/deb archive type

* add additional deb mime type

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* add metrics for file handling

* add metrics for errors

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* fix test

* remove

* use correct reader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* Update write method in contentWriter interface

* Add bufferReadSeekCloser

* update name

* update comment

* fix lint

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* remove

* rebase

* go mod tidy

* lint check

* update metric to ms

* update metric

* update comments

* dont use ptr

* update

* fix

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* Add a buffered file reader

* update comments

* use Buffered File Readder

* return buffer

* update

* fix

* return

* go mod tidy

* merge

* use a shared pool

* use sync.Once

* reorganzie

* remove unused code

* fix double init

* fix stuff

* nil check

* reduce allocations

* updates

* update metrics

* updates

* reset buffer instead of putting it back

* skip binaries

* skip

* concurrently process diffs

* close chan

* concurrently enumerate orgs

* increase workers

* ignore pbix and vsdx files

* add metrics for gitparse's Diffchan

* fix metric

* update metrics

* update

* fix checks

* fix

* inc

* update

* reduce

* Create workers to handle binary files

* modify workers

* updates

* add check

* delete code

* use custom reader

* rename struct

* add nonarchive handler

* fix break

* add comments

* add tests

* refactor

* remove log

* do not scan rpm links

* simplify

* rename var

* rename

* fix benchmark

* add buffer

* buffer

* buffer

* handle panic

* merge main

* merge main

* add recover

* revert stuff

* revert

* revert to using reader

* fixes

* remove

* update

* fixes

* linter

* fix test

* move buffers pkg out of writers pkg

* rename

* [refactor] - move buffer pool logic into own pkg (#2828)

* move buffer pool logic into own pkg

* fix test

* fix test

* whoops

* [feat] - additional buffer pool (#2829)

* move buffer pool logic into own pkg

* move

* fix test

* fix test

* fix test

* remove

* fix test

* whoops

* revert

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants