Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GCS Writer Sink and fix Gcs naming #11328

Closed

Conversation

majetideepak
Copy link
Collaborator

@majetideepak majetideepak commented Oct 22, 2024

Rename GCS to Gcs in code.
Refactor GCS code.
Refactor InsertTests so that both S3 and GCS can reuse the implementation.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 22, 2024
Copy link

netlify bot commented Oct 22, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 4cf63d6
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/672dffb942b78a00088acc5e

@majetideepak majetideepak force-pushed the gcs-write-sink branch 3 times, most recently from dd238f3 to 7f01fe0 Compare October 24, 2024 12:18
@majetideepak majetideepak marked this pull request as ready for review October 24, 2024 23:29
@majetideepak majetideepak marked this pull request as draft October 24, 2024 23:29
@majetideepak majetideepak marked this pull request as ready for review October 25, 2024 10:36
@majetideepak majetideepak force-pushed the gcs-write-sink branch 3 times, most recently from 10bbb77 to 9da7598 Compare October 25, 2024 16:52
Copy link
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @majetideepak. Looks good overall; added a bunch of minor comments.

@@ -26,6 +26,14 @@ constexpr std::string_view kGCSScheme{"gs://"};

} // namespace

static std::string_view kLoremIpsum =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be defined in a test header somewhere?


namespace facebook::velox::filesystems {
namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline after namespace definition

return preexistingBucketPath() + preexistingObjectName();
public:
void SetUp() {
testbench_ = std::make_shared<GCSTestbench>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be called GcsTestBench instead (same for the file)? Also, what is up with the "bench" in the name? Is this a benchmark utility?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the storage-testbench library. That is probably why. I renamed it to GcsEmulator
https://github.com/googleapis/storage-testbench


filesystems::GCSFileSystem gcfs(testGcsOptions());
filesystems::GCSFileSystem gcfs(testbench_->hiveConfig());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. Acronyms are not all upper cases (GcsFileSystem)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed everywhere.

class GCSTestbench : public testing::Environment {
public:
GCSTestbench() {
auto port = std::to_string(facebook::velox::exec::test::getFreePorts(1)[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFreePort() instead

serverProcess_.terminate();
serverProcess_.wait();
}
if (serverProcess_.valid() && serverProcess_.valid())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended (checking the same condition twice?). Also, curly braces missing


namespace facebook::velox::filesystems {

class GCSTestbench : public testing::Environment {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename this to something more descriptive and intuitive. "bench" implies benchmarking which is not the intent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named it as GcsEmulator.


objectName_ = "test-object-name";
google::cloud::StatusOr<gcs::ObjectMetadata> object =
client.InsertObject(bucketName_, objectName_, kLoremIpsum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably move the loremIpsum stuff here.

@majetideepak majetideepak changed the title Add GCS Writer Sink Add GCS Writer Sink and fix naming Oct 26, 2024
@majetideepak majetideepak changed the title Add GCS Writer Sink and fix naming Add GCS Writer Sink and fix Gcs naming Oct 26, 2024
@majetideepak majetideepak force-pushed the gcs-write-sink branch 2 times, most recently from ead10a1 to 4ee20f9 Compare October 27, 2024 11:13
class GcsFileSystemTest : public testing::Test {
public:
void SetUp() {
testbench_ = std::make_shared<GcsEmulator>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could also call this variable something else (emulator_)?

@facebook-github-bot
Copy link
Contributor

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @majetideepak

@majetideepak majetideepak added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Oct 29, 2024
@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai merged this pull request in 08dba0f.

Copy link

Conbench analyzed the 1 benchmark run on commit 08dba0f7.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@majetideepak majetideepak deleted the gcs-write-sink branch December 17, 2024 17:52
athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
Summary:
Rename GCS to Gcs in code.
Refactor GCS code.
Refactor InsertTests so that both S3 and GCS can reuse the implementation.

Pull Request resolved: facebookincubator#11328

Reviewed By: DanielHunte

Differential Revision: D65069865

Pulled By: kgpai

fbshipit-source-id: 4020722f97d9a246d1ed765b697c4f1e8d3a0c2b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants