Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package: Analytics Engine #399

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

CraigglesO
Copy link
Contributor

@CraigglesO CraigglesO commented Oct 1, 2022

Resolves cloudflare/workers-sdk#4383

TODO:

  • QUANTILEWEIGHTED function
  • parse "INTERVAL 'X' DAY" and build a function
  • vitest tests
  • jest tests
  • move sql exec into AnalyticsEngine constructor
  • fetch support
  • formatting

NOTES:

  • Miniflare api includes async getAnalyticsEngine(name: string): Promise<AnalyticsEngine>
  • shared-test-environment global state includes async queryMiniflareAnalyticsEngine(query: string): Promise<any> for jest and vitest.
  • http-server package function createRequestListener includes /cdn-cgi/mf/analytics_engine/sql
  • sqlite doesn't have strong aggregate support, so the implementation of QUANTILEWEIGHTED is pretty weak against large scale data.

@changeset-bot
Copy link

changeset-bot bot commented Oct 1, 2022

⚠️ No Changeset found

Latest commit: f93f54e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@46bit
Copy link

46bit commented Oct 3, 2022

@CraigglesO Hey, you were asking about how quantileWeighted is implemented. Right now it uses https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/quantileexactweighted

Copy link
Contributor

@mrbbot mrbbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! 👋 Thanks so much for doing this again! 😄 Sorry I've got to this now. I've added some comments. I'm trying to get someone on the Analytics Engine team to look at the SQL stuff too. 👍

packages/analytics-engine/src/analytics.ts Outdated Show resolved Hide resolved
packages/analytics-engine/src/engine.ts Outdated Show resolved Hide resolved
packages/analytics-engine/src/analytics.ts Outdated Show resolved Hide resolved
Comment on lines +104 to +108
const _blobs = blobs.map((blob) => {
if (blob instanceof ArrayBuffer) {
return decoder.decode(new Uint8Array(blob));
}
return blob;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following on from the comment above about using type BLOB instead of TEXT, is it possible to do the reverse here, and convert everything to ArrayBuffers? If blob contains non-UTF8 data, decode will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can switch to BLOBs no problem. The change is fine, but:

blobsKeys.length > 0 ? `, ${blobsKeys}` : ""
}) VALUES (@dataset, @index1${
doublesValues.length > 0 ? `, ${doublesValues}` : ""
}${blobsValues.length > 0 ? `, ${blobsValues}` : ""})`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an SQL injection vulnerability. 🙁 Rather than passing values in directly, could you insert placeholders and then fill in the actual values in the this.#db.prepare() call? This also means blob values aren't affected by the replaceAll calls below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused on this if you don't mind expounding. I'm only inserting blob1, blob2, ..., @blob1, @blob2, ... and the same for doubles depending upon length as far as I am aware.

packages/shared-test-environment/src/globals.ts Outdated Show resolved Hide resolved
packages/shared-test-environment/src/globals.ts Outdated Show resolved Hide resolved
packages/shared/src/wrangler.ts Show resolved Hide resolved
packages/analytics-engine/src/plugin.ts Outdated Show resolved Hide resolved
packages/shared/package.json Outdated Show resolved Hide resolved
@46bit
Copy link

46bit commented Oct 14, 2022

Hi @CraigglesO, thanks for contributing this. We've been talking internally about whether SQLite is 'good enough' as a model for how Analytics Engine will behave. I think it's fine - the behaviour might not quite match production, but this will allow people to do testing locally and that's definitely a good thing. Happy to merge once some of the above issues are addressed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Miniflare] Support analytics engine
3 participants