Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark TTFB poc #434

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Spark TTFB poc #434

wants to merge 8 commits into from

Conversation

pyropy
Copy link

@pyropy pyropy commented Dec 23, 2024

  • Adds retrieval time stats to public stats

Relates to space-meridian/roadmap#208

@pyropy pyropy changed the title Spark ttfb poc Spark TTFB poc Dec 27, 2024
@pyropy pyropy self-assigned this Dec 27, 2024
@juliangruber
Copy link
Member

Quoting @pyropy:

I left them (the PRs) as draft as I am unsure if there is a way to reference spark-evaluate commit from spark-stats and I am not sure if we should have another section for ttfb on dashboard

@@ -48,6 +48,7 @@ export class Committee {
addMeasurement (m) {
assert.strictEqual(m.cid, this.retrievalTask.cid, 'cid must match')
assert.strictEqual(m.minerId, this.retrievalTask.minerId, 'minerId must match')
assert.strictEqual(m.roundId, this.retrievalTask.roundId, 'roundId must match')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this added?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed task id structure from cid::minerId to cid::minerId::roundId hence we're checking round id here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but what is the motivation?

@@ -11,10 +11,11 @@ const debug = createDebug('spark:preprocess')

export class Measurement {
/**
* @param {Partial<import('./round.js').RoundData>} r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already passing round.pointerize as the last argument, I suggest we also pass round.index instead of round, for consistency and to prevent redundancy

lib/public-stats.js Outdated Show resolved Hide resolved
lib/public-stats.js Outdated Show resolved Hide resolved
stats.push({ minerId, taskId, timeToFirstByteP50 })
}

// conflic should never happen, but in case it does we'll ignore the new value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If conflicts shouldn't happen, I suggest we remove the conflict handling and let it fail. If everything is right, it will never fail. If it fails, it will inform us of a bug to fix.

@juliangruber
Copy link
Member

Quoting @pyropy:

I left them (the PRs) as draft as I am unsure if there is a way to reference spark-evaluate commit from spark-stats and I am not sure if we should have another section for ttfb on dashboard

I'm marking this as ready for review as the open questions don't affect this PR and I think it can be merged as soon as reviews pass

@juliangruber juliangruber marked this pull request as ready for review January 5, 2025 21:50
pyropy and others added 2 commits January 7, 2025 07:43
Co-authored-by: Julian Gruber <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>
* @param {pg.Client} pgClient
* @param {Iterable<Committee>} committees
*/
const updateRetreivalTimings = async (pgClient, committees) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

Suggested change
const updateRetreivalTimings = async (pgClient, committees) => {
const updateRetrievalTimings = async (pgClient, committees) => {

Comment on lines +253 to +256
INSERT INTO retrieval_timings
(day, miner_id, task_id, time_to_first_byte_p50) VALUES
(now(), unnest($1::text[]), unnest($2::text[]), unnest($3::int[]))
ON CONFLICT(day, miner_id, task_id) DO NOTHING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this design.

  • Task id was designed to distinguish tasks within one round. spark-evaluate always looks at one round only.
  • Since it's only updateRetreivalTimings() that needs to handle the case when one task is performed more than once during a day, I prefer to implement a solution that's limited to updateRetreivalTimings() only. For example, we can forward the current round number through updatePublicStats() to updateRetreivalTimings() and then combine old-style taskId with the round number.
  • Because task_id includes a round number, it does not help us detect cases when the same content was tested twice on the same day, it only ensures we can record timings for each task occurrence.

I propose a different DB schema for consideration: Instead of having one row per day+task+round, have only one row per day and store the p50 values in an array.

Something along the following lines:

INSERT INTO retrieval_timings
(day, miner_id, time_to_first_byte_p50) VALUES 
(now(), unnest($2::text[]), unnest($3::int[]))
ON CONFLICT(day, miner_id) 
DO UPDATE SET
  time_to_first_byte_p50 = array_cat(
    retrieval_timings.time_to_first_byte_p50,
    EXCLUDED.time_to_first_byte_p50
  )

day DATE NOT NULL,
miner_id TEXT NOT NULL,
task_id TEXT NOT NULL,
time_to_first_byte_p50 INT NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This column name is rather long, how about using the abbreviation TTFB?

Suggested change
time_to_first_byte_p50 INT NOT NULL,
ttfb_p50 INT NOT NULL,

* @param {number} timeToFirstByte Time in milliseconds
* @returns
*/
function givenTimeToFirstByte (measurment, timeToFirstByte) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
function givenTimeToFirstByte (measurment, timeToFirstByte) {
function givenTimeToFirstByte (measurement, timeToFirstByte) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants