-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark TTFB poc #434
base: main
Are you sure you want to change the base?
Spark TTFB poc #434
Conversation
Quoting @pyropy:
|
@@ -48,6 +48,7 @@ export class Committee { | |||
addMeasurement (m) { | |||
assert.strictEqual(m.cid, this.retrievalTask.cid, 'cid must match') | |||
assert.strictEqual(m.minerId, this.retrievalTask.minerId, 'minerId must match') | |||
assert.strictEqual(m.roundId, this.retrievalTask.roundId, 'roundId must match') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed task id structure from cid::minerId
to cid::minerId::roundId
hence we're checking round id here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but what is the motivation?
@@ -11,10 +11,11 @@ const debug = createDebug('spark:preprocess') | |||
|
|||
export class Measurement { | |||
/** | |||
* @param {Partial<import('./round.js').RoundData>} r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're already passing round.pointerize
as the last argument, I suggest we also pass round.index
instead of round
, for consistency and to prevent redundancy
stats.push({ minerId, taskId, timeToFirstByteP50 }) | ||
} | ||
|
||
// conflic should never happen, but in case it does we'll ignore the new value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If conflicts shouldn't happen, I suggest we remove the conflict handling and let it fail. If everything is right, it will never fail. If it fails, it will inform us of a bug to fix.
I'm marking this as ready for review as the open questions don't affect this PR and I think it can be merged as soon as reviews pass |
Co-authored-by: Julian Gruber <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>
* @param {pg.Client} pgClient | ||
* @param {Iterable<Committee>} committees | ||
*/ | ||
const updateRetreivalTimings = async (pgClient, committees) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo.
const updateRetreivalTimings = async (pgClient, committees) => { | |
const updateRetrievalTimings = async (pgClient, committees) => { |
INSERT INTO retrieval_timings | ||
(day, miner_id, task_id, time_to_first_byte_p50) VALUES | ||
(now(), unnest($1::text[]), unnest($2::text[]), unnest($3::int[])) | ||
ON CONFLICT(day, miner_id, task_id) DO NOTHING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this design.
- Task id was designed to distinguish tasks within one round. spark-evaluate always looks at one round only.
- Since it's only
updateRetreivalTimings()
that needs to handle the case when one task is performed more than once during a day, I prefer to implement a solution that's limited toupdateRetreivalTimings()
only. For example, we can forward the current round number throughupdatePublicStats()
toupdateRetreivalTimings()
and then combine old-style taskId with the round number. - Because task_id includes a round number, it does not help us detect cases when the same content was tested twice on the same day, it only ensures we can record timings for each task occurrence.
I propose a different DB schema for consideration: Instead of having one row per day+task+round, have only one row per day and store the p50 values in an array.
Something along the following lines:
INSERT INTO retrieval_timings
(day, miner_id, time_to_first_byte_p50) VALUES
(now(), unnest($2::text[]), unnest($3::int[]))
ON CONFLICT(day, miner_id)
DO UPDATE SET
time_to_first_byte_p50 = array_cat(
retrieval_timings.time_to_first_byte_p50,
EXCLUDED.time_to_first_byte_p50
)
day DATE NOT NULL, | ||
miner_id TEXT NOT NULL, | ||
task_id TEXT NOT NULL, | ||
time_to_first_byte_p50 INT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This column name is rather long, how about using the abbreviation TTFB?
time_to_first_byte_p50 INT NOT NULL, | |
ttfb_p50 INT NOT NULL, |
* @param {number} timeToFirstByte Time in milliseconds | ||
* @returns | ||
*/ | ||
function givenTimeToFirstByte (measurment, timeToFirstByte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
function givenTimeToFirstByte (measurment, timeToFirstByte) { | |
function givenTimeToFirstByte (measurement, timeToFirstByte) { |
Relates to space-meridian/roadmap#208