Spark TTFB poc #434

pyropy · 2024-12-23T13:17:15Z

Adds retrieval time stats to public stats

juliangruber · 2024-12-29T12:38:27Z

I left them (the PRs) as draft as I am unsure if there is a way to reference spark-evaluate commit from spark-stats and I am not sure if we should have another section for ttfb on dashboard

juliangruber · 2025-01-05T21:45:24Z

lib/committee.js

@@ -48,6 +48,7 @@ export class Committee {
  addMeasurement (m) {
    assert.strictEqual(m.cid, this.retrievalTask.cid, 'cid must match')
    assert.strictEqual(m.minerId, this.retrievalTask.minerId, 'minerId must match')
+    assert.strictEqual(m.roundId, this.retrievalTask.roundId, 'roundId must match')


why was this added?

I have changed task id structure from cid::minerId to cid::minerId::roundId hence we're checking round id here.

I understand, but what is the motivation?

juliangruber · 2025-01-05T21:46:44Z

lib/preprocess.js

@@ -11,10 +11,11 @@ const debug = createDebug('spark:preprocess')

 export class Measurement {
  /**
+   * @param {Partial<import('./round.js').RoundData>} r


Since we're already passing round.pointerize as the last argument, I suggest we also pass round.index instead of round, for consistency and to prevent redundancy

lib/public-stats.js

juliangruber · 2025-01-05T21:48:57Z

lib/public-stats.js

+    stats.push({ minerId, taskId, timeToFirstByteP50 })
+  }
+
+  // conflic should never happen, but in case it does we'll ignore the new value


If conflicts shouldn't happen, I suggest we remove the conflict handling and let it fail. If everything is right, it will never fail. If it fails, it will inform us of a bug to fix.

juliangruber · 2025-01-05T21:50:25Z

Quoting @pyropy:

I left them (the PRs) as draft as I am unsure if there is a way to reference spark-evaluate commit from spark-stats and I am not sure if we should have another section for ttfb on dashboard

I'm marking this as ready for review as the open questions don't affect this PR and I think it can be merged as soon as reviews pass

Co-authored-by: Julian Gruber <[email protected]>

bajtos · 2025-01-09T07:28:31Z

lib/public-stats.js

+ * @param {pg.Client} pgClient
+ * @param {Iterable<Committee>} committees
+ */
+const updateRetreivalTimings = async (pgClient, committees) => {


Typo.

Suggested change

const updateRetreivalTimings = async (pgClient, committees) => {

const updateRetrievalTimings = async (pgClient, committees) => {

bajtos · 2025-01-09T07:36:04Z

lib/public-stats.js

+    INSERT INTO retrieval_timings
+    (day, miner_id, task_id, time_to_first_byte_p50) VALUES 
+    (now(), unnest($1::text[]), unnest($2::text[]), unnest($3::int[]))
+    ON CONFLICT(day, miner_id, task_id) DO NOTHING


I have mixed feelings about this design.

Task id was designed to distinguish tasks within one round. spark-evaluate always looks at one round only.

Since it's only updateRetreivalTimings() that needs to handle the case when one task is performed more than once during a day, I prefer to implement a solution that's limited to updateRetreivalTimings() only. For example, we can forward the current round number through updatePublicStats() to updateRetreivalTimings() and then combine old-style taskId with the round number.

Because task_id includes a round number, it does not help us detect cases when the same content was tested twice on the same day, it only ensures we can record timings for each task occurrence.

I propose a different DB schema for consideration: Instead of having one row per day+task+round, have only one row per day and store the p50 values in an array.

Something along the following lines:

INSERT INTO retrieval_timings (day, miner_id, time_to_first_byte_p50) VALUES (now(), unnest($2::text[]), unnest($3::int[])) ON CONFLICT(day, miner_id) DO UPDATE SET time_to_first_byte_p50 = array_cat( retrieval_timings.time_to_first_byte_p50, EXCLUDED.time_to_first_byte_p50 )

bajtos · 2025-01-09T07:36:49Z

migrations/021.do.add-retrieval-times.sql

+  day DATE NOT NULL,
+  miner_id TEXT NOT NULL,
+  task_id TEXT NOT NULL,
+  time_to_first_byte_p50 INT NOT NULL,


This column name is rather long, how about using the abbreviation TTFB?

Suggested change

time_to_first_byte_p50 INT NOT NULL,

ttfb_p50 INT NOT NULL,

bajtos · 2025-01-09T07:38:05Z

test/public-stats.test.js

+   * @param {number} timeToFirstByte  Time in milliseconds
+   * @returns
+   */
+  function givenTimeToFirstByte (measurment, timeToFirstByte) {


Typo

Suggested change

function givenTimeToFirstByte (measurment, timeToFirstByte) {

function givenTimeToFirstByte (measurement, timeToFirstByte) {

pyropy added 6 commits December 23, 2024 14:16

Include round id in retreival task id

9a627ab

Add time to first byte public stat

5e6835e

Remove unused variable

b8a2ec7

Make names more generic

2642dcf

Fix typo

77e0f73

Ignore on conflict

c16c44c

pyropy changed the title ~~Spark ttfb poc~~ Spark TTFB poc Dec 27, 2024

pyropy self-assigned this Dec 27, 2024

juliangruber requested changes Jan 5, 2025

View reviewed changes

juliangruber marked this pull request as ready for review January 5, 2025 21:50

pyropy and others added 2 commits January 7, 2025 07:43

Update lib/public-stats.js

cbf5993

Co-authored-by: Julian Gruber <[email protected]>

Update lib/public-stats.js

73980b0

Co-authored-by: Julian Gruber <[email protected]>

bajtos requested changes Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark TTFB poc #434

Spark TTFB poc #434

pyropy commented Dec 23, 2024 •

edited

Loading

juliangruber commented Dec 29, 2024

juliangruber Jan 5, 2025

pyropy Jan 7, 2025

juliangruber Jan 7, 2025

juliangruber Jan 5, 2025

juliangruber Jan 5, 2025

juliangruber commented Jan 5, 2025

bajtos Jan 9, 2025

bajtos Jan 9, 2025

bajtos Jan 9, 2025

bajtos Jan 9, 2025

	const updateRetreivalTimings = async (pgClient, committees) => {
	const updateRetrievalTimings = async (pgClient, committees) => {

	function givenTimeToFirstByte (measurment, timeToFirstByte) {
	function givenTimeToFirstByte (measurement, timeToFirstByte) {

Spark TTFB poc #434

Are you sure you want to change the base?

Spark TTFB poc #434

Conversation

pyropy commented Dec 23, 2024 • edited Loading

juliangruber commented Dec 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juliangruber commented Jan 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyropy commented Dec 23, 2024 •

edited

Loading