For every wide dependency transformation, create another repartition RDD between its parent and itself. So that all transformations become narrow dependent, except repartition.
Background server:
- Worker discoverer
- Partition discoverer
- Job server:
- After a job is taken by a worker, deactivate the job and set a timer.
- If timeout and no result returned, reactivate the job for others.
Algorithm:
- Create RDD lineage by user's input.
- When
collect
results:- Create partitions lineage, passing in function closure. There is no pointer in partition object to RDD.
- For every partition in target RDD, find it in
partition_discoverer
. Fetch it if it exists. - For every missing partition, create a job in
job_server
. - keep discover those missing partitions. Fetch finished partitions from workers and remove that job from
job_server
. - After all partitions are done, merge all results and return.
- Repeat
Background servers:
- Worker server: broadcast itself as a worker service.
- Job discoverer
- Job server
- Partition discoverer
- Partition server
Algorithm:
- Keep trying to get a job from
job_discoverer
. - Fetch the partition associated to the job from remote.
- Check if it's already in
partition_server
. Skip to next job if so. - Check all partitions lineage in
partition_discoverer
, get the results of the exists. - Run the function of the partition:
- If it's a narrow dependent partition, or all dependencies are done:
- Do it right away
- Otherwise, for every missing dependency:
- Broadcast a
job
injob_server
. - Append current job back to jobs queue.
- Skip to next job.
- Broadcast a
- If it's a narrow dependent partition, or all dependencies are done:
- When finish the partition successfully, add the result to
partition_server
- Repeat