A Distributed version of Hogwild! [1]

Team : Grégoire Clément, Maxime Delisle, Sylvain Beaud

Description

Nowadays, robust and reliable systems are a core component of a respectable setup, for this reason, we focused more particularly on this side of the problem.

One of the main highlights of our implementation is the possibility to add and remove workers at will and at any time. Indeed, in the synchronous implementation, the coordinator monitors the number of workers and if a worker crashes, the computation can continue without him. On the contrary, if the user want to add more workers to the system, the new workers will connect to the coordinator or other workers and the computation will continue with these additional workers. In the asynchronous version, a new worker arrives, it will retrieve the list of workers from another worker and broadcast its updates to them and receive their computations; this is the only phase where a locking mechanism is used. When a worker encounters an error, it broadcasts an error message to the other workers and they will stop to communicate with the faulty node.

Another interesting feature of our implementation is the fact that once the computations are finished, the logs and statistics are uploaded and stored on transfer.sh and can be downloaded for a later use. We have also put options to adjust the level of verbosity of the logs.

Report

For more infos about this project refer to report.pdf or contact us.

Requirements

kubectl (https://kubernetes.io/docs/tasks/tools/install-kubectl/)

How to run the project

$ sh run.sh $1 $2 $3

$1 argument is either sync or async

$2 argument is the number of replicas 1 to 100 (or more)

$3 argument is the log level (or verbosity) from 0 (minimal) to 3 (maximal)

Results

Results are uploaded on transfer.sh (linked displayed in the console). In case of failure (if server transfer.sh is down) we also print them in the console (just to be sure!).

Reference

[1] Recht, Benjamin, et al. "Hogwild: A lock-free approach to parallelizing stochastic gradient descent." Advances in neural information processing systems. 2011.

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
.ipynb_checkpoints		.ipynb_checkpoints
kubernetes		kubernetes
logs		logs
plots		plots
project		project
src/main		src/main
target		target
.gitattributes		.gitattributes
.gitignore		.gitignore
Design_Milestone1.jpeg		Design_Milestone1.jpeg
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
delete_pods.sh		delete_pods.sh
report.pdf		report.pdf
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Distributed version of Hogwild! [1]

Description

Report

Requirements

How to run the project

Results

Reference

About

Releases

Packages

Contributors 3

Languages

License

gregunz/Hogwild

Folders and files

Latest commit

History

Repository files navigation

A Distributed version of Hogwild! [1]

Description

Report

Requirements

How to run the project

Results

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages