Skip to content

Bridging Immutable and Mutable Abstractions for Distributed Data Analytics

License

Notifications You must be signed in to change notification settings

kygx-legend/tangram

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tangram

Build Status

Tangram is a distributed data analytics framework which enjoys the benefits from both the immutable and mutable data abstractions.

Data analytics frameworks that adopt immutable data abstraction usually provide better support for failure recovery and straggler mitigation, while those that adopt mutable data abstraction are more efficient for iterative workloads thanks to their support for in-place state updates and asynchronous execution. Most existing frameworks adopt either one of the two data abstractions and do not enjoy the benefits of the other.

Tangram adopts a novel programming model named MapUpdate, which can determine whether a distributed dataset is mutable or immutable in an application. MapUpdate not only offers good expressiveness, but also allows us to enjoy the benefits of both mutable and immutable abstractions. MapUpdate naturally supports iterative and asynchronous execution, and can use different recovery strategies adaptively according to failure scenarios.

Tangram supports a variety of workloads including bulk processing, graph analytics, and iterative machine learning.

For more details about Tangram, please check our Wiki.

For bugs in Tangram, please file an issue on github issue platform.

Learn Tangram

Dependencies

Tangram has the following minimal dependencies:

  • CMake (Version >= 3.0.2, if >= 3.6.0, it should set CMAKE_PREFIX_PATH first when occurring errors to find the following dependencies)
  • ZeroMQ (including both libzmq and cppzmq)
  • Boost (Version >= 1.58)
  • A working C++ compiler (clang/gcc Version >= 4.9/icc/MSVC)
  • TCMalloc (In gperftools)
  • GLOG (Latest version, it will be included automatically)
  • libhdfs3 C/C++ HDFS Client

Build

Download the source code.

git clone https://github.com/Yuzhen11/tangram.git

Go to the project root and do an out-of-source build using CMake:

mkdir debug
cd debug
cmake -DCMAKE_BUILD_TYPE=Debug ..
make help               # List all build target
make $ApplicationName   # Build application
make SchedulerMain      # Build the Scheduler

make -j                 # Build all applications with all threads

TODO: upload a docker file to dockerhub.

Run a Tangram Program

To run a Tangram program, users need to modify a Python launch script. Some examples can be found in scripts/. The launch script allows users to specify their binary, the hostnames and ports of the machines, command line arguments, etc.

After that, running the program is as simple as:

python /path/to/your/script

To kill your Tangram program:

python /path/to/your/script kill

Run Tangram Unit Test

Tangram provides a set unit tests (based on gtest 1.7.0) in core/. Run it with:

$ make HuskyUnitTest  # yes, it is HuskyUnitTest
$ ./HuskyUnitTest

License

Copyright 2017-2019 Husky Team

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Bridging Immutable and Mutable Abstractions for Distributed Data Analytics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 86.8%
  • Python 9.4%
  • CMake 3.8%