Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add C/C++ guide #134

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ One of the [design goals](https://github.com/ewasm/design/blob/master/rationale.

At present, we've developed support for the following languages and toolchains:

- [C/C++ (LLVM) WebAssembly tutorial](./clang.md)
- [C/C++](./c_cpp_guide.md)
- Rust: documentation pending
- [AssemblyScript](https://github.com/AssemblyScript/assemblyscript), a subset of TypeScript, which uses the JavaScript toolchain: see the [etherts org](https://github.com/etherts/docs) for more information on writing contracts in AssemblyScript.

Expand Down
120 changes: 120 additions & 0 deletions c_cpp_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Compiling C/C++ to Ewasm

First an introduction, then a basic step-by-step guide, then advanced things. Warning: the Ewasm spec and tools below are subject to change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning: the Ewasm spec and tools below are subject to change.

I understand why you'd add this caveat here, but I don't find it constructive or helpful on its own, i.e., it just makes the reader worry that the instructions aren't going to work. Consider making this more constructive by saying something like, "Every effort is made to keep this document up to date, but if you notice anything wrong please feel free to submit a PR or an issue to report and/or fix it."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It is understood that things are subject to change. I completely removed it so that the guide is more concise.


## Introduction

An Ewasm contract is a WebAssembly module with the following restrictions:

- The module's imports must be among the [Ewasm helper functions](https://github.com/ewasm/design/blob/master/eth_interface.md) which resemble EVM opcodes to interact with the client.
- The module's exports must be a `main` function which takes no arguments and returns nothing, and the `memory` of the module.
- The module may not use floats or other [sources of non-determinism](https://github.com/WebAssembly/design/blob/master/Nondeterminism.md).

## Caveats

When writing Ewasm contracts in C/C++, one should bear in mind the following caveats:

1. WebAssembly is still primitive and [lacks features](https://github.com/WebAssembly/design/blob/master/FutureFeatures.md). For example, WebAssembly lacks support for exceptions and we have no way to do system calls in Ewasm. Compilers and libraries are still primitive. For example, we have a patched version of libc to allow `malloc`, but the patches are not yet enough for `std::vector` because other memory managment calls are unavailable. But perhaps any memory management beyond memory allocation may be unwanted for Ewasm contracts since it costs gas. This situation will improve as WebAssembly, compilers, and libraries mature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But perhaps any memory management beyond memory allocation may be unwanted for Ewasm contracts since it costs gas.

I don't think this statement belongs here, it leaves the reader with too many unanswered questions. I think the best thing to do would be to compile a list of open design questions in one place (not specific to C/C++) and provide a link to it somewhere in this doc. A link to open issues on the ewasm/design repo might suffice for now if we don't have a more mature doc.

To make this more helpful and constructive, it would be nice to conclude this section by saying something along the lines of, "For now, to work around these issues, ensure that you only use basic structs and malloc calls" (or whatever the advice should be). "To join the conversation and contribute to ongoing Ewasm interface design, see X" (link to another doc).

This situation will improve as WebAssembly, compilers, and libraries mature.

Consider dropping, I don't think this is critical.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that a guide is not a place for design discussions. I overhauled this paragraph.


1. In the current Ewasm design, all communication between the contract and the client is done through the module's memory. For example, the message data ("call data") sent to the contract is accessed by calling `callDataCopy()`, which puts this data to WebAssembly memory at a location given by a pointer. This pointer must be to either to a statically allocated array, or to dynamically allocated memory using `malloc`. For example, before calling `callDataCopy()`, one may use `getCallDataSize()` to see how many bytes of memory to `malloc`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callDataCopy()
getCallDataSize()

Consider linking to the EEI specs for these two methods.

Also, this would all be much clearer with an example using code. Could you maybe link to the wrc20 example code in C++, or even include it inline here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried including an example, but ended up having to explain too many things. This may overwhelm a first-time user. I think it is better to just give a concise high-level explanation, and allow the user to explore concrete examples when they know the basics. I overhauled this paragraph.


1. In the current Ewasm design, the Ethereum client writes data into WebAssembly as big-endian, but WebAssembly memory is little-endian, so has reversed bytes when the data is brought to/from the WebAssembly operand stack. For example, when the call data is brought into memory using `callDataCopy`, and those bytes are loaded to the WebAssembly stack using `i64.load`, all of the bytes are reversed. So extra C/C++ code may be needed to load bytes from the correct location and to reverse the loaded bytes.

1. The output of compilers is a `.wasm` binary which may have imports and exports which do not meet Ewasm requirements. We have tools to fix the imports and exports.
lrettig marked this conversation as resolved.
Show resolved Hide resolved

1. There are no tutorials for debugging/testing a contract. Hera supports extra Ewasm helper functions to print things, which have helped in writing test cases. A tutorial is needed to allow early adopters to debug/test their contracts without having to do it on the testnet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hera supports extra Ewasm helper functions to print things, which have helped in writing test cases

Could you link to a doc on these, or otherwise make it more explicit here? Assume you are talking about things like printmemhex?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention the state of debugging tools since it is important for developers. I changed it to say that early adoptors can debug on the testnet for now. I am left feeling that there is a great need for tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, was not suggesting removing this, but instead linking to docs we have elsewhere in this repo on debug tools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest a link? Do you think that it is reasonable to link instructions on how to write test fillers?


## Basic Step-by-Step Guide

First let's build the latest version of LLVM. Note: this section of the document allows you to build LLVM without any standard libraries. If you wish to use C/C++ standard libraries, then build the version of LLVM in the Advanced section below. That version can also be used here.

```sh
# checkout LLVM, clang, and lld
svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use git instead of svn? The instructions from Jake's doc seem reasonable no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned that the official guide http://llvm.org/docs/GettingStarted.html uses svn. But changed our guide to use git.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still compiling the git version to test it. Will revert to svn if there is a problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git version successfully compiles wrc20. Compiling the git version of LLVM had a few errors along the way, but restarted each time and finally it finished.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize svn was in the official guide. Glad to hear it works with git too!

cd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
svn co http://llvm.org/svn/llvm-project/lld/trunk lld
cd ../..

# build LLVM, clang, and lld
mkdir llvm-build
cd llvm-build
# note: if you want other targets than WebAssembly, then delete -DLLVM_TARGETS_TO_BUILD=
cmake -G "Unix Makefiles" -DLLVM_TARGETS_TO_BUILD= -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly ../llvm
make -j 8
```

Warning: this `cmake` step can take hours, requires a lot of disk space and memory, and may cause your computer to freeze. If there is an error, try again without the `-j 8` argument (which attempts to run eight parallel build processes).

Next download and compile a wrc20 ewasm contract written in C:

```sh
git clone https://gist.github.com/poemm/68a7b70ec353abaeae64bf6fe95d2d52.git cwrc20
```

Note that in `main.c`, there are many arrays in global scope: LLVM puts global arrays in WebAssembly memory, which allows them to be used as pointer arguments to Ethereum helper functions. Before compiling, make sure that the `Makefile` has a path to `llvm-build` above, and that `main.syms` has a list of Ewasm helper functions you are using.

Aside: If you are using C++, make sure to modify the Makefile to `clang++`, use `extern "C"` around the helper function declarations.

```sh
cd cwrc20
# edit the Makefile and main.syms as described above
make
```

The output is `main.wasm` which needs a cleanup of imports and exports to meet [Ewasm requirements](https://github.com/ewasm/design/blob/master/contract_interface.md). For this, we use [PyWebAssembly](https://github.com/poemm/pywebassembly), perform the cleanup manually, or use [wasm-chisel](https://github.com/wasmx/wasm-chisel), a program in Rust which can be installed with `cargo install chisel`. `wasm-chisel` is stricter and has more features, whereas `PyWebAssembly` is just enough for our use case, and Python is available on most machines. We therefore recommend using PyWebAssembly as follows:

```
cd ..
git clone https://github.com/poemm/pywebassembly.git
cd pywebassembly/examples/
python3 ewasmify.py ../../cwrc20/main.wasm
cd ../../cwrc20
```

Check whether the command line output of `ewasmify.py` above lists only [valid Ewasm imports and exports](https://github.com/ewasm/design/blob/master/eth_interface.md). To troubleshoot, you may wish to also inspect `main.wasm` in its text representation, so proceed to the next step with binaryen or wabt.

We can convert from the `.wasm` binary format to the `.wat` (or `.wast`) text format (these are equivalent formats and can be converted back-and-forth). This conversion can be done with Binaryen's `wasm-dis`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.wat (or .wast) text format (these are equivalent formats and can be converted back-and-forth)

Strictly speaking, this is not true. wast defines a "scripted" format with certain extensions to the grammar, and my understanding is that it's a superset of the wat format. See https://github.com/WebAssembly/design/blob/master/TextFormat.md#text-format. We've tended to elide over this in our docs thus far but I think it might be helpful to be more explicit about it. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I removed .wast. I think that we should start using the correct extension .wat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. It's clearly not critical but we can bring it up on a call and try to get everyone on the same page.


Aside: Alternatively one can use Wabt's `wasm2wat`. But Binaryen's `wasm-dis` is recommended because Ewasm studio uses Binaryen internally, and Binaryen can be quirky and fail to read a `.wat` generated by another program. Another tip: if Binaryen's `wasm-dis` can't read the `.wasm`, try using Wabt's `wasm2wat` then `wat2wasm` before trying again with Binaryen.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's link to @hugo-dc's new doc on binaryen and wabt here. Maybe we could rebase this against #141?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this binaryen section should be removed and replaced with a link to Hugo's doc once it is merged.


```sh
cd ..
git clone https://github.com/WebAssembly/binaryen.git # warning 90 MB, can also download precompiled binaries which are 15 MB
cd binaryen
mkdir build && cd build
cmake ..
make -j4
cd ../../cwrc20
../binaryen/build/bin/wasm-dis main_ewasmified.wasm > main_ewasmified.wat
```

`main_ewasmified.wat` is an ewasm contract. See other notes for how to deploy it. Happy hacking!


## Advanced

The above guide is for compiling a C file with no libc. Next we use a package which provides a minimal toolchain which includes libc and libc++, as well as patches allowing things like `malloc`.

```
git clone https://github.com/yurydelendik/wasmception.git
cd wasmception
make # Warning: this required lots of internet bandwidth, RAM, disk space, and one hour compiling on a mid-level laptop.
cd ..
```
Write down the end of the output of the above `make` command, it should include something like: `--sysroot=/home/user/repos/wasmception/sysroot`.

Next we will download and build a version of wrc20 which uses `malloc`. Make sure to edit the `Makefile` with the sysroot data above, and change the path of `clang` to our newly compiled version which may look something like `/home/user/repos/wasmception/dist/bin/clang`. Make sure that `main.syms` has a list of Ewasm helper functions you are using.

Aside: If you are using C++, make sure to modify the Makefile to `clang++`, use `extern "C"` around the helper function declarations, and follow other tips from wasmception.

```sh
git clone https://gist.github.com/poemm/91b64ecd2ca2f1cb4a88d31315313b9b.git cwrc20_with_malloc
cd cwrc20_with_malloc
# edit the Makefile and main.syms as described above
make
```

Now follow the same steps above to transform the output `main.wasm` into a valid Ewasm contract.

Tutorials are needed for more advanced things. For example, to statically link against other C files, one can link the LLVM IR as described here https://aransentin.github.io/cwasm/.
54 changes: 0 additions & 54 deletions clang.md

This file was deleted.