112 lines
4.8 KiB
Markdown
112 lines
4.8 KiB
Markdown
|
This folder contains an implementation of [automemcpy: A framework for automatic generation of fundamental memory operations](https://research.google/pubs/pub50338/).
|
||
|
|
||
|
It uses the [Z3 theorem prover](https://github.com/Z3Prover/z3) to enumerate a subset of valid memory function implementations. These implementations are then materialized as C++ code and can be [benchmarked](../) against various [size distributions](../distributions). This process helps the design of efficient implementations for a particular environnement (size distribution, processor or custom compilation options).
|
||
|
|
||
|
This is not enabled by default, as it is mostly useful when working on tuning the library implementation. To build it, use `LIBC_BUILD_AUTOMEMCPY=ON` (see below).
|
||
|
|
||
|
## Prerequisites
|
||
|
|
||
|
You may need to install `Z3` from source if it's not available on your system.
|
||
|
Here we show instructions to install it into `<Z3_INSTALL_DIR>`.
|
||
|
You may need to `sudo` to `make install`.
|
||
|
|
||
|
```shell
|
||
|
mkdir -p ~/git
|
||
|
cd ~/git
|
||
|
git clone https://github.com/Z3Prover/z3.git
|
||
|
python scripts/mk_make.py --prefix=<Z3_INSTALL_DIR>
|
||
|
cd build
|
||
|
make -j
|
||
|
make install
|
||
|
```
|
||
|
|
||
|
## Configuration
|
||
|
|
||
|
```shell
|
||
|
mkdir -p <BUILD_DIR>
|
||
|
cd <LLVM_PROJECT_DIR>/llvm
|
||
|
cmake -DCMAKE_C_COMPILER=/usr/bin/clang \
|
||
|
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
|
||
|
-DLLVM_ENABLE_PROJECTS="libc" \
|
||
|
-DLLVM_ENABLE_Z3_SOLVER=ON \
|
||
|
-DLLVM_Z3_INSTALL_DIR=<Z3_INSTALL_DIR> \
|
||
|
-DLIBC_BUILD_AUTOMEMCPY=ON \
|
||
|
-DCMAKE_BUILD_TYPE=Release \
|
||
|
-B<BUILD_DIR>
|
||
|
```
|
||
|
|
||
|
## Targets and compilation
|
||
|
|
||
|
There are three main CMake targets
|
||
|
1. `automemcpy_implementations`
|
||
|
- runs `Z3` and materializes valid memory functions as C++ code, a message will display its ondisk location.
|
||
|
- the source code is then compiled using the native host optimizations (i.e. `-march=native` or `-mcpu=native` depending on the architecture).
|
||
|
2. `automemcpy`
|
||
|
- the binary that benchmarks the autogenerated implementations.
|
||
|
3. `automemcpy_result_analyzer`
|
||
|
- the binary that analyses the benchmark results.
|
||
|
|
||
|
You may only compile the binaries as they both pull the autogenerated code as a dependency.
|
||
|
|
||
|
```shell
|
||
|
make -C <BUILD_DIR> -j automemcpy automemcpy_result_analyzer
|
||
|
```
|
||
|
|
||
|
## Running the benchmarks
|
||
|
|
||
|
Make sure to save the results of the benchmark as a json file.
|
||
|
|
||
|
```shell
|
||
|
<BUILD_DIR>/bin/automemcpy --benchmark_out_format=json --benchmark_out=<RESULTS_DIR>/results.json
|
||
|
```
|
||
|
|
||
|
### Additional useful options
|
||
|
|
||
|
|
||
|
- `--benchmark_min_time=.2`
|
||
|
|
||
|
By default, each function is benchmarked for at least one second, here we lower it to 200ms.
|
||
|
|
||
|
- `--benchmark_filter="BM_Memset|BM_Bzero"`
|
||
|
|
||
|
By default, all functions are benchmarked, here we restrict them to `memset` and `bzero`.
|
||
|
|
||
|
Other options might be useful, use `--help` for more information.
|
||
|
|
||
|
## Analyzing the benchmarks
|
||
|
|
||
|
Analysis is performed by running `automemcpy_result_analyzer` on one or more json result files.
|
||
|
|
||
|
```shell
|
||
|
<BUILD_DIR>/bin/automemcpy_result_analyzer <RESULTS_DIR>/results.json
|
||
|
```
|
||
|
|
||
|
What it does:
|
||
|
1. Gathers all throughput values for each function / distribution pair and picks the median one.\
|
||
|
This allows picking a representative value over many runs of the benchmark. Please make sure all the runs happen under similar circumstances.
|
||
|
|
||
|
2. For each distribution, look at the span of throughputs for functions of the same type (e.g. For distribution `A`, memcpy throughput spans from 2GiB/s to 5GiB/s).
|
||
|
|
||
|
3. For each distribution, give a normalized score to each function (e.g. For distribution `A`, function `M` scores 0.65).\
|
||
|
This score is then turned into a grade `EXCELLENT`, `VERY_GOOD`, `GOOD`, `PASSABLE`, `INADEQUATE`, `MEDIOCRE`, `BAD` - so that each distribution categorizes how function perform according to them.
|
||
|
|
||
|
4. A [Majority Judgement](https://en.wikipedia.org/wiki/Majority_judgment) process is then used to categorize each function. This enables finer analysis of how distributions agree on which function is better. In the following example, `Function_1` and `Function_2` are rated `EXCELLENT` but looking at the grade's distribution might help decide which is best.
|
||
|
|
||
|
| | EXCELLENT | VERY_GOOD | GOOD | PASSABLE | INADEQUATE | MEDIOCRE | BAD |
|
||
|
|------------|:---------:|:---------:|:----:|:--------:|:----------:|:--------:|:---:|
|
||
|
| Function_1 | 7 | 1 | 2 | | | | |
|
||
|
| Function_2 | 6 | 4 | | | | | |
|
||
|
|
||
|
The tool outputs the histogram of grades for each function. In case of tie, other dimensions might help decide (e.g. code size, performance on other microarchitectures).
|
||
|
|
||
|
```
|
||
|
EXCELLENT |█▁▂ | Function_0
|
||
|
EXCELLENT |█▅ | Function_1
|
||
|
VERY_GOOD |▂█▁ ▁ | Function_2
|
||
|
GOOD | ▁█▄ | Function_3
|
||
|
PASSABLE | ▂▆▄█ | Function_4
|
||
|
INADEQUATE | ▃▃█▁ | Function_5
|
||
|
MEDIOCRE | █▆▁| Function_6
|
||
|
BAD | ▁▁█| Function_7
|
||
|
```
|