workflow/README.md

[中文版入口](README_cn.md)

## Sogou C++ Workflow

[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://github.com/sogou/workflow/blob/master/LICENSE)
[![Language](https://img.shields.io/badge/language-c++-red.svg)](https://en.cppreference.com/)
[![Platform](https://img.shields.io/badge/platform-linux%20%7C%20macos%20%7C%20windows-lightgrey.svg)](https://img.shields.io/badge/platform-linux%20%7C%20macos20%7C%20windows-lightgrey.svg)
[![Build Status](https://travis-ci.org/sogou/workflow.svg?branch=master)](https://travis-ci.org/sogou/workflow)

As **Sogou\`s C++ server engine**, Sogou C++ Workflow supports almost all **back-end C++ online services** of Sogou, including all search services, cloud input method，online advertisements, etc., handling more than **10 billion** requests every day. This is an **enterprise-level programming engine** in light and elegant design which can satisfy most C++ back-end development requirements.

#### You can use it:

* To quickly build an **HTTP server**:

~~~cpp
#include <stdio.h>
#include "workflow/WFHttpServer.h"

int main()
{
    WFHttpServer server([](WFHttpTask *task) {
        task->get_resp()->append_output_body("<html>Hello World!</html>");
    });

    if (server.start(8888) == 0) { // start server on port 8888
        getchar(); // press "Enter" to end.
        server.stop();
    }

    return 0;
}
~~~

* As a **multifunctional asynchronous client**, it currently supports `HTTP`, `Redis`, `MySQL` and `Kafka` protocols.
* To implement **client/server on user-defined protocol** and build your own **RPC system**.
  * [srpc](https://github.com/sogou/srpc) is based on it and it is an independent open source project, which supports srpc, brpc, trpc and thrift protocols.
* To build **asynchronous workflow**; support common **series** and **parallel** structures, and also support any **DAG** structures.
* As a **parallel computing tool**. In addition to **networking tasks**, Sogou C++ Workflow also includes **the scheduling of computing tasks**. All types of tasks can be put into **the same** flow.
* As a **asynchronous file IO tool** in `Linux` system, with high performance exceeding any system call. Disk file IO is also a task.
* To realize any **high-performance** and **high-concurrency** back-end service with a very complex relationship between computing and networking.
* To build a **micro service** system.
  * This project has built-in **service governance** and **load balancing** features.

#### Compiling and running environment

* This project supports `Linux`, `macOS`, `Windows`, `Android` and other operating systems.
  * `Windows` version is currently released as an independent [branch](https://github.com/sogou/workflow/tree/windows), using `iocp` to implement asynchronous networking. All user interfaces are consistent with the `Linux` version.
* Supports all CPU platforms, including 32 or 64-bit `x86` processors, big-endian or little-endian `arm` processors.
* Relies on `OpenSSL`; `OpenSSL 1.1` and above is recommended. If you don't like SSL, you may checkout the [nossl](https://github.com/sogou/workflow/tree/nossl) branch. But still need to link `crypto` for `md5` and `sha1`.
* Uses the `C++11` standard and therefore, it should be compiled with a compiler which supports `C++11`. Does not rely on `boost` or `asio`.
* No other dependencies. However, if you need `Kafka` protocol, some compression libraries should be installed, including `lz4`, `zstd` and `snappy`.

### Get started (Linux, macOS):
~~~sh
$ git clone https://github.com/sogou/workflow
$ make
$ cd tutorial
$ make
~~~~

# Tutorials

* Client
  * [Creating your first task：wget](docs/en/tutorial-01-wget.md)
  * [Implementing Redis set and get：redis\_cli](docs/en/tutorial-02-redis_cli.md)
  * [More features about series：wget\_to\_redis](docs/en/tutorial-03-wget_to_redis.md)
* Server
  * [First server：http\_echo\_server](docs/en/tutorial-04-http_echo_server.md)
  * [Asynchronous server：http\_proxy](docs/en/tutorial-05-http_proxy.md)
* Parallel tasks and Series
  * [A simple parallel wget：parallel\_wget](docs/en/tutorial-06-parallel_wget.md)
* Important topics
  * [About error](docs/en/about-error.md)
  * [About timeout](docs/en/about-timeout.md)
  * [About global configuration](docs/en/about-config.md)
  * [About DNS](docs/en/about-dns.md)
  * [About exit](docs/en/about-exit.md)
* Computing tasks
  * [Using the build-in algorithm factory：sort\_task](docs/en/tutorial-07-sort_task.md)
  * [User-defined computing task：matrix\_multiply](docs/en/tutorial-08-matrix_multiply.md)
  * [Use computing task in a simple way: go task](docs/en/about-go-task.md)
* Asynchronous File IO tasks
  * [Http server with file IO：http\_file\_server](docs/en/tutorial-09-http_file_server.md)
* User-defined protocol
  * [A simple user-defined portocol: client/server](docs/en/tutorial-10-user_defined_protocol.md)
* Timing tasks and counting tasks
  * [About timer](docs/en/about-timer.md)
  * [About counter](docs/en/about-counter.md)
* Service governance
  * [About service governance](docs/en/about-service-management.md)
  * [More documents about upstream](docs/en/about-upstream.md)
* Connection context
  * [About connection context](docs/en/about-connection-context.md)
* Built-in protocols
  * [Asynchronous MySQL client：mysql\_cli](docs/en/tutorial-12-mysql_cli.md)
  * [Asynchronous Kafka client: kafka\_cli](docs/en/tutorial-13-kafka_cli.md)

#### System design features

We believe that a typical back-end program=protocol+algorithm+workflow and should be developed completely independently.

* Protocol
  * In most cases, users use built-in common network protocols, such as HTTP, Redis or various rpc.
  * Users can also easily customize user-defined network protocol. In the customization, they only need to provide serialization and deserialization functions to define their own client/server.
* Algorithm
  * In our design, the algorithm is a concept symmetrical to the protocol.
    * If protocol call is rpc, then algorithm call is an apc (Async Procedure Call).
  * We have provided some general algorithms, such as sort, merge, psort, reduce, which can be used directly.
  * Compared with a user-defined protocol, a user-defined algorithm is much more common. Any complicated computation with clear boundaries should be packaged into an algorithm.
* Workflow
  * Workflow is the actual bussiness logic, which is to put the protocols and algorithms into the flow graph for use.
  * The typical workflow is a closed series-parallel graph. Complex business logic may be a non-closed DAG.
  * The workflow graph can be constructed directly or dynamically generated based on the results of each step. All tasks are executed asynchronously.

Basic task, task factory and complex task

* Our system contains six basic tasks: networking, file IO, CPU, GPU, timer, and counter.
* All tasks are generated by the task factory and automatically recycled after callback.
  * Server task is one kind of special networking task, generated by the framework which calls the task factory, and handed over to the user through the process function.
* In most cases, the task generated by the user through the task factory is a complex task, which is transparent to the user.
  * For example, an HTTP request may include many asynchronous processes (DNS, redirection), but for user, it is just a networking task.
  * File sorting seems to be an algorithm, but it actually includes many complex interaction processes between file IO and CPU computation.
  * If you think of business logic as building circuits with well-designed electronic components, then each electronic component may be a complex circuit.

Asynchrony and encapsulation based on `C++11 std::function`

* Not based on user mode coroutines. Users need to know that they are writing asynchronous programs.
* All calls are executed asynchronously, and there are almost no operation that occupys a thread.
  * Although we also provide some facilities with semi-synchronous interfaces, they are not core features.
* We try to avoid user's derivations, and encapsulate user behavior with `std::function` instead, including:
  * The callback of any task.
  * Any server's process. This conforms to the `FaaS` (Function as a Service) idea.
  * The realization of an algorithm is simply a `std::function`. But the algorithm can also be implemented by derivation.

Memory reclamation mechanism

* Every task will be automatically reclaimed after the callback. If a task is created but a user does not want to run it, the user needs to release it through the dismiss method.
* Any data in the task, such as the response of the network request, will also be recycled with the task. At this time, the user can use `std::move()` to move the required data.
* SeriesWork and ParallelWork are two kinds of framework objects, which are also recycled after their callback.
  * When a series is a branch of a parallel, it will be recycled after the callback of the parallel that it belongs to.
* This project doesn’t use `std::shared_ptr` to manage memory.

#### More design documents

To be continued...