mirror of
https://github.com/sogou/workflow.git
synced 2026-02-08 01:33:17 +08:00
fccfc36a227da7e58d2490593b797b5c73fbbcda
Sogou C++ Workflow
As the backend C++ programming standard in Sogou, Workflow is an industrial-grade programming engine.
Main functions and features:
- An asynchronous engine based on C++11
std::functionwhich aims to solve all the serial, parallel and asynchronous problems. - As a network framework, it is completely protocol-agnostic and directly facing applications.
- It can either be used as a Redis client or an Http server.
- Convenient to customize protocols, so you can quickly build your own RPC systems.
- Sogou RPC is developed based on Sogou Workflow and is open source as an independent project. The project supports srpc, brpc and thrift protocols (benchmark).
- Support SSL (depends on openssl). Support TCP, UDP, SCTP and other common transport layer protocols. Support SSL on SCTP. Not support UDP server.
- Natively contains a variety of common Internet protocol implementations which are used in a unified way.
- Currently support http, redis, mysql and kafka protocols. You can directly access these resources or build servers for these protocols.
- Highly likely the only C++ full-featured mysql asynchronous client on the market.
- DNS protocol is being developing and currently we use the system library to access DNS.
- Powerful feature for scheduling computing tasks
- Computing task, as well as communication task, can be added into the task flow and they’re scheduled separately by their corresponding scheduler.
- You can use it as a parallel programming engine without the network features.
- Our biggest goal is to maximize the performance of every node when the calculation and communication environment is very complex.
- Some common algorithm implementations are provided, such as parallel sorting and MapReduce.
- In fact, all asynchronous processes (such as disk IO, GPU tasks, timers, etc.) can be scheduled in coordination.
- On the Linux system, the disk IO task is realized through the Linux underlying aio, which is extremely efficient.
- Support any task flow with DAG structure. However, in most cases, users only need series-parallel structure.
- Built-in load balancing and powerful service governance features.
- Easily used in conjunction with other asynchronous engines.
- Streaming communication engine is being developed.
- When working as a server, it supports multi-processes mode and supports precise graceful restart.
Building
- Support Linux, macOS, FreeBSD, Windows and other systems so far. Installing cmake is necessary.
- Windows version is temporarily released as an independent branch, which uses iocp as the basis for asynchronous communication and mean while, keeping the same external interface.
- As written in C/C++, it requires the users being able to proficiently use C++ programing. It does not rely on boost or asio, therefor the compiling speed is extremely fast.
- It contains a few C++11 features, so users should be able to use
std::functionandstd::move. - Theoretically support all CPU architectures and can be compiled and run on 32-bit or 64-bit arm processors. Big endian CPU is not tested.
- OpenSSL is required. If users expect high performance of SSL, OpenSSL 1.1 or higher is strongly recommended.
- No other dependencies. Several compression libraries such as snappy and lz4 is contained by their unmodified source (required by the Kafka protocol).
Some design features
- The basic usage is very simple and handy. Some features are designed to greatly reduce the difficulty of programming with general C++ projects.
- To avoid users to derive as much as possible, all user behaviors are wrapped with std::function, for example:
- the callback after every task ends
- the algorithms in computing tasks
- one server corresponds to one
std::function
- Trying to avoid complicated memory management, all tasks and frameworks are generated by factory classes, and their memory is recycled automatically. Which means,
- Every task is automatically deleted after its callback.
- If the users want to keep any data in the task (such as a network reply packet or the result of an algorithm), they need to use
std::moveto move it. - We treat memory recycle as a strict and naturally logical mechanism, so we don’t use share_ptr.
- Avoid using complicated parameter configuration.
- Actually we have a lot of configurable parameters, though you can use our system without feeling the existence of parameters.
- If you have specific requirements for program behavior and resource ratio, you can definitely find the corresponding parameter configuration items in order to maximize the performance of you program.
- To avoid users to derive as much as possible, all user behaviors are wrapped with std::function, for example:
- The project adopts a fully asynchronous design and is not transparent to users, which means users need to know that they are writing asynchronous programs.
- Thanks to the convenience brought by
std::functionand the automatic memory recycling mechanism, we have delicately designed the simply possible usage of asynchrony for users. - No user-mode threads concepts. On the one hand, performance is considered. On the other hand, we have the concept of computing tasks (threaded tasks) scheduling.
- In our design, computing is one kind of asynchronous task, which has no differences from communication.
- Computing tasks are scheduled by independent thread groups according to specific algorithms, please note that they may not be executed immediately.
- As we have such computing tasks, user-mode threads become meaningless, and therefore users must understand asynchrony.
- Because of the full asynchrony, almost all core calls are short and non-blocking operations.
- That’s why we don’t recommend users to block their programs in callback or do some complex calculations. However, it acceptable if the logic is quite simple.
- Thanks to the convenience brought by
- Brief summary of the usage:
* The users can build the program just like building a series-parallel circuit. The circuit can be generated at the beginning or dynamically generated during the program running.
* We provide various electronic components for users. For instance, one http request, one GPU matrix multiplication, and one parallel sorting can all be understood as a electronic component.
* Every electronic component has its standard input and output. At the meantime, every electronic component can be a complicated circuit, which has no necessary to be perceived by the users.
* For example, an http request may go through multiple asynchronous processes such as DNS, redirect, and retry, but the entire processes is just a component in the perspective of the users.
* Users can easily define their own components, including algorithms and some kind of communication.
- To implement stateless protocols is extremely simple. It may be a little bit complicated when the protocol includes login, library selection, etc., at this time, you can refer to the redis implementation. * Through the powerful Upstream system, complex service governance can be realized, such as communication node selection, load balancing, circuit breaker and recovery, master and slave, etc. * In conclusion, this is an enterprise-level, elegantly designed asynchronous framework which can cover almost all high-performance back-end service requirements.
Tutorials:
- Client
- Server
- Parallel task and Series
- Important topics
- Computing tasks
- File asynchronous IO tasks
- User-defined protocol basic usage
- Timing tasks and counting tasks
- Service governance
- Connection context
- Built-in protocols
Authors
- Xie Han - xiehan@sogou-inc.com
- Wu Jiaxu - wujiaxu@sogou-inc.com
- Li Yingxin - liyingxin@sogou-inc.com
Languages
C++
75.9%
C
21.1%
CMake
1.4%
Lua
0.8%
Starlark
0.5%
Other
0.3%