by PaddlePaddle

PaddlePaddle /Serving

A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)

286 Stars 78 Forks Last release: 3 months ago (v0.3.0) Apache License 2.0 3.2K Commits 7 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:


Build Status Release Issues License Slack


We consider deploying deep learning inference service online to be a user-facing application in the future. The goal of this project: When you have trained a deep neural net with Paddle, you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:


We highly recommend you to run Paddle Serving in Docker, please visit Run in Docker. See the document for more docker images. ```

Run CPU Docker

docker pull docker run -p 9292:9292 --name test -dit docker exec -it test bash

Run GPU Docker

nvidia-docker pull nvidia-docker run -p 9292:9292 --name test -dit nvidia-docker exec -it test bash ```

pip install paddle-serving-client 
pip install paddle-serving-server # CPU
pip install paddle-serving-server-gpu # GPU

You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add

to pip command) to speed up the download.

If you need install modules compiled with develop branch, please download packages from latest packages list and install with

pip install

Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7 and Ubuntu 16/18.

Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.6/3.7.

Recommended to install paddle >= 1.8.2.

Pre-built services with Paddle Serving

Latest release

Optical Character Recognition
Object Detection
Image Segmentation

Chinese Word Segmentation
> python -m paddle_serving_app.package --get_model lac
> tar -xzf lac.tar.gz
> python lac_model/ lac_workdir 9393 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}'

Image Classification

> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": ""}], "fetch": ["score"]}'

Quick Start Example

This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to TrainToService

Boston House Price Prediction model

wget --no-check-certificate
tar -xzf uci_housing.tar.gz

Paddle Serving provides HTTP and RPC based service for users to access

HTTP service

Paddle Serving provides a built-in python module called

that can start a RPC service or a http service with one-line command. If we specify the argument
--name uci
, it means that we will have a HTTP service with a url of
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci

| Argument | Type | Default | Description | |--------------|------|-----------|--------------------------------| |

| int |
| Concurrency of current service | |
| int |
| Exposed port of current service to users| |
| str |
| Service name, can be used to generate HTTP request url | |
| str |
| Path of paddle model directory to be served | |
| - | - | Disable memory / graphic memory optimization | |
| - | - | Enable analysis and optimization of calculation graph | |
(Only for cpu version) | - | - | Run inference with MKL |

Here, we use

to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, requests.
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}'

RPC service

A user can also start a RPC service with

. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
# A user can visit rpc service through paddle_serving_client API
from paddle_serving_client import Client

client = Client() client.load_client_config("uci_housing_client/serving_client_conf.prototxt") client.connect([""]) data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332] fetch_map = client.predict(feed={"x": data}, fetch=["price"]) print(fetch_map)


function has two arguments.
is a
python dict
with model input variable alias name and values.
assigns the prediction variables to be returned from servers. In the example, the name of
are assigned when the servable model is saved during training.

Some Key Features of Paddle Serving

  • Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
  • Industrial serving features supported, such as models management, online loading, online A/B testing etc.
  • Distributed Key-Value indexing supported which is especially useful for large scale sparse features as model inputs.
  • Highly concurrent and efficient communication between clients and servers supported.
  • Multiple programming languages supported on client side, such as Golang, C++ and python.


New to Paddle Serving

Tutorial at AIStudio


About Efficiency





To connect with other users and contributors, welcome to join our Slack channel


If you want to contribute code to Paddle Serving, please reference Contribution Guidelines


For any feedback or to report a bug, please propose a GitHub Issue.


Apache 2.0 License

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.