Need help with DREAMPlace?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

217 Stars 68 Forks BSD 3-Clause "New" or "Revised" License 476 Commits 8 Opened issues


Deep learning toolkit-enabled VLSI placement

Services available


Need anything else?

Contributors list

# 215,407
Deep le...
254 commits
# 321,972
Deep le...
75 commits
# 546,422
Deep le...
6 commits
# 576,333
Deep le...
5 commits
# 640,933
Deep le...
2 commits
# 710,321
Deep le...
1 commit
# 668,764
Deep le...
1 commit


Deep learning toolkit-enabled VLSI placement. With the analogy between nonlinear VLSI placement and deep learning training problem, this tool is developed with deep learning toolkit for flexibility and efficiency. The tool runs on both CPU and GPU. Over

speedup over the CPU implementation (RePlAce) is achieved in global placement and legalization on ISPD 2005 contest benchmarks with a Nvidia Tesla V100 GPU. DREAMPlace also integrates a GPU-accelerated detailed placer, ABCDPlace, which can achieve around
speedup on million-size benchmarks over the widely-adopted sequential placer NTUPlace3 on CPU.

DREAMPlace runs on both CPU and GPU. If it is installed on a machine without GPU, only CPU support will be enabled with multi-threading.

  • Animation

| Bigblue4 | Density Map | Electric Potential | Electric Field | | -------- | ----------- | ------------------ | -------------- | | | Density Map | Electric Potential Map | Electric Field Map |

  • Reference Flow


  • Yibo Lin, Shounak Dhar, Wuxi Li, Haoxing Ren, Brucek Khailany and David Z. Pan, "DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement", ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, Jun 2-6, 2019 (preprint) (slides)

  • Yibo Lin, Zixuan Jiang, Jiaqi Gu, Wuxi Li, Shounak Dhar, Haoxing Ren, Brucek Khailany and David Z. Pan, "DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2020 (accepted)

  • Yibo Lin, Wuxi Li, Jiaqi Gu, Haoxing Ren, Brucek Khailany and David Z. Pan, "ABCDPlace: Accelerated Batch-based Concurrent Detailed Placement on Multi-threaded CPUs and GPUs", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2020 (preprint) (accepted)


  • Python 3.5/3.6/3.7

  • Pytorch 1.0.0

    • Other version around 1.0.0 may also work, but not tested
  • GCC

    • Recommend GCC 5.1 or later.
    • Other compilers may also work, but not tested.
  • Boost

    • Need to install and visible for linking
  • Limbo

    • Integrated as a git submodule
  • Flute

    • Integrated as a submodule
  • CUB

    • Integrated as a git submodule
  • munkres-cpp

    • Integrated as a git submodule
  • CUDA 9.1 or later (Optional)

    • If installed and found, GPU acceleration will be enabled.
    • Otherwise, only CPU implementation is enabled.
  • GPU architecture compatibility 6.0 or later (Optional)

    • Code has been tested on GPUs with compute compatibility 6.0, 7.0, and 7.5.
    • Please check the compatibility of the GPU devices.
    • The default compilation target is compatibility 6.0. This is the minimum requirement and lower compatibility is not supported for the GPU feature.
    • For compatibility 7.0, it is necessary to set the CMAKECUDAFLAGS to -gencode=arch=compute70,code=sm70.
  • Cairo (Optional)

    • If installed and found, the plotting functions will be faster by using C/C++ implementation.
    • Otherwise, python implementation is used.
  • NTUPlace3 (Optional)

    • If the binary is provided, it can be used to perform detailed placement.

To pull git submodules in the root directory

git submodule init
git submodule update

Or alternatively, pull all the submodules when cloning the repository.

git clone --recursive

How to Install Python Dependency

Go to the root directory.

pip install -r requirements.txt 

How to Build

Two options are provided for building: with and without Docker.

Build with Docker

You can use the Docker container to avoid building all the dependencies yourself. 1. Install Docker on Windows, Mac or Linux. 2. To enable the GPU features, install NVIDIA-docker; otherwise, skip this step.
3. Navigate to the repository. 4. Get the docker container with either of the following options. - Option 1: pull from the cloud limbo018/dreamplace.

    docker pull limbo018/dreamplace:cuda
- Option 2: build the container.
    docker build . --file Dockerfile --tag your_name/dreamplace:cuda
5. Enter bash environment of the container. Replace
with your name if option 2 is chosen in the previous step.

Run with GPU on Linux.

docker run --gpus 1 -it -v $(pwd):/DREAMPlace limbo018/dreamplace:cuda bash
Run with GPU on Windows.
docker run --gpus 1 -it -v /dreamplace limbo018/dreamplace:cuda bash
Run without GPU on Linux.
docker run -it -v $(pwd):/DREAMPlace limbo018/dreamplace:cuda bash
Run without GPU on Windows.
docker run -it -v /dreamplace limbo018/dreamplace:cuda bash
cd /DREAMPlace
. 7. Go to next section to complete building.

Build without Docker

CMake is adopted as the makefile system. To build, go to the root directory.

mkdir build 
cd build 
cmake .. -DCMAKE_INSTALL_PREFIX=your_install_path
make install

Third party submodules are automatically built except for Boost.

To clean, go to the root directory.

rm -r build

Here are the available options for CMake. - CMAKEINSTALLPREFIX: installation directory - Example

cmake -DCMAKE_INSTALL_PREFIX=path/to/your/directory
- CMAKECUDAFLAGS: custom string for NVCC (default -gencode=arch=compute60,code=sm60) - Example
cmake -DCMAKE_CUDA_FLAGS=-gencode=arch=compute_60,code=sm_60
- CMAKECXXABI: 0|1 for the value of GLIBCXXUSECXX11ABI for C++ compiler, default is 0. - Example
- It must be consistent with the GLIBCXXUSECXX11ABI for compling all the C++ dependencies, such as Boost and PyTorch. - PyTorch in default is compiled with GLIBCXXUSECXX11ABI=0, but in a customized PyTorch environment, it might be compiled with GLIBCXXUSECXX11ABI=1.

How to Get Benchmarks

To get ISPD 2005 benchmarks, run the following script from the directory.

python benchmarks/

How to Run

Before running, make sure the benchmarks have been downloaded and the python dependency packages have been installed. Go to the install directory and run with JSON configuration file for full placement.

python dreamplace/ test/ispd2005/adaptec1.json

Test individual pytorch op with the unitest in the root directory.

python unitest/ops/


Descriptions of options in JSON configuration file can be found by running the following command.

python dreamplace/ --help

The list of options as follows will be shown.

| JSON Parameter | Default | Description | | -------------------------------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | | auxinput | required for Bookshelf | input .aux file | | lefinput | required for LEF/DEF | input LEF file | | definput | required for LEF/DEF | input DEF file | | veriloginput | optional for LEF/DEF | input VERILOG file, provide circuit netlist information if it is not included in DEF file | | gpu | 1 | enable gpu or not | | numbinsx | 512 | number of bins in horizontal direction | | numbinsy | 512 | number of bins in vertical direction | | globalplacestages | required | global placement configurations of each stage, a dictionary of {"numbinsx", "numbinsy", "iteration", "learningrate"}, learningrate is relative to bin size | | targetdensity | 0.8 | target density | | densityweight | 1.0 | initial weight of density cost | | gamma | 0.5 | initial coefficient for log-sum-exp and weighted-average wirelength | | randomseed | 1000 | random seed | | resultdir | results | result directory for output | | scalefactor | 0.0 | scale factor to avoid numerical overflow; 0.0 means not set | | ignorenetdegree | 100 | ignore net degree larger than some value | | gpnoiseratio | 0.025 | noise to initial positions for global placement | | enablefillers | 1 | enable filler cells | | globalplaceflag | 1 | whether use global placement | | legalizeflag | 1 | whether use internal legalization | | detailedplaceflag | 1 | whether use internal detailed placement | | stopoverflow | 0.1 | stopping criteria, consider stop when the overflow reaches to a ratio | | dtype | float32 | data type, float32 | float64 | | detailedplaceengine | | external detailed placement engine to be called after placement | | detailedplacecommand | -nolegal -nodetail | commands for external detailed placement engine | | plotflag | 0 | whether plot solution or not | | RePlAcerefhpwl | 350000 | reference HPWL used in RePlAce for updating density weight | | RePlAceLOWERPCOF | 0.95 | lower bound ratio used in RePlAce for updating density weight | | RePlAceUPPERPCOF | 1.05 | upper bound ratio used in RePlAce for updating density weight | | randomcenterinitflag | 1 | whether perform random initialization around the center for global placement | | sortnetsbydegree | 0 | whether sort nets by degree or not | | numthreads | 8 | number of CPU threads | | dumpglobalplacesolutionflag | 0 | whether dump intermediate global placement solution as a compressed pickle object | | dumplegalizesolution_flag | 0 | whether dump intermediate legalization solution as a compressed pickle object |


  • Yibo Lin, supervised by David Z. Pan, composed the initial release.
  • Zixuan Jiang and Jiaqi Gu improved the efficiency of the wirelength and density operators on GPU.
  • Yibo Lin and Jiaqi Gu developed and integrated ABCDPlace for detailed placement.
  • Pull requests to improve the tool are more than welcome. We appreciate all kinds of contributions from the community.


  • 0.0.2

    • Multi-threaded CPU and optional GPU acceleration support
  • 0.0.5

    • Net weighting support through .wts files in Bookshelf format
    • Incremental placement support
  • 0.0.6

    • LEF/DEF support as input/output
    • Python binding and access to C++ placement database
  • 1.0.0

    • Improved efficiency for wirelength and density operators from TCAD extension
  • 1.1.0

    • Docker container for building environment
  • 2.0.0

    • Integrate ABCDPlace: multi-threaded CPU and GPU acceleration for detailed placement
    • Support independent set matching, local reordering, and global swap with run-to-run determinism on one machine
    • Support movable macros with Tetris-like macro legalization and min-cost flow refinement
  • 2.1.0

    • Support deterministic mode to ensure run-to-run determinism with minor runtime overhead
  • 2.2.0

    • Integrate routability optimization relying on NCTUgr from TCAD extension
    • Improved robustness on parallel CPU version

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.