Skip to content

Latest commit

 

History

History
69 lines (63 loc) · 3.48 KB

README.md

File metadata and controls

69 lines (63 loc) · 3.48 KB

TopNet-Object-Placement

This is an unofficial implementation of the paper "TopNet: Transformer-based Object Placement Network for Image Compositing", CVPR 2023. Its idea is similar to our earlier work FOPA.

Setup

All the code have been tested on PyTorch 1.7.0. Follow the instructions to run the project.

First, clone the repository:

git clone [email protected]:bcmi/TopNet-Object-Placement.git

Then, install Anaconda and create a virtual environment:

conda create -n TopNet
conda activate TopNet

Install PyTorch 2.0.1 (higher version should be fine):

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==0.7.0 cudatoolkit=11.7 -c pytorch

Install necessary packages:

pip install -r requirements.txt

Data Preparation

Download and extract data from Baidu Cloud(access code: 4zf9) or Google Drive. Download the SOPA encoder from Baidu Cloud (access code: 1x3n) or Google Drive. Put them in "data/data". It should contain the following directories and files:

<data/data>
  bg/                         # background images
  fg/                         # foreground images
  mask/                       # foreground masks
  train(test)_pair_new.json   # json annotations 
  train(test)_pair_new.csv    # csv files
  SOPA.pth.tar                # SOPA encoder

Training

Before training, modify "config.py" according to your need. After that, run:

python train.py

You can download our pretrained model from Baidu Cloud (access code: jx6u) or OneDrive.

Test

To get the F1 score and balanced accuracy of a specified model, run:

python test.py --load_path <PATH_TO_MODEL> 

Evalution on Discriminative Task

We show the results on discriminate task compared with SOPA and FOPA.

Method F1 bAcc
SOPA 0.780 0.842
FOPA 0.776 0.840
TopNet 0.741 0.815

Evalution on Generation Task

Following FOPA, given each background-foreground pair in the test set, we predict 16 rationality score maps for 16 foreground scales and generate composite images with top 50 rationality scores. Then, we randomly sample one from 50 generated composite images per background-foreground pair for Acc and FID evaluation, using the test scripts provided by GracoNet.

Method Acc FID
TERSE 0.679 46.94
PlaceNet 0.683 36.69
GracoNet 0.847 27.75
IOPRE 0.895 21.59
FOPA 0.932 19.76
TopNet 0.910 23.49

Other Resources