Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproduction questions #9

Open
haodong2000 opened this issue May 2, 2024 · 0 comments
Open

reproduction questions #9

haodong2000 opened this issue May 2, 2024 · 0 comments

Comments

@haodong2000
Copy link

haodong2000 commented May 2, 2024

Hi authors, pioneering work!

I tried to reproduce the wizard_study case, just runing bash scripts/wizard_study.sh, with modified TG only. Please see the result below:

it20000-test_G-rgb.mp4

Also, I tried to ask the GPT again for generating per-object prompt, the per-relation prompt, and the negative prompt that containing all others except the current object, here is the bash:

wizard_modified.sh

export P="A Wizard standing in front of a Wooden Desk, gazing into a Crystal Ball perched atop the Wooden Desk, with a Stack of Ancient Spell Books perched atop the Wooden Desk." export NP="ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions"

export P1="'Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'"
export P2="'Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'"
export P3="'Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'"
export P4="'Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'"

export P12="The wizard is standing in front of the wooden desk."
export P23="The stack of ancient spell books is perched atop the wooden desk."
export P13="The wizard is gazing into the crystal ball."
export P34="The crystal ball is perched atop the wooden desk."

export N234="A wooden desk is visible with a crystal ball and a stack of ancient spell books on it."
export N134="A standing wizard is gazing into a crystal ball, and there's also a stack of ancient spell books."
export N124="There's a wizard standing before a wooden desk, on which a stack of ancient spell books is also placed."
export N123="A standing wizard is gazing into a crystal ball, both of which are by a wooden desk."

export PG=[["$P12"],["$P23"],["$P13"],["$P34"]]
export E_START_AT_1=[[1,2],[2,3],[1,3],[3,4]]
export E=[[0,1],[1,2],[0,2],[2,3]]

manually tuned parameters

export C=[[-0.2,0.2,0.0],[0.15,-0.15,-0.3],[0.4,0.2,0.25],[0.15,-0.15,0.16]]
export RO=[[0,0,0],[0,0,0],[0,0,0],[0,0,0]]
export R=[1.0,0.9,0.3,0.3]

Name save folder:

export TG="wizard_modified"

export CUDA=1

1. Coarse stage:

python launch.py --config configs/gd-if.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$P" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$P1"],["$P2"],["$P3"],["$P4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$PG" system.edge_list=$E system.guidance.guidance_scale=[200.,100.] system.guidance.guidance_scale_milestones=[2000,] system.geometry.center_params=$C system.geometry.radius_params=$R system.optimizer.params.geometry.lr=0.01 data.resolution_milestones=[2000,] trainer.max_steps=4600

2. Fine stage:

export RP="a 4K DSLR high-resolution high-quality photo of "$P""
export RP1="'a 4K DSLR high-resolution high-quality photo of a Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'"
export RP2="'a 4K DSLR high-resolution high-quality photo of a Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'"
export RP3="'a 4K DSLR high-resolution high-quality photo of a Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'"
export RP4="'a 4K DSLR high-resolution high-quality photo of a Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'"
export RP12="a 4K DSLR high-resolution high-quality photo of "$P12""
export RP23="a 4K DSLR high-resolution high-quality photo of "$P23""
export RP13="a 4K DSLR high-resolution high-quality photo of "$P13""
export RP34="a 4K DSLR high-resolution high-quality photo of "$P34""

export RPG=[["$RP12"],["$RP23"],["$RP13"],["$RP34"]]

Avoid OOM: data.batch_size=1 data.width=128 data.height=128

python launch.py --config configs/gd-sd-refine.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-if/$TG/ckpts/last.ckpt data.batch_size=1 data.width=128 data.height=128 trainer.max_steps=10000 trainer.val_check_interval=200

Increase training resolution: data.width=256 data.height=256 (Optional: 1xA100 required)

python launch.py --config configs/gd-sd-refine.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-sd-refine/$TG/ckpts/epoch=0-step=10000.ckpt data.batch_size=1 data.width=128 data.height=128 trainer.max_steps=20000 trainer.val_check_interval=200

And here is the result.

it20000-test_G-rgb.mp4

I've checked the prompts, and the 3D layout (please see the XY comparison below, the Z-axis is basically aligned), it seems good.

3655660d5eec0cfd3a8ba50cf63e087

I am wondering the possible reasons for this. Any help will be appearciated, thanks in advance!

btw, I am re-running the wizard_modified that has opposite Y-axis value with desk object, hope it will be better :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant