You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to reproduce the wizard_study case, just runing bash scripts/wizard_study.sh, with modified TGonly. Please see the result below:
it20000-test_G-rgb.mp4
Also, I tried to ask the GPT again for generating per-object prompt, the per-relation prompt, and the negative prompt that containing all others except the current object, here is the bash:
wizard_modified.sh
export P="A Wizard standing in front of a Wooden Desk, gazing into a Crystal Ball perched atop the Wooden Desk, with a Stack of Ancient Spell Books perched atop the Wooden Desk."
export NP="ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions"
export P1="'Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'"
export P2="'Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'"
export P3="'Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'"
export P4="'Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'"
export P12="The wizard is standing in front of the wooden desk."
export P23="The stack of ancient spell books is perched atop the wooden desk."
export P13="The wizard is gazing into the crystal ball."
export P34="The crystal ball is perched atop the wooden desk."
export N234="A wooden desk is visible with a crystal ball and a stack of ancient spell books on it."
export N134="A standing wizard is gazing into a crystal ball, and there's also a stack of ancient spell books."
export N124="There's a wizard standing before a wooden desk, on which a stack of ancient spell books is also placed."
export N123="A standing wizard is gazing into a crystal ball, both of which are by a wooden desk."
export RP="a 4K DSLR high-resolution high-quality photo of "$P""
export RP1="'a 4K DSLR high-resolution high-quality photo of a Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'"
export RP2="'a 4K DSLR high-resolution high-quality photo of a Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'"
export RP3="'a 4K DSLR high-resolution high-quality photo of a Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'"
export RP4="'a 4K DSLR high-resolution high-quality photo of a Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'"
export RP12="a 4K DSLR high-resolution high-quality photo of "$P12""
export RP23="a 4K DSLR high-resolution high-quality photo of "$P23""
export RP13="a 4K DSLR high-resolution high-quality photo of "$P13""
export RP34="a 4K DSLR high-resolution high-quality photo of "$P34""
Hi authors, pioneering work!
I tried to reproduce the
wizard_study
case, just runingbash scripts/wizard_study.sh
, with modifiedTG
only. Please see the result below:it20000-test_G-rgb.mp4
Also, I tried to ask the GPT again for generating per-object prompt, the per-relation prompt, and the negative prompt that containing all others except the current object, here is the bash:
wizard_modified.sh
export P="A Wizard standing in front of a Wooden Desk, gazing into a Crystal Ball perched atop the Wooden Desk, with a Stack of Ancient Spell Books perched atop the Wooden Desk." export NP="ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions"
export P1="'Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'"
export P2="'Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'"
export P3="'Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'"
export P4="'Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'"
export P12="The wizard is standing in front of the wooden desk."
export P23="The stack of ancient spell books is perched atop the wooden desk."
export P13="The wizard is gazing into the crystal ball."
export P34="The crystal ball is perched atop the wooden desk."
export N234="A wooden desk is visible with a crystal ball and a stack of ancient spell books on it."
export N134="A standing wizard is gazing into a crystal ball, and there's also a stack of ancient spell books."
export N124="There's a wizard standing before a wooden desk, on which a stack of ancient spell books is also placed."
export N123="A standing wizard is gazing into a crystal ball, both of which are by a wooden desk."
export PG=[["$P12"],["$P23"],["$P13"],["$P34"]]
export E_START_AT_1=[[1,2],[2,3],[1,3],[3,4]]
export E=[[0,1],[1,2],[0,2],[2,3]]
manually tuned parameters
export C=[[-0.2,0.2,0.0],[0.15,-0.15,-0.3],[0.4,0.2,0.25],[0.15,-0.15,0.16]]
export RO=[[0,0,0],[0,0,0],[0,0,0],[0,0,0]]
export R=[1.0,0.9,0.3,0.3]
Name save folder:
export TG="wizard_modified"
export CUDA=1
1. Coarse stage:
python launch.py --config configs/gd-if.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$P" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$P1"],["$P2"],["$P3"],["$P4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$PG" system.edge_list=$E system.guidance.guidance_scale=[200.,100.] system.guidance.guidance_scale_milestones=[2000,] system.geometry.center_params=$C system.geometry.radius_params=$R system.optimizer.params.geometry.lr=0.01 data.resolution_milestones=[2000,] trainer.max_steps=4600
2. Fine stage:
export RP="a 4K DSLR high-resolution high-quality photo of "$P""
export RP1="'a 4K DSLR high-resolution high-quality photo of a Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'"
export RP2="'a 4K DSLR high-resolution high-quality photo of a Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'"
export RP3="'a 4K DSLR high-resolution high-quality photo of a Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'"
export RP4="'a 4K DSLR high-resolution high-quality photo of a Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'"
export RP12="a 4K DSLR high-resolution high-quality photo of "$P12""
export RP23="a 4K DSLR high-resolution high-quality photo of "$P23""
export RP13="a 4K DSLR high-resolution high-quality photo of "$P13""
export RP34="a 4K DSLR high-resolution high-quality photo of "$P34""
export RPG=[["$RP12"],["$RP23"],["$RP13"],["$RP34"]]
Avoid OOM: data.batch_size=1 data.width=128 data.height=128
python launch.py --config configs/gd-sd-refine.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-if/$TG/ckpts/last.ckpt data.batch_size=1 data.width=128 data.height=128 trainer.max_steps=10000 trainer.val_check_interval=200
Increase training resolution: data.width=256 data.height=256 (Optional: 1xA100 required)
python launch.py --config configs/gd-sd-refine.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-sd-refine/$TG/ckpts/epoch=0-step=10000.ckpt data.batch_size=1 data.width=128 data.height=128 trainer.max_steps=20000 trainer.val_check_interval=200
And here is the result.
it20000-test_G-rgb.mp4
I've checked the prompts, and the 3D layout (please see the XY comparison below, the Z-axis is basically aligned), it seems good.
I am wondering the possible reasons for this. Any help will be appearciated, thanks in advance!
btw, I am re-running the
wizard_modified
that has opposite Y-axis value withdesk
object, hope it will be better :)The text was updated successfully, but these errors were encountered: