Releases: Phhofm/models
2xBHI_small_realplksr_dysample_pretrain
Scale: 2x
Network type: Compact
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 2x realplksr dysample high psnry model with l1&mssim loss only.
Pretrained Model: 4xmssim_realplksr_dysample_pretrain
Training iterations: 200'000
Description: 2x realplksr dysample bicubic upscaling model. Similiar to 2xBHI_small_compact_pretrain but for the realplksr network option.
I used my BHI100 val set to generate metrics to have a reference point. RealPLKSR is a community modification to stabilize GAN training of PLKSR so thats where its strength lies. Included PLKSR paper pretrain results as a reference since its the closest i have, best in red, second in blue, third in green:
BHI100 MetricsAnd also with my latest 2xBHI_small_compact_pretrain included. Outputs and calculations in attachements.
BHI100 Metrics with Compact4xBHI_dat2_otf_nn
4xBHI_dat2_otf_nn
Scale: 4x
Network type: DAT2
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, handles jpg compression and resizes
Pretrained Model: 4xNomos2_hq_dat2
Training iterations: 220000
Description: 4x dat2 upscaling model, trained with the real-esrgan otf pipeline but without noise, on my bhi dataset. Handles resizes, and jpg compression.
2xBHI_small_compact_pretrain
Scale: 2x
Network type: Compact
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 2x compact high psnry model with l1&mssim loss only.
Pretrained Model: Itself
Training iterations: 50'000 (in total its 650'000 iters)
Description: 2x compact pretrain model. Goal here was simply to reach high psnry metrics on Urban100. This model has been trained with (pillow) bicubic downsampled only with the LR I provided with my bhi_small dataset (see x2, HR and Urban100 zip files in there). Only l1 and mssim losses have been used.
Process:
At first I did some tests to see what config values I want to use, these were very short tests for 10k iters only.
These were all 4x compact from scratch tests:
Out of these l1 with 0.6 and mssim with 0.4 seemed to work best.
Higher batch seems to give better metrics.
Higher patch seems to give better metrics.
adamw with learning rate 1e-4 seems to give better metrics.
Now lets try a 2x compact model with these settings:
So after 300k iters where I reached a psnry of 31.614 I thought, for the 4x models often the 2x pretrain strategy is used, were the previous 2x from scratch model is used to train the 4x version. But what about we use the 2x pretrain strategy also for the 2x pretrain itself? I simply wanted to test real quick if it would improve training
within the next 20k iterations, a new training with loading the previous model as pretrain improved faster than letting the 300k version train some more. So I continued to train this model by using itself as a pretrain, same settings still:
At 150k iterations, I wanted to see what happens when I increase patch from 64 to 128, while keeping batch 64. This now maxed out my available VRAM on my GPU (RTX 3060 with 12GB (using previous step as pretrain again):
Which I trained for 170k iters
Then I though what happens after all this training, I max out patch. Highest I could increase patch was 256 since my dataset has an HR of 512 and a corresponding x2 of 256, so thats the highest I can go. Batch needed to be reduced to 16 because of my limited ressources.
Here is all the graphs from the training runs. And even though the model improves with longer training the curves are getting flatter and flatter, meaning I think a psnry of 32 is kinda the limit I can reach with a 2x compact model on my dataset.
Config files are also attached of the runs. I attach the final high-psnry 2x compact model.
The val metrics I reached for the final state of this model (I decided to end this experiment here) (PS those are y-channel metrics, meaning psnry and ssimy)
4xBHI_realplksr
4x BHI RealPLKSR Models
After making the BHI dataset, I trained multiple realplksr dysample models on it to try things out. I am releasing them here so they can be tried out.
I recommend using the dat2 models over these, since they give better results quality wise.
Download links:
4xBHI_realplksr_dysample_multi
4xBHI_realplksr_dysample_multiblur
4xBHI_realplksr_dysample_otf_nn
4xBHI_realplksr_dysample_otf
4xBHI_realplksr_dysample_real
Quick Summary:
mutli - No degradation handling
multiblur - same as multi but slightly sharper output since blur has been used. Also no degradation handling (even though I added 50% compression to the training set (25% jpg, 25% webp), this model seems not able to handle compression.
otf_nn - Trained with realesrgan otf pipeline, handles compression, very little noise handling.
otf - Trained with the real-esrgan otf pipeline, handles compression and noise, denoises rather aggressively.
real - Trained with my real degraded LR set, handles webp and jpg compression, realistic noise, some lens blur etc.
Visual Examples:
4xBHI_realplksr_dysample_multi
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling good quality, uncompressed images
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 285000
Description: 4x realplksr dysample model to upscale good quality, uncompressed images. Does not handle any degradations since LR had been scaled only using down_up, linear, cubic_mitchell, lanczos, gauss and box. Trained on my BHI dataset.
4xBHI_realplksr_dysample_multiblur
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, can handle some blur, jpeg and webp compression
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 354000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. Similiar to the 4xBHI_realplksr_dysample_multi model, the LR had made use of the same multiscaling, but additionally, compression handling had been added for jpeg and webp compression, plus I made use of average, gaussian and anisotropic blur for sharper results.
4xBHI_realplksr_dysample_otf
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, can handle some blur, jpeg compression, and noise, trained with realesrgan otf pipeline.
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 413000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. This time, instead of manual degradation, I made use of the real-esrgan otf pipeline with adjusted values to handle some blur, noise and jpeg compression.
4xBHI_realplksr_dysample_otf__nn
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, similiar to the 4xBHI_realplksr_dysample_otf model but without noise handling
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 251000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. This is similiar to the 4xBHI_realplksr_dysample_otf model but trained without noise handling capability (_nn -> 'no noise'), since I had trained some models like this in the past to retrain some more details for cases where noise removal simply is not needed.
4xBHI_realplksr_dysample_real
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, handling some blur, some noise, and compression
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 380000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. This is with a created LR set where I used wtp_destroyer to add blur (50% chance), my ludvae200 model to add noise (50% chance), and then wtp to add compression, scaling, recompression (50% chance), scaling.
4xBHI_dat2_real
Scale: 4x
Network type: DAT2
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images. Handles realistic noise, some realistic blur, and webp and jpg (re)compression.
Pretrained Model: 4xNomos2_hq_dat2
Training iterations: 240000
Description: 4x dat2 upscaling model for web and realistic images. It handles realistic noise, some realistic blur, and webp and jpg (re)compression. Trained on my BHI dataset (390'035 training tiles) with degraded LR subset.
Have a look at the applied_degradations files in the attachement to know what degradations in what order were applied to the BHI dataset to create the corresponding LR set for training.
4xBHI_dat2_otf
4xBHI_dat2_otf
Scale: 4x
Network type: DAT2
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, handles noise and jpg compression
Pretrained Model: 4xNomos2_hq_dat2
Training iterations: 270000
Description: 4x dat2 upscaling model, trained with the real-esrgan otf pipeline on my bhi dataset. Handles noise and compression.
4xBHI_dat2_multiblurjpg
4xBHI_dat2_multiblurjpg
Scale: 4x
Network type: DAT2
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, handles jpg compression
Pretrained Model: 4xNomos2_hq_dat2
Training iterations: 320000
Description: 4x dat2 upscaling model, trained with down_up,linear, cubic_mitchell, lanczos, gauss and box scaling algos, some average, gaussian and anisotropic blurs and jpg compression. Trained on my BHI sisr dataset.
You can also try out the 4xBHI_dat2_multiblur checkpoint below (trained to 250000 iters), which cannot handle compression but might give just slightly better output on non-degraded input.
4x Sebica pretrains
4x Sebica pretrains
Just a small release, these are two simple Sebica pretrains both trained to 100k each, meant to stabilize early training for both sebica and sebica_mini models.
Scale: 4
Architecture: Sebica
Architecture Options: sebica and sebica_mini
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: 4x Sebica options pretrains
Subject: Realistic
Input Type: Images
Release Date: 11.11.2024 (dd/mm/yy)
Dataset: DF2K_BHI
Dataset Size: 12'639
OTF (on the fly augmentations): No
Pretrained Model: None
Iterations: 100'000
Batch Size: 8
Patch Size: 48
Description:
Two simple sebica pretrains of 100k iters to stability early training of sebica models, both for the sebica and sebica_mini options.
For training I used my BHI filtered version of DF2K which I released together with my BHI filtering post on huggingface.
2xAoMR_mosr
2xAoMR_mosr
Scale: 4
Architecture: MoSR
Architecture Option: mosr
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: A 2x mosr upscaling model for game textures
Subject: Game Textures
Input Type: Images
Release Date: 21.09.2024 (dd/mm/yy)
Dataset: Game Textures from Age of Mythology: Retold
Dataset Size: 13'847
OTF (on the fly augmentations): No
Pretrained Model: 4xNomos2_hq_mosr
Iterations: 510'000
Batch Size: 4
Patch Size: 64
Description:
In short: A 2x game texture mosr upscaling model, trained on and for (but not limited to) Age of Mythology: Retold textures.
Since I have been playing Age of Mythology: Retold (casual player), I thought it would be interesting to train an single image super resolution model on (and for) game textures of AoMR, but this model should be usable for other game textures aswell.
This is a 2x model, since the biggest texture images are already 4096x4096, I thought going 4x on those would be overkill (also there are already 4x game texture upscaling models, so this model can be used for similiar cases where 4x is not needed).
The pth, a static onnx conversion (since dysample used), and the training config files are provided in the Assets.
Model Showcase:
(Click on image for better view)
Process:
After extracting all the game's .bar files:
I ended up with 9067 textures files, which I tiled into 13'847 512x512px tiles, so the hr kinda looks like this (containing basecolor, normals, masks, etc):
I then created a corresponding lr folder by using gaussian blur,
quantization (floyd_steinberg, jarvis_judice_ninke, stucki, atkinson, burkes, sierra, two_row_sierra, sierra_lite) ,
compression (jpg),
downscaling (down_up, linear, cubic_mitchell, lanczos, gauss, box),
and then later on added bc1 compression (based on kim's suggestion, which improved the quality of this model) using nvcompress.
I trained a mosr model for 550k iterations, and based on iqa scoring, went with the 510k checkpoint as release candidate:
Test
PS just as a test, I tried this model out on the game itself.
I extracted all the Greek Texture files as tga files, upscaled there with chaiNNer with using the onnx conversion, which took 7 hours and 17 min to complete:
And the converted them back to .ddt files and replaced the in-game texture files using a mod folder (so like a mod basically, replaces the game files):
And then tested it out in the in-game Editor placing some buildings and units and a fixed camera:
But as can be seen, there were artifacts when replacing all the texture files, so I tried out only replacing the baseColor files (instead of Normals, Masks etc):
Which resolved the artifacts, but in comparison with the default textures, it doesnt look better:
So just to make sure something actually happened, I tested out just inverting the colors on the town center basecolor texture:
Which shows me that replacing the textures worked.
So i assume that in this case either it doesnt make a difference because the textures used in age of mythology retold are already big enough (like I wrote, some of these are 4096x4096px already) so that it doesnt make a difference on this zoom level / the details we see in this in-game screenshot, or then I did not do it correctly (like values would need to be adjusted in some json file, or something like that, i have never modded a game really).
4xNomos2_hq_drct-l
4xNomos2_hq_drct-l
Scale: 4
Architecture: DRCT
Architecture Option: drct_l
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Upscaler
Subject: Photography
Input Type: Images
Release Date: 08.09.2024
Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: DRCT-L_X4
Iterations: 200'000
Batch Size: 2
Patch Size: 64
Description:
An drct-l 4x upscaling model, similiar to the 4xNomos2_hq_atd, 4xNomos2_hq_dat2 and 4xNomos2_hq_mosr models, trained and for usage on non-degraded input to give good quality output.