Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nahim shader defines #643

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open

Conversation

devshgraphicsprogramming
Copy link
Member

Description

Testing

TODO list:

Comment on lines 484 to 486
#define NBL_GENERATE_GET_OR_DEFAULT(field, ty, default) \
template<typename S, bool = has_member_##field<S>::value> struct get_or_default_##field : integral_constant<ty,S::field> {}; \
template<typename S> struct get_or_default_##field<S,false> : integral_constant<ty,default> {};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this macro causes Boost.Waveto throw exceptions, find out why

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nipunG314 does it still throw exceptions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know when you finally get your project to compile

NBL_GENERATE_GET_OR_DEFAULT(nonCoherentAtomSize, uint16_t, 0);

NBL_GENERATE_GET_OR_DEFAULT(subgroupSize, uint16_t, 0);
NBL_GENERATE_GET_OR_DEFAULT(subgroupOpsShaderStages, core::bitflag<asset::IShader::E_SHADER_STAGE, 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously there's no bitflag on hlsl, so use a plain uint8_t

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small annoying note, not every bitflag is a uint8_t

maybe see if you can make a nbl::hlsl::bitflag thats similar to core::bitflag except without constexpr ?

Else if its not possible/usable, you could move the enums to .hlsl headers, e.g.

namespace nbl
{
namespace hlsl
{
enum ShaderStage : uint8_t
{
...
};
}
}

then add a using E_SHADER_STAGE = ShaderStage; alias in place of the original enum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I lied to you, there's no uint8_t in HLSL, only uint16_t

But there are more than 8 shader stages anyway....

NBL_GENERATE_GET_OR_DEFAULT(allowCommandBufferQueryCopies, bool, false);
NBL_GENERATE_GET_OR_DEFAULT(maxOptimallyResidentWorkgroupInvocations, uint32_t, 0);
NBL_GENERATE_GET_OR_DEFAULT(maxResidentInvocations, uint32_t, 0);
NBL_GENERATE_GET_OR_DEFAULT(spirvVersion, asset::CGLSLCompiler::E_SPIRV_VERSION, 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to make that enum global and then alias or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. Can't we just replace the enum with an integral type like the rest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I mean
#643 (comment)

@devshgraphicsprogramming
Copy link
Member Author

Need a matching modification to nbl/video/CJITIncludeLoader.cpp

Base automatically changed from vulkan_1_3 to master March 11, 2024 17:24
Use uint32_t instead of VkExtent2D
Use vector types for small arrays
Replace core::bitflag<> with uint8_t
# custom command
set(NBL_COMMAND
"${_Python3_EXECUTABLE}"
"${NBL_GEN_PY}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 9 writeHeader invocations in tools/deviceGen/gen.py and only 5 arguments for custom command - never hardcode locations like that in your python script, set them as CMake variables (like for those 5) and then let your python receive them!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


limits = loadJSON(args.limits_json_path)
writeHeader(
args.limits_output_path + "members.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your arguments should be simply paths to outputs and not let for some "mangling" thing with concatenation, here it should be just args.limits_output_path, do not hardcode stuff like + "members.h" please otherwise you leave no room for CMake to set locations of your outputs + you need to sync both CMake & your script with your hardcoded data.

make your argument like following

parser.add_argument("--limits-members-filepath", type=str, help="The output file path for SPhysicalDeviceLimits members header")

and then use

args.limits_members_filepath

for writeHeader invocation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

json=limits
)
writeHeader(
args.limits_output_path + "subset.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

)
features = loadJSON(args.features_json_path)
writeHeader(
args.features_output_path + "members.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

json=features
)
writeHeader(
args.features_output_path + "union.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

op="|"
)
writeHeader(
args.features_output_path + "intersect.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

op="&"
)
writeHeader(
args.traits_output_path + "testers.hlsl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

format_params=["name"]
)
writeHeader(
args.traits_output_path + "defaults.hlsl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

format_params=["name", "type", "value"]
)
writeHeader(
args.traits_output_path + "members.hlsl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

format_params=["type", "name", "name"]
)
writeHeader(
args.traits_output_path + "jit_members.hlsl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust with following example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -4,7 +4,7 @@
// stuff for C++
#ifndef __HLSL_VERSION
#include <stdint.h>

#include <type_traits>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you can't do that, will lead to circular reference, move your Vector concept to type_traits.hlsl

@@ -73,6 +73,20 @@ NBL_TYPEDEF_VECTORS(float16_t);
NBL_TYPEDEF_VECTORS(float32_t);
NBL_TYPEDEF_VECTORS(float64_t);

#ifndef __HLSL_VERSION
template<typename T>
concept is_vector_v =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concepts are usually named just Camelcase without underscore, i.e. Vector

is_vector_v would be a template variable, i.e.

template<typename T>
NBL_CONSTEXPR bool is_vector_v = ...;

Also there's a more sure way to make your trait work with partial specialization, which we ALREADY have!

struct is_vector : bool_constant<false> {};

So just add

template<typename T>
NBL_CONSTEXPR bool is_vector_v = is_vector<T>::value;
#ifndef __HLSL_VERSION
template<typename T>
concept Vector = is_vector_v<T>;
#endif

below is_vector

@@ -8,10 +8,12 @@

#include <string>

#include "nbl/builtin/hlsl/type_traits.hlsl"

#include "glm/gtx/string_cast.hpp"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I want this floating around in public headers, can you make the specialization that uses glm::to_string not be inline and have it defined in the .cpp file instead?

NBL_GENERATE_GET_OR_DEFAULT(maxFragmentDualSrcAttachments, uint32_t, 1);
NBL_GENERATE_GET_OR_DEFAULT(maxFragmentCombinedOutputResources, uint32_t, 16);
NBL_GENERATE_GET_OR_DEFAULT(maxComputeSharedMemorySize, uint32_t, 32768);
NBL_GENERATE_GET_OR_DEFAULT(maxComputeWorkGroupCount[3], uint32_t, {MinMaxWorkgroupCount,MinMaxWorkgroupCount,MinMaxWorkgroupCount});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need a different way to handle array/vector constants because NBL_GENERATE_GET_OR_DEFAULT is implemented via inheritance from an integral_constant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the integral_constant cannot be made with a non-primitive type such as uint32_t3 or uint32_t3

You need to break down vectors/arrays into multiple members with X, Y, Z suffices

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but obviously do no changes to the C++ structs

NBL_GENERATE_MEMBER_TESTER(fragmentShaderPixelInterlock);
NBL_GENERATE_MEMBER_TESTER(maxOptimallyResidentWorkgroupInvocations);
NBL_GENERATE_MEMBER_TESTER(maxResidentInvocations);
#include "nbl/video/device_capabilities_traits_testers.hlsl"

#define NBL_GENERATE_GET_OR_DEFAULT(field, ty, default) \
template<typename S, bool = has_member_##field<S>::value> struct get_or_default_##field : integral_constant<ty,S::field> {}; \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the bool = has_member_##field<S>::value to has_member_##field<S>::value==(e_member_presence::is_present|e_member_presence::is_static)

Also write a TODO comment that the has_member_ needs to be improved because the detectors should really report present, static AND constant

NBL_GENERATE_GET_OR_DEFAULT(maxComputeSharedMemorySize, uint32_t, 32768);
NBL_GENERATE_GET_OR_DEFAULT(maxComputeWorkGroupCount[3], uint32_t, {MinMaxWorkgroupCount,MinMaxWorkgroupCount,MinMaxWorkgroupCount});
NBL_GENERATE_GET_OR_DEFAULT(maxComputeWorkGroupInvocations, uint16_t, MinMaxWorkgroupInvocations);
NBL_GENERATE_GET_OR_DEFAULT(maxWorkgroupSize[3], uint16_t, {MinMaxWorkgroupInvocations,MinMaxWorkgroupInvocations,64u});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBL_GENERATE_GET_OR_DEFAULT(maxSamplerLodBias, float, 4);
NBL_GENERATE_GET_OR_DEFAULT(maxSamplerAnisotropyLog2, uint8_t, 4);
NBL_GENERATE_GET_OR_DEFAULT(maxViewports, uint8_t, 16);
NBL_GENERATE_GET_OR_DEFAULT(maxViewportDims[2], uint16_t, {MinMaxImageDimension2D,MinMaxImageDimension2D});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBL_GENERATE_GET_OR_DEFAULT(maxSamplerAnisotropyLog2, uint8_t, 4);
NBL_GENERATE_GET_OR_DEFAULT(maxViewports, uint8_t, 16);
NBL_GENERATE_GET_OR_DEFAULT(maxViewportDims[2], uint16_t, {MinMaxImageDimension2D,MinMaxImageDimension2D});
NBL_GENERATE_GET_OR_DEFAULT(viewportBoundsRange[2], float, { -MinMaxImageDimension2D*2u, MinMaxImageDimension2D*2u-1 });
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see https://github.com/Devsh-Graphics-Programming/Nabla/pull/643/files#r1646508426

BUT anything with Range in the name, needs Min and Max suffices instead of X and Y

NBL_GENERATE_GET_OR_DEFAULT(maxFramebufferWidth, uint32_t, MinMaxImageDimension2D);
NBL_GENERATE_GET_OR_DEFAULT(maxFramebufferHeight, uint32_t, MinMaxImageDimension2D);
NBL_GENERATE_GET_OR_DEFAULT(maxFramebufferLayers, uint32_t, 1024);
NBL_GENERATE_GET_OR_DEFAULT(maxColorAttachments, uint8_t, MinMaxColorAttachments);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in HLSL you don't have access to the constexprs used in C++ structs.

so substitute the values here for HLSL

Comment on lines +3 to +6
NBL_GENERATE_GET_OR_DEFAULT(maxImageDimension1D, uint32_t, MinMaxImageDimension2D);
NBL_GENERATE_GET_OR_DEFAULT(maxImageDimension2D, uint32_t, MinMaxImageDimension2D);
NBL_GENERATE_GET_OR_DEFAULT(maxImageDimension3D, uint32_t, 2048);
NBL_GENERATE_GET_OR_DEFAULT(maxImageDimensionCube, uint32_t, MinMaxImageDimension2D);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBL_GENERATE_GET_OR_DEFAULT(maxImageArrayLayers, uint32_t, 2048);
NBL_GENERATE_GET_OR_DEFAULT(maxBufferViewTexels, uint32_t, 33554432);
NBL_GENERATE_GET_OR_DEFAULT(maxUBOSize, uint32_t, 65536);
NBL_GENERATE_GET_OR_DEFAULT(maxSSBOSize, uint32_t, MinMaxSSBOSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +72 to +74
NBL_GENERATE_GET_OR_DEFAULT(subPixelInterpolationOffsetBits, uint8_t, MinSubPixelInterpolationOffsetBits);
NBL_GENERATE_GET_OR_DEFAULT(maxFramebufferWidth, uint32_t, MinMaxImageDimension2D);
NBL_GENERATE_GET_OR_DEFAULT(maxFramebufferHeight, uint32_t, MinMaxImageDimension2D);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +83 to +84
NBL_GENERATE_GET_OR_DEFAULT(pointSizeRange[2], float, {1.f,64.f});
NBL_GENERATE_GET_OR_DEFAULT(lineWidthRange[2], float, {1.f,1.f});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBL_GENERATE_GET_OR_DEFAULT(nonCoherentAtomSize, uint16_t, 256);
// VK 1.1
NBL_GENERATE_GET_OR_DEFAULT(subgroupSize, uint16_t, 4);
NBL_GENERATE_GET_OR_DEFAULT(subgroupOpsShaderStages, core::bitflag<asset::IShader::E_SHADER_STAGE>, asset::IShader::ESS_COMPUTE | asset::IShader::ESS_ALL_GRAPHICS);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have core::bitflag in HLSL and we won't be porting it to HLSL.

So whenever you encounter an enum use the T from core::bitflag<T> instead.

Btw you might need to bump DXC a version for the enum to integer casts, etc. to compile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't need to bump the DXC version, but the enum in HLSL cannot be enum class unfortunately due to this

Enum Class not constexpr: https://godbolt.org/z/hheGKo9vx

unscoped enum is constexpr: https://godbolt.org/z/7rorK5qoW

NBL_GENERATE_GET_OR_DEFAULT(shaderSubgroupArithmetic, bool, false);
NBL_GENERATE_GET_OR_DEFAULT(shaderSubgroupQuad, bool, false);
NBL_GENERATE_GET_OR_DEFAULT(shaderSubgroupQuadAllStages, bool, false);
NBL_GENERATE_GET_OR_DEFAULT(pointClippingBehavior, E_POINT_CLIPPING_BEHAVIOR, EPCB_USER_CLIP_PLANES_ONLY);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBL_GENERATE_GET_OR_DEFAULT(maxMultiviewViewCount, uint8_t, 6);
NBL_GENERATE_GET_OR_DEFAULT(maxMultiviewInstanceIndex, uint32_t, 134217727);
NBL_GENERATE_GET_OR_DEFAULT(maxPerSetDescriptors, uint32_t, 572);
NBL_GENERATE_GET_OR_DEFAULT(maxMemoryAllocationSize, size_t, MinMaxSSBOSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBL_GENERATE_GET_OR_DEFAULT(maxDescriptorSetUpdateAfterBindDynamicOffsetSSBOs, uint32_t, 4);
NBL_GENERATE_GET_OR_DEFAULT(maxDescriptorSetUpdateAfterBindImages, uint32_t, 500000);
NBL_GENERATE_GET_OR_DEFAULT(maxDescriptorSetUpdateAfterBindStorageImages, uint32_t, 500000);
NBL_GENERATE_GET_OR_DEFAULT(maxDescriptorSetUpdateAfterBindInputAttachments, uint32_t, MinMaxColorAttachments);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +141 to +142
NBL_GENERATE_GET_OR_DEFAULT(supportedDepthResolveModes, core::bitflag<RESOLVE_MODE_FLAGS>, RESOLVE_MODE_FLAGS::SAMPLE_ZERO_BIT);
NBL_GENERATE_GET_OR_DEFAULT(supportedStencilResolveModes, core::bitflag<RESOLVE_MODE_FLAGS>, RESOLVE_MODE_FLAGS::SAMPLE_ZERO_BIT);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember about using the enums directly, migrating them to HLSL header and having them unscoped

Comment on lines +181 to +183
NBL_GENERATE_GET_OR_DEFAULT(storageTexelBufferOffsetAlignmentBytes, size_t, std::numeric_limits<size_t>::max());
NBL_GENERATE_GET_OR_DEFAULT(uniformTexelBufferOffsetAlignmentBytes, size_t, std::numeric_limits<size_t>::max());
NBL_GENERATE_GET_OR_DEFAULT(maxBufferSize, size_t, MinMaxSSBOSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t isn;t a thing in HLSL https://godbolt.org/z/8nnfnToqs

use uint64_t instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants