Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vectors trivially relocatable. #337

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions config/ImathConfig.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -174,4 +174,7 @@
# endif
#endif

// Whether the copy and move constructors of Vec2, Vec3 and Vec4 are default or not.
#cmakedefine IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead we want

Suggested change
#cmakedefine IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
#define IMATH_VEC_USE_DEFAULT_CONSTRUCTOR 1
#endif

Because it's easy to imagine that a particular downstream app needs to make sure it's the other way, and this would allow it to be overridden if the downstream modules defines the symbol as 0 before including any imath headers.

Then the various places where you have

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR

should be

#if IMATH_VEC_USE_DEFAULT_CONSTRUCTOR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fly in the ointment, if we go both ways, are we inviting an recurring support burden along these lines?

I can't make my build work with MegaDCC",

oh right, did you match IMATH_VEC_USE_DEFAULT_CONSTRUCTOR to what they picked?

but that's in conflict to our internal build

yeah but did you namespace?

we don't want to have to have two copies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OPENEXR_DLL thing was sort of a harbinger, I'm worried about a matrix of macros people have to solve for compatibility

or maybe I'm misunderstanding completely, and it's all inlined and completely doesn't matter which you pick

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because it's inlined it's all ok? There shouldn't be any link issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, great. Then my only request is that we document why we have it both ways, even if it is a copy and paste of @AlexMWells thorough notes above. This is strongly in the realm of "gosh, this is rather esoteric, why is this a choice, and how do I make it?" I'd prefer the doc is inline in the CMakeLists.txt where you are faced with making a choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example, well with an older compiler, but just wanted to show the behavior.
https://godbolt.org/z/rEY4Ef5GM
With #define VEC_ENABLE_DEFAULT_CONSTRUCTOR 1
The optimization report shows:

 remark #15518: Structure assignment was serialized   [ <source>(145,14) ]
 remark #15518: Structure assignment was serialized   [ <source>(147,14) ]
 remark #15518: Structure assignment was serialized   [ <source>(149,14) ]
 remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
 remark #15450: unmasked unaligned unit stride loads: 11 
 remark #15451: unmasked unaligned unit stride stores: 4 
 remark #15475: --- begin vector cost summary ---
 remark #15476: scalar cost: 367 
 remark #15477: vector cost: 125.870 
 remark #15478: estimated potential speedup: 2.490 
 remark #15488: --- end vector cost summary ---
 ...
 <source>(145,14):remark #34000: call to memcpy implemented inline with loads and stores with proven source 

Structure assignment was serialized really kills performance, and we can see the compiler by "default" decided to generate a memcpy.

By providing the copy constructor and assignment operator explicitly we avoid the memcpy or memset (for initializations) and with #define VEC_ENABLE_DEFAULT_CONSTRUCTOR 0
The optimization report shows:

   remark #15309: vectorization support: normalized vectorization overhead 0.352
   remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
   remark #15450: unmasked unaligned unit stride loads: 11 
   remark #15451: unmasked unaligned unit stride stores: 4 
   remark #15475: --- begin vector cost summary ---
   remark #15476: scalar cost: 201 
   remark #15477: vector cost: 44.370 
   remark #15478: estimated potential speedup: 3.350 

Now this is all very dependent on compiler and order of optimizations, and perhaps isn't as required on newer compilers. But this is just a simple example, real world kernels could have much larger state being kept on the stack and any nested aggregate that happens to use a memcpy or memset by default could prevent the SROA pass from fully succeeding.

Thus it would be nice for projects like OSL to choose to disable the use of default constructors. And if the supported compiler matrix has moved on to where it is not a problem then it could chose to enable them. Problem is examining optimization reports before and after is required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that sounds like explicitly defining copy and assignment, and not using default, we get the behavior that we want, which is that things optimize as we wish. "we" includes OSL. I can't think of a reason to provide this as one option, and default constructors and copy another option.

I am rereading this argument which suggests maybe we need the option,

#337 (comment)

but I'm not understanding a case to preserve the default option. Sorry if I'm being dense....

Copy link
Contributor Author

@sukeya sukeya Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I made this PR was because I wanted to copy Vec3 sequences between host and device using thrust::async::copy. It requires Vec3 to be trivially relocatable, so default constructors and assignment operators for copy and move are needed for me.
I'm sorry for not telling it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I didn't mean to shut down your effort. Maybe I am misunderstanding? I thought that adding the explicit copy and move gives us the trivial relocatability you are searching for, and also satisfies the SROA concern that @AlexMWells raises? I thought we can solve both your need, and also the SIMD concern at the same time, without needing to have a special conditional. My intent wasn't to stop your effort, I was hoping we can find one solution that solves both problems.

Copy link
Contributor Author

@sukeya sukeya Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I apologize for my quick but rough decision.
I misunderstood I couldn't solve the assertion error, but I noticed that it was caused by ODR violation. Though I'm running CI in my forked repository, it will be OK.

I think that the trivial relocatability and the SROA are incompatible because the trivial relocatability requires Vec3 not to have user-provided copy constructor and assignment operator(even though Vec3::Vec3(const Vec3& v) : x(v.x), y(v.y), z(v.z) {}), but on the other hand, providing the copy constructor and assignment operator increases the performance as @AlexMWells showed earlier.
So, we have to switch default or not in some way.

Thank you :)


#endif // INCLUDED_IMATH_CONFIG_H
2 changes: 1 addition & 1 deletion src/Imath/ImathBoxAlgo.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ clip (const T& p, const Box<T>& box) IMATH_NOEXCEPT
///

template <class T>
IMATH_HOSTDEVICE constexpr inline T
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 inline T
closestPointInBox (const T& p, const Box<T>& box) IMATH_NOEXCEPT
{
return clip (p, box);
Expand Down
78 changes: 75 additions & 3 deletions src/Imath/ImathVec.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,15 +72,35 @@ template <class T> class IMATH_EXPORT_TEMPLATE_TYPE Vec2
IMATH_HOSTDEVICE constexpr Vec2 (T a, T b) IMATH_NOEXCEPT;

/// Copy constructor
#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
IMATH_HOSTDEVICE constexpr Vec2 (const Vec2& v) IMATH_NOEXCEPT = default;
sukeya marked this conversation as resolved.
Show resolved Hide resolved
#else
IMATH_HOSTDEVICE constexpr Vec2 (const Vec2& v) IMATH_NOEXCEPT;
#endif

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
/// Move constructor
IMATH_HOSTDEVICE constexpr Vec2 (Vec2&& v) IMATH_NOEXCEPT = default;
#endif

/// Construct from Vec2 of another base type
template <class S>
IMATH_HOSTDEVICE constexpr Vec2 (const Vec2<S>& v) IMATH_NOEXCEPT;

/// Assignment
/// Copy assignment
#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 Vec2&
operator= (const Vec2& v) IMATH_NOEXCEPT = default;
sukeya marked this conversation as resolved.
Show resolved Hide resolved
#else
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 const Vec2&
operator= (const Vec2& v) IMATH_NOEXCEPT;
#endif

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
/// Move assignment
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 Vec2&
operator= (Vec2&& v) IMATH_NOEXCEPT = default;
#endif

/// Destructor
~Vec2 () IMATH_NOEXCEPT = default;
sukeya marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -369,7 +389,16 @@ template <class T> class IMATH_EXPORT_TEMPLATE_TYPE Vec3
IMATH_HOSTDEVICE constexpr Vec3 (T a, T b, T c) IMATH_NOEXCEPT;

/// Copy constructor
#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
IMATH_HOSTDEVICE constexpr Vec3 (const Vec3& v) IMATH_NOEXCEPT = default;
#else
IMATH_HOSTDEVICE constexpr Vec3 (const Vec3& v) IMATH_NOEXCEPT;
#endif

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
/// Move constructor
IMATH_HOSTDEVICE constexpr Vec3 (Vec3&& v) IMATH_NOEXCEPT = default;
#endif

/// Construct from Vec3 of another base type
template <class S>
Expand All @@ -387,9 +416,20 @@ template <class T> class IMATH_EXPORT_TEMPLATE_TYPE Vec3
explicit IMATH_HOSTDEVICE IMATH_CONSTEXPR14
Vec3 (const Vec4<S>& v, InfException);

/// Assignment
/// Copy assignment
#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 Vec3&
operator= (const Vec3& v) IMATH_NOEXCEPT = default;
#else
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 const Vec3&
operator= (const Vec3& v) IMATH_NOEXCEPT;
#endif

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
/// Move assignment
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 Vec3&
operator= (Vec3&& v) IMATH_NOEXCEPT = default;
#endif

/// Destructor
~Vec3 () IMATH_NOEXCEPT = default;
Expand Down Expand Up @@ -690,7 +730,16 @@ template <class T> class IMATH_EXPORT_TEMPLATE_TYPE Vec4
IMATH_HOSTDEVICE constexpr Vec4 (T a, T b, T c, T d) IMATH_NOEXCEPT;

/// Copy constructor
#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
IMATH_HOSTDEVICE constexpr Vec4 (const Vec4& v) IMATH_NOEXCEPT = default;
#else
IMATH_HOSTDEVICE constexpr Vec4 (const Vec4& v) IMATH_NOEXCEPT;
#endif

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
/// Move constructor
IMATH_HOSTDEVICE constexpr Vec4 (Vec4&& v) IMATH_NOEXCEPT = default;
#endif

/// Construct from Vec4 of another base type
template <class S>
Expand All @@ -700,9 +749,20 @@ template <class T> class IMATH_EXPORT_TEMPLATE_TYPE Vec4
template <class S>
IMATH_HOSTDEVICE explicit constexpr Vec4 (const Vec3<S>& v) IMATH_NOEXCEPT;

/// Assignment
/// Copy assignment
#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 Vec4&
operator= (const Vec4& v) IMATH_NOEXCEPT = default;
#else
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 const Vec4&
operator= (const Vec4& v) IMATH_NOEXCEPT;
#endif

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
/// Move assignment
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 Vec4&
operator= (Vec4&& v) IMATH_NOEXCEPT = default;
#endif

/// Destructor
~Vec4 () IMATH_NOEXCEPT = default;
Expand Down Expand Up @@ -1212,11 +1272,13 @@ IMATH_HOSTDEVICE constexpr inline Vec2<T>::Vec2 (T a, T b) IMATH_NOEXCEPT
y (b)
{}

#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
template <class T>
IMATH_HOSTDEVICE constexpr inline Vec2<T>::Vec2 (const Vec2& v) IMATH_NOEXCEPT
: x (v.x),
y (v.y)
{}
#endif

template <class T>
template <class S>
Expand All @@ -1225,6 +1287,7 @@ IMATH_HOSTDEVICE constexpr inline Vec2<T>::Vec2 (const Vec2<S>& v)
y (T (v.y))
{}

#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
template <class T>
IMATH_CONSTEXPR14 IMATH_HOSTDEVICE inline const Vec2<T>&
Vec2<T>::operator= (const Vec2& v) IMATH_NOEXCEPT
Expand All @@ -1233,6 +1296,7 @@ Vec2<T>::operator= (const Vec2& v) IMATH_NOEXCEPT
y = v.y;
return *this;
}
#endif

template <class T>
template <class S>
Expand Down Expand Up @@ -1618,12 +1682,14 @@ IMATH_HOSTDEVICE constexpr inline Vec3<T>::Vec3 (T a, T b, T c) IMATH_NOEXCEPT
z (c)
{}

#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
template <class T>
IMATH_HOSTDEVICE constexpr inline Vec3<T>::Vec3 (const Vec3& v) IMATH_NOEXCEPT
: x (v.x),
y (v.y),
z (v.z)
{}
#endif

template <class T>
template <class S>
Expand All @@ -1633,6 +1699,7 @@ IMATH_HOSTDEVICE constexpr inline Vec3<T>::Vec3 (const Vec3<S>& v)
z (T (v.z))
{}

#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
template <class T>
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 inline const Vec3<T>&
Vec3<T>::operator= (const Vec3& v) IMATH_NOEXCEPT
Expand All @@ -1642,6 +1709,7 @@ Vec3<T>::operator= (const Vec3& v) IMATH_NOEXCEPT
z = v.z;
return *this;
}
#endif

template <class T>
template <class S>
Expand Down Expand Up @@ -2093,13 +2161,15 @@ IMATH_HOSTDEVICE constexpr inline Vec4<T>::Vec4 (T a, T b, T c, T d)
w (d)
{}

#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
template <class T>
IMATH_HOSTDEVICE constexpr inline Vec4<T>::Vec4 (const Vec4& v) IMATH_NOEXCEPT
: x (v.x),
y (v.y),
z (v.z),
w (v.w)
{}
#endif

template <class T>
template <class S>
Expand All @@ -2110,6 +2180,7 @@ IMATH_HOSTDEVICE constexpr inline Vec4<T>::Vec4 (const Vec4<S>& v)
w (T (v.w))
{}

#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
template <class T>
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 inline const Vec4<T>&
Vec4<T>::operator= (const Vec4& v) IMATH_NOEXCEPT
Expand All @@ -2120,6 +2191,7 @@ Vec4<T>::operator= (const Vec4& v) IMATH_NOEXCEPT
w = v.w;
return *this;
}
#endif

template <class T>
template <class S>
Expand Down