Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

highgui: wayland: fix to pass highgui test #25551

Merged
merged 6 commits into from May 13, 2024
Merged

Conversation

Kumataro
Copy link
Contributor

@Kumataro Kumataro commented May 7, 2024

Close #25550

  • optimize Mat to XRGB8888 conversion with OpenCV functions
    • extend to support CV_8S/16U/16S/32F/64F
    • extend to support 1/4 channels
  • fix to update value timing
  • initilize slider_ value if value is not nullptr.
  • Update user-ptr value and call on_change() function if cv_wl_trackbar::draw() is not called.
  • Update usage of WAYLAND/XDG macro to avoid reference undefined macro.
  • Update documents

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

- optimize Mat to XRGB8888 conversion with SIMD
  - extend to support CV_8S/16U/16S/32F/64F
  - extend to support 1/4 channels
- fix to update value timing
 - initilize slider_ value if value is not nullptr.
 - Update user-ptr value and call on_change() function if cv_wl_trackbar::draw() is not called.
- Update usage of WAYLAND/XDG macro to avoid reference undefined macro.
- Update documents
@Kumataro
Copy link
Contributor Author

Kumataro commented May 7, 2024

This patch contains performance tuning.

Result

I compare count of Instruction references.

Environment Ir of write_mat_to_xrgb8888()
OpenCV 4.9(Before) 5,911,406
Without SIMD 2,453,868
SSE3 + SIMD128 1.150.789
SSE4.1 + SIMD128 325,695
AVX2 + SIMD256 260,544

The differences between SSE3 and SSE4.1 comes from intristic implementation.

inline void v_store_interleave( uchar* ptr, const v_uint8x16& a, const v_uint8x16& b,
const v_uint8x16& c, hal::StoreMode mode = hal::STORE_UNALIGNED)
{
#if CV_SSE4_1
const __m128i sh_a = _mm_setr_epi8(0, 11, 6, 1, 12, 7, 2, 13, 8, 3, 14, 9, 4, 15, 10, 5);
const __m128i sh_b = _mm_setr_epi8(5, 0, 11, 6, 1, 12, 7, 2, 13, 8, 3, 14, 9, 4, 15, 10);
const __m128i sh_c = _mm_setr_epi8(10, 5, 0, 11, 6, 1, 12, 7, 2, 13, 8, 3, 14, 9, 4, 15);
__m128i a0 = _mm_shuffle_epi8(a.val, sh_a);
__m128i b0 = _mm_shuffle_epi8(b.val, sh_b);
__m128i c0 = _mm_shuffle_epi8(c.val, sh_c);
const __m128i m0 = _mm_setr_epi8(0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0);
const __m128i m1 = _mm_setr_epi8(0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0);
__m128i v0 = _mm_blendv_epi8(_mm_blendv_epi8(a0, b0, m1), c0, m0);
__m128i v1 = _mm_blendv_epi8(_mm_blendv_epi8(b0, c0, m1), a0, m0);
__m128i v2 = _mm_blendv_epi8(_mm_blendv_epi8(c0, a0, m1), b0, m0);
#elif CV_SSSE3
const __m128i m0 = _mm_setr_epi8(0, 6, 11, 1, 7, 12, 2, 8, 13, 3, 9, 14, 4, 10, 15, 5);
const __m128i m1 = _mm_setr_epi8(5, 11, 0, 6, 12, 1, 7, 13, 2, 8, 14, 3, 9, 15, 4, 10);
const __m128i m2 = _mm_setr_epi8(10, 0, 5, 11, 1, 6, 12, 2, 7, 13, 3, 8, 14, 4, 9, 15);
__m128i t0 = _mm_alignr_epi8(b.val, _mm_slli_si128(a.val, 10), 5);
t0 = _mm_alignr_epi8(c.val, t0, 5);
__m128i v0 = _mm_shuffle_epi8(t0, m0);
__m128i t1 = _mm_alignr_epi8(_mm_srli_si128(b.val, 5), _mm_slli_si128(a.val, 5), 6);
t1 = _mm_alignr_epi8(_mm_srli_si128(c.val, 5), t1, 5);
__m128i v1 = _mm_shuffle_epi8(t1, m1);
__m128i t2 = _mm_alignr_epi8(_mm_srli_si128(c.val, 10), b.val, 11);
t2 = _mm_alignr_epi8(t2, a.val, 11);
__m128i v2 = _mm_shuffle_epi8(t2, m2);

Test

Source code is here.

// g++ main.cpp -o a.out -I /usr/local/include/opencv4 -lopencv_core -lopencv_highgui -lopencv_imgcodecs
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <iostream>
#include <string>

int main(int argc, char *argv[])
{
  std::cout << "cv::currentUIFramework() returns " << cv::currentUIFramework() << std::endl;

  cv::Mat src;
  src = cv::imread("opencv-logo.png");

  cv::namedWindow("src");
  cv::imshow("src", src);
  (void)cv::waitKey(1000);

  return 0;
}

Command is here.

valgrind --tool=callgrind ./a.out
callgrind_annotate callgrind.out.[PID] | grep 8888 | head -1

@asmorkalov asmorkalov self-requested a review May 8, 2024 12:02
Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Tested manually with Kubuntu 24.04 and Wayland session.
Proposed todo: Add RISC-V RVV and other scalable vector intrinsics support. Need to use CV_SIMD_SCALABLE macro and run-time value step in loops.

@Kumataro
Copy link
Contributor Author

Kumataro commented May 8, 2024

Proposed todo: Add RISC-V RVV and other scalable vector intrinsics support. Need to use CV_SIMD_SCALABLE macro and run-time value step in loops.

Thank you for your proposal ! I'll try it this weekend.
I have no ARM SVE and RISC-V Vector Extension environment, but it looks like to able to test with AVX environment.

I think current implementation will be refactoring similar to split function.
https://github.com/opencv/opencv/blob/4.x/modules/core/src/split.simd.hpp

For example(this is only my imagination ).

template<typename T, typename VecT> static void
vecwrite_T_to_xrgb8888( const T* src, T* dst, int len, int scn )
{
    const int VECSZ = VTraits<VecT>::vlanes();
    const int dcn = 4; // XRGB
:
:
    else if( scn == 3 )
    {
        for( i = 0; i < len; i += VECSZ )
        {
            if( i > len - VECSZ )
            {
                i = len - VECSZ;
                mode = hal::STORE_UNALIGNED;
            }
            VecT b,g,r;
            v_load_deinterleave(src + i*scn, b, g, r);
            v_store_interleave (dst + i*dcn, b, g, r, r, mode);
            if( i < i0 )
            {
                i = i0 - VECSZ;
                mode = hal::STORE_ALIGNED_NOCACHE;
            }
        }
    }

@Kumataro
Copy link
Contributor Author

Kumataro commented May 10, 2024

I update code to support CV_SIMD_SCALABLE and tested with VMWare(AVX2) and Raspi4(NEON) with ubuntu24.04. opencv_test_highgui is passed and it called vector implementation.

This logic is simple. I add AVX512_SKX LASX and RVV because I expected it to be effective.

ocv_add_dispatched_file(write_mat_to_xrgb8888 SSE4_1 AVX2 AVX512_SKX NEON LASX RVV)

@Kumataro
Copy link
Contributor Author

Kumataro commented May 11, 2024

I see cvtColor implementation, and I have second idea to use cvtColor(cv::BGR2BGRA) or cvtColor(cv::GRAY2BGRA) instead of this SIMD implementation. I'l try it.

Wayland requests [B8:G8:R8:X8], not [B8:G8:R8:A8].
I thought it is hard to extend cvtColor() to support RGBX for this backend only .

But I notice X channel is not used, it means there are no problem even if it stores non-transparency alpha value.
So we can use COLOR_BGR2BGRA2 option for this purpose.

We can get many performance improvemet, which are provided from OpenCL, IPP, multithread, via cvtColor(). And furthermore, the maintainability of the code is also improved.

modules/highgui/test/test_gui.cpp Outdated Show resolved Hide resolved
modules/highgui/include/opencv2/highgui.hpp Outdated Show resolved Hide resolved
modules/highgui/src/window_wayland.cpp Outdated Show resolved Hide resolved
modules/highgui/src/window_wayland.cpp Outdated Show resolved Hide resolved
- In comment, use "Wayland" instead of "wayland".
- Remove redundant newline.
Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov self-assigned this May 13, 2024
@asmorkalov asmorkalov merged commit 8dc9ff5 into opencv:4.x May 13, 2024
27 of 28 checks passed
klatism pushed a commit to klatism/opencv that referenced this pull request May 17, 2024
highgui: wayland: fix to pass highgui test opencv#25551

Close opencv#25550

- optimize Mat to XRGB8888 conversion with OpenCV functions
  - extend to support CV_8S/16U/16S/32F/64F
  - extend to support 1/4 channels
- fix to update value timing
 - initilize slider_ value if value is not nullptr.
 - Update user-ptr value and call on_change() function if cv_wl_trackbar::draw() is not called.
- Update usage of WAYLAND/XDG macro to avoid reference undefined macro.
- Update documents

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

highgui: wayland: opencv_test_highgui fails
3 participants