Skip to content

Commit

Permalink
Merge pull request #295 from kknox/2.12
Browse files Browse the repository at this point in the history
2.12
  • Loading branch information
Kent Knox authored Jan 18, 2017
2 parents d16f7b3 + 88afc1d commit 1f3de2a
Show file tree
Hide file tree
Showing 245 changed files with 3,102 additions and 2,459 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,5 @@
# vim temp files
.*.swp

src/build/

# python compiled files
*.pyc
16 changes: 9 additions & 7 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,19 +113,21 @@ install:
- if [ ${TRAVIS_OS_NAME} == "linux" ]; then
mkdir -p ${OPENCL_ROOT};
pushd ${OPENCL_ROOT};
wget ${OPENCL_REGISTRY}/specs/opencl-icd-1.2.11.0.tgz;
tar -xf opencl-icd-1.2.11.0.tgz;
mv ./icd/* .;
mkdir -p inc/CL;
travis_retry git clone --depth 1 https://github.com/KhronosGroup/OpenCL-ICD-Loader.git;
mv ./OpenCL-ICD-Loader/* .;
travis_retry git clone --depth 1 https://github.com/KhronosGroup/OpenCL-Headers.git inc/CL;
pushd inc/CL;
wget -r -w 1 -np -nd -nv -A h,hpp https://www.khronos.org/registry/cl/api/1.2/;
wget -w 1 -np -nd -nv -A h,hpp https://www.khronos.org/registry/cl/api/2.1/cl.hpp;
travis_retry wget -w 1 -np -nd -nv -A h,hpp ${OPENCL_REGISTRY}/api/2.1/cl.hpp;
popd;
mkdir -p lib;
pushd lib;
cmake -G "Unix Makefiles" ..;
make;
cp ../bin/libOpenCL.so .;
cp ./bin/libOpenCL.so .;
popd;
pushd inc/CL;
travis_retry git fetch origin opencl12:opencl12;
git checkout opencl12;
popd;
mv inc/ include/;
popd;
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ We want to ensure that the project code base maintains a level of quality over t
guidelines over time
* separate check-ins that modify a files style from the ones that add/change/delete code.
* target the **develop** branch in the repository
* ensure that the [code properly builds]( https://github.com/kknox/clBLAS/wiki/Build )
* ensure that the [code properly builds]( https://github.com/clMathLibraries/clBLAS/wiki/Build )
* cannot break existing test cases
* we encourage contributors to [run the test-short]( https://github.com/kknox/clBLAS/wiki/Testing ) suite of tests on their end before the pull-request
* we encourage contributors to [run the test-short]( https://github.com/clMathLibraries/clBLAS/wiki/Testing ) suite of tests on their end before the pull-request
* if possible, upload the test results associated with the pull request to a personal [gist repository]( https://gist.github.com/ ) and insert a link to the test results in the pull request so that collaborators can browse the results
* if no test results are provided with the pull request, official collaborators will run the test suite on their test machines against the patch before we will accept the pull-request
* if we detect failing test cases, we will request that the code associated with the pull request be fixed before the pull request will be merged
Expand Down
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This repository houses the code for the OpenCL™ BLAS portion of clMath.
The complete set of BLAS level 1, 2 & 3 routines is implemented. Please
see Netlib BLAS for the list of supported routines. In addition to GPU
devices, the library also supports running on CPU devices to facilitate
debugging and multicore programming. APPML 1.10 is the most current
debugging and multicore programming. APPML 1.12 is the most current
generally available pre-packaged binary version of the library available
for download for both Linux and Windows platforms.

Expand All @@ -23,13 +23,12 @@ library does generate and enqueue optimized OpenCL kernels, relieving
the user from the task of writing, optimizing and maintaining kernel
code themselves.

## clBLAS update notes 09/2015

- Introducing [AutoGemm](http://github.com/clMathLibraries/clBLAS/wiki/AutoGemm)
- clBLAS's Gemm implementation has been comprehensively overhauled to use AutoGemm. AutoGemm is a suite of python scripts which generate optimized kernels and kernel selection logic, for all precisions, transposes, tile sizes and so on.
- CMake is configured to use AutoGemm for clBLAS so the build and usage experience of Gemm remains unchanged (only performance and maintainability has been improved). Kernel sources are generated at build time (not runtime) and can be configured within CMake to be pre-compiled at build time.
- clBLAS users with unique Gemm requirements can customize AutoGemm to their needs (such as non-default tile sizes for very small or very skinny matrices); see [AutoGemm](http://github.com/clMathLibraries/clBLAS/wiki/AutoGemm) documentation for details.
## clBLAS update notes 01/2017

- v2.12 is a bugfix release as a rollup of all fixes in /develop branch
- Thanks to @pavanky, @iotamudelta, @shahsan10, @psyhtest, @haahh, @hughperkins, @tfauck
@abhiShandy, @IvanVergiliev, @zougloub, @mgates3 for contributions to clBLAS v2.12
- Summary of fixes available to read on the releases tab

## clBLAS library user documentation

Expand Down Expand Up @@ -197,8 +196,12 @@ The simple example below shows how to use clBLAS to compute an OpenCL accelerate
### Test infrastructure
* Googletest v1.6
* ACML on windows/linux; Accelerate on Mac OSX
* Latest Boost
* CPU BLAS
- Netlib CBLAS (recommended)
Ubuntu: install by "apt-get install libblas-dev"
Windows: download & install lapack-3.6.0 which comes with CBLAS
- or ACML on windows/linux; Accelerate on Mac OSX
### Performance infrastructure
* Python
Expand Down
17 changes: 10 additions & 7 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,26 +40,29 @@ install:
- ps: mkdir $env:OPENCL_ROOT
- ps: pushd $env:OPENCL_ROOT
- ps: $opencl_registry = $env:OPENCL_REGISTRY
# This downloads the source to the example/demo icd library
- ps: wget $opencl_registry/specs/opencl-icd-1.2.11.0.tgz -OutFile opencl-icd-1.2.11.0.tgz
- ps: 7z x opencl-icd-1.2.11.0.tgz
- ps: 7z x opencl-icd-1.2.11.0.tar
- ps: mv .\icd\* .
# This downloads the source to the Khronos ICD library
- git clone --depth 1 https://github.com/KhronosGroup/OpenCL-ICD-Loader.git
- ps: mv ./OpenCL-ICD-Loader/* .
# This downloads all the opencl header files
# The cmake build files expect a directory called inc
- ps: mkdir inc/CL
- ps: wget $opencl_registry/api/1.2/ | select -ExpandProperty links | where {$_.href -like "*.h*"} | select -ExpandProperty outerText | foreach{ wget $opencl_registry/api/1.2/$_ -OutFile inc/CL/$_ }
- git clone --depth 1 https://github.com/KhronosGroup/OpenCL-Headers.git inc/CL
- ps: wget $opencl_registry/api/2.1/cl.hpp -OutFile inc/CL/cl.hpp
# - ps: dir; if( $lastexitcode -eq 0 ){ dir include/CL } else { Write-Output boom }
# Create the static import lib in a directory called lib, so findopencl() will find it
- ps: mkdir lib
- ps: pushd lib
- cmake -G "NMake Makefiles" ..
- nmake
- ps: popd
# Switch to OpenCL 1.2 headers
- ps: pushd inc/CL
- git fetch origin opencl12:opencl12
- git checkout opencl12
- ps: popd
# Rename the inc directory to include, so FindOpencl() will find it
- ps: ren inc include
- ps: popd
- ps: popd

# before_build is used to run configure steps
before_build:
Expand Down
79 changes: 49 additions & 30 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# ########################################################################
# Copyright 2013 Advanced Micro Devices, Inc.
#
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Expand All @@ -18,7 +18,7 @@ cmake_minimum_required(VERSION 2.8)

#User toggle-able options that can be changed on the command line with -D
option( BUILD_RUNTIME "Build the BLAS runtime library" ON )
option( BUILD_TEST "Build the library testing suite (dependency on google test, Boost, and ACML)" ON )
option( BUILD_TEST "Build the library testing suite (dependency on google test, Boost, and ACML/NETLIB BLAS)" ON )
option( BUILD_PERFORMANCE "Copy the performance scripts that can measure and graph performance" OFF )
option( BUILD_SAMPLE "Build the sample programs" OFF )
option( BUILD_CLIENT "Build a command line clBLAS client program with a variety of configurable parameters (dependency on Boost)" OFF )
Expand All @@ -41,33 +41,33 @@ set( OPENCL_OFFLINE_BUILD_TAHITI_KERNEL OFF)
#use dynamic generated kernels
# MESSAGE(STATUS "Build dynamic Hawaii kernels.")
# MESSAGE(STATUS "Check OPENCL_OFFLINE_BUILD_HAWAII_KERNEL to build kernls at compile-time. This will eliminates clBuildProgram() overhead and better kernel performance with certain driver.")
add_definitions(-DCLBLAS_HAWAII_DYNAMIC_KERNEL)
add_definitions(-DCLBLAS_HAWAII_DYNAMIC_KERNEL)
#else()
# MESSAGE(STATUS "Build static Hawaii kernels.")
# MESSAGE(STATUS "Uncheck OPENCL_OFFLINE_BUILD_HAWAII_KERNEL to build kernls at run-time")
# MESSAGE(STATUS "Please ensure the presence of Hawaii device in the system. With certain driver/compiler flags, this might result in compile-time error.")
# MESSAGE(STATUS "Please ensure the presence of Hawaii device in the system. With certain driver/compiler flags, this might result in compile-time error.")
#endif( )

#if( NOT OPENCL_OFFLINE_BUILD_BONAIRE_KERNEL )
#use dynamic generated kernels
# MESSAGE(STATUS "Build dynamic Bonaire kernels.")
# MESSAGE(STATUS "Check OPENCL_OFFLINE_BUILD_BONAIRE_KERNEL to build kernls at compile-time. This will eliminates clBuildProgram() overhead and better kernel performance with certain driver.")
add_definitions(-DCLBLAS_BONAIRE_DYNAMIC_KERNEL)
add_definitions(-DCLBLAS_BONAIRE_DYNAMIC_KERNEL)
#else()
# MESSAGE(STATUS "Build static Bonaire kernels.")
# MESSAGE(STATUS "Uncheck OPENCL_OFFLINE_BUILD_BONAIRE_KERNEL to build kernls at run-time")
# MESSAGE(STATUS "Please ensure the presence of Bonaire device in the system. With certain driver/compiler flags, this might result in compile-time error.")
# MESSAGE(STATUS "Please ensure the presence of Bonaire device in the system. With certain driver/compiler flags, this might result in compile-time error.")
#endif( )

#if( NOT OPENCL_OFFLINE_BUILD_TAHITI_KERNEL )
#use dynamic generated kernels
# MESSAGE(STATUS "Build dynamic Tahiti kernels.")
# MESSAGE(STATUS "Check OPENCL_OFFLINE_BUILD_TAHITI_KERNEL to build kernls at compile-time. This will eliminates clBuildProgram() overhead and better kernel performance with certain driver.")
add_definitions(-DCLBLAS_TAHITI_DYNAMIC_KERNEL)
add_definitions(-DCLBLAS_TAHITI_DYNAMIC_KERNEL)
#else( )
# MESSAGE(STATUS "Build static Tahiti kernels.")
# MESSAGE(STATUS "Uncheck OPENCL_OFFLINE_BUILD_TAHITI_KERNEL to build kernls at run-time")
# MESSAGE(STATUS "Please ensure the presence of Tahiti device in the system. With certain driver/compiler flags, this might result in compile-time error.")
# MESSAGE(STATUS "Please ensure the presence of Tahiti device in the system. With certain driver/compiler flags, this might result in compile-time error.")
#endif( )


Expand Down Expand Up @@ -108,7 +108,7 @@ if( NOT DEFINED clBLAS_VERSION_MAJOR )
endif( )

if( NOT DEFINED clBLAS_VERSION_MINOR )
set( clBLAS_VERSION_MINOR 10 )
set( clBLAS_VERSION_MINOR 12 )
endif( )

if( NOT DEFINED clBLAS_VERSION_PATCH )
Expand All @@ -135,8 +135,8 @@ if(NOT CMAKE_BUILD_TYPE)
FORCE)
endif()

# These variables are meant to contain string which should be appended to the installation paths
# of library and executable binaries, respectively. They are meant to be user configurable/overridable.
# These variables are meant to contain string which should be appended to the installation paths
# of library and executable binaries, respectively. They are meant to be user configurable/overridable.
set( SUFFIX_LIB_DEFAULT "" )
set( SUFFIX_BIN_DEFAULT "" )

Expand Down Expand Up @@ -170,8 +170,9 @@ if( MSVC_IDE )
endif( )

# add the math library for Linux
if( UNIX )
if( UNIX )
set(MATH_LIBRARY "m")
set(THREAD_LIBRARY "pthread")
endif()

# set the path to specific OpenCL compiler
Expand Down Expand Up @@ -220,7 +221,7 @@ if( BUILD_TEST )
else()
message(WARNING "Cannot find acml.h")
endif()

if( UNIX )
find_library(ACML_LIBRARIES acml_mp
HINTS
Expand All @@ -238,7 +239,7 @@ if( BUILD_TEST )
)
mark_as_advanced(_acml_mv_library)
endif( )

if(WIN32)
find_library(ACML_LIBRARIES libacml_mp_dll
HINTS
Expand All @@ -248,7 +249,7 @@ if( BUILD_TEST )
$ENV{ACML_ROOT}/${ACML_SUBDIR}/lib
)
endif( )

if( NOT ACML_LIBRARIES )
message(WARNING "Cannot find libacml")
endif( )
Expand All @@ -265,15 +266,23 @@ if( BUILD_TEST )
endif( )
endif( )

if( BUILD_CLIENT )
if( NETLIB_FOUND )
else( )
message( WARNING "Not find Netlib; BUILD_CLIENT needs the Netlib CBLAS library" )
endif()
endif()


# This will define OPENCL_FOUND
find_package( OpenCL )
find_package( OpenCL ${OPENCL_VERSION} )

# Find Boost on the system, and configure the type of boost build we want
set( Boost_USE_MULTITHREADED ON )
set( Boost_USE_STATIC_LIBS ON )
set( Boost_DETAILED_FAILURE_MSG ON )
set( Boost_DEBUG ON )
set( Boost_ADDITIONAL_VERSIONS "1.44.0" "1.44" "1.47.0" "1.47" )
# set( Boost_DEBUG ON )
set( Boost_ADDITIONAL_VERSIONS "1.44.0" "1.44" "1.47.0" "1.47" "1.60.0" "1.60" )

find_package( Boost 1.33.0 COMPONENTS program_options )
message(STATUS "Boost_PROGRAM_OPTIONS_LIBRARY: ${Boost_PROGRAM_OPTIONS_LIBRARY}")
Expand All @@ -288,26 +297,36 @@ endif()

# Turn on maximum compiler verbosity
if(CMAKE_COMPILER_IS_GNUCXX)
add_definitions(-pedantic -Wall -Wextra
add_definitions(# -pedantic -Wall -Wextra
-D_POSIX_C_SOURCE=199309L -D_XOPEN_SOURCE=500
)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=c99 -Wstrict-prototypes" CACHE STRING
"Default CFLAGS" FORCE)
# Don't use -rpath.
set(CMAKE_SKIP_RPATH ON CACHE BOOL "Skip RPATH" FORCE)

set(CMAKE_C_FLAGS "-m${TARGET_PLATFORM} ${CMAKE_C_FLAGS}")
set(CMAKE_CXX_FLAGS "-m${TARGET_PLATFORM} ${CMAKE_CXX_FLAGS}")
set(CMAKE_Fortran_FLAGS "-m${TARGET_PLATFORM} ${CMAKE_Fortran_FLAGS}")
# Need to determine the target machine of the C compiler, because
# the '-m32' and '-m64' flags are supported on x86 but not on e.g. ARM.
exec_program( "${CMAKE_C_COMPILER} -dumpmachine"
OUTPUT_VARIABLE CMAKE_C_COMPILER_MACHINE )
message( STATUS "CMAKE_C_COMPILER_MACHINE: ${CMAKE_C_COMPILER_MACHINE}" )
# The "86" regular expression matches x86, x86_64, i686, etc.
if(${CMAKE_C_COMPILER_MACHINE} MATCHES "86")
set(CMAKE_C_FLAGS "-m${TARGET_PLATFORM} ${CMAKE_C_FLAGS}")
set(CMAKE_CXX_FLAGS "-m${TARGET_PLATFORM} ${CMAKE_CXX_FLAGS}")
set(CMAKE_Fortran_FLAGS "-m${TARGET_PLATFORM} ${CMAKE_Fortran_FLAGS}")
endif()

if(TARGET_PLATFORM EQUAL 32)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fno-builtin")
endif()
elseif(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")
elseif( MSVC )
# CMake sets huge stack frames for windows, for whatever reason. We go with compiler default.
string( REGEX REPLACE "/STACK:[0-9]+" "" CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}" )
string( REGEX REPLACE "/STACK:[0-9]+" "" CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS}" )
string( REGEX REPLACE "/STACK:[0-9]+" "" CMAKE_MODULE_LINKER_FLAGS "${CMAKE_MODULE_LINKER_FLAGS}" )
string( REGEX REPLACE "/STACK:[0-9]+" "" CMAKE_MODULE_LINKER_FLAGS "${CMAKE_MODULE_LINKER_FLAGS}" )
endif( )

if (WIN32)
Expand All @@ -320,13 +339,13 @@ add_definitions( -DCL_USE_DEPRECATED_OPENCL_1_1_APIS )
configure_file( "${PROJECT_SOURCE_DIR}/clBLAS.version.h.in" "${PROJECT_BINARY_DIR}/include/clBLAS.version.h" )

# configure a header file to pass the CMake version settings to the source, and package the header files in the output archive
install( FILES
"clBLAS.h"
install( FILES
"clBLAS.h"
"clAmdBlas.h"
"clAmdBlas.version.h"
"clBLAS-complex.h"
"${PROJECT_BINARY_DIR}/include/clBLAS.version.h"
DESTINATION
DESTINATION
"./include" )


Expand All @@ -351,7 +370,7 @@ if( BUILD_SAMPLE AND IS_DIRECTORY "${PROJECT_SOURCE_DIR}/samples" )
add_subdirectory( samples )
endif( )

# The build server is not supposed to build or package any of the tests; build server script will define this on the command line with
# The build server is not supposed to build or package any of the tests; build server script will define this on the command line with
# cmake -G "Visual Studio 10 Win64" -D BUILDSERVER:BOOL=ON ../..
if( BUILD_TEST )
if( IS_DIRECTORY "${PROJECT_SOURCE_DIR}/tests" )
Expand Down Expand Up @@ -386,7 +405,7 @@ install(FILES ${CMAKE_CURRENT_BINARY_DIR}/clBLASConfigVersion.cmake
DESTINATION ${destdir})


# The following code is setting variables to control the behavior of CPack to generate our
# The following code is setting variables to control the behavior of CPack to generate our
if( WIN32 )
set( CPACK_SOURCE_GENERATOR "ZIP" )
set( CPACK_GENERATOR "ZIP" )
Expand Down
19 changes: 19 additions & 0 deletions src/FindNetlib.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,25 @@ if( NOT contains_BLAS EQUAL -1 )
FIND_PACKAGE_HANDLE_STANDARD_ARGS( NETLIB DEFAULT_MSG Netlib_BLAS_LIBRARY )
endif( )


#look for netlib cblas header
if( UNIX )
find_path(Netlib_INCLUDE_DIRS cblas.h
HINTS
/usr/include
)
else()
find_path(Netlib_INCLUDE_DIRS cblas.h
HINTS
${Netlib_ROOT}/CBLAS/include/
)
endif()

if( Netlib_INCLUDE_DIRS )
else()
message(WARNING "Cannot find cblas.h")
endif()

if( NETLIB_FOUND )
list( APPEND Netlib_LIBRARIES ${Netlib_BLAS_LIBRARY} )
else( )
Expand Down
Loading

0 comments on commit 1f3de2a

Please sign in to comment.