================================================================================ CULA R16 (CUDA 5.0) Release Notes EM Photonics, Inc. ================================================================================ -------------------------------------------------------------------------------- Installation Instructions -------------------------------------------------------------------------------- For installation instructions, please consult the CULAProgrammersGuide.pdf file included in the 'doc/' folder of your CULA distribution. -------------------------------------------------------------------------------- System Requirements -------------------------------------------------------------------------------- CULA requires that your system be equipped with a NVIDIA CUDA-compatible device in order to run CULA-enabled programs. The NVIDIA drivers must be version 306.94 (or greater) for Windows systems and 304.54 for Linux systems. Mac OS X systems must have 5.0.36 or newer. If you wish to use the CULA "Device" interface, you should install the CUDA 5.0 toolkit. -------------------------------------------------------------------------------- Supported Operating Systems -------------------------------------------------------------------------------- All systems feature 32-bit and 64-bit support. * Windows XP / Vista / 7 * Ubuntu Linux 10.04 (and newer) * Red Hat Enterprise Linux 5.3 (and newer) * Fedora 11 * Mac OSX 10.6 Snow Leopard / 10.7 Lion -------------------------------------------------------------------------------- Revision History -------------------------------------------------------------------------------- CULA R16 (October 16, 2012) CULA R15 (August 14, 2012) CULA R14 (January 30, 2012) CULA R13 (November 2, 2011) CULA R12 (May 26, 2011) CULA R11 (March 31, 2011) CULA R10 (December 10, 2010) CULA 2.1 (August 31, 2010) CULA 2.0 (June 28, 2010) CULA 2.0 Preview (May 21, 2010) CULA 1.3a (April 19, 2010) CULA 1.3 (April 8, 2010) CULA 1.2 (February 17, 2010) CULA 1.1b (January 6, 2009) CULA 1.1a (December 21, 2009) CULA 1.1 (November 25, 2009) CULA 1.1 Beta (November 13, 2009) CULA 1.0 (September 30, 2009) CULA 1.0 Beta 3 (September 15, 2009) CULA 1.0 Beta 2 (August 27, 2009) CULA 1.0 Beta 1 (August 12, 2009) -------------------------------------------------------------------------------- Changelog -------------------------------------------------------------------------------- Release R16 CUDA 5.0 (October 16, 2012) --------------------------------------- All Versions * Feature: CUDA runtime upgraded to 5.0 * Feature: K20 support * Fixed: Incompatibility between Fortran module files and Cray Compiler * Fixed: Resource leak caused by culaShutdown CULA Dense * Feature: Implemented symmetric generalized Eigensolvers (sygv) * Alpha Feature: pgesv (multi-GPU LU-based solve) * Alpha Feature: pgetrs (multi-GPU LU backsolve) Release R15 CUDA 4.2 (August 14, 2012) --------------------------------------- Announcement * All packages are now "universal" and contain both 32-bit and 64-bit binaries * Multi-GPU routines (pCULA) remain in alpha All Versions * Feature: CUDA runtime upgraded to 4.2 * Feature: Kepler support * Feature: Link interface LAPACK compatibility version upgraded to 3.3.1 * Feature: New Fortran module files for CULA Core and LAPACK subsets, with Device interface * Feature: New PGI Fortran example using CUDA-Fortran semantics * Feature: New Fortran Device Interface example * Improved: Performance of geqrf improved by up to 10% * Improved: Fortran documentation in Programmer's Guide * Improved: Link interface compatible with Matlab 2012 * Fixed: Link interface properly functions on GEMM for sizes > 1k * Fixed: Resource overflow possibility when certain dimensions to gesv are very large * Changed: Fortran modules are now located in "include" CULA Dense * Improved: improved speed for multi-GPU routines * Improved: improved scalability for multi-GPU routines * Improved: reduced memory overhead Release R14 CUDA 4.1 (January 30, 2012) --------------------------------------- Announcement * Alpha Feature: Multi-GPU routines are included in the full CULA Dense version as a preview to be finalized in R15 * Alpha Feature: pgetrf (multi-GPU LU factorization) * Alpha Feature: ppotrf (multi-GPU Cholesky decomposition) * Alpha Feature: ppotrs (multi-GPU Cholesky backsolve) * Alpha Feature: pposv (multi-GPU symmetric/hermitian positive-definite factorize and solve) * Alpha Feature: pgemm (multi-GPU matrix-matrix multiply) * Alpha Feature: ptrsm (multi-GPU triangular solve) * Alpha Advisory: Performance, accuracy, routine list, and interface are all subject to change All Versions * Feature: CUDA runtime upgraded to 4.1 * Changed: Transitional headers have been removed * Fixed: Now shipping all dependencies required by OSX systems CULA Dense * Improved: Up to 3x performance improvement to trtri * Improved: 10% performance improvement to potrf Release R13 CUDA 4.0 (November 2, 2011) --------------------------------------- All Versions * Feature: Compatibility with CULA Sparse S1 * Improved: Significantly improved thread safety * Fixed: Host transpose function * Fixed: Workaround for a resource leak in the CUDA toolkit * Changed: Headers renamed to cula_lapack.h (etc); transitional header available CULA Dense * Feature: Implemented potri (inverse of symmetric positive definite matrix) * Feature: Implemented gesdd (singular value decomposition variant) * Feature: Implemented geqrfp (qr decomposition variant) Release R12 CUDA 4.0 (May 26, 2011) ----------------------------------- All Versions * Feature: CUDA runtime upgraded to 4.0 * Feature: New link-compatible interface for compatibility with existing programs * Improved: Now reserving less memory in multithreaded programs * Improved: More closely matching cuComplex type for better compatibility * Improved: Renamed all examples to clarity the purpose of each * Improved: Compatibility with future GPUs Premium * Fixed: gebrd accuracy for M M case Premium * Improved: GELQF performance increase 2-3x * Improved: GEHRD/GERQF/ORGLQ/ORGQR performance increased by 10-20% * Improved: GEHRD routine accurate for size N==1 Release 1.1 Beta (November 13, 2009) ------------------------------------ All Versions * Feature: Mac OS X 10.5 Leopard "preview" release - single precision only * Feature: New "Bridge" interface provides for easy and seamless porting of existing LAPACK/MKL/ACML applications (see doc/bridge_interface.txt) * Feature: New document describing full CULA API * Feature: New function culaSelectDevice to set executing device * Feature: New "gesv" example shows operation of all S/C/D/Z data types * Feature: New "multigpu" example showing multi-GPU CULA operation * Feature: New "bridge" example showing usage of the Bridge interface * Improved: SVD optimized for non-square cases * Improved: Documentation clarified on error conditions and codes * Improved: Stronger error reporting from example projects * Improved: culaInitialize detects and reports if driver/runtime version are inadequate * Improved: Documentation clearer on thread safety issues * Fixed: CULA can now handle extremely non-square matrices (eg 500000x16) * Fixed: An error in the "benchmark" example causing it to ignore user arguments * Fixed: Properly reporting cudaErrorMemoryValueTooLarge as culaInsufficientMemory Basic * Improved: GESV performance increased by up to 30% * Improved: Stability of GELS in certain cases * Improved: Stability of SVD in certain cases Premium * Feature: Implemented geev (general Eigensolver) in S/D/C/Z precisions * Feature: Implemented gehrd (general Hessenberg reduction) in S/D/C/Z precisions * Feature: Implemented orghr * Feature: .hpp headers have name overloads of ORG/UNG functions * Fixed: Host interface "ORG" functions different results from device interface Release 1.0 Final (September 30, 2009) -------------------------------------- Basic * Feature: All functions feature complex variants * Fixed: Crash related to getrs pivot array Premium * Feature: All functions implemented in all supported data types Release 1.0 Beta 3 (September 15, 2009) --------------------------------------- All Versions * Feature: New documentation section on specific routine conventions * Improved: Updated sysinfo script with more descriptive output * Improved: Added example that demonstrates the device interface * Fixed: Various corrections for small-matrix inputs, especially M=N=1 * Fixed: culaInitialize now sets environment variable KMP_DUPLICATE_LIB_OK Basic * Feature: Complex geqrf included * Feature: Added culaGetDeviceCount to report the number of available devices * Feature: Added culaGetDeviceInfo to report information about a device * Feature: Added culaGetExecutingDevice to report the executing device * Fixed: Further corrections for unitary output in gesvd for all job codes Premium * Feature: New functions culaDeviceMalloc/culaDeviceFree in culadevice.h * Fixed: Orglq and orgqr should behave more reliably Release 1.0 Beta 2 (August 27, 2009) ------------------------------------ All Versions * Feature: Including both 32- and 64-bit libraries on 64-bit Linux release * Feature: Now shipping precompiled Benchmark example on Linux builds * Feature: Troubleshooting section added to Programmer's Guide * Feature: Added scripts that report system information to `examples` folder * Improved: Error output for examples is now more descriptive * Improved: Documentation is more specific about configuring system runtime * Fixed: Incompatibilities with gcc 4.2 and earlier; gcc 4.1 is now compatible Basic * Improved: gesvd was optimized for up to a 60% speedup over Beta 1 * Fixed: Error in geqrf for matrices of M << N * Fixed: Error in gesvd where some matrices would yield non-unitary U and Vt Premium * Feature: Implemented getri * Feature: Implemented potrf * Feature: Implemented potrs * Feature: Implemented posv * Feature: Implemented trtrs * Improved: orglq was optimized for up to a 700% speedup Release 1.0 Beta 1 (August 13, 2009) ------------------------------------ All Versions * Feature: Support Windows XP 32/64 * Feature: Support Linux 32/64 Basic * Feature: Implemented gels * Feature: Implemented geqrf * Feature: Implemented gesv * Feature: Implemented gesvd * Feature: Implemented getrf * Feature: Implemented gglse Premium * Feature: Implemented gebrd * Feature: Implemented getrs * Feature: Implemented trtrs * Feature: Implemented gelqf * Feature: Implemented gerqf * Feature: Implemented orgqr * Feature: Implemented orglq * Feature: Implemented orgbr * Feature: Implemented ormqr * Feature: Implemented ormlq * Feature: Implemented ormrq * Feature: Implemented bdsqr -------------------------------------------------------------------------------- More Information -------------------------------------------------------------------------------- For more information on the CULAtools family of products, please visit our webpage at http://www.culatools.com To provide feedback, please visit http://www.culatools.com/forums and post in the appropriate forum topic.