- Sep 2024
-
ainowinstitute.org ainowinstitute.org
-
But this software dominance is also slowly being challenged. OpenAI developed Triton, an open-source software solution that it claims is more efficient than CUDA. Triton can only be used on Nvidia’s GPUs as of now.97 Meta developed PyTorch and then spun off the project as an open-source initiative housed under the Linux Foundation (still financially supported by Meta), and its new version performs relatively well on Nvidia’s A100.98 The benefit of PyTorch is that it can be used across a range of hardware, but on the flip side, it is not optimized for any particular chip
Ah… so THATs what purpose PyTirch serves. PyTorch is to CUDA what OCP is to proprietary hyper scale server design
-
Moreover, its proprietary CUDA compiling software is the most well known to AI developers, which further encourages the use of Nvidia hardware as other chips require either more extensive programming or more specialized knowledge.
It’s good that this is so explicitly called out as a bottleneck
-
- Apr 2021
-
statisticsplaybook.github.io statisticsplaybook.github.io
-
3.1 GPU 사용 가능 체크
여기서 FALSE가 발생하는 경우가 있습니다. 이 경우에 gpu tensor가 작동하지 않는데 GPU의 cuda version 호환에서 문제가 발생하는 것으로 알고 있습니다. 많은 곳에서 10.1, 10.2 version을 사용하기 때문에 저도 해당 버전을 깔아보았습니다. 하지만 cuda를 다시 깔고 라이브러리를 불러와도 여전히 FALSE가 뜨는 것을 볼수 있죠. 아래 코드를 입력해보시기 바랍니다. 1번째 코드는 당연히 본인의 쿠다 버전이 설치된 장소로 지정해주어야 합니다. 2번째 코드에서 에러코드가 발생할 수 있습니다만 패키지를 다시 인스톨 하시고 불러오시면 정상적으로 작동하는 것을 볼 수 있습니다.
Sys.setenv("CUDA_HOME" = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2") source("https://raw.githubusercontent.com/mlverse/torch/master/R/install.R")<br> install.packages("torch")
-
- Apr 2020
-
gist.github.com gist.github.com
-
NVIDIA's CUDA libraries
cuda has moved to
homebrew-drivers
[1] its name has alos changed to nvidia-cudaTo install:
brew tap homebrew/cask-drivers
brew cask install nvidia-cuda
https://i.imgur.com/rmnoe6d.png
[1] https://github.com/Homebrew/homebrew-cask/issues/38325#issuecomment-327605803
-
- Dec 2017
-
dicl.unist.ac.kr dicl.unist.ac.kr
-
Warp divergence
When threads inside a warp branches to different execution paths. Instead of all 32 threads in the warp do the same instruction, on average only half of the threads do the instruction when warp divergence occurs. This causes 50% performance loss.
-
make as many consecutive threads as possible do the same thing
an important take-home message for dealing with branch divergence.
-
Warps are run concurrently in an SM
This statement conflicts the statement that only one warp is executed at a time per SM.
-
Each SM has multiple processors but only one instruction unit
Q: Only one instruction unit is a SM, and a SM has many warps. Does this imply all these warps within the same SM execute the same set of instructions?
A: No. Each SM has a (cost-free) warp scheduler which prioritizes ready warps (along the time dimension). Take a look at the figure in page 6. of http://www.math.ncku.edu.tw/~mhchen/HPC/CUDA/GPGPU_Lecture5.pdf
-
- Nov 2017
-
www.cs.cmu.edu www.cs.cmu.edu
-
cudaMalloc
cudaMalloc API: first (returning) arg as the address of the pointer to the allocated device memory
-