- add all cuBLAS routines
- add all cuSOLVER routines
- add sparse matrix/vector
- add useful numpy functions, like arange and linspace
- add matrix views
- better logic for the number of threads per block and number of blocks, see e.g. https://stackoverflow.com/questions/9985912/how-do-i-choose-grid-and-block-dimensions-for-cuda-kernels
- allow mixing of matrix and vector in blas
- test complex number stuff