Database of all RcppExports.cpp files, to support RcppDeepState project.
Update compileAttributes-parse.R to count total usages of type across all Rcpp functions in CRAN packages, here are the top 100:
> arg.dt[, .(usages=.N), by=clean.type][order(-usages)][1:100] clean.type usages <char> <int> 1: int 10610 2: double 10460 3: Rcpp::NumericVector 9374 4: arma::mat 5317 5: arma::vec 4620 6: Rcpp::NumericMatrix 4062 7: bool 4058 8: SEXP 3826 9: Rcpp::List 2624 10: Rcpp::IntegerVector 2226 11: std::string 2109 12: unsigned int 1093 13: Rcpp::CharacterVector 971 14: Rcpp::IntegerMatrix 541 15: std::vector<double> 509 16: Eigen::VectorXd 491 17: Rcpp::DataFrame 439 18: arma::colvec 427 19: arma::uvec 416 20: Eigen::MatrixXd 415 21: long 400 22: std::vector<std::string> 394 23: Rcpp::RObject 282 24: Rcpp::String 280 25: std::size_t 276 26: std::vector<int> 258 27: arma::cube 257 28: Rcpp::Function 249 29: arma::uword 247 30: Rcpp::LogicalVector 227 31: Eigen::Map<Eigen::MatrixXd> 209 32: arma::ivec 202 33: float 165 34: Rcpp::S4 159 35: arma::sp_mat 151 36: Eigen::VectorXi 150 37: Rcpp::Nullable<Rcpp::NumericVector> 148 38: Rcpp::StringVector 141 39: unsigned 124 40: Rcpp::Environment 121 41: arma::rowvec 114 42: XPtrImage 99 43: Eigen::Map<Eigen::VectorXd> 96 44: Eigen::SparseMatrix<double> 83 45: arma::umat 77 46: char 65 47: Rcpp::DoubleVector 63 48: uint64 62 49: arma::field<arma::vec> 59 50: char* 57 51: uint 53 52: Rcpp::LogicalMatrix 51 53: Eigen::Matrix<double, Eigen::Dynamic, 1> 50 54: Rcpp::RawVector 49 55: std::vector<long> 49 56: char * 48 57: arma::cx_mat 43 58: arma::Mat<double> 41 59: Rcpp::XPtr<matrix4> 39 60: XPtrNode 38 61: arma::field<arma::mat> 37 62: Eigen::ArrayXd 36 63: std::vector<QuantLib::Date> 34 64: arma::imat 34 65: CV 33 66: Rcpp::CharacterMatrix 31 67: PyObjectRef 31 68: arma::ucube 27 69: arma::Col<unsigned> 26 70: std::vector<arma::mat> 25 71: arma::cx_cube 25 72: arma::field<arma::Cube<unsigned char>> 24 73: arma::Col<int> 23 74: uint32_t 23 75: QuantLib::Date 23 76: Rcpp::GenericVector 23 77: Rcpp::RawMatrix 22 78: std::vector<unsigned int> 21 79: Rcpp::Nullable<std::string> 21 80: XPtrMat 21 81: XPtrDoc 20 82: Rcpp::Nullable<Rcpp::IntegerVector> 19 83: Rcpp::Nullable<Rcpp::NumericMatrix> 18 84: DbResult* 18 85: unsigned long 17 86: std::vector<std::vector<std::string>> 17 87: Eigen::Map<Eigen::ArrayXd> 16 88: Rcpp::XPtr<DbConnectionPtr> 16 89: Rcpp::DateVector 16 90: RcppGSL::Matrix 16 91: std::vector<std::size_t> 16 92: std::vector<Rcpp::Environment> 16 93: ComplexVector 15 94: std::ostream* 15 95: std::vector<std::vector<int>> 14 96: std::vector<bool> 14 97: Rcpp::Nullable<Rcpp::CharacterVector> 14 98: std::vector<std::vector<double>> 13 99: arma::cx_vec 13 100: Symbol 13 clean.type usages
- checks-download.R downloads CRAN check pages from all packages under packages.
- checks-analyze.R parses “Additional issues” from the downloaded CRAN check pages. Will be useful when we compare RcppDeepState fuzz testing with standard CRAN tests, to answer the question, “how many more issues are we able to detect with fuzz testing, which we not already revealed using existing testing approaches?”
- sckott/cchecksapi#57 does list additional issues but does NOT yet implement search functionality so basically the same as asking CRAN directly (need to download N check pages where N is the number of packages).
> issue.dt[, .(pkgs=.N), by=type][order(pkgs)] type pkgs 1: clang-ASAN 1 2: gcc-ASAN 1 3: MKL 1 4: ATLAS 1 5: OpenBLAS 2 6: clang11 2 7: noLD 3 8: LTO 5 9: rchk 7 10: gcc10 9 11: gcc-UBSAN 12 12: valgrind 18 13: clang-UBSAN 26 > type.dt[, .(pkgs=.N), by=type][order(pkgs)] type pkgs 1: clang-ASAN 1 2: gcc-ASAN 1 3: gcc-UBSAN 12 4: valgrind 18 5: clang-UBSAN 26 > unique(type.dt$pkg) [1] "AGread" "bigmemory" "BuyseTest" [4] "cld2" "cld3" "compboost" [7] "dggridR" "DStree" "fastAdaboost" [10] "FLSSS" "FRegSigCom" "glamlasso" [13] "glmmsr" "GMKMcharlie" "GreedySBTM" [16] "iptools" "isotree" "kernelboot" [19] "later" "lda.svi" "milr" [22] "mined" "mixggm" "OneArmPhaseTwoStudy" [25] "pdftools" "PP" "PRIMME" [28] "protolite" "pts2polys" "r2sundials" [31] "RcppDE" "Rdimtools" "Rdtq" [34] "RMKL" "rTRNG" "sboost" [37] "Scalelink" "scPDSI" "scrypt" [40] "TDA" "tesseract" "TreeLS" [43] "volesti" >
Simplified pattern for matching a type in compileAttributes-parse.R, parseRcppExports function now self-contained.
- compileAttributes-untar.R untars the entire R package and then calls compileAttributes to generate a standard (easy to parse) RcppExports.cpp file.
- compileAttributes-parse.R does pretty much the same thing as packages-parse.R, but using the RcppExports.cpp file that we generated (instead of the file that was provided in the package source tar.gz file). The results below are similar to the previous results, but the numbers are a bit larger, which implies that we should run compileAttributes before parsing the RcppExports.cpp file.
> (some.types <- grep( + "SEXP|List", top10$clean.type, invert=TRUE, value=TRUE)) [1] "Rcpp::NumericVector" "Rcpp::NumericMatrix" "arma::mat" [4] "std::string" "Rcpp::CharacterVector" "int" [7] "Rcpp::IntegerVector" "double" > some.covered <- arg.counts[some.types, on="clean.type"][, .( + top10args=.N + ), by=.(pkg.dir, funName, args)][args==top10args][order(-args)] > some.covered[, .( + funs=.N, + pkgs=length(unique(pkg.dir)) + )] funs pkgs 1: 5952 1007 >
Also I checked to make sure that all funs/args are parsed using our regex, so we can be sure that the regex is sufficient (no need to improve any further).
> lines.dt[parsed<parameters] Empty data.table (0 rows and 3 cols): pkg.dir,parameters,parsed >
We are however unable to automatically fuzz the following 39 packages which use Rcpp, but do not use the export attribute, so there is no information about functions/args in the RcppExports.cpp file. Since this is a small minority of packages, it is acceptable to ignore these (we can require users of our software to use the Rcpp export attribute).
> lines.dt[parameters==0] pkg.dir parameters parsed 1: compileAttributes/ANN2 0 0 2: compileAttributes/ConConPiWiFun 0 0 3: compileAttributes/CoxPlus 0 0 4: compileAttributes/DPP 0 0 5: compileAttributes/DiffusionRgqd 0 0 6: compileAttributes/DiffusionRimp 0 0 7: compileAttributes/DiffusionRjgqd 0 0 8: compileAttributes/FiRE 0 0 9: compileAttributes/FisPro 0 0 10: compileAttributes/GiRaF 0 0 11: compileAttributes/MADPop 0 0 12: compileAttributes/NPBayesImputeCat 0 0 13: compileAttributes/NlinTS 0 0 14: compileAttributes/OncoBayes2 0 0 15: compileAttributes/OneArmPhaseTwoStudy 0 0 16: compileAttributes/RBesT 0 0 17: compileAttributes/RcppBDT 0 0 18: compileAttributes/RcppCNPy 0 0 19: compileAttributes/RcppDL 0 0 20: compileAttributes/RcppHNSW 0 0 21: compileAttributes/RcppXsimd 0 0 22: compileAttributes/RcppXts 0 0 23: compileAttributes/YPPE 0 0 24: compileAttributes/bmlm 0 0 25: compileAttributes/cblasr 0 0 26: compileAttributes/cbq 0 0 27: compileAttributes/cccp 0 0 28: compileAttributes/compboost 0 0 29: compileAttributes/dggridR 0 0 30: compileAttributes/hsstan 0 0 31: compileAttributes/incgraph 0 0 32: compileAttributes/lm.br 0 0 33: compileAttributes/lolog 0 0 34: compileAttributes/multinet 0 0 35: compileAttributes/qmix 0 0 36: compileAttributes/randomUniformForest 0 0 37: compileAttributes/rrcovHD 0 0 38: compileAttributes/s2net 0 0 39: compileAttributes/wingui 0 0 pkg.dir parameters parsed >
- packages-download.R downloads all CRAN packages which list Rcpp under LinkingTo.
- packages-untar.R extracts just the RcppExports.cpp file from each package tar.gz file. (these are copied to the packages directory in this github repo)
- input_parameter_parse.R was for experimenting with regex subroutines, but it only parses argument types (not functions) so it should no longer be used.
- packages-parse.R analyzes which types are used most frequently in R packages that use Rcpp:
The top 10 types are:
> (top10 <- arg.counts[args==1, .( + funs=.N, + pkgs=length(unique(pkg.dir)) + ), by=clean.type][order(-funs)][1:10]) clean.type funs pkgs 1: SEXP 380 72 2: Rcpp::NumericVector 330 154 3: Rcpp::NumericMatrix 236 128 4: arma::mat 208 102 5: Rcpp::List 172 71 6: std::string 159 76 7: Rcpp::CharacterVector 112 51 8: int 108 60 9: Rcpp::IntegerVector 88 37 10: double 79 44 >
If we implement RcppDeepState_*
random generation functions for each
of these ten types, then we will be able to automatically test this many
functions/packages:
> covered[, .( + funs=.N, + pkgs=length(unique(pkg.dir)) + )] funs pkgs 1: 7702 1132 >
If we only implement these 8 (easy) then we have this many:
> (some.types <- grep("SEXP|List", top10$clean.type, invert=TRUE, value=TRUE)) [1] "Rcpp::NumericVector" "Rcpp::NumericMatrix" "arma::mat" [4] "std::string" "Rcpp::CharacterVector" "int" [7] "Rcpp::IntegerVector" "double" > some.covered <- arg.counts[some.types, on="clean.type"][, .( + top10args=.N + ), by=.(pkg.dir, funName, args)][args==top10args][order(-args)] > some.covered[, .( + funs=.N, + pkgs=length(unique(pkg.dir)) + )] funs pkgs 1: 5838 995 >