You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link
The marker "TO DO" is consistently used in the .R and .c comments.
As of this writing (April 15, 2019), here's the full list from grep:
src/rbindlist.c: // TO DO: options(datatable.pedantic=TRUE) to issue this warning :
src/fastmean.c: if(ISNAN(REAL(x)[i])) continue; // TO DO: could drop this line and let NA propogate?
src/uniqlist.c: // TO DO: store previ twiddle call, but it'll need to be vector since this is in a loop through columns. Hopefully the first == will short circuit most often
src/uniqlist.c: // TO DO: store previ twiddle call, but it'll need to be vector since this is in a loop through columns. Hopefully the first == will short circuit most often
src/forder.c: // TO DO: at this point if the last count==n then it's all the same number and we can stop now.
src/forder.c:static void *xtmp=NULL; int xtmp_alloc=0; // TO DO: save xtmp if possible, see allocs in forder
src/forder.c:static void alloc_xtmp(int n) { // TO DO: currently always the largest type (double) but
src/forder.c: // TO DO: could include extra bits to divide the first radix up more. Often the MSD has groups in just 0-4 out of 256.
src/forder.c: alloc_otmp(maxgrpn); // TO DO: can we leave this to forder and remove these calls??
src/forder.c: alloc_xtmp(maxgrpn); // TO DO: doesn't need to be sizeof(double) always, see inside
src/forder.c: /* TO DO: If nextradix==-1 AND no further columns from forder AND !retGrp, we're
src/forder.c: // see comments in iradix for structure. This follows the same. TO DO: merge iradix in here (almost ready)
src/forder.c: if (radix_xsuballoc < maxgrpn) { // TO DO: centralize this alloc
src/forder.c: // TO DO: could include extra bits to divide the first radix up more. Often the MSD has groups in just 0-4 out of 256.
src/forder.c: alloc_otmp(maxgrpn); // TO DO: leave to forder and remove these calls?
src/forder.c: // TO DO: If nextradix==-1 and no further columns from forder, we're done. We have o. Remember to memset thiscounts before returning.
src/forder.c:// TO DO?: dcount. Find step size, then range = (max-min)/step and proceed as icount. Many fixed precision floats (such as prices)
src/forder.c: // OLD COMMENT: can return 0 here for the same string in known and unknown encodings, good if the unknown string is in that encoding but not if not ordering is ascii only (C locale). TO DO: revisit and allow user to change to strcoll, and take account of Encoding. see comments in bmerge(). 10k calls of strcmp = 0.37s, 10k calls of strcoll = 4.7s. See ?Comparison, ?Encoding, Scollate in R internals.
src/forder.c:// TO DO: check that all unknown encodings are ascii; i.e. no non-ascii unknowns are present, and that either Latin1
src/forder.c:// Fortunately, UTF sorts in the same order if treated as ASCII, so we can simplify by doing it by bytes. TO DO: confirm
src/forder.c: // TO DO?: chmatch to existing sorted vector, then grow it.
src/forder.c: // TO DO?: if (n<N_SMALL=200) insert sort, then loop through groups via ==
src/forder.c: // TO DO: if (n<50) cinsert (continuing from radix offset into CHAR) or using StrCmp. But 256 is narrow, so quick and not too much an issue.
src/forder.c: // TO DO: the same string in different encodings will be considered different here. Sweep through ustr and merge counts where equal (sort needed therefore, unfortunately?, only if there are any marked encodings present)
src/forder.c: if (n < N_SMALL && nalast != 0) { // TO DO: calibrate() N_SMALL=200
src/forder.c: if (range <= N_RANGE) // && range<n) // TO DO: calibrate(). radix was faster (9.2s "range<=10000" instead of 11.6s
src/forder.c: // sort ustr. TO DO: just sort new ones and merge them in.
src/forder.c: cradix_xtmp = (SEXP *)realloc( cradix_xtmp, ustr_n * sizeof(SEXP) ); // TO DO: Reuse the one we have in forder. Does it need to be n length?
src/forder.c:// If a vector is in decreasing order with ties, then an in-place reverse (no sort) would result in instability of ties (TO DO).
src/forder.c:// TO DO: test in big steps first to return faster if unsortedness is at the end (a common case of rbind'ing data to end)
src/forder.c: else return(0); // TO DO: improve to be stable for ties in reverse
src/forder.c: this = twiddle(x,i,order); // TO DO: once we get past -Inf, NA and NaN at the bottom, and +Inf at the top,
src/forder.c: // TO DO: add calibrate() to init.c
src/forder.c: int *o = INTEGER(ans); // TO DO: save allocation if NULL is returned (isSorted==TRUE)
src/forder.c: for (i=0; i<n; i++) o[i] = i+1; // TO DO: we don't need this if returning NULL? Save it? Unlikely huge gain, though.
src/forder.c: // ** TO DO *_: if isSorted, we can just point xsub into x directly. If (_f)() returns 0, though, will have to copy x at that point
src/forder.c: free(cradix_xtmp); cradix_xtmp=NULL; cradix_xtmp_alloc=0; // TO DO: use xtmp already got
src/wrappers.c: // TO DO: revisit. Enough to reproduce is: DT=data.table(a=1:3); DT[2]; DT[,b:=2]
src/dogroups.c: bynames = PROTECT(allocVector(STRSXP, ngrpcols)); protecti++; // TO DO: do we really need bynames, can we assign names afterwards in one step?
src/dogroups.c: // TO DO: check this check above.
src/dogroups.c: grpn = 1; // TO DO: it is anyway?
src/dogroups.c: // TO DO: this might over allocate if first group has 1 row and j is actually a single row aggregate
src/dogroups.c: // Shouldn't need SET_* to age objects here sice groups, TO DO revisit.
src/dogroups.c: // TO DO: could avoid this last 'if' by adding a dummy PROTECT after first alloc for this UNPROTECT(1) to do.
src/dogroups.c: // ... TO DO: set truelength to LENGTH(VECTOR_ELT(ans,0)), length to ansloc and enhance finalizer to handle over-allocated rows.
src/dogroups.c: PROTECT(newx = allocVector(TYPEOF(x), newlen)); // TO DO: R_realloc(?) here?
src/dogroups.c: // TO DO. Using SET_ to ensure objects are aged, rather than memcpy. Perhaps theres a bulk/fast way to age CHECK_OLD_TO_NEW
src/dogroups.c: // TO DO: Again, is there bulk op to avoid this loop, which still respects older generations
src/assign.c: PROTECT( // needed when --enable-strict-barrier it seems, iiuc. TO DO: test under that flag and remove if not needed.
src/assign.c: // TO DO: keepattr() would be faster, but can't because shallow isn't merely a shallow copy. It
src/assign.c: // TO DO: test realloc names if selfrefnamesok (users can setattr(x,"name") themselves for example.
src/assign.c: oldtncol = TRUELENGTH(dt); // TO DO: oldtncol can be just called tl now, as we won't realloc here any more.
src/assign.c: savetl_init(); // ** TO DO **: remove allocs that could fail between here and _end, or different way
src/assign.c: if (verbose) Rprintf("Coerced factor to character to match the column's type (coercion is inefficient)\n"); // TO DO: datatable.pedantic would turn this into warning
src/assign.c: // TO DO: if := in future could ever change a list item's contents by reference, would need to duplicate at that point
src/assign.c: } else if (slen<10) { // 10 is just a guess for when memcpy is faster. Certainly memcpy is slower when slen==1, but that's the most common case by far so not high priority to discover the optimum here. TO DO: revisit
src/fcast.c:// TO DO: if no 0 or NA detected up front in subsetDT() below, could switch to a faster subsetVectorRawNo0orNA()
src/fcast.c:// TO DO: margins
src/bmerge.c: // retLength[j] = 0; // TO DO: do this to save the branch below and later branches at R level to set .N to 0
src/bmerge.c: } else if (xval.i>ival.i) { // TO DO: is *(&xlow, &xupp)[0|1]=mid more efficient than branch?
src/bmerge.c: // TO DO?: not if mult=first|last and col<ncol-1
src/bmerge.c: while(tmplow<iupp-1) { // TO DO: could double up from lir rather than halving from iupp
src/bmerge.c: if (tmp == 0) { // TO DO: deal with mixed encodings and locale optionally
src/fread.c:/***** TO DO *****
src/fread.c:A few TO DO inline in the code, including some speed fine tuning e.g. specialize Ispace and any other lib calls.
src/fread.c:static char sep, eol, eol2; // sep2 TO DO
src/fread.c: lch++; // TO DO can remove lch<eof when last row is specialized in case of no final eol
src/fread.c: if (sizes[TypeSxp[oldtype]] == sizes[TypeSxp[newtype]] && newtype != SXP_STR) { // after && is quick fix. TO DO: revisit
R/fcast.R: idx = do.call("CJ", mapunique)[map, I := .I][["I"]] # TO DO: move this to C and avoid materialising the Cross Join.
R/bmerge.R: # TO DO: rename leftcols to icols, rightcols to xcols
R/bmerge.R: # TO DO: io and xo could be moved inside Cbmerge
R/bmerge.R: # TO DO: enforce via .internal.shallow attribute and expose shallow() to users
R/bmerge.R: lc = leftcols[a] # i # TO DO: rename left and right to i and x
R/bmerge.R: # TO DO: we need a way to avoid copying 'value' for internal purposes
R/bmerge.R: # TO DO: add warning if reallyreal about loss of precision
R/bmerge.R: # TO DO: could be allocated inside Cbmerge and returned as list from that
R/tables.R: if (mb) set(info,i,"MB",ceiling(as.numeric(object.size(DT))/1024^2)) # mb is an option because object.size() appears to be slow. TO DO: revisit
R/data.table.R: # TO DO: rewrite data.table(), one of the oldest functions here. Many people use data.table() to convert data.frame rather than
R/data.table.R: # TO DO Something strange with NAMED on components of .... To investigate. Or just port data.table() to C. This is why
R/data.table.R: xi = as.data.table(xi, keep.rownames=keep.rownames) # TO DO: allow a matrix to be a column of a data.table. This could allow a key'd lookup to a matrix, not just by a single rowname vector, but by a combination of several columns. A matrix column could be stored either by row or by column contiguous in memory.
R/data.table.R: # TO DO ... recycle in C, but not high priority as large data already regular from database or file
R/data.table.R: # TO DO: surely use set() here, or avoid the coercion
R/data.table.R: # test explicitly if the caller is [.data.table (even stronger test. TO DO.)
R/data.table.R: # TO DO (document/faq/example). Removed for now ... if ((roll || rolltolast) && missing(mult)) mult="last" # for when there is exact match to mult. This does not control cases where the roll is mult, that is always the last one.
R/data.table.R: # TO DO: change isub at C level s/.N/nrow(x); changing a symbol to a constant should be ok
R/data.table.R: if (is.call(isub) && isub[[1L]]=="eval") { # TO DO: or ..()
R/data.table.R: # TO DO: print method could print physical and secondary keys at end.
R/data.table.R: # TO DO: move down to if (is.data.table) clause below, later ...
R/data.table.R: else if (identical(class(i),"data.frame")) i = as.data.table(i) # TO DO: avoid these as.data.table() and use a flag instead
R/data.table.R: # To revert to <=v1.9.2 behaviour. TO DO: remove option after Sep 2015
R/data.table.R: if (length(xo) && length(irows)) irows = xo[irows] # TO DO: fsort here?
R/data.table.R: # TO DO: TODO: Incorporate which_ here on DT[!i] where i is logical. Should avoid i = !i (above) - inefficient.
R/data.table.R: # minor TO DO: can we merge this with check_idx in fcast.c/subset ?
R/data.table.R: # TO DO: uncomment these warnings in next release. Later, make both errors.
R/data.table.R: if (is.name(bysub) && !(as.character(bysub) %chin% names(x))) { # TO DO: names(x),names(i),and i. and x. prefixes
R/data.table.R: if (length(bysubl) && identical(bysubl[[1L]],quote(eval))) { # TO DO: or by=..()
R/data.table.R: # *** TO DO ***: try() this eval first (as long as not list() or .()) and see if it evaluates to column names
R/data.table.R: # TO DO: Add a test like X[i,sum(v),by=i.x2], or where by includes a join column (both where some i don't match).
R/data.table.R: # TO DO: Make xss directly, rather than recursive call.
R/data.table.R: if (!is.na(nomatch)) irows = irows[irows!=0L] # TO DO: can be removed now we have CisSortedSubset
R/data.table.R: if (length(allbyvars)) { ############### TO DO TO DO TO DO ###############
R/data.table.R: # TO DO: pass xss (x subset) through into dogroups. Still need irows there (for :=), but more condense
R/data.table.R: jsubl = as.list.default(jsub) # TO DO: names(jsub) and names(jsub)="" seem to work so make use of that
R/data.table.R: # TO DO: if call to a[1] for example, then call it 'a' too
R/data.table.R: ansvars = dupdiff(names(x),union(bynames,allbyvars)) # TO DO: allbyvars here for vars used by 'by'. Document.
R/data.table.R: if (!with) irows <- irows[!is.na(irows)] # fixes 2445. TO DO: added a message if(verbose) or warning?
R/data.table.R: # Adding new column(s). TO DO: move after the first eval in case the jsub has an error.
R/data.table.R: if (is.name(name) && ok && verbose) { # && NAMED(x)>0 (TO DO) # ok here includes -1 (loaded from disk)
R/data.table.R: # TO DO: Add option 'datatable.pedantic' to turn on warnings like this.
R/data.table.R: # TO DO ... comments moved up from C ...
R/data.table.R: } # TO DO: else if env$<- or list$<-
R/data.table.R: # hash=TRUE (the default) does seem better as expected using e.g. test 645. TO DO experiment with 'size' argument
R/data.table.R: # TO DO: port more of this to C
R/data.table.R: # the address==address is a temp fix for R >= 3.1.0. TO DO: allow shallow copy here, then copy only when user uses :=
R/data.table.R: if (!is.null(lhs)) { # *** TO DO ***: use set() here now that it can add new column(s) and remove newnames and alloc logic above
R/data.table.R: jval = data.table(jval) # TO DO: should this be setDT(list(jval)) instead?
R/data.table.R: # fix for bug fix typo in my name #5114 from GSee's - .data.table.locked=TRUE. # TO DO: more efficient way e.g. address==address (identical will do that but then proceed to deep compare if !=, wheras we want just to stop?)
R/data.table.R: if (haskey(x) && all(key(x) %chin% names(jval)) && suppressWarnings(is.sorted(jval, by=key(x)))) # TO DO: perhaps this usage of is.sorted should be allowed internally then (tidy up and make efficient)
R/data.table.R: SDenv$.GRP = as.integer(1) # oddly using 1L doesn't work reliably here! Possible R bug? TO DO: create reproducible example and report. To reproduce change to 1L and run test.data.table, test 780 fails. The assign seems ineffective and a previous value for .GRP from a previous test is retained, despite just creating a new SDenv.
R/data.table.R: ## 'av' correct here ?? *** TO DO ***
R/data.table.R: if (!bysameorder) { # TO DO: lower this into forder.c
R/data.table.R: if (!orderedirows && !length(o__)) o__ = 1:xnrow # temp fix. TO DO: revist orderedirows
R/data.table.R: # TO DO: combine uniqlist and uniquelengths into one call. Or, just set len__ to NULL when dogroups infers that.
R/data.table.R: # TO DO: allow secondary keys to be stored, then we see if our by matches one, if so use it, and no need to sort again. TO DO: document multiple keys.
R/data.table.R: # TODO, TO DO: raise the checks for 'jvnames' earlier (where jvnames is set by checking 'jsub') and set 'jvnames' already.
R/data.table.R: # TODO, TO DO: revisit complex cases (as illustrated below)
R/data.table.R: # TO DO, TODO: maybe a message/warning here so that we can catch the overlooked cases, if any?
R/data.table.R:# TO DO, add more warnings e.g. for by.data.table(), telling user what the data.table syntax is but letting them dispatch to data.frame if they want
R/data.table.R: # TO DO: inside Ccopy it could reset tl to 0 or length, but no matter as selfrefok detects it
R/data.table.R: # TO DO: revisit duplicate.c in R 3.0.3 and see where it's at
R/data.table.R: .Call(Cassign,x,i,j,NULL,value,FALSE) # verbose=FALSE for speed to avoid getOption() TO DO: somehow read getOption("datatable.verbose") from C level
R/data.table.R: # TO DO if table has 'ul' then match to that
R/data.table.R: # TO DO: deprecate and remove this. It's exported but doubt anyone uses it. Think the plan was to use it internally, but forderv superceded.
R/setkey.R: setattr(x,"index",NULL) # TO DO: reorder existing indexes likely faster than rebuilding again. Allow optionally. Simpler for now to clear.
R/setkey.R:# Maybe just a grep through *.R for use of these function internally would be better (TO DO).
R/setkey.R: # TO DO: export and document forder
R/setkey.R: return( if (length(o)) x[o] else x ) # TO DO: document the nice efficiency here
Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link
The marker "TO DO" is consistently used in the .R and .c comments.
As of this writing (April 15, 2019), here's the full list from
grep
:...
. To investigate. Or just port data.table() to C. This is whyThe text was updated successfully, but these errors were encountered: