Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #205] Move TO DO comments in the code into tracker (use grep) #678

Closed
arunsrinivasan opened this issue Jun 8, 2014 · 1 comment
Labels

Comments

@arunsrinivasan
Copy link
Member

arunsrinivasan commented Jun 8, 2014

Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link

The marker "TO DO" is consistently used in the .R and .c comments.


As of this writing (April 15, 2019), here's the full list from grep:

  • src/rbindlist.c: // TO DO: options(datatable.pedantic=TRUE) to issue this warning :
  • src/fastmean.c: if(ISNAN(REAL(x)[i])) continue; // TO DO: could drop this line and let NA propogate?
  • src/uniqlist.c: // TO DO: store previ twiddle call, but it'll need to be vector since this is in a loop through columns. Hopefully the first == will short circuit most often
  • src/uniqlist.c: // TO DO: store previ twiddle call, but it'll need to be vector since this is in a loop through columns. Hopefully the first == will short circuit most often
  • src/forder.c: // TO DO: at this point if the last count==n then it's all the same number and we can stop now.
  • src/forder.c:static void *xtmp=NULL; int xtmp_alloc=0; // TO DO: save xtmp if possible, see allocs in forder
  • src/forder.c:static void alloc_xtmp(int n) { // TO DO: currently always the largest type (double) but
  • src/forder.c: // TO DO: could include extra bits to divide the first radix up more. Often the MSD has groups in just 0-4 out of 256.
  • src/forder.c: alloc_otmp(maxgrpn); // TO DO: can we leave this to forder and remove these calls??
  • src/forder.c: alloc_xtmp(maxgrpn); // TO DO: doesn't need to be sizeof(double) always, see inside
  • src/forder.c: /* TO DO: If nextradix==-1 AND no further columns from forder AND !retGrp, we're
  • src/forder.c: // see comments in iradix for structure. This follows the same. TO DO: merge iradix in here (almost ready)
  • src/forder.c: if (radix_xsuballoc < maxgrpn) { // TO DO: centralize this alloc
  • src/forder.c: // TO DO: could include extra bits to divide the first radix up more. Often the MSD has groups in just 0-4 out of 256.
  • src/forder.c: alloc_otmp(maxgrpn); // TO DO: leave to forder and remove these calls?
  • src/forder.c: // TO DO: If nextradix==-1 and no further columns from forder, we're done. We have o. Remember to memset thiscounts before returning.
  • src/forder.c:// TO DO?: dcount. Find step size, then range = (max-min)/step and proceed as icount. Many fixed precision floats (such as prices)
  • src/forder.c: // OLD COMMENT: can return 0 here for the same string in known and unknown encodings, good if the unknown string is in that encoding but not if not ordering is ascii only (C locale). TO DO: revisit and allow user to change to strcoll, and take account of Encoding. see comments in bmerge(). 10k calls of strcmp = 0.37s, 10k calls of strcoll = 4.7s. See ?Comparison, ?Encoding, Scollate in R internals.
  • src/forder.c:// TO DO: check that all unknown encodings are ascii; i.e. no non-ascii unknowns are present, and that either Latin1
  • src/forder.c:// Fortunately, UTF sorts in the same order if treated as ASCII, so we can simplify by doing it by bytes. TO DO: confirm
  • src/forder.c: // TO DO?: chmatch to existing sorted vector, then grow it.
  • src/forder.c: // TO DO?: if (n<N_SMALL=200) insert sort, then loop through groups via ==
  • src/forder.c: // TO DO: if (n<50) cinsert (continuing from radix offset into CHAR) or using StrCmp. But 256 is narrow, so quick and not too much an issue.
  • src/forder.c: // TO DO: the same string in different encodings will be considered different here. Sweep through ustr and merge counts where equal (sort needed therefore, unfortunately?, only if there are any marked encodings present)
  • src/forder.c: if (n < N_SMALL && nalast != 0) { // TO DO: calibrate() N_SMALL=200
  • src/forder.c: if (range <= N_RANGE) // && range<n) // TO DO: calibrate(). radix was faster (9.2s "range<=10000" instead of 11.6s
  • src/forder.c: // sort ustr. TO DO: just sort new ones and merge them in.
  • src/forder.c: cradix_xtmp = (SEXP *)realloc( cradix_xtmp, ustr_n * sizeof(SEXP) ); // TO DO: Reuse the one we have in forder. Does it need to be n length?
  • src/forder.c:// If a vector is in decreasing order with ties, then an in-place reverse (no sort) would result in instability of ties (TO DO).
  • src/forder.c:// TO DO: test in big steps first to return faster if unsortedness is at the end (a common case of rbind'ing data to end)
  • src/forder.c: else return(0); // TO DO: improve to be stable for ties in reverse
  • src/forder.c: this = twiddle(x,i,order); // TO DO: once we get past -Inf, NA and NaN at the bottom, and +Inf at the top,
  • src/forder.c: // TO DO: add calibrate() to init.c
  • src/forder.c: int *o = INTEGER(ans); // TO DO: save allocation if NULL is returned (isSorted==TRUE)
  • src/forder.c: for (i=0; i<n; i++) o[i] = i+1; // TO DO: we don't need this if returning NULL? Save it? Unlikely huge gain, though.
  • src/forder.c: // ** TO DO *_: if isSorted, we can just point xsub into x directly. If (_f)() returns 0, though, will have to copy x at that point
  • src/forder.c: free(cradix_xtmp); cradix_xtmp=NULL; cradix_xtmp_alloc=0; // TO DO: use xtmp already got
  • src/wrappers.c: // TO DO: revisit. Enough to reproduce is: DT=data.table(a=1:3); DT[2]; DT[,b:=2]
  • src/dogroups.c: bynames = PROTECT(allocVector(STRSXP, ngrpcols)); protecti++; // TO DO: do we really need bynames, can we assign names afterwards in one step?
  • src/dogroups.c: // TO DO: check this check above.
  • src/dogroups.c: grpn = 1; // TO DO: it is anyway?
  • src/dogroups.c: // TO DO: this might over allocate if first group has 1 row and j is actually a single row aggregate
  • src/dogroups.c: // Shouldn't need SET_* to age objects here sice groups, TO DO revisit.
  • src/dogroups.c: // TO DO: could avoid this last 'if' by adding a dummy PROTECT after first alloc for this UNPROTECT(1) to do.
  • src/dogroups.c: // ... TO DO: set truelength to LENGTH(VECTOR_ELT(ans,0)), length to ansloc and enhance finalizer to handle over-allocated rows.
  • src/dogroups.c: PROTECT(newx = allocVector(TYPEOF(x), newlen)); // TO DO: R_realloc(?) here?
  • src/dogroups.c: // TO DO. Using SET_ to ensure objects are aged, rather than memcpy. Perhaps theres a bulk/fast way to age CHECK_OLD_TO_NEW
  • src/dogroups.c: // TO DO: Again, is there bulk op to avoid this loop, which still respects older generations
  • src/assign.c: PROTECT( // needed when --enable-strict-barrier it seems, iiuc. TO DO: test under that flag and remove if not needed.
  • src/assign.c: // TO DO: keepattr() would be faster, but can't because shallow isn't merely a shallow copy. It
  • src/assign.c: // TO DO: test realloc names if selfrefnamesok (users can setattr(x,"name") themselves for example.
  • src/assign.c: oldtncol = TRUELENGTH(dt); // TO DO: oldtncol can be just called tl now, as we won't realloc here any more.
  • src/assign.c: savetl_init(); // ** TO DO **: remove allocs that could fail between here and _end, or different way
  • src/assign.c: if (verbose) Rprintf("Coerced factor to character to match the column's type (coercion is inefficient)\n"); // TO DO: datatable.pedantic would turn this into warning
  • src/assign.c: // TO DO: if := in future could ever change a list item's contents by reference, would need to duplicate at that point
  • src/assign.c: } else if (slen<10) { // 10 is just a guess for when memcpy is faster. Certainly memcpy is slower when slen==1, but that's the most common case by far so not high priority to discover the optimum here. TO DO: revisit
  • src/fcast.c:// TO DO: if no 0 or NA detected up front in subsetDT() below, could switch to a faster subsetVectorRawNo0orNA()
  • src/fcast.c:// TO DO: margins
  • src/bmerge.c: // retLength[j] = 0; // TO DO: do this to save the branch below and later branches at R level to set .N to 0
  • src/bmerge.c: } else if (xval.i>ival.i) { // TO DO: is *(&xlow, &xupp)[0|1]=mid more efficient than branch?
  • src/bmerge.c: // TO DO?: not if mult=first|last and col<ncol-1
  • src/bmerge.c: while(tmplow<iupp-1) { // TO DO: could double up from lir rather than halving from iupp
  • src/bmerge.c: if (tmp == 0) { // TO DO: deal with mixed encodings and locale optionally
  • src/fread.c:/***** TO DO *****
  • src/fread.c:A few TO DO inline in the code, including some speed fine tuning e.g. specialize Ispace and any other lib calls.
  • src/fread.c:static char sep, eol, eol2; // sep2 TO DO
  • src/fread.c: lch++; // TO DO can remove lch<eof when last row is specialized in case of no final eol
  • src/fread.c: if (sizes[TypeSxp[oldtype]] == sizes[TypeSxp[newtype]] && newtype != SXP_STR) { // after && is quick fix. TO DO: revisit
  • src/fread.c: mmp = (const char *)mmap(NULL, filesize, PROT_READ, MAP_PRIVATE | MAP_POPULATE, fd, 0); // TO DO?: MAP_HUGETLB
  • src/fread.c: // TO DO: commented this out for the case of nrow=100, more testing to do.
  • src/fread.c: // if (endblanks==0) There is non white after the last eol. Ok and dealt with. TO DO: reference test id here in comment
  • src/fread.c: // TO DO: goes away if we allow over-allocated columns.
  • src/fread.c: // TO DO: avoid coercing several times and bump straight to the new type once, somehow.
  • Binary file .git/objects/pack/pack-330771b91ff0ead10d122379a91bc4eac5964ca2.pack matches
  • R/fcast.R: idx = do.call("CJ", mapunique)[map, I := .I][["I"]] # TO DO: move this to C and avoid materialising the Cross Join.
  • R/bmerge.R: # TO DO: rename leftcols to icols, rightcols to xcols
  • R/bmerge.R: # TO DO: io and xo could be moved inside Cbmerge
  • R/bmerge.R: # TO DO: enforce via .internal.shallow attribute and expose shallow() to users
  • R/bmerge.R: lc = leftcols[a] # i # TO DO: rename left and right to i and x
  • R/bmerge.R: # TO DO: we need a way to avoid copying 'value' for internal purposes
  • R/bmerge.R: # TO DO: add warning if reallyreal about loss of precision
  • R/bmerge.R: # TO DO: could be allocated inside Cbmerge and returned as list from that
  • R/tables.R: if (mb) set(info,i,"MB",ceiling(as.numeric(object.size(DT))/1024^2)) # mb is an option because object.size() appears to be slow. TO DO: revisit
  • R/merge.R: # TO DO: use set() here instead.. #5262
  • R/data.table.R: # TO DO: rewrite data.table(), one of the oldest functions here. Many people use data.table() to convert data.frame rather than
  • R/data.table.R: # TO DO Something strange with NAMED on components of .... To investigate. Or just port data.table() to C. This is why
  • R/data.table.R: xi = as.data.table(xi, keep.rownames=keep.rownames) # TO DO: allow a matrix to be a column of a data.table. This could allow a key'd lookup to a matrix, not just by a single rowname vector, but by a combination of several columns. A matrix column could be stored either by row or by column contiguous in memory.
  • R/data.table.R: # TO DO ... recycle in C, but not high priority as large data already regular from database or file
  • R/data.table.R: # TO DO: surely use set() here, or avoid the coercion
  • R/data.table.R: # test explicitly if the caller is [.data.table (even stronger test. TO DO.)
  • R/data.table.R: # TO DO (document/faq/example). Removed for now ... if ((roll || rolltolast) && missing(mult)) mult="last" # for when there is exact match to mult. This does not control cases where the roll is mult, that is always the last one.
  • R/data.table.R: # TO DO: change isub at C level s/.N/nrow(x); changing a symbol to a constant should be ok
  • R/data.table.R: if (is.call(isub) && isub[[1L]]=="eval") { # TO DO: or ..()
  • R/data.table.R: # TO DO: print method could print physical and secondary keys at end.
  • R/data.table.R: # TO DO: move down to if (is.data.table) clause below, later ...
  • R/data.table.R: else if (identical(class(i),"data.frame")) i = as.data.table(i) # TO DO: avoid these as.data.table() and use a flag instead
  • R/data.table.R: # To revert to <=v1.9.2 behaviour. TO DO: remove option after Sep 2015
  • R/data.table.R: if (length(xo) && length(irows)) irows = xo[irows] # TO DO: fsort here?
  • R/data.table.R: # TO DO: TODO: Incorporate which_ here on DT[!i] where i is logical. Should avoid i = !i (above) - inefficient.
  • R/data.table.R: # minor TO DO: can we merge this with check_idx in fcast.c/subset ?
  • R/data.table.R: # TO DO: uncomment these warnings in next release. Later, make both errors.
  • R/data.table.R: if (is.name(bysub) && !(as.character(bysub) %chin% names(x))) { # TO DO: names(x),names(i),and i. and x. prefixes
  • R/data.table.R: if (length(bysubl) && identical(bysubl[[1L]],quote(eval))) { # TO DO: or by=..()
  • R/data.table.R: # *** TO DO ***: try() this eval first (as long as not list() or .()) and see if it evaluates to column names
  • R/data.table.R: # TO DO: Add a test like X[i,sum(v),by=i.x2], or where by includes a join column (both where some i don't match).
  • R/data.table.R: # TO DO: Make xss directly, rather than recursive call.
  • R/data.table.R: if (!is.na(nomatch)) irows = irows[irows!=0L] # TO DO: can be removed now we have CisSortedSubset
  • R/data.table.R: if (length(allbyvars)) { ############### TO DO TO DO TO DO ###############
  • R/data.table.R: # TO DO: pass xss (x subset) through into dogroups. Still need irows there (for :=), but more condense
  • R/data.table.R: jsubl = as.list.default(jsub) # TO DO: names(jsub) and names(jsub)="" seem to work so make use of that
  • R/data.table.R: # TO DO: if call to a[1] for example, then call it 'a' too
  • R/data.table.R: ansvars = dupdiff(names(x),union(bynames,allbyvars)) # TO DO: allbyvars here for vars used by 'by'. Document.
  • R/data.table.R: if (!with) irows <- irows[!is.na(irows)] # fixes 2445. TO DO: added a message if(verbose) or warning?
  • R/data.table.R: # Adding new column(s). TO DO: move after the first eval in case the jsub has an error.
  • R/data.table.R: if (is.name(name) && ok && verbose) { # && NAMED(x)>0 (TO DO) # ok here includes -1 (loaded from disk)
  • R/data.table.R: # TO DO: Add option 'datatable.pedantic' to turn on warnings like this.
  • R/data.table.R: # TO DO ... comments moved up from C ...
  • R/data.table.R: } # TO DO: else if env$<- or list$<-
  • R/data.table.R: # hash=TRUE (the default) does seem better as expected using e.g. test 645. TO DO experiment with 'size' argument
  • R/data.table.R: # TO DO: port more of this to C
  • R/data.table.R: # the address==address is a temp fix for R >= 3.1.0. TO DO: allow shallow copy here, then copy only when user uses :=
  • R/data.table.R: if (!is.null(lhs)) { # *** TO DO ***: use set() here now that it can add new column(s) and remove newnames and alloc logic above
  • R/data.table.R: jval = data.table(jval) # TO DO: should this be setDT(list(jval)) instead?
  • R/data.table.R: # fix for bug fix typo in my name #5114 from GSee's - .data.table.locked=TRUE. # TO DO: more efficient way e.g. address==address (identical will do that but then proceed to deep compare if !=, wheras we want just to stop?)
  • R/data.table.R: # Commented as it's taken care of above, along with data.table new column := slower than base R (?) #921 fix. Kept here for the bug fix info and TO DO.
  • R/data.table.R: if (haskey(x) && all(key(x) %chin% names(jval)) && suppressWarnings(is.sorted(jval, by=key(x)))) # TO DO: perhaps this usage of is.sorted should be allowed internally then (tidy up and make efficient)
  • R/data.table.R: SDenv$.GRP = as.integer(1) # oddly using 1L doesn't work reliably here! Possible R bug? TO DO: create reproducible example and report. To reproduce change to 1L and run test.data.table, test 780 fails. The assign seems ineffective and a previous value for .GRP from a previous test is retained, despite just creating a new SDenv.
  • R/data.table.R: ## 'av' correct here ?? *** TO DO ***
  • R/data.table.R: if (!bysameorder) { # TO DO: lower this into forder.c
  • R/data.table.R: if (!orderedirows && !length(o__)) o__ = 1:xnrow # temp fix. TO DO: revist orderedirows
  • R/data.table.R: # TO DO: combine uniqlist and uniquelengths into one call. Or, just set len__ to NULL when dogroups infers that.
  • R/data.table.R: # TO DO: allow secondary keys to be stored, then we see if our by matches one, if so use it, and no need to sort again. TO DO: document multiple keys.
  • R/data.table.R: # TODO, TO DO: raise the checks for 'jvnames' earlier (where jvnames is set by checking 'jsub') and set 'jvnames' already.
  • R/data.table.R: # TODO, TO DO: revisit complex cases (as illustrated below)
  • R/data.table.R: # TO DO, TODO: maybe a message/warning here so that we can catch the overlooked cases, if any?
  • R/data.table.R: # fix for bug active voice > passive voice #2758. TO DO: provide a better error message
  • R/data.table.R: # TO DO: xrows would be a better name for irows: irows means the rows of x that i joins to
  • R/data.table.R: # TO DO: setkey could mark the key whether it is unique or not.
  • R/data.table.R: setkeyv(x,cnames) # TO DO: setkey before grouping to get memcpy benefit.
  • R/data.table.R: setnames(ans,seq_along(bynames),bynames) # TO DO: why doesn't groups have bynames in the first place?
  • R/data.table.R: setnames(ans,seq_along(bynames),bynames) # TO DO: reinvestigate bynames flowing from dogroups here and simplify
  • R/data.table.R: alloc.col(ans) # TO DO: overallocate in dogroups in the first place and remove this line
  • R/data.table.R:# TO DO. Reintroduce velow but dispatch straight to
  • R/data.table.R: # TO DO: warning("Please use DT[i,j:=value] syntax instead of DT[i,j]<-value, for efficiency. See ?':='")
  • R/data.table.R: } else x = shallow(x) ## Fix for [R-Forge #5660] Interaction between "Caret" package and data.table in R 3.1 gives different column name #476 and cast changes column name of origin input data.table #825. Needed for R v3.1.0+. TO DO: revisit
  • R/data.table.R:# TO DO, add more warnings e.g. for by.data.table(), telling user what the data.table syntax is but letting them dispatch to data.frame if they want
  • R/data.table.R: # TO DO: inside Ccopy it could reset tl to 0 or length, but no matter as selfrefok detects it
  • R/data.table.R: # TO DO: revisit duplicate.c in R 3.0.3 and see where it's at
  • R/data.table.R: .Call(Cassign,x,i,j,NULL,value,FALSE) # verbose=FALSE for speed to avoid getOption() TO DO: somehow read getOption("datatable.verbose") from C level
  • R/data.table.R: # TO DO if table has 'ul' then match to that
  • R/data.table.R: # TO DO: deprecate and remove this. It's exported but doubt anyone uses it. Think the plan was to use it internally, but forderv superceded.
  • R/setkey.R: setattr(x,"index",NULL) # TO DO: reorder existing indexes likely faster than rebuilding again. Allow optionally. Simpler for now to clear.
  • R/setkey.R:# Maybe just a grep through *.R for use of these function internally would be better (TO DO).
  • R/setkey.R: # TO DO: export and document forder
  • R/setkey.R: return( if (length(o)) x[o] else x ) # TO DO: document the nice efficiency here
  • R/setkey.R:# TO DO?: Use the CJ list() replication method for SJ (inside as.data.table.list?, Wrong behaviour DT[, i, with=FALSE]  #2109) too to avoid alloc.col
  • R/test.data.table.R: # TO DO: reinstate solution for C locale of CRAN's Mac (R-Forge's Mac is ok)
  • R/test.data.table.R: # TO DO: every line that could possibly fail should ideally be inside test()
  • R/test.data.table.R: # TO DO: test 166 doesn't pass with these :
  • R/uniqlist.R: # TO DO: Possibly reinstate reverse argument :
  • R/as.data.table.R: ans = copy(x) # TO DO: change this deep copy to be shallow.
  • R/duplicated.R: # TO DO: allow logical to be passed through to C level, and allow cols=NULL to mean all, for further speed gain.
  • R/utils.R: # TO DO: allow taking along any dimension e.g. all but last column, rather than all but last row.
  • tests/tests.R:# TO DO: check we test each verbose message at least once, instead of a full repeat of all tests
@MichaelChirico
Copy link
Member

This bug is very old & even the list above is likely outdated.

The way to proceed here is with lintr::todo_comment_linter(), so we can open a fresh issue for this once the lint CI is in place (#5908)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants