[R-Forge #205] Move TO DO comments in the code into tracker (use grep) #678

arunsrinivasan · 2014-06-08T13:17:29Z

Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link

The marker "TO DO" is consistently used in the .R and .c comments.

As of this writing (April 15, 2019), here's the full list from grep:

src/rbindlist.c: // TO DO: options(datatable.pedantic=TRUE) to issue this warning :
src/fastmean.c: if(ISNAN(REAL(x)[i])) continue; // TO DO: could drop this line and let NA propogate?
src/uniqlist.c: // TO DO: store previ twiddle call, but it'll need to be vector since this is in a loop through columns. Hopefully the first == will short circuit most often
src/uniqlist.c: // TO DO: store previ twiddle call, but it'll need to be vector since this is in a loop through columns. Hopefully the first == will short circuit most often
src/forder.c: // TO DO: at this point if the last count==n then it's all the same number and we can stop now.
src/forder.c:static void *xtmp=NULL; int xtmp_alloc=0; // TO DO: save xtmp if possible, see allocs in forder
src/forder.c:static void alloc_xtmp(int n) { // TO DO: currently always the largest type (double) but
src/forder.c: // TO DO: could include extra bits to divide the first radix up more. Often the MSD has groups in just 0-4 out of 256.
src/forder.c: alloc_otmp(maxgrpn); // TO DO: can we leave this to forder and remove these calls??
src/forder.c: alloc_xtmp(maxgrpn); // TO DO: doesn't need to be sizeof(double) always, see inside
src/forder.c: /* TO DO: If nextradix==-1 AND no further columns from forder AND !retGrp, we're
src/forder.c: // see comments in iradix for structure. This follows the same. TO DO: merge iradix in here (almost ready)
src/forder.c: if (radix_xsuballoc < maxgrpn) { // TO DO: centralize this alloc
src/forder.c: // TO DO: could include extra bits to divide the first radix up more. Often the MSD has groups in just 0-4 out of 256.
src/forder.c: alloc_otmp(maxgrpn); // TO DO: leave to forder and remove these calls?
src/forder.c: // TO DO: If nextradix==-1 and no further columns from forder, we're done. We have o. Remember to memset thiscounts before returning.
src/forder.c:// TO DO?: dcount. Find step size, then range = (max-min)/step and proceed as icount. Many fixed precision floats (such as prices)
src/forder.c: // OLD COMMENT: can return 0 here for the same string in known and unknown encodings, good if the unknown string is in that encoding but not if not ordering is ascii only (C locale). TO DO: revisit and allow user to change to strcoll, and take account of Encoding. see comments in bmerge(). 10k calls of strcmp = 0.37s, 10k calls of strcoll = 4.7s. See ?Comparison, ?Encoding, Scollate in R internals.
src/forder.c:// TO DO: check that all unknown encodings are ascii; i.e. no non-ascii unknowns are present, and that either Latin1
src/forder.c:// Fortunately, UTF sorts in the same order if treated as ASCII, so we can simplify by doing it by bytes. TO DO: confirm
src/forder.c: // TO DO?: chmatch to existing sorted vector, then grow it.
src/forder.c: // TO DO?: if (n<N_SMALL=200) insert sort, then loop through groups via ==
src/forder.c: // TO DO: if (n<50) cinsert (continuing from radix offset into CHAR) or using StrCmp. But 256 is narrow, so quick and not too much an issue.
src/forder.c: // TO DO: the same string in different encodings will be considered different here. Sweep through ustr and merge counts where equal (sort needed therefore, unfortunately?, only if there are any marked encodings present)
src/forder.c: if (n < N_SMALL && nalast != 0) { // TO DO: calibrate() N_SMALL=200
src/forder.c: if (range <= N_RANGE) // && range<n) // TO DO: calibrate(). radix was faster (9.2s "range<=10000" instead of 11.6s
src/forder.c: // sort ustr. TO DO: just sort new ones and merge them in.
src/forder.c: cradix_xtmp = (SEXP *)realloc( cradix_xtmp, ustr_n * sizeof(SEXP) ); // TO DO: Reuse the one we have in forder. Does it need to be n length?
src/forder.c:// If a vector is in decreasing order with ties, then an in-place reverse (no sort) would result in instability of ties (TO DO).
src/forder.c:// TO DO: test in big steps first to return faster if unsortedness is at the end (a common case of rbind'ing data to end)
src/forder.c: else return(0); // TO DO: improve to be stable for ties in reverse
src/forder.c: this = twiddle(x,i,order); // TO DO: once we get past -Inf, NA and NaN at the bottom, and +Inf at the top,
src/forder.c: // TO DO: add calibrate() to init.c
src/forder.c: int *o = INTEGER(ans); // TO DO: save allocation if NULL is returned (isSorted==TRUE)
src/forder.c: for (i=0; i<n; i++) o[i] = i+1; // TO DO: we don't need this if returning NULL? Save it? Unlikely huge gain, though.
src/forder.c: // ** TO DO *_: if isSorted, we can just point xsub into x directly. If (_f)() returns 0, though, will have to copy x at that point
src/forder.c: free(cradix_xtmp); cradix_xtmp=NULL; cradix_xtmp_alloc=0; // TO DO: use xtmp already got
src/wrappers.c: // TO DO: revisit. Enough to reproduce is: DT=data.table(a=1:3); DT[2]; DT[,b:=2]
src/dogroups.c: bynames = PROTECT(allocVector(STRSXP, ngrpcols)); protecti++; // TO DO: do we really need bynames, can we assign names afterwards in one step?
src/dogroups.c: // TO DO: check this check above.
src/dogroups.c: grpn = 1; // TO DO: it is anyway?
src/dogroups.c: // TO DO: this might over allocate if first group has 1 row and j is actually a single row aggregate
src/dogroups.c: // Shouldn't need SET_* to age objects here sice groups, TO DO revisit.
src/dogroups.c: // TO DO: could avoid this last 'if' by adding a dummy PROTECT after first alloc for this UNPROTECT(1) to do.
src/dogroups.c: // ... TO DO: set truelength to LENGTH(VECTOR_ELT(ans,0)), length to ansloc and enhance finalizer to handle over-allocated rows.
src/dogroups.c: PROTECT(newx = allocVector(TYPEOF(x), newlen)); // TO DO: R_realloc(?) here?
src/dogroups.c: // TO DO. Using SET_ to ensure objects are aged, rather than memcpy. Perhaps theres a bulk/fast way to age CHECK_OLD_TO_NEW
src/dogroups.c: // TO DO: Again, is there bulk op to avoid this loop, which still respects older generations
src/assign.c: PROTECT( // needed when --enable-strict-barrier it seems, iiuc. TO DO: test under that flag and remove if not needed.
src/assign.c: // TO DO: keepattr() would be faster, but can't because shallow isn't merely a shallow copy. It
src/assign.c: // TO DO: test realloc names if selfrefnamesok (users can setattr(x,"name") themselves for example.
src/assign.c: oldtncol = TRUELENGTH(dt); // TO DO: oldtncol can be just called tl now, as we won't realloc here any more.
src/assign.c: savetl_init(); // ** TO DO **: remove allocs that could fail between here and _end, or different way
src/assign.c: if (verbose) Rprintf("Coerced factor to character to match the column's type (coercion is inefficient)\n"); // TO DO: datatable.pedantic would turn this into warning
src/assign.c: // TO DO: if := in future could ever change a list item's contents by reference, would need to duplicate at that point
src/assign.c: } else if (slen<10) { // 10 is just a guess for when memcpy is faster. Certainly memcpy is slower when slen==1, but that's the most common case by far so not high priority to discover the optimum here. TO DO: revisit
src/fcast.c:// TO DO: if no 0 or NA detected up front in subsetDT() below, could switch to a faster subsetVectorRawNo0orNA()
src/fcast.c:// TO DO: margins
src/bmerge.c: // retLength[j] = 0; // TO DO: do this to save the branch below and later branches at R level to set .N to 0
src/bmerge.c: } else if (xval.i>ival.i) { // TO DO: is *(&xlow, &xupp)[0|1]=mid more efficient than branch?
src/bmerge.c: // TO DO?: not if mult=first|last and col<ncol-1
src/bmerge.c: while(tmplow<iupp-1) { // TO DO: could double up from lir rather than halving from iupp
src/bmerge.c: if (tmp == 0) { // TO DO: deal with mixed encodings and locale optionally
src/fread.c:/***** TO DO *****
src/fread.c:A few TO DO inline in the code, including some speed fine tuning e.g. specialize Ispace and any other lib calls.
src/fread.c:static char sep, eol, eol2; // sep2 TO DO
src/fread.c: lch++; // TO DO can remove lch<eof when last row is specialized in case of no final eol
src/fread.c: if (sizes[TypeSxp[oldtype]] == sizes[TypeSxp[newtype]] && newtype != SXP_STR) { // after && is quick fix. TO DO: revisit
src/fread.c: mmp = (const char *)mmap(NULL, filesize, PROT_READ, MAP_PRIVATE | MAP_POPULATE, fd, 0); // TO DO?: MAP_HUGETLB
src/fread.c: // TO DO: commented this out for the case of nrow=100, more testing to do.
src/fread.c: // if (endblanks==0) There is non white after the last eol. Ok and dealt with. TO DO: reference test id here in comment
src/fread.c: // TO DO: goes away if we allow over-allocated columns.
src/fread.c: // TO DO: avoid coercing several times and bump straight to the new type once, somehow.
Binary file .git/objects/pack/pack-330771b91ff0ead10d122379a91bc4eac5964ca2.pack matches
R/fcast.R: idx = do.call("CJ", mapunique)[map, I := .I][["I"]] # TO DO: move this to C and avoid materialising the Cross Join.
R/bmerge.R: # TO DO: rename leftcols to icols, rightcols to xcols
R/bmerge.R: # TO DO: io and xo could be moved inside Cbmerge
R/bmerge.R: # TO DO: enforce via .internal.shallow attribute and expose shallow() to users
R/bmerge.R: lc = leftcols[a] # i # TO DO: rename left and right to i and x
R/bmerge.R: # TO DO: we need a way to avoid copying 'value' for internal purposes
R/bmerge.R: # TO DO: add warning if reallyreal about loss of precision
R/bmerge.R: # TO DO: could be allocated inside Cbmerge and returned as list from that
R/tables.R: if (mb) set(info,i,"MB",ceiling(as.numeric(object.size(DT))/1024^2)) # mb is an option because object.size() appears to be slow. TO DO: revisit
R/merge.R: # TO DO: use set() here instead.. #5262
R/data.table.R: # TO DO: rewrite data.table(), one of the oldest functions here. Many people use data.table() to convert data.frame rather than
R/data.table.R: # TO DO Something strange with NAMED on components of .... To investigate. Or just port data.table() to C. This is why
R/data.table.R: xi = as.data.table(xi, keep.rownames=keep.rownames) # TO DO: allow a matrix to be a column of a data.table. This could allow a key'd lookup to a matrix, not just by a single rowname vector, but by a combination of several columns. A matrix column could be stored either by row or by column contiguous in memory.
R/data.table.R: # TO DO ... recycle in C, but not high priority as large data already regular from database or file
R/data.table.R: # TO DO: surely use set() here, or avoid the coercion
R/data.table.R: # test explicitly if the caller is [.data.table (even stronger test. TO DO.)
R/data.table.R: # TO DO (document/faq/example). Removed for now ... if ((roll || rolltolast) && missing(mult)) mult="last" # for when there is exact match to mult. This does not control cases where the roll is mult, that is always the last one.
R/data.table.R: # TO DO: change isub at C level s/.N/nrow(x); changing a symbol to a constant should be ok
R/data.table.R: if (is.call(isub) && isub[[1L]]=="eval") { # TO DO: or ..()
R/data.table.R: # TO DO: print method could print physical and secondary keys at end.
R/data.table.R: # TO DO: move down to if (is.data.table) clause below, later ...
R/data.table.R: else if (identical(class(i),"data.frame")) i = as.data.table(i) # TO DO: avoid these as.data.table() and use a flag instead
R/data.table.R: # To revert to <=v1.9.2 behaviour. TO DO: remove option after Sep 2015
R/data.table.R: if (length(xo) && length(irows)) irows = xo[irows] # TO DO: fsort here?
R/data.table.R: # TO DO: TODO: Incorporate which_ here on DT[!i] where i is logical. Should avoid i = !i (above) - inefficient.
R/data.table.R: # minor TO DO: can we merge this with check_idx in fcast.c/subset ?
R/data.table.R: # TO DO: uncomment these warnings in next release. Later, make both errors.
R/data.table.R: if (is.name(bysub) && !(as.character(bysub) %chin% names(x))) { # TO DO: names(x),names(i),and i. and x. prefixes
R/data.table.R: if (length(bysubl) && identical(bysubl[[1L]],quote(eval))) { # TO DO: or by=..()
R/data.table.R: # *** TO DO ***: try() this eval first (as long as not list() or .()) and see if it evaluates to column names
R/data.table.R: # TO DO: Add a test like X[i,sum(v),by=i.x2], or where by includes a join column (both where some i don't match).
R/data.table.R: # TO DO: Make xss directly, rather than recursive call.
R/data.table.R: if (!is.na(nomatch)) irows = irows[irows!=0L] # TO DO: can be removed now we have CisSortedSubset
R/data.table.R: if (length(allbyvars)) { ############### TO DO TO DO TO DO ###############
R/data.table.R: # TO DO: pass xss (x subset) through into dogroups. Still need irows there (for :=), but more condense
R/data.table.R: jsubl = as.list.default(jsub) # TO DO: names(jsub) and names(jsub)="" seem to work so make use of that
R/data.table.R: # TO DO: if call to a[1] for example, then call it 'a' too
R/data.table.R: ansvars = dupdiff(names(x),union(bynames,allbyvars)) # TO DO: allbyvars here for vars used by 'by'. Document.
R/data.table.R: if (!with) irows <- irows[!is.na(irows)] # fixes 2445. TO DO: added a message if(verbose) or warning?
R/data.table.R: # Adding new column(s). TO DO: move after the first eval in case the jsub has an error.
R/data.table.R: if (is.name(name) && ok && verbose) { # && NAMED(x)>0 (TO DO) # ok here includes -1 (loaded from disk)
R/data.table.R: # TO DO: Add option 'datatable.pedantic' to turn on warnings like this.
R/data.table.R: # TO DO ... comments moved up from C ...
R/data.table.R: } # TO DO: else if env$<- or list$<-
R/data.table.R: # hash=TRUE (the default) does seem better as expected using e.g. test 645. TO DO experiment with 'size' argument
R/data.table.R: # TO DO: port more of this to C
R/data.table.R: # the address==address is a temp fix for R >= 3.1.0. TO DO: allow shallow copy here, then copy only when user uses :=
R/data.table.R: if (!is.null(lhs)) { # *** TO DO ***: use set() here now that it can add new column(s) and remove newnames and alloc logic above
R/data.table.R: jval = data.table(jval) # TO DO: should this be setDT(list(jval)) instead?
R/data.table.R: # fix for bug fix typo in my name #5114 from GSee's - .data.table.locked=TRUE. # TO DO: more efficient way e.g. address==address (identical will do that but then proceed to deep compare if !=, wheras we want just to stop?)
R/data.table.R: # Commented as it's taken care of above, along with data.table new column := slower than base R (?) #921 fix. Kept here for the bug fix info and TO DO.
R/data.table.R: if (haskey(x) && all(key(x) %chin% names(jval)) && suppressWarnings(is.sorted(jval, by=key(x)))) # TO DO: perhaps this usage of is.sorted should be allowed internally then (tidy up and make efficient)
R/data.table.R: SDenv$.GRP = as.integer(1) # oddly using 1L doesn't work reliably here! Possible R bug? TO DO: create reproducible example and report. To reproduce change to 1L and run test.data.table, test 780 fails. The assign seems ineffective and a previous value for .GRP from a previous test is retained, despite just creating a new SDenv.
R/data.table.R: ## 'av' correct here ?? *** TO DO ***
R/data.table.R: if (!bysameorder) { # TO DO: lower this into forder.c
R/data.table.R: if (!orderedirows && !length(o__)) o__ = 1:xnrow # temp fix. TO DO: revist orderedirows
R/data.table.R: # TO DO: combine uniqlist and uniquelengths into one call. Or, just set len__ to NULL when dogroups infers that.
R/data.table.R: # TO DO: allow secondary keys to be stored, then we see if our by matches one, if so use it, and no need to sort again. TO DO: document multiple keys.
R/data.table.R: # TODO, TO DO: raise the checks for 'jvnames' earlier (where jvnames is set by checking 'jsub') and set 'jvnames' already.
R/data.table.R: # TODO, TO DO: revisit complex cases (as illustrated below)
R/data.table.R: # TO DO, TODO: maybe a message/warning here so that we can catch the overlooked cases, if any?
R/data.table.R: # fix for bug active voice > passive voice #2758. TO DO: provide a better error message
R/data.table.R: # TO DO: xrows would be a better name for irows: irows means the rows of x that i joins to
R/data.table.R: # TO DO: setkey could mark the key whether it is unique or not.
R/data.table.R: setkeyv(x,cnames) # TO DO: setkey before grouping to get memcpy benefit.
R/data.table.R: setnames(ans,seq_along(bynames),bynames) # TO DO: why doesn't groups have bynames in the first place?
R/data.table.R: setnames(ans,seq_along(bynames),bynames) # TO DO: reinvestigate bynames flowing from dogroups here and simplify
R/data.table.R: alloc.col(ans) # TO DO: overallocate in dogroups in the first place and remove this line
R/data.table.R:# TO DO. Reintroduce velow but dispatch straight to
R/data.table.R: # TO DO: warning("Please use DT[i,j:=value] syntax instead of DT[i,j]<-value, for efficiency. See ?':='")
R/data.table.R: } else x = shallow(x) ## Fix for [R-Forge #5660] Interaction between "Caret" package and data.table in R 3.1 gives different column name #476 and cast changes column name of origin input data.table #825. Needed for R v3.1.0+. TO DO: revisit
R/data.table.R:# TO DO, add more warnings e.g. for by.data.table(), telling user what the data.table syntax is but letting them dispatch to data.frame if they want
R/data.table.R: # TO DO: inside Ccopy it could reset tl to 0 or length, but no matter as selfrefok detects it
R/data.table.R: # TO DO: revisit duplicate.c in R 3.0.3 and see where it's at
R/data.table.R: .Call(Cassign,x,i,j,NULL,value,FALSE) # verbose=FALSE for speed to avoid getOption() TO DO: somehow read getOption("datatable.verbose") from C level
R/data.table.R: # TO DO if table has 'ul' then match to that
R/data.table.R: # TO DO: deprecate and remove this. It's exported but doubt anyone uses it. Think the plan was to use it internally, but forderv superceded.
R/setkey.R: setattr(x,"index",NULL) # TO DO: reorder existing indexes likely faster than rebuilding again. Allow optionally. Simpler for now to clear.
R/setkey.R:# Maybe just a grep through *.R for use of these function internally would be better (TO DO).
R/setkey.R: # TO DO: export and document forder
R/setkey.R: return( if (length(o)) x[o] else x ) # TO DO: document the nice efficiency here
R/setkey.R:# TO DO?: Use the CJ list() replication method for SJ (inside as.data.table.list?, Wrong behaviour DT[, i, with=FALSE] #2109) too to avoid alloc.col
R/test.data.table.R: # TO DO: reinstate solution for C locale of CRAN's Mac (R-Forge's Mac is ok)
R/test.data.table.R: # TO DO: every line that could possibly fail should ideally be inside test()
R/test.data.table.R: # TO DO: test 166 doesn't pass with these :
R/uniqlist.R: # TO DO: Possibly reinstate reverse argument :
R/as.data.table.R: ans = copy(x) # TO DO: change this deep copy to be shallow.
R/duplicated.R: # TO DO: allow logical to be passed through to C level, and allow cols=NULL to mean all, for further speed gain.
R/utils.R: # TO DO: allow taking along any dimension e.g. all but last column, rather than all but last row.
tests/tests.R:# TO DO: check we test each verbose message at least once, instead of a full repeat of all tests

The text was updated successfully, but these errors were encountered:

…R ToDo of #678 (#5263)

MichaelChirico · 2024-04-06T23:03:29Z

This bug is very old & even the list above is likely outdated.

The way to proceed here is with lintr::todo_comment_linter(), so we can open a fresh issue for this once the lint CI is in place (#5908)

jangorecki added tests and removed feature request labels Mar 12, 2019

This was referenced Nov 19, 2021

R/merge.R: # TO DO: use set() here instead.. #5262

Closed

rbindlist support fill=TRUE with use.names=FALSE and use it in merge.R ToDo of #678 #5263

Merged

ToDos: replace looped subsetVector with subsetDT #5264

Merged

mattdowle pushed a commit that referenced this issue Nov 23, 2021

rbindlist support fill=TRUE with use.names=FALSE and use it in merge.…

4922384

…R ToDo of #678 (#5263)

ben-schwen mentioned this issue Jan 1, 2022

set automatically allocates new column slots if needed #5269

Open

MichaelChirico closed this as completed Apr 6, 2024

This was referenced Apr 6, 2024

TODO items from tests.Rraw #2572

Closed

[R-Forge #1003] Revisit growVector #669

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-Forge #205] Move TO DO comments in the code into tracker (use grep) #678

[R-Forge #205] Move TO DO comments in the code into tracker (use grep) #678

arunsrinivasan commented Jun 8, 2014 •

edited by ben-schwen

Loading

MichaelChirico commented Apr 6, 2024

[R-Forge #205] Move TO DO comments in the code into tracker (use grep) #678

[R-Forge #205] Move TO DO comments in the code into tracker (use grep) #678

Comments

arunsrinivasan commented Jun 8, 2014 • edited by ben-schwen Loading

MichaelChirico commented Apr 6, 2024

arunsrinivasan commented Jun 8, 2014 •

edited by ben-schwen

Loading