You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DEDUPLICATE would have the same spec as UNIQUE, and would operate by the same rules of comparison to determine uniqueness, but would modify its argument to remove duplicates instead of returning a copy with the duplicates removed. Basically, a modifying UNIQUE (see #1550). It would return a reference to its argument, for chaining.
This would be useful for occasions when you don't want the series copy overhead, or must use the same series due to multiple references. As with UNIQUE, this would be a noop when applied to bitset! or typeset! arguments.
Note about the name: There is no verb form of "unique" in the English language, so when the data storage industry needed such a word they invented "deduplicate". A possible alternative would be the verb phrase MAKE-UNIQUE, or the option UNIQUE/no-copy.
>>a: [1231]
== [1231]
>>unique a
== [123]
>> a
== [1231]
>> deduplicate a
== [123]
>> a
== [123]
; Mezzanine version:deduplicate: func["Removes duplicates from the data set."
set1 [block!string!binary!bitset!typeset!] "The data set (modified)"/case"Use case-sensitive comparison (except bitsets)"/skip"Treat the series as records of fixed size"
size [integer!]
] [
if series? set1 [
insert set1 also apply :unique [set1 case skip size] clear set1
]
set1
]
But having the 'stay behavior by default would be clever.
And if you want a copy then... do it.
>>unique copy my-serie
Rebolbot commented on Apr 19, 2010:
Submitted by:BrianH
Steeve, we already discussed those options in #1550. And /no-copy by default is unlikely, for reasons discussed there.
#1550 is a counter-proposal to this one; if it is accepted, this one should be dismissed, or vice-versa. If a different name for the function is chosen instead, or the UNIQUE/no-copy option, the summary of this ticket should be renamed accordingly.
Rebolbot commented on Apr 26, 2010:
Submitted by:Carl
It should be noted that for larger series, UNIQUE must always allocate extra series in order to hash comparisons to avoid N**2 performance.
Rebolbot commented on Apr 28, 2010:
Submitted by:BrianH
Then the advantage of making DEDUPLICATE native would be to avoid allocating an extra series, but only when the series would be small enough that it wouldn't matter as much to memory usage whether it gets allocated or not. It sounds like we can get by with the mezzanine that is in the example code above.
Rebolbot commented on Nov 8, 2010:
Submitted by:Ladislav
"This would be useful for occasions when you don't want the series copy overhead..."- I cannot help but say, that this proposal is based on error, so it should be dismissed
Rebolbot commented on Nov 9, 2010:
Submitted by:BrianH
The mezzanine version of this is useful on its own, even if not all of its potential benefits are possible. The creation of this ticket was prompted by a real-world need for such a function in the R3 GUI project - other projects could likely benefit from this as well. If not accepted into R3 itself it would still be a useful addition to a community library.
Changed the category to Mezzanine and tweaked the code comments.
Rebolbot commented on Jan 28, 2011:
Submitted by:Ladislav
"The creation of this ticket was prompted by a real-world need for such a function in the R3 GUI project"- I do not think this is accurate enough
Rebolbot commented on Jan 28, 2011:
Submitted by:BrianH
Nope, it's accurate. Henrik needed something like this for the R3 GUI. That doesn't mean that it needs to be a mezzanine, just that the function was needed. Coincidentally, Maxim needs something like this for Glass. Guess GUIs tend to have multiple references to blocks that need updating.
Rebolbot commented on Jan 31, 2011:
Submitted by:Ladislav
I am questioning the "real-world need", which was not demonstrated in a satisfactory way. The fact that somebody mentions such a need using quite unrelated arguments is not acceptable for me.
For the same reason it is not acceptable for me to see any "real world need" if somebody mentions "the need" several times.
To elaborate further: what bothers me is the arguments "proving" the need are totally unrelated to the subject, and therefore not convincing at all.
Rebolbot commented on Jan 31, 2011:
Submitted by:BrianH
There is a need for such a function (it's been requested independently more than once with different names, including UNIQUE and REMOVE-DUPLICATES). There is such a function (above, in the example code) and it can be used by the people who need it. There, problem solved. So the question is whether the solution would be better made available by adding the function to a community library, or better by making the function mezzanine.
I don't need such a function and if I do I can just copy it from this ticket, and you clearly don't need such a function, so our votes count accordingly. I am perfectly OK with there being a high bar for a function getting added to the mezzanines; R3 is modular for that reason among others. The fact that there is currently not yet such a community library is not really a problem unless we delete this ticket; we can make such a library later. This ticket can even be dismissed and the problem will still be solved just by it still existing for future reference.
Rebolbot commented on Feb 1, 2011:
Submitted by:Ladislav
"if I do I can just copy it from this ticket"- yes, certainly you can. What bothers me is the fact, that the proponents of such a function are trying to prove, that the above possible implementation does not satisfy the need they have (since it uses the auxiliary storage).
Rebolbot commented on Feb 1, 2011:
Submitted by:BrianH
I don't count such arguments as "requests" for this function. The only ones I count are those that actually need the block to be modified, such as in cases where there is more than one reference to the same block and all expect to see the updated version, whether or not there is an auxiliary series allocated. Henrik and Maxim both requested this function (with different names) for that reason, for example.
For those who think this will save on temporary series allocations, please read Carl's comment above.
Rebolbot added the Type.wish on Jan 12, 2016
The text was updated successfully, but these errors were encountered:
Submitted by: BrianH
DEDUPLICATE would have the same spec as UNIQUE, and would operate by the same rules of comparison to determine uniqueness, but would modify its argument to remove duplicates instead of returning a copy with the duplicates removed. Basically, a modifying UNIQUE (see #1550). It would return a reference to its argument, for chaining.
This would be useful for occasions when you don't want the series copy overhead, or must use the same series due to multiple references. As with UNIQUE, this would be a noop when applied to bitset! or typeset! arguments.
Note about the name: There is no verb form of "unique" in the English language, so when the data storage industry needed such a word they invented "deduplicate". A possible alternative would be the verb phrase MAKE-UNIQUE, or the option UNIQUE/no-copy.
Imported from: CureCode [ Version: alpha 97 Type: Wish Platform: All Category: Mezzanine Reproduce: Always Fixed-in:none ]
Imported from: metaeducation#1573
Comments:
Submitted by: Steeve
Or perhaps just a refinement.
But having the 'stay behavior by default would be clever.
And if you want a copy then... do it.
Submitted by: BrianH
Steeve, we already discussed those options in #1550. And /no-copy by default is unlikely, for reasons discussed there.
Submitted by: Carl
It should be noted that for larger series, UNIQUE must always allocate extra series in order to hash comparisons to avoid N**2 performance.
Submitted by: BrianH
Then the advantage of making DEDUPLICATE native would be to avoid allocating an extra series, but only when the series would be small enough that it wouldn't matter as much to memory usage whether it gets allocated or not. It sounds like we can get by with the mezzanine that is in the example code above.
Submitted by: Ladislav
Submitted by: BrianH
The mezzanine version of this is useful on its own, even if not all of its potential benefits are possible. The creation of this ticket was prompted by a real-world need for such a function in the R3 GUI project - other projects could likely benefit from this as well. If not accepted into R3 itself it would still be a useful addition to a community library.
Changed the category to Mezzanine and tweaked the code comments.
Submitted by: Ladislav
Submitted by: BrianH
Nope, it's accurate. Henrik needed something like this for the R3 GUI. That doesn't mean that it needs to be a mezzanine, just that the function was needed. Coincidentally, Maxim needs something like this for Glass. Guess GUIs tend to have multiple references to blocks that need updating.
Submitted by: Ladislav
I am questioning the "real-world need", which was not demonstrated in a satisfactory way. The fact that somebody mentions such a need using quite unrelated arguments is not acceptable for me.
For the same reason it is not acceptable for me to see any "real world need" if somebody mentions "the need" several times.
To elaborate further: what bothers me is the arguments "proving" the need are totally unrelated to the subject, and therefore not convincing at all.
Submitted by: BrianH
There is a need for such a function (it's been requested independently more than once with different names, including UNIQUE and REMOVE-DUPLICATES). There is such a function (above, in the example code) and it can be used by the people who need it. There, problem solved. So the question is whether the solution would be better made available by adding the function to a community library, or better by making the function mezzanine.
I don't need such a function and if I do I can just copy it from this ticket, and you clearly don't need such a function, so our votes count accordingly. I am perfectly OK with there being a high bar for a function getting added to the mezzanines; R3 is modular for that reason among others. The fact that there is currently not yet such a community library is not really a problem unless we delete this ticket; we can make such a library later. This ticket can even be dismissed and the problem will still be solved just by it still existing for future reference.
Submitted by: Ladislav
Submitted by: BrianH
I don't count such arguments as "requests" for this function. The only ones I count are those that actually need the block to be modified, such as in cases where there is more than one reference to the same block and all expect to see the updated version, whether or not there is an auxiliary series allocated. Henrik and Maxim both requested this function (with different names) for that reason, for example.
For those who think this will save on temporary series allocations, please read Carl's comment above.
The text was updated successfully, but these errors were encountered: