Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEDUPLICATE function #1573

Closed
Siskin-Bot opened this issue Feb 15, 2020 · 1 comment
Closed

DEDUPLICATE function #1573

Siskin-Bot opened this issue Feb 15, 2020 · 1 comment

Comments

@Siskin-Bot
Copy link
Collaborator

Submitted by: BrianH

DEDUPLICATE would have the same spec as UNIQUE, and would operate by the same rules of comparison to determine uniqueness, but would modify its argument to remove duplicates instead of returning a copy with the duplicates removed. Basically, a modifying UNIQUE (see #1550). It would return a reference to its argument, for chaining.

This would be useful for occasions when you don't want the series copy overhead, or must use the same series due to multiple references. As with UNIQUE, this would be a noop when applied to bitset! or typeset! arguments.

Note about the name: There is no verb form of "unique" in the English language, so when the data storage industry needed such a word they invented "deduplicate". A possible alternative would be the verb phrase MAKE-UNIQUE, or the option UNIQUE/no-copy.

>> a: [1 2 3 1]
== [1 2 3 1]
>> unique a
== [1 2 3]
>> a
== [1 2 3 1]
>> deduplicate a
== [1 2 3]
>> a
== [1 2 3]

; Mezzanine version:
deduplicate: func [
    "Removes duplicates from the data set."
    set1 [block! string! binary! bitset! typeset!] "The data set (modified)"
    /case "Use case-sensitive comparison (except bitsets)"
    /skip "Treat the series as records of fixed size"
    size [integer!]
] [
    if series? set1 [
        insert set1 also apply :unique [set1 case skip size] clear set1
    ]
    set1
]

Imported from: CureCode [ Version: alpha 97 Type: Wish Platform: All Category: Mezzanine Reproduce: Always Fixed-in:none ]
Imported from: metaeducation#1573

Comments:

Rebolbot commented on Apr 19, 2010:

Submitted by: Steeve

Or perhaps just a refinement.

unique/stay, unique/stick, unique/hold, unique/keep, unique/remain...

But having the 'stay behavior by default would be clever.
And if you want a copy then... do it.

>> unique copy my-serie

Rebolbot commented on Apr 19, 2010:

Submitted by: BrianH

Steeve, we already discussed those options in #1550. And /no-copy by default is unlikely, for reasons discussed there.

#1550 is a counter-proposal to this one; if it is accepted, this one should be dismissed, or vice-versa. If a different name for the function is chosen instead, or the UNIQUE/no-copy option, the summary of this ticket should be renamed accordingly.

Rebolbot commented on Apr 26, 2010:

Submitted by: Carl

It should be noted that for larger series, UNIQUE must always allocate extra series in order to hash comparisons to avoid N**2 performance.


Rebolbot commented on Apr 28, 2010:

Submitted by: BrianH

Then the advantage of making DEDUPLICATE native would be to avoid allocating an extra series, but only when the series would be small enough that it wouldn't matter as much to memory usage whether it gets allocated or not. It sounds like we can get by with the mezzanine that is in the example code above.


Rebolbot commented on Nov 8, 2010:

Submitted by: Ladislav

"This would be useful for occasions when you don't want the series copy overhead..." - I cannot help but say, that this proposal is based on error, so it should be dismissed

Rebolbot commented on Nov 9, 2010:

Submitted by: BrianH

The mezzanine version of this is useful on its own, even if not all of its potential benefits are possible. The creation of this ticket was prompted by a real-world need for such a function in the R3 GUI project - other projects could likely benefit from this as well. If not accepted into R3 itself it would still be a useful addition to a community library.

Changed the category to Mezzanine and tweaked the code comments.


Rebolbot commented on Jan 28, 2011:

Submitted by: Ladislav

"The creation of this ticket was prompted by a real-world need for such a function in the R3 GUI project" - I do not think this is accurate enough

Rebolbot commented on Jan 28, 2011:

Submitted by: BrianH

Nope, it's accurate. Henrik needed something like this for the R3 GUI. That doesn't mean that it needs to be a mezzanine, just that the function was needed. Coincidentally, Maxim needs something like this for Glass. Guess GUIs tend to have multiple references to blocks that need updating.


Rebolbot commented on Jan 31, 2011:

Submitted by: Ladislav

I am questioning the "real-world need", which was not demonstrated in a satisfactory way. The fact that somebody mentions such a need using quite unrelated arguments is not acceptable for me.

For the same reason it is not acceptable for me to see any "real world need" if somebody mentions "the need" several times.

To elaborate further: what bothers me is the arguments "proving" the need are totally unrelated to the subject, and therefore not convincing at all.


Rebolbot commented on Jan 31, 2011:

Submitted by: BrianH

There is a need for such a function (it's been requested independently more than once with different names, including UNIQUE and REMOVE-DUPLICATES). There is such a function (above, in the example code) and it can be used by the people who need it. There, problem solved. So the question is whether the solution would be better made available by adding the function to a community library, or better by making the function mezzanine.

I don't need such a function and if I do I can just copy it from this ticket, and you clearly don't need such a function, so our votes count accordingly. I am perfectly OK with there being a high bar for a function getting added to the mezzanines; R3 is modular for that reason among others. The fact that there is currently not yet such a community library is not really a problem unless we delete this ticket; we can make such a library later. This ticket can even be dismissed and the problem will still be solved just by it still existing for future reference.


Rebolbot commented on Feb 1, 2011:

Submitted by: Ladislav

"if I do I can just copy it from this ticket" - yes, certainly you can. What bothers me is the fact, that the proponents of such a function are trying to prove, that the above possible implementation does not satisfy the need they have (since it uses the auxiliary storage).

Rebolbot commented on Feb 1, 2011:

Submitted by: BrianH

I don't count such arguments as "requests" for this function. The only ones I count are those that actually need the block to be modified, such as in cases where there is more than one reference to the same block and all expect to see the updated version, whether or not there is an auxiliary series allocated. Henrik and Maxim both requested this function (with different names) for that reason, for example.

For those who think this will save on temporary series allocations, please read Carl's comment above.


Rebolbot added the Type.wish on Jan 12, 2016


@Oldes
Copy link
Owner

Oldes commented Mar 30, 2020

I've included it in the above commit (just without bitset! and typeset!, which somehow does not make sense as cannot contain any duplicates)

@Oldes Oldes closed this as completed Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants