squeaky-clean: change tasks to not include unicode handling #2049

jaywritescode · 2021-10-23T17:17:36Z

I noticed in the squeaky-clean problem, there's a test as follows:

//  test/java/SqueakyCleanTest.java

    @Test
    public void string_with_no_letters() {
        assertThat(SqueakyClean.clean("\uD83D\uDE00\uD83D\uDE00\uD83D\uDE00")).isEmpty();
    }

However, there's no corresponding instruction to remove "non-standard" characters from the input string, so the test suite defines a different spec than the instructions.

I think the intent of the test is to remove any non-alphanumeric character or underscore from the input string, but I personally feel going too far into the details of Unicode (i.e. what is a "character" anyway?) distracts from the purpose of the exercise and can be discouraging. Perhaps the instructions can be clarified or the test can be removed or ignored.

Tasks

After some discussion (see comment below), an agreement was reached to modify the exercise to not include any unicode handling. Here are the tasks to do this:

Update the current tasks and their examples to not include non-ascii characters.
Change the tests to use non-ascii characters too.
Remove the final task concerning greek letters.

Contributing to this task

If you'd like to contribute to this task, make a comment below saying that you'd like to work on this issue.
After that, feel free to make a PR fixing the issue. Don't forget to link the PR to this issue

The text was updated successfully, but these errors were encountered:

ericbalawejder · 2021-10-27T17:05:16Z

@ystromm

kotp · 2021-10-27T17:49:28Z

When reading the instructions it states:

A valid SqueakyClean name is comprised of zero or more letters and underscores.

This tells me that it is comprised of zero or more letters and underscores. This tells me that it does not contain anything other than letters and underscores.

jaywritescode · 2021-10-28T16:00:07Z

@kotp — I agree with your point and ultimately the test suite defines the specs.

But the intent of the exercise is to teach someone new to Java, and possibly new to programming, about string manipulation, and the details of Unicode distract from that instruction. For example, grokking 'g' < 'v' is much more straightforward than grokking 'Ψ' < '😀'.

kotp · 2021-10-28T17:14:12Z

I am not the final say, and I think the test makes sense. But not positive about a change for the written specification, the description.

The concept taught is char and so "What is a character anyway?" is one of the questions that hopefully is answered by this lesson.

I also would say grokking < means that all of the examples possible for something < something_else is as easy to grok once you grok <.

njhanley · 2021-11-02T05:10:53Z

This undocumented test is part of a larger issue: the stated goal, written tasks, hints, and tests all seem to disagree on what we're trying to accomplish. If the purpose of clean is to produce strings composed of zero or more letters and underscores, why don't we simply strip the other characters? What is the purpose of the replacements? Why is isWhitespace recommended when we're only instructed to replace spaces? Why remove Greek letters when "àḃç" is passed through unaltered?

Moreover, is it really a good idea to introduce Unicode support alongside chars without discussing supplementary characters, especially when there are tests containing surrogates? If "What is a character anyway?" is the question being asked, it isn't being adequately addressed by this exercise. In my opinion, that question is beyond the scope of simple char manipulation.

My apologies if this is outside the scope of the original issue.

sonro · 2021-11-24T10:50:31Z

If the exercise is to remain in its current state, an additional instruction needs to be added to the README.md. For example: "Omit all other non alphanumeric characters".

jmrunkle · 2021-12-04T19:18:40Z

possibly new to programming

Just an FYI: teaching "new programmers" is not really a goal. We are not trying to teach people new to programming at exercism. There is (effectively) an expectation that you already understand at least one programming language. Exercism is about teaching fluency - generally so that a programmer in language X can learn language Y and get fluent quickly.

All that being said, the rest of this discussion seems to be somewhat relevant: we appear to be teaching too much at once in this exercise. We probably need to create a separate concept for instruction about things like unicode. The concept exercises are meant to be trivial for someone that is fluent in the language to create the expected solution (ie. the exemplar).

jmrunkle · 2021-12-10T03:53:25Z

OK, proposal:

we simplify squeaky-clean to literally just teach about basic characters (like the letter "A" or a space " ", etc)
we add a new concept / exercise for dealing with code points and other fine nuances relating to unicode

ericjobrien · 2021-12-29T15:00:13Z

Changes to this exercise will be greatly appreciated. This is coming from someone trying to use Exercism to further their knowledge of Java. Upon encountering the squeaky-clean exercise, I almost gave up on using Exercism completely.

jmrunkle · 2021-12-29T16:18:04Z

Thanks for the additional insight. Now we just need someone to contribute such a change. Adding the new concept will probably be its own issue, for this one I think it is enough for us to remove the unicode specific stuff from the existing exercise.

AlbusPortucalis · 2021-12-30T09:27:00Z

@jmrunkle I can update it after my holidays ;)

barthon-b · 2021-12-30T19:47:22Z

For example: "Omit all other non alphanumeric characters".

One angle that I don't think has been touched on here is that alphanumeric in unicode is a massive set. I assume we mean Latin alphanumerics, so basically the ASCII subset minus special chars.

Otherwise agree with @jmrunkle on this:

for this one I think it is enough for us to remove the unicode specific stuff from the existing exercise.

jmrunkle · 2021-12-30T19:56:27Z

Perhaps even more simply stated as English letters and numbers (and possibly whitespace).

andrerfcsantos · 2022-06-19T10:38:37Z

I agree we should change this exercise according to what is discussed above. I updated the title and the description with a list of tasks and added labels to increase the visibility of the issue.

github-actions · 2022-09-18T04:19:41Z

This issue has been automatically marked as action/stale because it has not had recent activity. Please update if there are new updates to provide.

GitteV-2159432 · 2023-05-06T09:58:48Z

I would like to work on this issue. I have already tried to listen to the tasks and change these thing in the code. I don't know if the changes that i made are sufficient and useful.

…andling exercism#2049

sanderploegsma · 2023-09-21T11:43:03Z

@andrerfcsantos looking at the discussion above, I'm wondering whether it makes sense to keep the task about control characters, or to remove that as well. If the goal of this concept exercise is to give a basic introduction of characters, maybe it's best to focus on the Latin alphabet, numbers, whitespace and punctuation, and leave things like control characters, unicode etc for a secondary concept exercise.

manumafe98 · 2024-01-24T15:45:32Z

Hi @sanderploegsma I would like to take on this issue, the scope is to remove the unicode and greek leeters? or do you think this needs a complete reformat?

sanderploegsma · 2024-01-26T08:09:49Z

@manumafe98 sure, go ahead! As I mentioned above, IMO the exercise should only focus on introducing the char type as a concept, it does not have to handle everything there is to now about chars. This can perhaps be covered in another concept ("advanced chars" or something, idk), or it can be covered by one or more practice exercises.

So I'd remove the following aspects from the exercise:

Control characters
Unicode
Greek letters

Looking at the current instructions, that would leave the following tasks:

Replace any spaces encountered with underscores
Convert kebab-case to camelCase
Omit characters that are not letters (where it should focus only on numbers and special characters like punctuation, no emojis or unicode)

github-actions bot added the action/stale label Mar 31, 2022

exercism deleted a comment from github-actions bot Jun 19, 2022

andrerfcsantos changed the title ~~squeaky-clean: test without corresponding instructions~~ squeaky-clean: change tasks to not include unicode handling Jun 19, 2022

github-actions bot added the action/stale label Sep 18, 2022

GitteV-2159432 added a commit to GitteV-2159432/java that referenced this issue May 6, 2023

changes tasks in squeaky-clean: change tasks to not include unicode h…

dd611dc

…andling exercism#2049

GitteV-2159432 mentioned this issue May 6, 2023

changed tasks in squeaky-clean: change tasks to not include unicode handling #2049 #2305

Closed

sanderploegsma added help wanted Extra attention is needed and removed good first issue Good for newcomers action/stale labels Jan 9, 2024

sanderploegsma assigned manumafe98 Jan 26, 2024

manumafe98 mentioned this issue Jan 29, 2024

Updating Squeaky Clean #2664

Merged

manumafe98 closed this as completed Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

squeaky-clean: change tasks to not include unicode handling #2049

squeaky-clean: change tasks to not include unicode handling #2049

jaywritescode commented Oct 23, 2021 •

edited by andrerfcsantos

Loading

ericbalawejder commented Oct 27, 2021

kotp commented Oct 27, 2021

jaywritescode commented Oct 28, 2021

kotp commented Oct 28, 2021

njhanley commented Nov 2, 2021

sonro commented Nov 24, 2021

jmrunkle commented Dec 4, 2021

jmrunkle commented Dec 10, 2021

ericjobrien commented Dec 29, 2021

jmrunkle commented Dec 29, 2021

AlbusPortucalis commented Dec 30, 2021

barthon-b commented Dec 30, 2021

jmrunkle commented Dec 30, 2021

andrerfcsantos commented Jun 19, 2022

github-actions bot commented Sep 18, 2022

GitteV-2159432 commented May 6, 2023

sanderploegsma commented Sep 21, 2023

manumafe98 commented Jan 24, 2024

sanderploegsma commented Jan 26, 2024

squeaky-clean: change tasks to not include unicode handling #2049

squeaky-clean: change tasks to not include unicode handling #2049

Comments

jaywritescode commented Oct 23, 2021 • edited by andrerfcsantos Loading

Tasks

Contributing to this task

ericbalawejder commented Oct 27, 2021

kotp commented Oct 27, 2021

jaywritescode commented Oct 28, 2021

kotp commented Oct 28, 2021

njhanley commented Nov 2, 2021

sonro commented Nov 24, 2021

jmrunkle commented Dec 4, 2021

jmrunkle commented Dec 10, 2021

ericjobrien commented Dec 29, 2021

jmrunkle commented Dec 29, 2021

AlbusPortucalis commented Dec 30, 2021

barthon-b commented Dec 30, 2021

jmrunkle commented Dec 30, 2021

andrerfcsantos commented Jun 19, 2022

github-actions bot commented Sep 18, 2022

GitteV-2159432 commented May 6, 2023

sanderploegsma commented Sep 21, 2023

manumafe98 commented Jan 24, 2024

sanderploegsma commented Jan 26, 2024

jaywritescode commented Oct 23, 2021 •

edited by andrerfcsantos

Loading