Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squeaky-clean: change tasks to not include unicode handling #2049

Closed
3 tasks
jaywritescode opened this issue Oct 23, 2021 · 19 comments
Closed
3 tasks

squeaky-clean: change tasks to not include unicode handling #2049

jaywritescode opened this issue Oct 23, 2021 · 19 comments
Assignees
Labels
help wanted Extra attention is needed x:action/improve Improve existing functionality/content x:knowledge/elementary Little Exercism knowledge required x:module/concept-exercise Work on Concept Exercises x:size/small Small amount of work x:type/content Work on content (e.g. exercises, concepts)

Comments

@jaywritescode
Copy link

jaywritescode commented Oct 23, 2021

I noticed in the squeaky-clean problem, there's a test as follows:

//  test/java/SqueakyCleanTest.java

    @Test
    public void string_with_no_letters() {
        assertThat(SqueakyClean.clean("\uD83D\uDE00\uD83D\uDE00\uD83D\uDE00")).isEmpty();
    }

However, there's no corresponding instruction to remove "non-standard" characters from the input string, so the test suite defines a different spec than the instructions.

I think the intent of the test is to remove any non-alphanumeric character or underscore from the input string, but I personally feel going too far into the details of Unicode (i.e. what is a "character" anyway?) distracts from the purpose of the exercise and can be discouraging. Perhaps the instructions can be clarified or the test can be removed or ignored.

Tasks

After some discussion (see comment below), an agreement was reached to modify the exercise to not include any unicode handling. Here are the tasks to do this:

  • Update the current tasks and their examples to not include non-ascii characters.
  • Change the tests to use non-ascii characters too.
  • Remove the final task concerning greek letters.

Contributing to this task

  • If you'd like to contribute to this task, make a comment below saying that you'd like to work on this issue.
  • After that, feel free to make a PR fixing the issue. Don't forget to link the PR to this issue
@ericbalawejder
Copy link
Contributor

@ystromm

@kotp
Copy link
Member

kotp commented Oct 27, 2021

When reading the instructions it states:

A valid SqueakyClean name is comprised of zero or more letters and underscores.

This tells me that it is comprised of zero or more letters and underscores. This tells me that it does not contain anything other than letters and underscores.

@jaywritescode
Copy link
Author

@kotp — I agree with your point and ultimately the test suite defines the specs.

But the intent of the exercise is to teach someone new to Java, and possibly new to programming, about string manipulation, and the details of Unicode distract from that instruction. For example, grokking 'g' < 'v' is much more straightforward than grokking 'Ψ' < '😀'.

@kotp
Copy link
Member

kotp commented Oct 28, 2021

I am not the final say, and I think the test makes sense. But not positive about a change for the written specification, the description.

The concept taught is char and so "What is a character anyway?" is one of the questions that hopefully is answered by this lesson.

I also would say grokking < means that all of the examples possible for something < something_else is as easy to grok once you grok <.

@njhanley
Copy link

njhanley commented Nov 2, 2021

This undocumented test is part of a larger issue: the stated goal, written tasks, hints, and tests all seem to disagree on what we're trying to accomplish. If the purpose of clean is to produce strings composed of zero or more letters and underscores, why don't we simply strip the other characters? What is the purpose of the replacements? Why is isWhitespace recommended when we're only instructed to replace spaces? Why remove Greek letters when "àḃç" is passed through unaltered?

Moreover, is it really a good idea to introduce Unicode support alongside chars without discussing supplementary characters, especially when there are tests containing surrogates? If "What is a character anyway?" is the question being asked, it isn't being adequately addressed by this exercise. In my opinion, that question is beyond the scope of simple char manipulation.

My apologies if this is outside the scope of the original issue.

@sonro
Copy link

sonro commented Nov 24, 2021

If the exercise is to remain in its current state, an additional instruction needs to be added to the README.md. For example: "Omit all other non alphanumeric characters".

@jmrunkle
Copy link
Contributor

jmrunkle commented Dec 4, 2021

possibly new to programming

Just an FYI: teaching "new programmers" is not really a goal. We are not trying to teach people new to programming at exercism. There is (effectively) an expectation that you already understand at least one programming language. Exercism is about teaching fluency - generally so that a programmer in language X can learn language Y and get fluent quickly.

All that being said, the rest of this discussion seems to be somewhat relevant: we appear to be teaching too much at once in this exercise. We probably need to create a separate concept for instruction about things like unicode. The concept exercises are meant to be trivial for someone that is fluent in the language to create the expected solution (ie. the exemplar).

@jmrunkle
Copy link
Contributor

OK, proposal:

  1. we simplify squeaky-clean to literally just teach about basic characters (like the letter "A" or a space " ", etc)
  2. we add a new concept / exercise for dealing with code points and other fine nuances relating to unicode

@ericjobrien
Copy link

Changes to this exercise will be greatly appreciated. This is coming from someone trying to use Exercism to further their knowledge of Java. Upon encountering the squeaky-clean exercise, I almost gave up on using Exercism completely.

@jmrunkle
Copy link
Contributor

Thanks for the additional insight. Now we just need someone to contribute such a change. Adding the new concept will probably be its own issue, for this one I think it is enough for us to remove the unicode specific stuff from the existing exercise.

@AlbusPortucalis
Copy link
Contributor

@jmrunkle I can update it after my holidays ;)

@barthon-b
Copy link

For example: "Omit all other non alphanumeric characters".

One angle that I don't think has been touched on here is that alphanumeric in unicode is a massive set. I assume we mean Latin alphanumerics, so basically the ASCII subset minus special chars.

Otherwise agree with @jmrunkle on this:

for this one I think it is enough for us to remove the unicode specific stuff from the existing exercise.

@jmrunkle
Copy link
Contributor

Perhaps even more simply stated as English letters and numbers (and possibly whitespace).

@andrerfcsantos andrerfcsantos added good first issue Good for newcomers x:action/improve Improve existing functionality/content x:knowledge/elementary Little Exercism knowledge required x:module/concept-exercise Work on Concept Exercises x:type/content Work on content (e.g. exercises, concepts) x:size/small Small amount of work and removed action/stale labels Jun 19, 2022
@exercism exercism deleted a comment from github-actions bot Jun 19, 2022
@andrerfcsantos andrerfcsantos changed the title squeaky-clean: test without corresponding instructions squeaky-clean: change tasks to not include unicode handling Jun 19, 2022
@andrerfcsantos
Copy link
Member

I agree we should change this exercise according to what is discussed above. I updated the title and the description with a list of tasks and added labels to increase the visibility of the issue.

@github-actions
Copy link
Contributor

This issue has been automatically marked as action/stale because it has not had recent activity. Please update if there are new updates to provide.

@GitteV-2159432
Copy link

I would like to work on this issue. I have already tried to listen to the tasks and change these thing in the code. I don't know if the changes that i made are sufficient and useful.

@sanderploegsma
Copy link
Contributor

@andrerfcsantos looking at the discussion above, I'm wondering whether it makes sense to keep the task about control characters, or to remove that as well. If the goal of this concept exercise is to give a basic introduction of characters, maybe it's best to focus on the Latin alphabet, numbers, whitespace and punctuation, and leave things like control characters, unicode etc for a secondary concept exercise.

@sanderploegsma sanderploegsma added help wanted Extra attention is needed and removed good first issue Good for newcomers action/stale labels Jan 9, 2024
@manumafe98
Copy link
Contributor

Hi @sanderploegsma I would like to take on this issue, the scope is to remove the unicode and greek leeters? or do you think this needs a complete reformat?

@sanderploegsma
Copy link
Contributor

@manumafe98 sure, go ahead! As I mentioned above, IMO the exercise should only focus on introducing the char type as a concept, it does not have to handle everything there is to now about chars. This can perhaps be covered in another concept ("advanced chars" or something, idk), or it can be covered by one or more practice exercises.

So I'd remove the following aspects from the exercise:

  • Control characters
  • Unicode
  • Greek letters

Looking at the current instructions, that would leave the following tasks:

  1. Replace any spaces encountered with underscores
  2. Convert kebab-case to camelCase
  3. Omit characters that are not letters (where it should focus only on numbers and special characters like punctuation, no emojis or unicode)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed x:action/improve Improve existing functionality/content x:knowledge/elementary Little Exercism knowledge required x:module/concept-exercise Work on Concept Exercises x:size/small Small amount of work x:type/content Work on content (e.g. exercises, concepts)
Projects
None yet