Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lorem EN word.ts contains a single character in the pool #3261

Open
7 of 10 tasks
konarx opened this issue Nov 14, 2024 · 8 comments
Open
7 of 10 tasks

Lorem EN word.ts contains a single character in the pool #3261

konarx opened this issue Nov 14, 2024 · 8 comments
Labels
c: bug Something isn't working m: lorem Something is referring to the lorem module p: 1-normal Nothing urgent s: awaiting more info Additional information are requested s: needs decision Needs team/maintainer decision

Comments

@konarx
Copy link

konarx commented Nov 14, 2024

Pre-Checks

Describe the bug

I was using lorem.word for testing, and I had a failing test. I was confident enough to ping the developer:
-"Hey, you have a bug here."
-"No, you are providing a single character when I expect a word of at least two characters."
-"No, I don't. I use a library that specifically uses random words, not chars".
-"Yes, you are. Here, see the payload you provided yourself."
-"Ah..."
And here I am :) There is an 'a' character here, which is NOT a word, so I do not think it should be in this pool.

Minimal reproduction code

No response

Additional Context

No response

Environment Info

-

Which module system do you use?

  • CJS
  • ESM

Used Package Manager

npm

@konarx konarx added c: bug Something isn't working s: pending triage Pending Triage labels Nov 14, 2024
@konarx konarx changed the title Lorem word.ts contains a single character in the pool Lorem EN word.ts contains a single character in the pool Nov 14, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Nov 14, 2024

FFR:

The English locale does have other one character (non-lorem) words as well (e.g. most prominently I and a).
Do you consider those to be words?
To be clear, I'm not against removing it, I just wish to understand your usecase a bit more.
Because if we remove a from the list, then maybe someone else considers 2 letter words to be too short.

If you need words of a certain length, have you tried faker.word.sample({length: { min: 2, max: 1000 }})?
Or do you specifically need a similar feature for the lorem words?

@ST-DDT ST-DDT added p: 1-normal Nothing urgent m: lorem Something is referring to the lorem module s: awaiting more info Additional information are requested labels Nov 14, 2024
@xDivisionByZerox xDivisionByZerox removed the s: pending triage Pending Triage label Nov 14, 2024
@konarx
Copy link
Author

konarx commented Nov 14, 2024

Do you consider those to be words?

I understand your perspective, but this approach might be a bit abstract. For instance, the character I can also represent a Roman numeral, so it feels more suited to be treated as a character rather than a word in the traditional sense. Personally, I find it a bit misleading to keep single-character elements like I and a in a word pool—they're more accurately handled within a character set or pool rather than a word list.

If you're aiming for control over word length, I’d recommend focusing on ensuring that single characters don’t get pulled into word contexts, rather than adjusting word definitions. That way, we can keep words to truly represent terms rather than individual characters.

@ST-DDT
Copy link
Member

ST-DDT commented Nov 14, 2024

Thanks for sharing your opinion.
This is really useful in understanding the expectations of our users, their thoughts and decision making processes.

Is it possible for you to share

  • the name of the property you are trying to set,
  • its rough definition
  • and why you chose lorem.word over word.sample and random characters?

@ST-DDT
Copy link
Member

ST-DDT commented Nov 14, 2024

In a sense, these one character words have found the exact issue they are meant to find.
Namely, finding differences in the understanding of specific terms and maybe outlining potential to improve the documentation and specification.

Do you expect to get them, when you ask for a word? In this case: No
And more importantly: Do you think of them, when you define the input as "a word", do your/our users think of them? Should you/they? How do we communicate that with our respective users?
Most (two letter) words aren't any more useful/valid by themselves as one letter words.

I and a are in the English dictionary, so at least some people consider them to be words.
I'll consult a Latin lexicon later and we will discuss this issue in the next team-meeting.

@ST-DDT ST-DDT added the s: needs decision Needs team/maintainer decision label Nov 14, 2024
@matthewmayer
Copy link
Contributor

"a" is a valid Latin word like "a populo" (by the people) as is "e" (e pluribus unum).

@ST-DDT
Copy link
Member

ST-DDT commented Nov 14, 2024

@matthewmayer Would you expect lorem.word() to return these one letter word?
And what is your opinion regarding a word length parameter?

@matthewmayer
Copy link
Contributor

In a sense, these one character words have found the exact issue they are meant to find.

I agree with this. Having the one character word led to a conversation between two people which led to a better understanding of what the actual requirements for a parameter were. That's a good thing.

Similarly having words like jalapeño in the English word list might help uncover a hidden requirement that a "word" is supposed to be ASCII #1538

@konarx
Copy link
Author

konarx commented Nov 14, 2024

"a" is a valid Latin word like "a populo" (by the people) as is "e" (e pluribus unum).

This is probably the most accurate explanation; thank you, @matthewmayer .

In my case, I opted not to use en/word/adjective.ts because I needed to create a simple Label—just a straightforward, character-free string that could serve as an indicator. Since the adjective includes hyphenated (-) terms like black-and-white and extra-large, it didn’t quite fit my needs. So, avoiding those entries was the better choice for me.

However, we can all agree that some words and characters overlap categories, which might be a bit confusing. Ideally, each string should fit the closest category—like Nick being both a name and a word, but I wouldn't expect to find it in the name pool (it's not, it's just an example from the top of my head).

Thank you all for the insights and the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: bug Something isn't working m: lorem Something is referring to the lorem module p: 1-normal Nothing urgent s: awaiting more info Additional information are requested s: needs decision Needs team/maintainer decision
Projects
None yet
Development

No branches or pull requests

4 participants