Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/multi content authoring #3366

Closed
wants to merge 23 commits into from
Closed

Conversation

barthc
Copy link
Contributor

@barthc barthc commented Mar 3, 2020

Closes #716

Introduced two new config options, a top-level locales option

locales
 - en
 - fr

and a collection level config multi_content i18n_structure option with three possible values:

  • single_file value with save all translated content inside a single file with a format like so:
en:
  title: a-night-not-like-the-others
 content: some english content
fr: 
  title: une-nuit-pas-comme-les-autres
  content: some french content
  • same_folder locale_file_extensions value with save the content in mutilple files post.en.md and post.fr.md
  • diff_folder locale_folders value with save the content in mutilple files en/post.md and fr/post.md

Editor UI for multiple content currently looks like so:
multi-content

@barthc barthc requested a review from a team March 3, 2020 11:58
@barthc barthc force-pushed the feat/multi-content-authoring branch from 4a1dc92 to fa18ab6 Compare March 4, 2020 08:51
@erezrokah erezrokah self-requested a review March 4, 2020 16:57
@erezrokah
Copy link
Contributor

@barthc This is a very good step forward. Will have more time to review it tomorrow.

@joallard
Copy link

joallard commented Mar 6, 2020

Thanks for taking this up! A few comments on the design:

Data model: Seeing the data model here of storing multiple translations of each record (post: {fr: {...attrs}, en: {...}}) instead of storing translation in fields (post: {title: {fr, en}, ...}), how do you handle the case of non-translated attributes, eg date? Or any other non-localized field?

Config naming: I would rename the setting multi_content to i18n_structure, and options same_folder to locale_file_extensions; diff_folder to locale_folders. (Or similar naming that communicates function more clearly)

Scrollbar: I'm seeing a scrollbar in the middle. This poses a UX issue: I wouldn't like to have to scroll twice each time I scroll to somewhere. Keep in mind people will most often want to see the same items at the same height with the least hassle. That's probably a logistical constraint that got you there, though. Is there a way to have the same scroll for both panes? (Not inside of textareas, just overall with the form)

@barthc
Copy link
Contributor Author

barthc commented Mar 6, 2020

Scrollbar: I'm seeing a scrollbar in the middle. This poses a UX issue: I wouldn't like to have to scroll twice each time I scroll to somewhere. Keep in mind people will most often want to see the same items at the same height with the least hassle. That's probably a logistical constraint that got you there, though. Is there a way to have the same scroll for both panes? (Not inside of textareas, just overall with the form)

Both scrolls work in sync, it's using react-scroll-sync

Data model: Seeing the data model here of storing multiple translations of each record (post: {fr: {...attrs}, en: {...}}) instead of storing translation in fields (post: {title: {fr, en}, ...}), how do you handle the case of non-translated attributes, eg date? Or any other non-localized field?

If we are doing multiple files, the date value, for example, is supposed to be present in all the locale files, or just in a single locale file?

@erezrokah
Copy link
Contributor

erezrokah commented Mar 6, 2020

Had a chance to play with it a little bit.
Some notes:

  1. We need to add local backend support for this (currently broken).
  2. When scroll sync is active only one scroll bar should be shown.
  3. Agree with @joallard on the naming.
  4. Had a weird issue when syncing data, looks like changing locale doesn't reset markdown body:
    data_sync
  5. Do we have an example for a static site generator that uses the single file approach? I know Hugo supports multiple files or multiple directories. Would be nice to see how data should be structured in a single file (by locale or by field).
  6. I think we would still want to support path with this feature. Could we add a locale template variable and identify path configurations that are missing it? Possibly add the variable to the path if missing.
  7. With lists - should we sync additions/deletions of items? Meaning should adding an item to a specific locale list add an item to the other locales lists as well? Looks like we should, especially when scroll sync is active.
  8. Type checks are failing on this PR (it was missing from our CI - ci: add missing types check #3384).
  9. Would be nice to show some indication for a translated post on the collection list view.
  10. Missing translations should not prevent saving an entry, assuming translation is an incremental process (I might completely wrong about it though).
  11. locale-codes adds 29.7K gzipped. Can we avoid it or just have a hardcode list?
  12. Infer non translatable fields (like date, and possibly configured) and don't show the UI for them or show them disabled.
  13. Possibly dirty indicator per language like mentioned here Support for multilingual content authoring #716 (comment). Not a must for this PR.

Will add comments on the code as well and do some more testing.
Again, I appreciate the contribution and I think we're not that far ahead from closing this very important feature.

Copy link
Contributor

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barthc, only managed to go through part of the backend code (didn't do the editor code yet).
Will finish reviewing next week.

packages/netlify-cms-backend-bitbucket/src/API.ts Outdated Show resolved Hide resolved
packages/netlify-cms-core/src/__tests__/backend.spec.js Outdated Show resolved Hide resolved
packages/netlify-cms-core/src/actions/config.js Outdated Show resolved Hide resolved
packages/netlify-cms-core/src/backend.ts Outdated Show resolved Hide resolved
packages/netlify-cms-core/src/backend.ts Outdated Show resolved Hide resolved
@erezrokah
Copy link
Contributor

10. Missing translations should not prevent saving an entry, assuming translation is an incremental process (I might completely wrong about it though).

Not sure if we should have a default locale which throws on missing required fields. and other locales don't, or just allow missing fields in translation mode. Maybe @joallard can help here.

@joallard
Copy link

joallard commented Mar 7, 2020

  1. Missing translations should not prevent saving an entry, assuming translation is an incremental process (I might completely wrong about it though).

That's right! I might not even want localizations for a particular entry. There are a lot of different use cases in I18n, so I'd propose we need to keep as much flexibility as realistically possible.

Data model

If we are doing multiple files, the date value, for example, is supposed to be present in all the locale files, or just in a single locale file?

These are excellent questions and I feel like we'll have to balance between incremental first step and defining a data model that will be backwards-compatible for future UX improvements.

Entries have non-localizable fields/data, and localizable fields/localized data.

What's tricky with the conceptual model is that it is two-dimensional. It makes sense to enumerate the fields an entry has, and ask whether each is localizable, required, etc. It also makes sense to enumerate localizations, and ask whether each is blank, complete, dirty, etc.

Entries can be compiled into localized representations by concatenating non-localizable fields with localized ones. (which are in different locations) In the abstract, we might call entry.representation(locale) // Representation or representation.toHtml(). It's the resource that would get served on HTTP as this-post.fr.html or /es/this-post.html

How does this fit in with our data model:

One file per Entry (single_file)

In single_file (treating the file as a big JSON object), each file is an Entry. This should correspond to our internal model of an Entry. We can figure out a neutral place for non-localized data. All good.

Since the file format will be hard to change in the future, I think we should examine the pros and cons of putting field or localization as the first and second dimensions. Should we have entry.field.localization or entry.localization.field?

If I make the case which has attributes as the first dimension/nesting level rather than locales:

  • It makes the read/write strategy simple
  • It keeps localized fields closer together in the file (good for manual editing)
  • It provides for a neutral place to put non-localized data (entry.field)

However:

  • Asking whether a localization is present/dirty can be more complex (query localizable fields, see which keys are present in each, then union; add dirty_locales metadata)

In a database-store context, this was the concept behind trasto, which came with the benefit of not needing distinct thing_translations tables.

(It uses attribute names being suffixed with _i18n to know which field is localizable.)

Compare:

a = {
  date: "",
  locale_independent_id: "foo",
  name: {
    en: "The Name",
    fr: "Le Nom"
  }
}


b = {
  date: "",
  locale_independent_id: "foo",
  localizations: {
    fr: {
      name: "Le Nom"
    },
    en: {
      name: "The Name"
    }
  }
}

One file per localization (locale_exts, locale_folders)

If we have one file per localization, it means we need to infer the Entry data from those. We might need to establish a LocalizationFile <=> Entry logic.

We'll need to know in which file(s) we're reading non-localizable data. The CMS will also need to know where to write new data. This points to the need for each entry to be able to point to a canonical location (file) to read/write non-localizable data.

(I think that having non-localizable data in multiple files would open the door to conflicts and needless complexity)

(This has implications for partial translations. What if .en presence is site-wide enforced and thus canonical file is .en, but I only wanted .fr? .en will need to exist for my entry to be valid/created.)

Under that mode, "Non-localized data is always in the same locale" (A) and "Entries can be created in any one locale" (B) are mutually exclusive.

Options:

  • Site-wide 'default' locale (A)(downside: one language enforced)
  • One of the files has canonical: true or other distinguishing feature, indicating it is the place for non-loc'd data (B)(downside: non-locd data location is unknown until all entry.localizations are read)

Validity

In an I18n context, what is the meaning of validity?

Seems like validity can be assessed on representations rather than entries. (The contrary would mean that only non-localized data can be validated, which is an interesting idea.)

The most general interpretation seems to be that once published, non-blank/complete representations need to be valid.

If we're assuming an incremental process, validations should only be ran on publication (not drafts). However, that doesn't seem to be how NetlifyCMS works at present. Let's assume it has to be valid on creation.

Options for testing validity:

  • the canonical representation needs to be valid
  • any non-blank representation needs to be valid
  • no localized fields can have validations (that would certainly be simple)

Not sure if we should have a default locale which throws on missing required fields. and other locales don't, or just allow missing fields in translation mode. Maybe @joallard can help here.

Right. The meaning of "default" locale here would be that we're enforcing the validity of a specific locale representation. (ie. setting default: en site-wide really means English entries should always be present and valid) That decision of whether to enforce any specific locale validity (even drafts) should ideally be left to the user, even if that array of enforced locales is empty. Ideally.

The reason I want to avoid setting a default locale is that it would obligate records to always be present and complete in that language.

I notice that we're essentially trying algorithmically figure out which localizations the user was attempting to fill as complete, and which ones are meant to be incomplete. This is where tradeoffs are made between UX and incremental code. (and backwards-compatibility)


I'm hoping this helps and clarifies rather than complicates, and that it's not too long-winded. Happy to discuss!

@barthc
Copy link
Contributor Author

barthc commented Mar 9, 2020

Options:
-Site-wide 'default' locale (A)(downside: one language enforced)
-One of the files has canonical: true or other distinguishing feature, indicating it is the place for non-loc'd data (B)(downside: non-locd data location is unknown until all entry.localizations are read)

I think we should go with the first option and allow users to configure default locale per collection..

@erquhart
Copy link
Contributor

@barthc thanks so much for all of your work on this, and to @joallard for providing such solid product guidance! This PR provides a great place to do some POC (proof-of-concept) work while discussing potential solutions.

I'd like to set expectations here, so we're all on the same page, and of course, open these expectations up to be challenged:

  • Because we only have early, rough requirements in Support for multilingual content authoring #716, this PR has to be about defining the feature as much as it is about building the feature. At this early stage, it's probably even more about defining than building.
  • We need to push toward answers for the many questions raised, and discuss product considerations like those raised by @joallard until a path forward is agreed upon.
  • We should all understand that, as long awaited as this feature is, we're not racing toward the finish, but building with intentionality toward a sufficient feature launch.
  • Finally, please know that this functionality is a top current priority - the insistence on "defining" will not equate to this feature being put off any further.

So glad to see all of this moving forward - thanks again to everyone making it happen. We need a whole lot of commenting, responding, and understanding to make this feature a success for the community - let's keep the conversation alive!

@barthc
Copy link
Contributor Author

barthc commented Mar 16, 2020

This is what we currently have. A default_locale config option that would allow each collection to define a default locale which would be used to generate the entry slug filename( identifier_field: 'en.title' for example) and also the locale to store non-translatable fields. So for locale_file_extensions locale_folders mode assuming the default locale is en, we would have files like so:

filename : post-title.en.md or en/post-title.md
---
title: 'post title'
date: '2020-3-17'
---
Post content
filename : post-title.fr.md or fr/post-title.md
---
title: 'titre de l'article'
--- 
Publier un contenu

And then for single_file mode:

filename: post-title.md
en: 
  title: 'post title'
  date: '2020-3-17'
  content:  'Post content'
fr:
  title: 'titre de l'article'
  content: 'Publier un contenu'

@erezrokah erezrokah added the type: feature code contributing to the implementation of a feature and/or user facing functionality label Apr 29, 2020
@reimertz
Copy link

@barthc @joallard, I just want to say, amazing work so far!

We are using netlify-cms for our content edits but are in the progress starting to add translations, so this will add so much value.

I am more than glad to try and help out here to get this to a mergable state.

What are the current blockers except from merge conflicts? How can I help?

@barthc
Copy link
Contributor Author

barthc commented May 27, 2020

@erquhart @erezrokah This PR is long overdue. I suggest we move this forward(beta as usual). We can still add more formats and handle bugs fixes as users start experimenting with it. And speaking about betas some of our betas should be graduating out 😀

@erezrokah
Copy link
Contributor

Sorry for taking so long to respond. We have a PR #3716 that refactors the backends code and will make it much easier to handle multiple data files per entry.
Once we have that in place I’ll do some digging into what’s left to push this forward.

@signalwerk
Copy link

I really appreciate the initiative here for i18n! I just would like to give an input on the naming. I think the concept of i18n or multilingual is too narrow. I would suggest to call it dimension.

Explanation

I live in Switzerland. We have four official languages

  • de_CH – German
  • fr_CH – French
  • it_CH – Italian
  • rm_CH – Romansh

Our biggest trading partner is Germany (Language German). In Germany we have the currency EUR in Switzerland CHF.
So we have basically two dimensions for many sites. One is the language and the other is the Currency. So we use the same Texts for Germany and the german part of Switzerland but different prices. And within the Switzerland the same prices but different Texts.

I don't ask to solve this complex situation here. I just ask to name it not i18n, multilingual or locales. I ask to call it dimensions. I stole that term from NEOS CMS

@barthc barthc force-pushed the feat/multi-content-authoring branch from 23744e5 to bc9dcab Compare June 24, 2020 14:28
@joallard
Copy link

Interesting thought from @signalwerk, I also think it’s important when differentiating (for the lack of a better term) locales. — Side note: I could research this further, but ‘locale’ seems an appropriate term for ‘fr-CH’ here, as eg. French is a language, but Swiss French would be a locale.

Speaking from my experience as a Quebec French speaker, it sure does annoy me when some content localized for French uses some France/Europe French regionalisms. To some’s dismay, France French is not the ‘correct’ or even canonical way to speak the language. It is merely a regional version, as valid as the others. At best, all localizations would be international and neutral, alas that’s not always possible. From my understanding, that’s the case for other languages and locales as well, off the top of my head Spanish (contrast es-MX vocabulary with es-ES, es-CO, es-AR, and so on) and Portuguese (pt-PT/pt-BR) show these patterns.

My main point being: it is important to integrate into the conceputal model of locales that there is no one canonical form of a language, and space must be available at some point for regional variants. Those regional variants should be able to make exceptions/extend the base locale/language rather than completely copy it over.

As to currencies, I think i18n has been used that way in other systems, but this seems out of scope as far as i18n is concerned. As a software matter, currency seems completely orthogonal to me.

It’s interesting Stefan talks about dimensions here, because currency and locale indeed seem independent ‘dimensions’, but not in the way you mean 😅

It might still be good to think about eg supporting multiple currencies on a site would mean for the conceptual model. Probably that we could generalize I18n into ‘dimensions’ indeed. I think it’s a bit premature to integrate that into the code at this point, but good to keep in mind.

@signalwerk
Copy link

@joallard as explained before. I don't ask for

integrate that into the code at this point

I just ask for an independent naming and not tight it to a language, region or something arbitrary. The code can stay the same.

It's a general problem that people wanna have different «variations» (dimensions). No matter if it is currencies, languages, regions, target groups, countries, …

@blackb1rd
Copy link
Contributor

blackb1rd commented Jun 25, 2020

Hi,

I'm using the Hugo static site generator and I believe the locales shall not be a hardcode list.

if we take a look https://gohugo.io/content-management/multilingual/#translation-of-strings

From Hugo 0.31 you no longer need to use a valid language code. It can be anything.

See gohugoio/hugo#3564

And the i18n_test.go file of Hugo at commit 23ba779, you will see

lang: "klingon"

which is not a Unicode CLDR (like en, fr, cn, gb, etc.)

I'm not sure about other static site generator but it's better to avoid the hardcode and user may need es-ES, es-CO, es-AR for specific locales.

@barthc barthc force-pushed the feat/multi-content-authoring branch from 40ada3a to d5816d7 Compare July 7, 2020 13:17
@barthc
Copy link
Contributor Author

barthc commented Jul 7, 2020

I'm not sure about other static site generator but it's better to avoid the hardcode and user may need es-ES, es-CO, es-AR for specific locales.

@blackb1rd the files will be inferred based on locales value, for this initial release, I suggest we constraint the locale values, we will allow more flexibility on further updates.

@erezrokah I have rebased this PR and made the necessary changes.

@erezrokah
Copy link
Contributor

Thanks @barthc, I'll review it tomorrow

@signalwerk
Copy link

@barthc just to understand it (no judgement):

A – you say to restrict people to locales you defined in a hardcoded list is code-wise simpler than give the people the option to define themselves?
B – you don't like the idea of call it more generic so people can then decide to do with that function whatever they like?

Don't get me wrong. I'm really appreciate your work and I think it's really great someone is taking care of it! Love it!

@barthc barthc force-pushed the feat/multi-content-authoring branch from c92be7c to 0f230d4 Compare August 1, 2020 21:18
@erezrokah
Copy link
Contributor

A long overdue update on this PR.

I've created a new branch off this PR (so I won't need to force push to this one), with various fixes to issues I encounter during my testing and review process.

Once I get my branch to a stable state we can sync this one with the changes.

Since the current configuration approach involves several different configurations (locales, i18n_structure, default_locale, translatable, duplicate) some at the root level, some at the collection level and some at the field level I might try to simplify it.

Thanks @barthc, @joallard and everyone for the amazing work.

@joallard
Copy link

joallard commented Aug 5, 2020

@erezrokah Sounds good! (As a matter of project management, I've been reading Shape Up lately, I've been obsessed with it ever since! — At this point in the duration, sounds like a good idea to break it into bits you can be sure to integrate and get done.) Keep it up!

@marcotuna
Copy link

I've been trying the current solution but it still needs some polish here and there.
There are some things that really annoys me:

  • Having to set translatable flag on every field
  • Preview is not enabled nor implemented for multi language
  • The default language is set based on the order of declaration (the first on the list) in config.yml

@erezrokah erezrokah mentioned this pull request Aug 11, 2020
6 tasks
@erezrokah
Copy link
Contributor

erezrokah commented Aug 16, 2020

This PR work is now carried on under #4139.
Again, thanks for everyone involved - we're getting very close to merging and releasing it.
WIP docs are here

@erezrokah erezrokah closed this Aug 16, 2020
@martinjagodic martinjagodic deleted the feat/multi-content-authoring branch April 28, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature code contributing to the implementation of a feature and/or user facing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for multilingual content authoring
10 participants