Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical relations across tags #2435

Merged
merged 12 commits into from
Sep 3, 2020
Merged

Hierarchical relations across tags #2435

merged 12 commits into from
Sep 3, 2020

Conversation

AlexandreMagueresse
Copy link
Contributor

This pull request does not correspond to any specific, but rather is an answer to several issues (#333, #961). This corresponds to my current work for the Kodoeba event.

This contribution aims to provide a way to hierarchically organize tags by allowing tags to have children. For example the tag grammar could have two children tense and aspect, and tense could have past as a child.

I created a new table called TagsLinks that stores the relation between tags. I have not use the Tree behaviour from CakePHP for the only reason that I did not know it existed before implementing, I may consider simplifying my code using this behaviour.

Because I wanted to add autocompletion to the form that enables to specify a new relation between two tags, I made the autocompletion script from sentence/show more general and transformed it into a reusable module.

Finally, I inspired myself from the wall tickets to display the tags arborescence.

@RyckRichards
Copy link
Member

That's really awesome! Thank you so much for that!

By the way, does your "tree system" advice you what is the next tag you could/should use?

Like - I tag a sentence as "past simple". Would it suggest me to tag the person (first, second, third) or even "phrasal verb" or "slang" and/or "spoken language"?

@AlexandreMagueresse
Copy link
Contributor Author

This tag structure is mainly intended to facilitate the exploration/navigation across tags, so that we can have a clearer view of which tags exist and easily find the information we want.

I believe what you mention is out of the focus of my project, but would indeed be so helpful! I think it could be achieved by either rule-based or data-based (correlation) systems. Anyway, this project is a starting point towards tags categorization and consistent management, let's keep dreaming big!

@AndiPersti
Copy link
Contributor

AndiPersti commented Jul 11, 2020

I've taken a first look at your PR and noticed the following:

1 You base your work on a pretty old code version (from March 4th) so I highly recommend to update the code in your dev environment and then rebase your work.
2 Instead of writing SQL files for database changes we nowadays prefer writing them in PHP using the Migrations plugin.
3 Your refactoring of the autocomplete feature removed the ability to use the keyboard for selecting one of the suggestions.
4 The UI needs some improvement 😄 I have all the tags from the production database in my development database and since you currently show all tags on a single page I get a long list of over 7000 tags. Also newly added links are added at the bottom of the list which makes it harder to locate them.

@AlexandreMagueresse
Copy link
Contributor Author

Thank you for your feedback !

  1. Thanks for pointing that out, I'll try and update the code and then rebase my edits.
  2. I do not know what you are referring to: am I changing the structure of any table?
  3. I did not notice that problem, thanks again for pointing it out. I'll try and fix it.
  4. The UI definitely needs refactoring 😄 This is a draft and I did not aim to implement a fully scalable solution, just one that works. I intend to use pagination and reorder the tags according to their creation date.

@AndiPersti
Copy link
Contributor

  1. I do not know what you are referring to: am I changing the structure of any table?

You are creating a new table.
Instead of writing the table definition in a SQL script (docs/database/tables/tags_links.sql) we prefer using CakePHP's Migrations plugin. You can find the migrations files inside config/Migrations, e.g. 20190401052133_CreateExports.php.

@jiru
Copy link
Member

jiru commented Jul 18, 2020

@Ynue Good job with the PR. 👍

There is something I don’t really understand however. Do we have to have a tag just for the sake grouping tags?

For example, the tag grammar could have two children tense and aspect, and tense could have past as a child.

It means we will have tags such as grammar and tense. But what are we going to do with these tags? Are they supposed to be applied to every sentence tagged past? In which case it’s redundant information. Or are they supposed not to be applied to any sentence? In which case they will just clutter the list of tags.

@AlexandreMagueresse
Copy link
Contributor Author

@AndiPersti Thank you for clarifying, I will convert the table creation with the Migration plugin.

@jiru If a sentence is tagged with a certain tag, then it does not need to be tagged by its parents: for example if a sentence is tagged past then it does not need to be tagged past nor grammar. The idea would then be to always tag with the most specific tag(s).

I could have implemented this structure another way, by creating a new entity that would have acted as "parent" tags and only served as containers for "child" tags (sentences would not have been taggable with these containers), but I did not do so for the following reason: there may be some cases where we have a generic parent tag (e.g. animals) with a few child tags (e.g. dogs, cats, ...) and a few sentences related to dolphin but maybe not enough for the tag dolphins to exist. In that case, we could tag these sentence with animals only.

@jiru
Copy link
Member

jiru commented Jul 20, 2020

@Ynue Thanks for clarifying. I see your point. But I think we should handle this case more seamlessly. If I take your example, how are we going to tag sentences about dogs? With both dogs and animals, or dogs only? It would make more sense to tag them with dogs only and make the page that list sentences tagged animals automatically include child tags like dogs. However, I am quite sure tags are going evolve over time. We may very well tag sentences about dolphins with animals now, but one day we may want to tag them with fish instead.

How are we going to change dolphin sentences from animals to fish?
How are we going to make sure that sentences are not tagged with both animals and fish/dog?

@AlexandreMagueresse
Copy link
Contributor Author

@AndiPersti I have solved your first three points and partially the fourth for now (only the ordering) 😃

@jiru Would you suggest creating a separate table that would serve only as container for tags? I indeed now realize it would be more relevant, especially when dealing with tag deletion which is really hard to handle with the structure I implemented. It should not take much time to transition from the current implementation to the one you suggest.

@AlexandreMagueresse
Copy link
Contributor Author

@AndiPersti Thank you again for checking my code! To be sure not to make mistakes anymore, I have used the command line to create migrations. 😃

@jiru I have just recoded everything under a more general and external structure. I created "superTags" that serve as containers for tags, which can contain both tags and superTags. Also, I have adapted the autocompletion module to be more general and applicable onto any table, rather than only on tags (I needed autocompletion for superTags as well).

@jiru
Copy link
Member

jiru commented Jul 31, 2020

@Ynue Thanks for proactively solve the issue I pointed out. I wanted to answer you but I was getting too tired to wrap it up in my mind yesterday. I’m going to review your code.

@jiru
Copy link
Member

jiru commented Aug 7, 2020

@Ynue My apologies for taking so long. Your new implementation solves the problems I mentioned. 👍 But there are still lots of things that can be improved. 🙂

I’ll focus on the database schema because that’s what we need to get right before the rest. I’d like to suggest using the name "tags categories" instead of "super tags". I think naming is important and it’s hard to change it afterwards, so let’s take some time to think about it. In terms of database schema, I suggest the following changes:

  1. Rename table super_tags into tags_categories.
  2. Rename table tags_super_tags into categories_tree.
  3. Rename enum value superTag into category.

For clarity, from now on, I’ll refer to these tables the way I suggested renaming them into. About the field categories_tree.parent, I think you should name it categories_tree.tags_categories_id instead, so that it follows CakePHP naming convention of foreign keys. When you define the table association using ->hasMany(...) or ->belongsTo(...), CakePHP assumes that the foreign key is named like that (if not, you need to actually specify it).

As opposed to that, using the table name categories_tree goes against the CakePHP convention (which would be tags_tags_categories), so you’d have to specify the join table name in the ->belongsToMany() call.

About the child_type field, CakePHP offers a way to link tables according to some conditions. Look for the keyword setConditions in the documentation page about associations. This would allow you to create separate models like, say, SubTagsTable and SubCategoriesTable. You could let them have a common parent class to share some logic. With such a setup, a call to Tags->find()->contain(['TagsCategories' => ['SubTagsTable', 'SubCategoriesTable']]) should return all the childs already separated as sub-tags and sub-categories.

You haven’t used the Tree behavior yet; is there a reason for that? From what I can see, it looks like the Tree behavior allows operations such as counting the total number of childs or getting an arbitrary large subset of the tree in just a single SQL query. I suspect your implementation wouldn’t scale so much on a large tree.

@jiru
Copy link
Member

jiru commented Aug 7, 2020

Also your commit 1786896 adds stuff that was removed a while ago by AndiPersti in a series of commits such as b177e2f.

src/Controller/CategoriesTreeController.php Outdated Show resolved Hide resolved
src/Controller/CategoriesTreeController.php Outdated Show resolved Hide resolved
src/Controller/CategoriesTreeController.php Outdated Show resolved Hide resolved
src/Model/Behavior/AutocompletableBehavior.php Outdated Show resolved Hide resolved
webroot/js/autocompletion.js Outdated Show resolved Hide resolved
src/Model/Table/CategoriesTreeTable.php Outdated Show resolved Hide resolved
src/Model/Table/CategoriesTreeTable.php Outdated Show resolved Hide resolved
src/Model/Table/CategoriesTreeTable.php Outdated Show resolved Hide resolved
src/Model/Table/CategoriesTreeTable.php Outdated Show resolved Hide resolved
src/Model/Table/CategoriesTreeTable.php Outdated Show resolved Hide resolved
@AlexandreMagueresse
Copy link
Contributor Author

@AndiPersti Thank you so much for your thorough review! 👍
I still have not improved the UI to scale to large trees. I wonder whether pagination could be relevant here, closing all branches by default may be enough.

Another possibility the Tree behaviour allows for is handling the position of the leaves of a node, which means we could reorder them, say with arrows somewhere near the names. Is this feature useful?

Copy link
Contributor

@AndiPersti AndiPersti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with the autocomplete feature when I'm selecting the entry with the keyboard (Up and Down keys). The orange background color doesn't get removed from an unselected entry:
autocomplete

src/Template/Element/categories_tree.ctp Outdated Show resolved Hide resolved
webroot/js/categoriesTree.js Outdated Show resolved Hide resolved
webroot/js/categoriesTree.js Outdated Show resolved Hide resolved
@AlexandreMagueresse
Copy link
Contributor Author

There's a problem with the autocomplete feature when I'm selecting the entry with the keyboard (Up and Down keys). The orange background color doesn't get removed from an unselected entry

This problem should be solved now (it was because of a typo).

@AlexandreMagueresse
Copy link
Contributor Author

I changed the way to generate the tree: instead of relying on the treeList find from the TreeBehaviour, I used the threaded find which happens to be more efficient (3426 µs for treeList vs 1686 µs for threaded, averaged on 10 runs on my computer) and enables to sort the tag categories. Tags are now sorted too, under each tag category.

@jiru
Copy link
Member

jiru commented Sep 3, 2020

I think this PR is ready for merge. However, the way we are going to create the tree is still very unclear. Currently, your code gives read-write access to the tree to any advanced user and above, like for tags. But this approach has proven to generate a mess. This is perfect for testing out, but it’s definitely not the way we are going to create the real tree. We need to think this through with the community first.

Just in case, I’d like that we make this clear by making the page unavailable on production. (And also putting the translatable strings in a different domain so that they do not end up on Transifex yet.) So I can merge the PR and do these changes. Is everyone okay with that?

@AndiPersti
Copy link
Contributor

Just in case, I’d like that we make this clear by making the page unavailable on production. (And also putting the translatable strings in a different domain so that they do not end up on Transifex yet.) So I can merge the PR and do these changes. Is everyone okay with that?

Sounds good to me.

@AlexandreMagueresse
Copy link
Contributor Author

It's okay for me! Besides, I believe the translatable strings have to be reviewed in order to better fit to Tatoeba's interface.

@jiru jiru marked this pull request as ready for review September 3, 2020 20:10
jiru added a commit that referenced this pull request Sep 3, 2020
jiru added a commit that referenced this pull request Sep 3, 2020
The way we are going to create the tree is still unclear.
Giving read-write access to the tree to any advanced user
and above, like for tags, is perfect for testing out, but
it’s definitely not the way we are going to create the real
tree. We need to think this through with the community
first.

Refs #2435.
jiru added a commit that referenced this pull request Sep 3, 2020
There is no need to have translators translate these
strings yet. To avoid having these ending up on Transifex,
I put them in a separate domain, just like we currently do
for admin pages.

Refs #2435.
jiru added a commit that referenced this pull request Sep 3, 2020
This page is not ready for production yet.

Refs #2435.
@jiru jiru merged commit 38f1f12 into Tatoeba:dev Sep 3, 2020
@jiru jiru added this to the 2020-09-06 milestone Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants