-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UHF-8727: Added transliterating for multiple languages. #571
Conversation
Codecov Report
@@ Coverage Diff @@
## main #571 +/- ##
=========================================
Coverage 12.74% 12.74%
Complexity 236 236
=========================================
Files 30 30
Lines 902 902
=========================================
Hits 115 115
Misses 787 787 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works good, but I'd simplify the swaps a bit to reduce loops and code size.
- Since we're lowercasing the
content.textContent
before we're matching, there's no need for the uppercase matching. We can combine all matching to lowercase ones. - We do not need to do separate regex queries for each character, we can combine the characters to a single query and reduce the loops considerably. So instead of checking for all characters that become
b
separately with 6 regex matches: 'b': ['б', 'β', 'ب', 'ဗ', 'ბ', 'b'] (and another 4 for large B'B': ['Б', 'Β', 'ब', 'B'],
), we can check for any of the characters that becomeb
with a single regex'b': '[бβبဗბbब]',
that has the lowercase matches of both of the b and B combined.
Here's a code that does what I described above, please consider using it:
'use strict';
(function (Drupal, once, drupalSettings) {
Drupal.behaviors.table_of_contents = {
attach: function attach() {
function findAvailableId(name, reserved, anchors, count) {
let newName = name;
if (count > 0) { // Only when headings are not unique on page we want to add counter
newName += '-' + count;
}
if (reserved.includes(newName)) {
return findAvailableId(name, reserved, anchors, ++count);
} else if (anchors.includes(newName)) {
if (count === 0) {
count++; // When reserved heading is visible on page, lets start counting from 2 instead of 1
}
return findAvailableId(name, reserved, anchors, ++count);
}
return newName;
}
const anchors = [];
const tableOfContents = document.getElementById('helfi-toc-table-of-contents');
const tableOfContentsList = document.querySelector('#helfi-toc-table-of-contents-list > ul');
const mainContent = document.querySelector('main.layout-main-wrapper');
const reservedElems = document.querySelectorAll('[id]');
const reserved = []; // Let's list current id's here to avoid creating duplicates
reservedElems.forEach(function (elem) {
reserved.push(elem.id);
});
// Exclude elements from TOC that are not content:
// e.g. TOC, sidebar, cookie compliency-banner etc.
const exclusions = '' +
':not(.layout-sidebar-first *)' +
':not(.layout-sidebar-second *)' +
':not(.tools__container *)' +
':not(.breadcrumb__container *)' +
':not(#helfi-toc-table-of-contents *)' +
':not(.embedded-content-cookie-compliance *)' +
':not(.react-and-share-cookie-compliance *)'
const titleComponents = [
'h2'+exclusions,
'h3'+exclusions,
'h4'+exclusions,
'h5'+exclusions,
'h6'+exclusions,
];
const mainLanguages = [
'en',
'fi',
'sv',
];
const swaps = {
'0': '[°₀۰0]',
'1': '[¹₁۱1]',
'2': '[²₂۲2]',
'3': '[³₃۳3]',
'4': '[⁴₄۴٤4]',
'5': '[⁵₅۵٥5]',
'6': '[⁶₆۶٦6]',
'7': '[⁷₇۷7]',
'8': '[⁸₈۸8]',
'9': '[⁹₉۹9]',
'a': '[àáảãạăắằẳẵặâấầẩẫậāąåαάἀἁἂἃἄἅἆἇᾀᾁᾂᾃᾄᾅᾆᾇὰᾰᾱᾲᾳᾴᾶᾷаأအာါǻǎªაअاaä]',
'b': '[бβبဗბbब]',
'c': '[çćčĉċc©]',
'd': '[ďðđƌȡɖɗᵭᶁᶑдδدضဍဒდdᴅᴆ]',
'e': '[éèẻẽẹêếềểễệëēęěĕėεέἐἑἒἓἔἕὲеёэєəဧေဲეएإئe]',
'f': '[фφفƒფf]',
'g': '[ĝğġģгґγဂგگg]',
'h': '[ĥħηήحهဟှჰh]',
'i': '[íìỉĩịîïīĭįıιίϊΐἰἱἲἳἴἵἶἷὶῐῑῒῖῗіїиဣိီည်ǐიइیii̇ϒ]',
'j': '[ĵјჯجj]',
'k': '[ķĸкκقكကკქکk]',
'l': '[łľĺļŀлλلလლlल]',
'm': '[мμمမმm]',
'n': '[ñńňņʼnŋνнنနნn]',
'o': '[óòỏõọôốồổỗộơớờởỡợøōőŏοὀὁὂὃὄὅὸόоوθိုǒǿºოओoöө]',
'p': '[пπပპپp]',
'q': '[ყq]',
'r': '[ŕřŗрρرრr]',
's': '[śšşсσșςسصစſსsŝ]',
't': '[ťţтτțتطဋတŧთტt]',
'u': '[úùủũụưứừửữựûūůűŭųµуဉုူǔǖǘǚǜუउuўü]',
'v': '[вვϐv]',
'w': '[ŵωώဝွw]',
'x': '[χξx]',
'y': '[ýỳỷỹỵÿŷйыυϋύΰيယyῠῡὺ]',
'z': '[źžżзζزဇზz]',
'aa': '[عआآ]',
'ae': '[æǽ]',
'ai': '[ऐ]',
'ch': '[чჩჭچ]',
'dj': '[ђđ]',
'dz': '[џძ]',
'ei': '[ऍ]',
'gh': '[غღ]',
'ii': '[ई]',
'ij': '[ij]',
'kh': '[хخხ]',
'lj': '[љ]',
'nj': '[њ]',
'oe': '[öœؤ]',
'oi': '[ऑ]',
'oii': '[ऒ]',
'ps': '[ψ]',
'sh': '[шშش]',
'shch': '[щ]',
'ss': '[ß]',
'sx': '[ŝ]',
'th': '[þϑثذظ]',
'ts': '[цცწ]',
'ue': '[ü]',
'uu': '[ऊ]',
'ya': '[я]',
'yu': '[ю]',
'zh': '[жჟژ]',
'gx': '[ĝ]',
'hx': '[ĥ]',
'jx': '[ĵ]',
};
// Craft table of contents.
once('table-of-contents', titleComponents.join(','), mainContent)
.forEach(function (content) {
let name = content.textContent
.toLowerCase()
.trim();
// To ensure backwards compatibility, this is done only to "other" languages.
if (!mainLanguages.includes(drupalSettings.path.currentLanguage)) {
Object.keys(swaps).forEach((swap) => {
name = name.replace(new RegExp(swaps[swap], 'g'), swap);
});
}
else {
name = name
.replace(/ä/gi, 'a')
.replace(/ö/gi, 'o')
.replace(/å/gi, 'a');
}
name = name.replace(/\W/g, '-').replace(/\s/g, '-').replace(/-(\d+)$/g, '_$1');
let nodeName = content.nodeName.toLowerCase();
if (nodeName === 'button') {
nodeName = content.parentElement.nodeName.toLowerCase();
}
const anchorName = content.id
? content.id
: findAvailableId(name, reserved, anchors, 0);
anchors.push(anchorName);
// Create table of contents if component is enabled.
if (tableOfContentsList && nodeName === "h2") {
let listItem = document.createElement('li');
listItem.classList.add('table-of-contents__item');
let link = document.createElement('a');
link.classList.add('table-of-contents__link');
link.href = '#' + anchorName;
link.textContent = content.textContent.trim();
listItem.appendChild(link);
tableOfContentsList.appendChild(listItem);
}
// Create anchor links.
content.setAttribute('id', anchorName);
});
// Remove loading text.
if (tableOfContents) {
const removeElements = tableOfContents.querySelectorAll('.js-remove');
removeElements.forEach(function (element) {
element.remove();
});
}
},
};
})(Drupal, once, drupalSettings);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please remember to check linter problems. I think there were some problems from this old piece of code that I probably wrote at some point. Now would be a nice time to fix those issues.
Very good improvements, thanks! 👏 Meant to also do the array structure change but got lost trying to figure out the Chinese and totally forgot about it. The changes are applied and I fixed some of the linting errors, but I think some might be left, I didn't get the linter fully work and it might have some wrong configs so a bit careful with the changes. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
UHF-8727
On some languages (e.g. Arabic, Ukrainian..) the ids of headings become just dashes (------------------).
This doesn't fix Chinese language as that's a bit more complicated.
What was done
How to install
git pull origin dev
make fresh
composer require drupal/helfi_platform_config:dev-UHF-8727_automatic-id-tweaks
make drush-cr
How to test