Skip to content

Commit

Permalink
Fix part of #4562: Add string consistency tool & CI check (#4563)
Browse files Browse the repository at this point in the history
## Explanation
Fixes part of #4562.

This PR introduces two new scripts:
- One for computing how many strings are not yet translated for Arabic, Brazilian Portuguese, and Swahili (as compared to the base English strings)
- One for verifying newline consistency between the English & non-English strings (based on number of lines)

The latter was also added as one of the jobs to be run during the static checks CI workflow, and is specifically useful to ensure newlines weren't inadvertently added by translators (see the updated translation strings for an idea on how often this has happened). Note that, as part of fixing these strings, the source string for lets_get_started was updated to no longer include a newline. This seems reasonable given that we should never use newlines for spacing/styling unless it's for logical splits (like paragraphs). However, I can't verify whether actual style changes are needed since this string is used as part of the unreleased app walkthrough feature. I expect that we'll re-audit the UI of that feature when we revisit it.

This PR also includes a major performance fix for RepositoryFile that will benefit all checks that use it. Kotlin's built-in file tree walker is built to implicitly follow symlinks on the filesystem. For Bazel builds, this expands the codebase to >1M files (since directory filtering happens *after* the files are known). Since scripts don't actually need to follow symlinks, the utility was updated to specifically not follow them (using Java NIO's file tree walk routine) which led to an observed ~10x performance improvement. Further optimizations could be done by building our own tree-walker and filtering directories during search (which will probably be necessary if we ever _do_ need to follow symlinks, but it probably won't be needed for a long time since the current performance is quite solid).

The former script is useful when manually auditing the translation progress using the codebase as the source of truth rather than Translatewiki. For now, it's been helpful in beta MR1 planning but longer-term it may also be useful as a badge on the repo's README.

Finally, the Kotlin stdlib dependency was updated to point to jdk8 instead of jdk7 (since a specific feature was needed from the former, and the codebase can rely on higher Java SDK versions than 7 given that the minimum Java version is 1.8).

## Essential Checklist
- [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".)
- [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation.
- [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide).
- [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)).
- [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop".
- [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)).

## For UI-specific PRs only
N/A -- this is a developer infrastructure-only PR, except for all of the string newline fixes (which just result in a more correct UI for non-English), and the changed English string (which is inaccessible to demonstrate). Screenshots could be uploaded, but the newlines need to be removed regardless so it doesn't seem specifically useful here (plus those languages may not be obvious to verify for reviewers).

Commits:

* Create dedicated alpha application component.

This simplifies application component management significantly and
allows individual build flavors to have their own unique module lists.

* Add beta & GA update notices.

This also introduces dedicated beta & GA build flavors which is a
necessary prerequisite.

It also introduces an extra beta, alpha, and dev mode labels for the splash
screen (the latter 2 were extra) with 2 second minimum wait timers for
beta and alpha to ensure they are seen. A 5-second safety timer was
added to ensure the splash screen can always be passed even if something
goes wrong at the domain level (since there are now quite a few moving
pieces to determine the user's current onboarding state).

* Add build tests for the new beta & GA flavors.

* Fix broken test per earlier changes.

* Fix general broken tests & builds.

Tests broken due to changes to the app startup experience haven't yet
been fixed.

* Lint fixes.

* First part of adding tests for GA notices.

There's a bunch left to do here, this is mainly needed so that I can
transfer changes to a different machine.

* Update TransformAndroidManifestTest.kt

Correct typos.

* Fix tests & static checks.

This also removes temporary debug code and TODOs, and finishes the tests
for SplashActivity.

* Post-merge fixes.

* Test fixes.

* Fix Gradle test.

* Add some string resource checks/tools.

Also, fixes major performance issue with all file-based CI checks.

* Ensure newline consistency in translated strings.

Also, fix reporting in new validation check script.

* Add tests & fix static checks.

* Follow-up adjustments after self-review.
  • Loading branch information
BenHenning authored Sep 8, 2022
1 parent d22f502 commit 34f9b16
Show file tree
Hide file tree
Showing 17 changed files with 1,266 additions and 133 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/static_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,11 @@ jobs:
gh issue list --limit 2000 --repo oppia/oppia-android --json number > $(pwd)/open_issues.json
bazel run //scripts:todo_open_check -- $(pwd) scripts/assets/todo_open_exemptions.pb open_issues.json
- name: String Resource Validation Check
if: always()
run: |
bazel run //scripts:string_resource_validation_check -- $(pwd)
# Note that caching is intentionally not enabled for this check since licenses should always be
# verified without any potential influence from earlier builds (i.e. always from a clean build to
# ensure the results exactly match the current state of the repository).
Expand Down
10 changes: 5 additions & 5 deletions app/src/main/res/values-ar/strings.xml

Large diffs are not rendered by default.

238 changes: 119 additions & 119 deletions app/src/main/res/values-sw/strings.xml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion app/src/main/res/values/strings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@
<string name="what_do_you_want_to_learn">What do you want to learn?</string>
<!-- WalkthroughEndFragment -->
<string name="great">Great</string>
<string name="lets_get_started">Let’s get \nstarted.</string>
<string name="lets_get_started">Let’s get started.</string>
<string name="yes">Yes</string>
<string name="no">No…</string>
<string name="pick_a_different_topic">Pick a \ndifferent topic.</string>
Expand Down
14 changes: 14 additions & 0 deletions scripts/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,20 @@ kt_jvm_binary(
runtime_deps = ["//scripts/src/java/org/oppia/android/scripts/xml:xml_syntax_check_lib"],
)

kt_jvm_binary(
name = "string_language_translation_check",
testonly = True,
main_class = "org.oppia.android.scripts.xml.StringLanguageTranslationCheckKt",
runtime_deps = ["//scripts/src/java/org/oppia/android/scripts/xml:string_language_translation_check_lib"],
)

kt_jvm_binary(
name = "string_resource_validation_check",
testonly = True,
main_class = "org.oppia.android.scripts.xml.StringResourceValidationCheckKt",
runtime_deps = ["//scripts/src/java/org/oppia/android/scripts/xml:string_resource_validation_check_lib"],
)

TEST_FILE_EXEMPTION_ASSETS = generate_test_file_assets_list_from_text_protos(
name = "test_file_exemption_assets",
test_file_exemptions_name = ["test_file_exemptions"],
Expand Down
3 changes: 3 additions & 0 deletions scripts/src/java/org/oppia/android/scripts/common/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,7 @@ kt_jvm_library(
testonly = True,
srcs = ["RepositoryFile.kt"],
visibility = ["//scripts:oppia_script_library_visibility"],
deps = [
"//third_party:org_jetbrains_kotlin_kotlin-stdlib-jdk8_jar",
],
)
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
package org.oppia.android.scripts.common

import java.io.File
import java.nio.file.Files
import java.nio.file.Path
import kotlin.streams.asSequence

/** Helper class for managing & accessing files within the project repository. */
class RepositoryFile() {
Expand Down Expand Up @@ -41,7 +44,10 @@ class RepositoryFile() {
expectedExtension: String = "",
exemptionsList: List<String> = listOf<String>()
): List<File> {
return File(repoPath).walk().filter { file ->
// Note that Files.walk() is used instead of Kotlin's walk() function since the latter follows
// symbolic links which is almost 10x slower than not following them (due to very deep Bazel
// build directories), and it's not necessary to follow the symlinks.
return Files.walk(File(repoPath).toPath()).asSequence().map(Path::toFile).filter { file ->
val isProhibited = checkIfProhibitedFile(retrieveRelativeFilePath(file, repoPath))
!isProhibited &&
file.isFile &&
Expand Down
32 changes: 31 additions & 1 deletion scripts/src/java/org/oppia/android/scripts/xml/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -1,10 +1,40 @@
"""
Libraries corresponding to XML syntax based check to ensure that all the XML files in the codebase
are syntactically correct.
are syntactically correct and consistent.
"""

load("@io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kt_jvm_library")

kt_jvm_library(
name = "string_language_translation_check_lib",
testonly = True,
srcs = ["StringLanguageTranslationCheck.kt"],
visibility = ["//scripts:oppia_script_binary_visibility"],
deps = [
":string_resource_parser",
],
)

kt_jvm_library(
name = "string_resource_parser",
testonly = True,
srcs = ["StringResourceParser.kt"],
visibility = ["//scripts:oppia_script_test_visibility"],
deps = [
"//scripts/src/java/org/oppia/android/scripts/common:repository_file",
],
)

kt_jvm_library(
name = "string_resource_validation_check_lib",
testonly = True,
srcs = ["StringResourceValidationCheck.kt"],
visibility = ["//scripts:oppia_script_binary_visibility"],
deps = [
":string_resource_parser",
],
)

kt_jvm_library(
name = "xml_syntax_error_handler",
testonly = True,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
package org.oppia.android.scripts.xml

import java.io.File

/**
* Script for checking if all strings have been translated across all supported languages.
*
* Usage:
* bazel run //scripts:string_language_translation_check -- <path_to_directory_root>
*
* Arguments:
* - path_to_directory_root: directory path to the root of the Oppia Android repository.
*
* Example:
* bazel run //scripts:string_language_translation_check -- $(pwd)
*/
fun main(vararg args: String) {
require(args.isNotEmpty()) {
"Expected: bazel run //scripts:string_language_translation_check -- <repo_path>"
}

// Path of the repo to be analyzed.
val repoPath = "${args[0]}/"

val parser = StringResourceParser(File(repoPath))
val baseTranslations = parser.retrieveBaseStringNames()
val missingTranslations = parser.retrieveAllNonEnglishTranslations().mapValues { (_, xlations) ->
baseTranslations - xlations.strings.keys
}
val missingTranslationCount = missingTranslations.values.sumOf { it.size }
println("$missingTranslationCount translation(s) were found missing.")
if (missingTranslationCount > 0) {
println()
println("Missing translations:")
missingTranslations.forEach { (language, translations) ->
if (translations.isNotEmpty()) {
println("${language.name} (${translations.size}/$missingTranslationCount):")
translations.forEach { translation ->
println("- $translation")
}
println()
}
}
}
}
109 changes: 109 additions & 0 deletions scripts/src/java/org/oppia/android/scripts/xml/StringResourceParser.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
package org.oppia.android.scripts.xml

import org.oppia.android.scripts.common.RepositoryFile
import org.w3c.dom.Node
import org.w3c.dom.NodeList
import java.io.File
import javax.xml.parsers.DocumentBuilderFactory

/**
* Parser and processor for all UI-facing string resources, for use in validation and analysis
* scripts.
*
* @property repoRoot the root of the Oppia Android repository being processed
*/
class StringResourceParser(private val repoRoot: File) {
private val translations by lazy { parseTranslations() }
private val documentBuilderFactory by lazy { DocumentBuilderFactory.newInstance() }

/** Returns the [StringFile] corresponding to the base (i.e. untranslated English) strings. */
fun retrieveBaseStringFile(): StringFile = translations.getValue(TranslationLanguage.ENGLISH)

/** Returns the [Set] of all string keys contained within the base strings file. */
fun retrieveBaseStringNames(): Set<String> = retrieveBaseStringFile().strings.keys

/**
* Returns a map of all [StringFile]s (keyed by their [StringFile.language]) which represent
* actual translations (i.e. all non-base files--see [retrieveBaseStringFile] for the base
* strings).
*/
fun retrieveAllNonEnglishTranslations(): Map<TranslationLanguage, StringFile> =
translations.filter { (language, _) -> language != TranslationLanguage.ENGLISH }

private fun parseTranslations(): Map<TranslationLanguage, StringFile> {
// A list of all XML files in the repo to be analyzed.
val stringFiles = RepositoryFile.collectSearchFiles(
repoPath = repoRoot.absolutePath,
expectedExtension = ".xml"
).filter {
it.toRelativeString(repoRoot).startsWith("app/") && it.nameWithoutExtension == "strings"
}.associateBy {
checkNotNull(it.parentFile?.name?.let(::findTranslationLanguage)) {
"Strings file '${it.toRelativeString(repoRoot)}' does not correspond to a known language:" +
" ${it.parentFile?.name}"
}
}.toSortedMap() // Sorted for consistent output.
val expectedLanguages = TranslationLanguage.values().toSet()
check(expectedLanguages == stringFiles.keys) {
"Missing translation strings for language(s):" +
" ${(expectedLanguages - stringFiles.keys).joinToString() }"
}
return stringFiles.map { (language, file) ->
language to StringFile(language, file, file.parseStrings())
}.toMap()
}

private fun File.parseStrings(): Map<String, String> {
val manifestDocument = documentBuilderFactory.parseXmlFile(this)
val stringsElem = manifestDocument.getChildSequence().single { it.nodeName == "resources" }
val stringElems = stringsElem.getChildSequence().filter { it.nodeName == "string" }
return stringElems.associate {
checkNotNull(it.attributes.getNamedItem("name")?.nodeValue) to checkNotNull(it.textContent)
}
}

/**
* The language given strings have been translated to/are being represented in.
*
* @property valuesDirectoryName the name of the resource values directory that is expected to
* contain a strings.xml file for strings related to this language
*/
enum class TranslationLanguage(val valuesDirectoryName: String) {
/** Corresponds to Arabic (ar) translations. */
ARABIC(valuesDirectoryName = "values-ar"),

/** Corresponds to Brazilian Portuguese (pt-rBR) translations. */
BRAZILIAN_PORTUGUESE(valuesDirectoryName = "values-pt-rBR"),

/** Corresponds to English (en) translations. */
ENGLISH(valuesDirectoryName = "values"),

/** Corresponds to Swahili (sw) translations. */
SWAHILI(valuesDirectoryName = "values-sw");
}

/**
* A record of a specific set of translations corresponding to one language.
*
* @property language the language of this string file
* @property file the direct [File] to the strings.xml containing the translations
* @property strings a map with keys of string names and values of the actual strings retrieved
* from the strings.xml file
*/
data class StringFile(
val language: TranslationLanguage,
val file: File,
val strings: Map<String, String>
)

private companion object {
private fun DocumentBuilderFactory.parseXmlFile(file: File) = newDocumentBuilder().parse(file)

private fun Node.getChildSequence() = childNodes.asSequence()

private fun NodeList.asSequence() = (0 until length).asSequence().map(this::item)

private fun findTranslationLanguage(valuesDirectoryName: String) =
TranslationLanguage.values().find { it.valuesDirectoryName == valuesDirectoryName }
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
package org.oppia.android.scripts.xml

import org.oppia.android.scripts.xml.StringResourceParser.StringFile
import org.oppia.android.scripts.xml.StringResourceParser.TranslationLanguage
import java.io.File

/**
* Script for validating consistency between translated and base string resources.
*
* Usage:
* bazel run //scripts:string_resource_validation_check -- <path_to_directory_root>
*
* Arguments:
* - path_to_directory_root: directory path to the root of the Oppia Android repository.
*
* Example:
* bazel run //scripts:string_resource_validation_check -- $(pwd)
*/
fun main(vararg args: String) {
require(args.isNotEmpty()) {
"Expected: bazel run //scripts:string_resource_validation_check -- <repo_path>"
}

// Path of the repo to be analyzed.
val repoPath = "${args[0]}/"
val repoRoot = File(repoPath)

data class Finding(val language: TranslationLanguage, val file: File, val errorLine: String)
val parser = StringResourceParser(repoRoot)
val baseFile = parser.retrieveBaseStringFile()
val otherTranslations = parser.retrieveAllNonEnglishTranslations()
val inconsistencies = otherTranslations.entries.fold(listOf<Finding>()) { errors, entry ->
val (_, translatedFile) = entry
errors + computeInconsistenciesBetween(baseFile, translatedFile).map { line ->
Finding(translatedFile.language, translatedFile.file, line)
}
}.groupBy(keySelector = { it.language to it.file }, valueTransform = { it.errorLine })

if (inconsistencies.isNotEmpty()) {
println("${inconsistencies.size} language(s) were found with string consistency errors.")
println()

inconsistencies.forEach { (context, errorLines) ->
val (language, file) = context
println(
"${errorLines.size} consistency error(s) were found for ${language.name} strings (file:" +
" ${file.toRelativeString(repoRoot)}):"
)
errorLines.forEach { println("- $it") }
println()
}
throw Exception("STRING RESOURCE VALIDATION CHECKS FAILED")
} else println("STRING RESOURCE VALIDATION CHECKS PASSED")
}

private fun computeInconsistenciesBetween(
baseFile: StringFile,
translatedFile: StringFile
): List<String> {
val commonTranslations = baseFile.strings.intersectWith(translatedFile.strings)

// Check for inconsistent newlines post-translation.
return commonTranslations.mapNotNull { (stringName, stringPair) ->
val (baseString, translatedString) = stringPair
val baseLines = baseString.split("\\n")
val translatedLines = translatedString.split("\\n")
return@mapNotNull if (baseLines.size != translatedLines.size) {
"string $stringName: original translation uses ${baseLines.size} line(s) but translation" +
" uses ${translatedLines.size} line(s). Please remove any extra lines or add any that are" +
" missing."
} else null // The number of lines match.
}
}

private fun Map<String, String>.intersectWith(other: Map<String, String>) =
keys.intersect(other.keys).associateWith { getValue(it) to other.getValue(it) }
35 changes: 34 additions & 1 deletion scripts/src/javatests/org/oppia/android/scripts/xml/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -1,10 +1,43 @@
"""
Tests corresponding to XML syntax based check to ensure that all the XML files in the codebase are
syntactically correct.
syntactically correct and consistent.
"""

load("@io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kt_jvm_test")

kt_jvm_test(
name = "StringLanguageTranslationCheckTest",
srcs = ["StringLanguageTranslationCheckTest.kt"],
deps = [
"//scripts/src/java/org/oppia/android/scripts/xml:string_language_translation_check_lib",
"//testing:assertion_helpers",
"//third_party:com_google_truth_truth",
"//third_party:org_jetbrains_kotlin_kotlin-test-junit",
],
)

kt_jvm_test(
name = "StringResourceParserTest",
srcs = ["StringResourceParserTest.kt"],
deps = [
"//scripts/src/java/org/oppia/android/scripts/xml:string_resource_parser",
"//testing:assertion_helpers",
"//third_party:com_google_truth_truth",
"//third_party:org_jetbrains_kotlin_kotlin-test-junit",
],
)

kt_jvm_test(
name = "StringResourceValidationCheckTest",
srcs = ["StringResourceValidationCheckTest.kt"],
deps = [
"//scripts/src/java/org/oppia/android/scripts/xml:string_resource_validation_check_lib",
"//testing:assertion_helpers",
"//third_party:com_google_truth_truth",
"//third_party:org_jetbrains_kotlin_kotlin-test-junit",
],
)

kt_jvm_test(
name = "XmlSyntaxCheckTest",
srcs = ["XmlSyntaxCheckTest.kt"],
Expand Down
Loading

0 comments on commit 34f9b16

Please sign in to comment.