Google today announced the release of English-to-Spanish, Finnish, Hungarian, and Persian-to-English gender-specific translations in Google Translate that leverage a new paradigm to address gender bias by rewriting or post-editing the initial translation. The tech giant claims the approach is more scalable than an earlier technique underpinning Google Translate’s gender-specific Turkish-to-English translations, chiefly because it doesn’t rely on a data-intensive gender-neutrality detector.
“We’ve made significant progress since our initial launch by increasing the quality of gender-specific translations and also expanding it to 4 more language-pairs,” wrote Google Research senior software engineer Melvin Johnson. “We are committed to further addressing gender bias in Google Translate and plan to extend this work to document-level translation, as well.”
As Johnson explains, the old classifier used for Turkish-to-English gender-specific translations — which was laborious to adapt to new languages — failed to produce masculine and feminine translations independently using a neural machine translation (NMT) system. Moreover, it couldn’t show gender-specific translations for up to 40% of eligible queries because the two translations often weren’t exactly equivalent, except for specific gender-related phenomena.
By contrast, the new rewriting-based method first generates translations and then reviews them to identify instances where a gender-neutral source phrase yielded a gender-specific translation. If that turns out to be the case, a sentence-level rewriter spits out an alternative gendered translation, and both the first and rewritten translations are reviewed to ensure gender is the only difference.
According to Google, building the rewriter involved generating millions of training examples composed of pairs of phrases, each of which included both masculine and feminine translations. Because the data wasn’t readily available, the Google Translate team had to come up with candidate rewrites by swapping gendered pronouns from masculine to feminine (or the other way around), starting with a large monolingual data set. To this novel corpus of rewrites, they applied an in-house language model trained on millions of English sentences to select the best candidates, which netted training data that went from a masculine input to a feminine output and vice versa.
After merging the training data from both directions, the team used it to train a one-layer Transformer-based sequence-to-sequence model. They then introduced punctuation and casing variants in the training data to increase the model robustness, such that the final model can reliably produce the requested masculine or feminine rewrites 99% of the time.
Evaluated on a Google-developed metric called bias reduction, which measures the relative reduction of bias between the new translation system and the existing system (where “bias” is defined as making a gender choice in translation that’s unspecified in the source), Johnson says the new approach results in a bias reduction of ≥90% for translations from Hungarian, Finnish, and Persian-to-English. The bias reduction of the existing Turkish-to-English system improved from 60% to 95%, and the system triggers gender-specific translations with an average precision of 97% — i.e., when it decides to show gender-specific translations, it’s right 97% of the time.
Source: Read Full Article