-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Budget for Summ AI #3065
base: develop
Are you sure you want to change the base?
Budget for Summ AI #3065
Conversation
Code Climate has analyzed commit 292c552 and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 89.9% (50% is the threshold). This pull request will bring the total coverage in the repository to 82.8% (0.0% change). View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, but this needs a major architectural overhaul. Apart from the suggestions below, the main reason I am saying this is the following:
- when
SummAiApiClient.translate_queryset
is called, - a
lambda
function returning a closure callingSummAiApiClient.check_usage
(which overridesMachineTranslationApiClient.check_usage
) is used in the creation of aBudgetHelper
object, - which is used for the instantiation of a
TranslationHelper
object, - which has
TranslationHelper.check_usage
method, - which calls its embedded
BudgetHelper.check_usage
method, - which calls
BudgetHelper._check_usage
, - which is the aforementioned closure.
Now back up the chain:
BudgetHelper.check_usage
is only called byBudgetHelper.allocate
,- which is only called by
TranslationHelper.allocate_budget
, - which is only called by
SummAiApiClient.translate_text_fields
, - which is only called by
SummAiApiClient.translate_queryset
.
Frankly, I question the need for the BudgetEstimate
class, as well as the helper methods on TranslationHelper
.
However, it is difficult to give more directed feedback, simply because of the amount of obfuscation going on due to these enormous call chains.
I have not yet had a chance to examine the async
functions.
if isinstance(translation, AbstractContentTranslation): | ||
attributes = [ | ||
getattr(translation, attr, None) | ||
for attr in ["title", "content", "meta_description"] | ||
] | ||
|
||
content_to_translate = [ | ||
unescape(strip_tags(attr)) for attr in attributes if attr | ||
] | ||
content_to_translate_str = " ".join(content_to_translate) | ||
else: | ||
content_to_translate_str = translation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This errs on nitpick territory, but: I really dislike the isinstance
pattern. What do you think about refactoring this so both this function and check_usage
always get a string as the input?
This would require adding a translateable_content
(or similar) property to the AbstractContentTranslation
model. For cases where the input is string, nothing needs to be changed; whenever a ACT is passed, pass ACT.translateable_content()` instead?
This would also reduce maintanence overhead if we decide to make more fields trasnslatable in the future, since only this new method would need to be tweaked, not multiple functions for multiple translation providers.
This is not 100% necessary, just thought I would put it out there!
|
||
#: cms/models/regions/region.py | ||
msgid "Credits renewal date for simplified language translation" | ||
msgstr "Credits Zurücksetzungsdatum für Übersetzung in Einfache Sprache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msgstr "Credits Zurücksetzungsdatum für Übersetzung in Einfache Sprache" | |
msgstr "Zurücksetzungsdatum der Credits für Übersetzung in Einfache Sprache" |
I know, this is copied from the existing translation, I just noticed that this is probably a more understandable translation. Also applies to the other instance of this.
@@ -7841,6 +7865,14 @@ msgstr "Bereits verbraucht" | |||
msgid "Remaining words" | |||
msgstr "Verbleibende Wörter" | |||
|
|||
#: cms/templates/regions/region_form.html | |||
msgid "Currently HIX is globally deactivated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msgid "Currently HIX is globally deactivated" | |
msgid "HIX is currently deactivated globally" |
Has nothing to do with this PR, really, just noticed it in the diff. THis is a very German ordering of the words, the suggestion is the far more "natural" sentence in English 😄
def check_usage( | ||
self, | ||
region: Region, | ||
source_translation: str | AbstractContentTranslation, | ||
allocated_budget: int = 0, | ||
) -> tuple[bool, int]: | ||
""" | ||
This function checks if the attempted translation would exceed the region's word limit | ||
|
||
:param region: region for which to check usage | ||
:param source_translation: single content object | ||
:param allocated_budget: how many additional words should be considered already spent | ||
:return: translation would exceed limit, word count of attempted translation | ||
""" | ||
words = word_count(source_translation) | ||
|
||
region.refresh_from_db() | ||
# Allow up to SUMM_AI_SOFT_MARGIN more words than the actual limit | ||
word_count_leeway = max( | ||
1, words + allocated_budget - settings.SUMM_AI_SOFT_MARGIN | ||
) | ||
translation_exceeds_limit = region.summ_ai_budget_remaining < word_count_leeway | ||
|
||
return (translation_exceeds_limit, words) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost 1:1 from the MachineTranslationApiClient
. Suggestion to avoid this code duplication:
- add
margin_field
andbudget_field
as class variables ofMachineTranslationApiClient
- use these fields in the
check_usage
method of that class. Then you just need to overwrite the fields inSummAiApiClient
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you either open an issue to refactor this file, or split it into multiple files in this PR? 🙈 660 lines is way too much for a utils.py
😆
@property | ||
def valid(self) -> bool: | ||
""" | ||
Wether or not the translation was successful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wether or not the translation was successful | |
Whether or not the translation was successful |
@property | ||
def word_count(self) -> int: | ||
""" | ||
How many words need to be translated.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many words need to be translated.. | |
How many words need to be translated. |
As this needs some major refactoring and time is sparse at the moment, it'll take some time until we will get back to this |
Short description
This PR adds separate budget accounting for Summ AI. Until now the usage of Summ AI was not metered at all by us.
Proposed changes
summ_ai_budget_used
to theRegion
model, as well as various auxiliary properties and budget related changes analogous to machine translationssumm_ai_budget_used
of the region accordingly after a batch of translations was performedThis required a change to how the asynchronous translation flow works:
PatientTaskQueue
from the start, more are added every time the queue gets low enoughBudgetEstimate
class to keep track of "allocated" budget, via an instance shared with eachTranslationHelper
– the helper class representing a single content translation object responsible for splitting it up into single translatable units and splicing it back together after translationTranslationHelper
instead ofTextField
instances around to batch translation tasks per content translationBudgetEstimate
PatientTaskQueue
is initialized empty andworker()
is run directly to fill it before the workers are initializedSide effects
Resolved issues
This fulfills the first half of #2173
Pull Request Review Guidelines