Use NLLB 200 for translations #80

svenseeberg · 2024-11-22T19:26:09Z

Replace the LLM translations with NLLB-200 3.3B model.

svenseeberg · 2024-11-22T20:42:29Z

We can use chunking to work around the token limit:

def split_text(text, max_length=500):
    sentences = text.split('.') "

    chunks = []
    current_chunk = ""

    for sentence in sentences:
        if not sentence.strip():
            continue
        sentence = sentence.strip() + "."
        if len(current_chunk) + len(sentence) <= max_length:
            current_chunk += sentence + " " 
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + " "

    if current_chunk.strip():
        chunks.append(current_chunk.strip())

    return chunks

svenseeberg requested a review from dasgoutam November 22, 2024 19:26

svenseeberg force-pushed the feature/nllb-3b-translations branch 2 times, most recently from 69e4ab8 to 8a63d8b Compare November 22, 2024 20:27

Use NLLB 200 for translations

4afdcb0

svenseeberg force-pushed the feature/nllb-3b-translations branch from 2e40b25 to 4afdcb0 Compare November 23, 2024 07:11

svenseeberg merged commit 4a40727 into main Nov 23, 2024

svenseeberg mentioned this pull request Nov 23, 2024

Evaluate Translation model performance #50

Open

svenseeberg mentioned this pull request Dec 2, 2024

Support for summaries in more languages #87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use NLLB 200 for translations #80

Use NLLB 200 for translations #80

svenseeberg commented Nov 22, 2024 •

edited

Loading

svenseeberg commented Nov 22, 2024 •

edited

Loading

Use NLLB 200 for translations #80

Use NLLB 200 for translations #80

Conversation

svenseeberg commented Nov 22, 2024 • edited Loading

svenseeberg commented Nov 22, 2024 • edited Loading

svenseeberg commented Nov 22, 2024 •

edited

Loading

svenseeberg commented Nov 22, 2024 •

edited

Loading