Skip to content
#

tokenization

Here are 801 public repositories matching this topic...

This project utilizes a machine learning model where consumer brand data is employed. Initially, a preliminary model is developed, followed by a refined model using a process called 'fine-tuning' to improve results. Additionally, a comprehensive testing suite has been created to validate accuracy and reliability of the model's predictions.

  • Updated Feb 8, 2024
  • Jupyter Notebook

The project aims to build a search engine for EncyclEarthpedia by retrieving and processing content from Wikipedia articles, despite the unavailability of their database and API. Key tasks include retrieving Wikipedia content, cleaning and processing text data, tokenizing the content, counting token frequency, and visualizing the mostfrequenttokens

  • Updated Aug 7, 2023
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."

Learn more