Skip to content

Latest commit

 

History

History
7 lines (6 loc) · 477 Bytes

README.md

File metadata and controls

7 lines (6 loc) · 477 Bytes

Tokenization-tutorial-code

This repository contains the code used in my tokenization tutorial video. Tokenization consists of splitting a piece of text into smaller components called tokens. If the text data is split into smaller sentences, it is known as sentence tokenization and if the text data is split into words, then it is known as word tokenization.

To see the complete video explanation of this topic, check out the following link: https://youtu.be/O2jGzHSJzpg