Skip to content

Requirements for Text Mining Summer Course (Lab Session)

Notifications You must be signed in to change notification settings

omarsar/text_mining_lab_2017

Repository files navigation

Hello Everyone,

Here is the list of packages needed for our Text Mining Lab Session scheduled for 6/29/2017 (2:00-5:00 p.m.)

Updates:


  • I have uploaded some poster examples of some past students. (Check the posters folder)
  • For the guys intereted in the slack community, send me your email to ellfae@gmail and I will provide an invite
  • If you have any other questions or technical problems, feel free to stop by Idea Lab Delta 701. I will be more than happy to assist.
  • I may extend the python notebook based on the excellent questions you guys asked (e.g., more statistics, visuals, etc.)
  • Lastly, good luck and enjoy your stay here.

Software:


Computing Resources:


  • Operating System: Preferably Linux or MacOS (Windows break but you can try it out)
  • RAM: 4GB
  • Disk Space: 8GB (mostly to store word embeddings)

Test:


Once you have installed all the necessary packages, you can test to see if everything is working by running the following python code:

import logging
logging.root.handlers = []  # Jupyter messes up logging so needs a reset
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
from smart_open import smart_open
import pandas as pd
import numpy as np
from numpy import random
import gensim
import nltk
from sklearn.cross_validation import train_test_split
from sklearn import linear_model
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
from gensim.models import Word2Vec
from sklearn.neighbors import KNeighborsClassifier
from sklearn import linear_model
from nltk.corpus import stopwords
%matplotlib inline

If you have any further questions please feel free to contact me at [email protected]

Have Fun,

Elvis Saravia (Text Mining TA)