Skip to content

exmaple to build NER pipeline with kinesis, coreNLP and DynamoDB

Notifications You must be signed in to change notification settings

chyikwei/kinesis-ner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kinesis-ner

CircleCI

This is a exmaple to build a Named Entity Recognizer(NER) pipeline that:

  1. fetch data from Amazon Kinesis stream
  2. process text fields by Stanford CoreNLP and extract entities
  3. store result to dynamoDB table

Getting started:

  1. To build, run:
mvn package
  1. Maven will generate a zip file kinesis-ner-{version}-SNAPSHOT-package.zip in the target/ folder. The zip file contains all dependencies except Stanford NLP model jar since it is too large.

(Note: To include the NLP models in package, you can remove the provided scope for coreNLP models in pom.xml before you run mvn package. Then you can skip step 3.)

  1. Downlaod CoreNLP model.

  2. Unzip the package in step 2.

  3. Make sure your machine have permission to create/read/write Kinesis streams and DynanoDB tables.

  4. Make sure all jars are in your classpath and run:

java -Xmx1536m -cp {class_path} com.chyikwei.app.KinesisNerApplication

(Note: the process will use ~1GB ram)

  1. Put some data into the stream. the sample format is json with uuid, title, text fields. Example:
{
  "uuid": "04947df8-0e9e-4471-a2f9-9af509fb5801",
  "title": "news title",
  "text": "news text"
}
  1. Check entities extracted from coreNLP. they will be stored in DynamoDB's ddb-news-entities table.

  2. clean up AWS resources (kinesis stream, dynamoDB tables) after test. (The settings for stream & table names are here)

About

exmaple to build NER pipeline with kinesis, coreNLP and DynamoDB

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages