Skip to content

A search engine in c++ on wikipedia data using pugi xml parser.

Notifications You must be signed in to change notification settings

boredomed/SearchEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SearchEngine

A search engine in c++ on wikipedia data using pugi xml parser.

Setup

Create a c++ project in Visual Studio or any of your choice add teh souce code, liberaries and wikipedia data and you are good to go.

Dependencies

Pugi xml
Wikipedia dump data

Description

Parse the query.
Convert words into wordIDs
Tokenizing the text data
Seek to the start of the doclist in the short barrel for every word.
Scan through the doclists until there is a document that matches all the search terms.
Compute the rank of that document for the query.
Sort the documents that have matched by rank and return the top k.

Specifications

Creating document object loading xml file and creating a tree
Tokenizing the text data
Forward Indexing (ist of terms contained within a particular document) and Inverted Indexing (list of documents containing a given term)
Processing the queery
Traversing the pages ie crawling
Single word, multiword search queeries

About

A search engine in c++ on wikipedia data using pugi xml parser.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages