SCIATICA

Repository for End term submission for Information Retrieval course (CS60092) offered in Spring semester 2023, Department of CSE, IIT Kharagpur.

Research for research papers

Report Bug · Request Feature

Table of Contents

About The Project
Getting Started
- Directory structure
Colab Notebooks

About The Project

This project is an attempt of implementing and improving on the work of Sheshera Mysore, Tim O'Gorman, Andrew McCallum, Hamed Zamani titled CSFCube - A Test Collection of Computer Science Papers for Faceted Query by Example

The dataset can be found here

The paper describing the dataset can be accessed here

Demo video:

Team members:

Ashwani Kumar Kamal - 20CS10011
Hardik Pravin Soni - 20CS30023
Shiladitya De - 20CS30061
Sourabh Soumyakanta Das - 20CS30051

(back to top)

Getting Started

A quick introduction of the minimal setup you need to get the application up

pip install -r requirements.txt
streamlit run deploy.py

Directory Structure

Any .ipynb files that need to be run must be placed in this root directory which will contain the /data directory and /Results directory.
The data directory contains the CSFCube dataset

.
├── abstracts-csfcube-preds.json
├── abstracts-csfcube-preds.jsonl
├── abstracts-csfcube-preds-no-unicode.jsonl
├── evaluation_splits.json
├── test-pid2anns-csfcube-background.json
├── test-pid2anns-csfcube-method.json
├── test-pid2anns-csfcube-result.json
└── test-pid2pool-csfcube.json

The Results directory contains the embeddings generated from the models used

.
├── alberta
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-alberta-background-ranked.json
│   ├── test-pid2pool-csfcube-alberta-method-ranked.json
│   └── test-pid2pool-csfcube-alberta-result-ranked.json
├── allenai_specter
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-allenai_specter-background-ranked.json
│   ├── test-pid2pool-csfcube-allenai_specter-method-ranked.json
│   └── test-pid2pool-csfcube-allenai_specter-result-ranked.json
├── all_mpnet_base_v2
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-all_mpnet_base_v2-background-ranked.json
│   ├── test-pid2pool-csfcube-all_mpnet_base_v2-method-ranked.json
│   └── test-pid2pool-csfcube-all_mpnet_base_v2-result-ranked.json
├── bert_nli
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-bert_nli-background-ranked.json
│   ├── test-pid2pool-csfcube-bert_nli-method-ranked.json
│   └── test-pid2pool-csfcube-bert_nli-result-ranked.json
├── bert_pp
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-bert_pp-background-ranked.json
│   ├── test-pid2pool-csfcube-bert_pp-method-ranked.json
│   └── test-pid2pool-csfcube-bert_pp-result-ranked.json
├── distilbert_nli
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-distilbert_nli-background-ranked.json
│   ├── test-pid2pool-csfcube-distilbert_nli-method-ranked.json
│   └── test-pid2pool-csfcube-distilbert_nli-result-ranked.json
└── ensemble
    ├── test-pid2pool-csfcube-ensemble-background-ranked.json
    ├── test-pid2pool-csfcube-ensemble-method-ranked.json
    └── test-pid2pool-csfcube-ensemble-result-ranked.json

(back to top)

Colab Notebooks

Base Model

This notebook contains the code for generating embeddings from the base models. Avoid running it as it takes a long time to run. The embeddings are already provided in the Googe Drive of IR Submission Files.

Fine Tuning DistilBERT (Grid Search)

This is for the fine tuning of the Distilbert model. The results are already present in it. Avoid ruuning it as it takes a long time.

Ensembling models

Run each cell of this jupyter notebook and at the second last cell change the queries as per choice and then run both the cells (itself and after it) and it gives the results.

IR Submission Files (Google Drive)

Apart rom all this We are also submitting a zip of the local copies and reports of the .ipynb files which can be run locally. [Note] Please change the file directories strings in the notebooks appropriately to avoid any errors.

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SCIATICA

About The Project

Getting Started

Directory Structure

Colab Notebooks

Files

README.md

Latest commit

History

README.md

File metadata and controls

SCIATICA

About The Project

Getting Started

Directory Structure

Colab Notebooks