A repository dealing with the ability to use LLMs for semantic search. The data considered are specific curated documents targetting closed domain search. This is created to show how relatively simple it is to use these methods and increase productivity within an org.
Data used here were the scidocs dataset obtained from BEIR Repository https://github.com/beir-cellar/beir and https://github.com/allenai/scidocs. Link to download dataset is found here: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/scidocs.zip The use case will only be the .jsonl file which contains the corpus needed. Place the data under src/data/scidocs
Run the main.py and it should be able to achieve the requirements. In essence, this should be generalizable and used with any semantic search use case. Bring Your Own Data !