Skip to content

Version 0.0.82

Compare
Choose a tag to compare
@adhityan adhityan released this 01 Jun 19:08
· 271 commits to main since this release
5dbde53

A number of important features and bug fixes make it into this release. Here's a rundown of the top new features -

Loader inference

The library can now infer the type of the loader automatically. You can pass a string and it will use the MimeType (detected using magic numbers) and the file extension if available to decide what is the correct loader to invoke. For example -

.addLoader('https://tesla-info.com/sitemap.xml') // will use sitemap loader
.addLoader('https://en.wikipedia.org/wiki/Tesla,_Inc.') // will use the web loader
.addLoader('s4pVFLUlx8g') // will detect this is a youtube video id and use the video loader
.addLoader('https://lamport.azurewebsites.net/pubs/paxos-simple.pdf') // will use the pdf loader

.addLoader('local/paxos-simple.pdf') // will also use the pdf loader
.addLoader('local/data.csv') // will also use the CSV loader

You can also pass it a local directory name and it will recursively load all files within it using the most appropirate loader. Note: It will skip files it does not have a loader for.

Alternatively, you can now add loaders by passing in an object with the correct parameters without invoking the loader constructor directly. That is -

//Before
.addLoader(new WebLoader({ urlOrContent: 'https://www.biography.com/business-leaders/steve-jobs' }))

//Now
.addLoader({ type: 'Web', urlOrContent: 'https://www.biography.com/business-leaders/steve-jobs' })

This makes for simpler reading and is very consistent across all loaders.

List of added loaders

The library now maintains the past list of loaders which were added in its cache. So, you can now get the list of all loaded content even between restarts. This is useful if you want to internalize the state of the RAG application within the library itself.

You can get the list of loaders by calling -

await ragApplication.getLoaders()

The list of added loaders will include all loaders, even those that were implicitly invoked by another loader. To understand this better, let's look at theLocalPathLoader. This loader uses the file system API to scan files and directories. Once it infers the file type, it internally calls other loaders to add and process PPT, CSV, HTML, etc. files. When this happens, the getLoaders() method will give you the list of all loaders including LocalPathLoader, CsvLoader, WebLoader, etc with metadata around what each loader worked on.

Note: All the data around this is recorded in the cache attached. Therefore this functionality only works when you have a cache set.

CSV Loader

Now you can add CSV files from both local and web URLs using the CSV loader. To add a Csv file (or URL) to your embeddings, use CsvLoader. The library will parse the Csv and add each row to its vector database.

.addLoader(new CsvLoader({ filePathOrUrl: '...' }))

Note: You can control how the CsvLoader parses the file in great detail by passing in the optional csvParseOptions constructor parameter.

Github workflow

The library now uses Github actions to verify if the PR compiles and builds in Node versions 18, 20 and 22. This will be automatically run on every PR.