Ollama AI

A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally.

This Gem is designed to provide low-level access to Ollama, enabling people to build abstractions on top of it. If you are interested in more high-level abstractions or more user-friendly tools, you may want to consider Nano Bots 💎 🤖.

TL;DR and Quick Start

gem 'ollama-ai', '~> 1.3.0'

require 'ollama-ai'

client = Ollama.new(
  credentials: { address: 'http://localhost:11434' },
  options: { server_sent_events: true }
)

result = client.generate(
  { model: 'llama2',
    prompt: 'Hi!' }
)

Result:

[{ 'model' => 'llama2',
   'created_at' => '2024-01-07T01:34:02.088810408Z',
   'response' => 'Hello',
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:34:02.419045606Z',
   'response' => '!',
   'done' => false },
 # ..
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:34:07.680049831Z',
   'response' => '?',
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:34:07.872170352Z',
   'response' => '',
   'done' => true,
   'context' =>
     [518, 25_580,
      # ...
      13_563, 29_973],
   'total_duration' => 11_653_781_127,
   'load_duration' => 1_186_200_439,
   'prompt_eval_count' => 22,
   'prompt_eval_duration' => 5_006_751_000,
   'eval_count' => 25,
   'eval_duration' => 5_453_058_000 }]

Index

{index}

Setup

Installing

gem install ollama-ai -v 1.3.0

gem 'ollama-ai', '~> 1.3.0'

Usage

Client

Create a new client:

require 'ollama-ai'

client = Ollama.new(
  credentials: { address: 'http://localhost:11434' },
  options: { server_sent_events: true }
)

Bearer Authentication

require 'ollama-ai'

client = Ollama.new(
  credentials: {
    address: 'http://localhost:11434',
    bearer_token: 'eyJhbG...Qssw5c'
  },
  options: { server_sent_events: true }
)

Remember that hardcoding your credentials in code is unsafe. It's preferable to use environment variables:

require 'ollama-ai'

client = Ollama.new(
  credentials: {
    address: 'http://localhost:11434',
    bearer_token: ENV['OLLAMA_BEARER_TOKEN']
  },
  options: { server_sent_events: true }
)

Methods

client.generate
client.chat
client.embeddings

client.create
client.tags
client.show
client.copy
client.delete
client.pull
client.push

generate: Generate a completion

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion

Without Streaming Events

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion

result = client.generate(
  { model: 'llama2',
    prompt: 'Hi!',
    stream: false }
)

Result:

[{ 'model' => 'llama2',
   'created_at' => '2024-01-07T01:35:41.951371247Z',
   'response' => "Hi there! It's nice to meet you. How are you today?",
   'done' => true,
   'context' =>
     [518, 25_580,
      # ...
      9826, 29_973],
   'total_duration' => 6_981_097_576,
   'load_duration' => 625_053,
   'prompt_eval_count' => 22,
   'prompt_eval_duration' => 4_075_171_000,
   'eval_count' => 16,
   'eval_duration' => 2_900_325_000 }]

Receiving Stream Events

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion

Ensure that you have enabled Server-Sent Events before using blocks for streaming. stream: true is not necessary, as true is the default:

client.generate(
  { model: 'llama2',
    prompt: 'Hi!' }
) do |event, raw|
  puts event
end

Event:

{ 'model' => 'llama2',
  'created_at' => '2024-01-07T01:36:30.665245712Z',
  'response' => 'Hello',
  'done' => false }

You can get all the receive events at once as an array:

result = client.generate(
  { model: 'llama2',
    prompt: 'Hi!' }
)

Result:

[{ 'model' => 'llama2',
   'created_at' => '2024-01-07T01:36:30.665245712Z',
   'response' => 'Hello',
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:36:30.927337136Z',
   'response' => '!',
   'done' => false },
 # ...
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:36:37.249416767Z',
   'response' => '?',
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:36:37.44041283Z',
   'response' => '',
   'done' => true,
   'context' =>
     [518, 25_580,
      # ...
      13_563, 29_973],
   'total_duration' => 10_551_395_645,
   'load_duration' => 966_631,
   'prompt_eval_count' => 22,
   'prompt_eval_duration' => 4_034_990_000,
   'eval_count' => 25,
   'eval_duration' => 6_512_954_000 }]

You can mix both as well:

result = client.generate(
  { model: 'llama2',
    prompt: 'Hi!' }
) do |event, raw|
  puts event
end

chat: Generate a chat completion

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-chat-completion

result = client.chat(
  { model: 'llama2',
    messages: [
      { role: 'user', content: 'Hi! My name is Purple.' }
    ] }
) do |event, raw|
  puts event
end

Event:

{ 'model' => 'llama2',
  'created_at' => '2024-01-07T01:38:01.729897311Z',
  'message' => { 'role' => 'assistant', 'content' => "\n" },
  'done' => false }

Result:

[{ 'model' => 'llama2',
   'created_at' => '2024-01-07T01:38:01.729897311Z',
   'message' => { 'role' => 'assistant', 'content' => "\n" },
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:38:02.081494506Z',
   'message' => { 'role' => 'assistant', 'content' => '*' },
   'done' => false },
 # ...
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:38:17.855905499Z',
   'message' => { 'role' => 'assistant', 'content' => '?' },
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:38:18.07331245Z',
   'message' => { 'role' => 'assistant', 'content' => '' },
   'done' => true,
   'total_duration' => 22_494_544_502,
   'load_duration' => 4_224_600,
   'prompt_eval_count' => 28,
   'prompt_eval_duration' => 6_496_583_000,
   'eval_count' => 61,
   'eval_duration' => 15_991_728_000 }]

Back-and-Forth Conversations

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-chat-completion

To maintain a back-and-forth conversation, you need to append the received responses and build a history for your requests:

result = client.chat(
  { model: 'llama2',
    messages: [
      { role: 'user', content: 'Hi! My name is Purple.' },
      { role: 'assistant',
        content: 'Hi, Purple!' },
      { role: 'user', content: "What's my name?" }
    ] }
) do |event, raw|
  puts event
end

Event:

{ 'model' => 'llama2',
  'created_at' => '2024-01-07T01:40:07.352998498Z',
  'message' => { 'role' => 'assistant', 'content' => ' Pur' },
  'done' => false }

Result:

[{ 'model' => 'llama2',
   'created_at' => '2024-01-07T01:40:06.562939469Z',
   'message' => { 'role' => 'assistant', 'content' => 'Your' },
   'done' => false },
 # ...
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:40:07.352998498Z',
   'message' => { 'role' => 'assistant', 'content' => ' Pur' },
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:40:07.545323584Z',
   'message' => { 'role' => 'assistant', 'content' => 'ple' },
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:40:07.77769408Z',
   'message' => { 'role' => 'assistant', 'content' => '!' },
   'done' => false },
 { 'model' => 'llama2',
   'created_at' => '2024-01-07T01:40:07.974165849Z',
   'message' => { 'role' => 'assistant', 'content' => '' },
   'done' => true,
   'total_duration' => 11_482_012_681,
   'load_duration' => 4_246_882,
   'prompt_eval_count' => 57,
   'prompt_eval_duration' => 10_387_150_000,
   'eval_count' => 6,
   'eval_duration' => 1_089_249_000 }]

embeddings: Generate Embeddings

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-embeddings

result = client.embeddings(
  { model: 'llama2',
    prompt: 'Hi!' }
)

Result:

[{ 'embedding' =>
   [0.6970467567443848, -2.248202085494995,
    # ...
    -1.5994540452957153, -0.3464218080043793] }]

Models

create: Create a Model

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#create-a-model

result = client.create(
  { name: 'mario',
    modelfile: "FROM llama2\nSYSTEM You are mario from Super Mario Bros." }
) do |event, raw|
  puts event
end

Event:

{ 'status' => 'reading model metadata' }

Result:

[{ 'status' => 'reading model metadata' },
 { 'status' => 'creating system layer' },
 { 'status' =>
   'using already created layer sha256:4eca7304a07a42c48887f159ef5ad82ed5a5bd30fe52db4aadae1dd938e26f70' },
 { 'status' =>
   'using already created layer sha256:876a8d805b60882d53fed3ded3123aede6a996bdde4a253de422cacd236e33d3' },
 { 'status' =>
   'using already created layer sha256:a47b02e00552cd7022ea700b1abf8c572bb26c9bc8c1a37e01b566f2344df5dc' },
 { 'status' =>
   'using already created layer sha256:f02dd72bb2423204352eabc5637b44d79d17f109fdb510a7c51455892aa2d216' },
 { 'status' =>
   'writing layer sha256:1741cf59ce26ff01ac614d31efc700e21e44dd96aed60a7c91ab3f47e440ef94' },
 { 'status' =>
   'writing layer sha256:e8bcbb2eebad88c2fa64bc32939162c064be96e70ff36aff566718fc9186b427' },
 { 'status' => 'writing manifest' },
 { 'status' => 'success' }]

After creation, you can use it:

client.generate(
  { model: 'mario',
    prompt: 'Hi! Who are you?' }
) do |event, raw|
  print event['response']
end

Woah! adjusts sunglasses It's-a me, Mario! winks You must be a new friend I've-a met here in the Mushroom Kingdom. tips top hat What brings you to this neck of the woods? Maybe you're looking for-a some help on your adventure? nods Just let me know, and I'll do my best to-a assist ya! 😃

tags: List Local Models

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#list-local-models

result = client.tags

Result:

[{ 'models' =>
   [{ 'name' => 'llama2:latest',
      'modified_at' => '2024-01-06T15:06:23.6349195-03:00',
      'size' => 3_826_793_677,
      'digest' =>
      '78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962',
      'details' =>
      { 'format' => 'gguf',
        'family' => 'llama',
        'families' => ['llama'],
        'parameter_size' => '7B',
        'quantization_level' => 'Q4_0' } },
    { 'name' => 'mario:latest',
      'modified_at' => '2024-01-06T22:41:59.495298101-03:00',
      'size' => 3_826_793_787,
      'digest' =>
      '291f46d2fa687dfaff45de96a8cb6e32707bc16ec1e1dfe8d65e9634c34c660c',
      'details' =>
      { 'format' => 'gguf',
        'family' => 'llama',
        'families' => ['llama'],
        'parameter_size' => '7B',
        'quantization_level' => 'Q4_0' } }] }]

show: Show Model Information

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#show-model-information

result = client.show(
  { name: 'llama2' }
)

Result:

[{ 'license' =>
     "LLAMA 2 COMMUNITY LICENSE AGREEMENT\t\n" \
     # ...
     "* Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama..." \
     "\n",
   'modelfile' =>
     "# Modelfile generated by \"ollama show\"\n" \
     # ...
     'PARAMETER stop "<</SYS>>"',
   'parameters' =>
     "stop                           [INST]\n" \
     "stop                           [/INST]\n" \
     "stop                           <<SYS>>\n" \
     'stop                           <</SYS>>',
     'template' =>
     "[INST] <<SYS>>{{ .System }}<</SYS>>\n\n{{ .Prompt }} [/INST]\n",
   'details' =>
     { 'format' => 'gguf',
       'family' => 'llama',
       'families' => ['llama'],
       'parameter_size' => '7B',
       'quantization_level' => 'Q4_0' } }]

copy: Copy a Model

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#copy-a-model

result = client.copy(
  { source: 'llama2',
    destination: 'llama2-backup' }
)

Result:

true

If the source model does not exist:

begin
  result = client.copy(
    { source: 'purple',
      destination: 'purple-backup' }
  )
rescue Ollama::Errors::OllamaError => error
  puts error.class # Ollama::Errors::RequestError
  puts error.message # 'the server responded with status 404'

  puts error.payload
  # { source: 'purple',
  #   destination: 'purple-backup',
  #   ...
  # }

  puts error.request.inspect
  # #<Faraday::ResourceNotFound response={:status=>404, :headers...
end

delete: Delete a Model

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#delete-a-model

result = client.delete(
  { name: 'llama2' }
)

Result:

true

If the model does not exist:

begin
  result = client.delete(
    { name: 'llama2' }
  )
rescue Ollama::Errors::OllamaError => error
  puts error.class # Ollama::Errors::RequestError
  puts error.message # 'the server responded with status 404'

  puts error.payload
  # { name: 'llama2',
  #   ...
  # }

  puts error.request.inspect
  # #<Faraday::ResourceNotFound response={:status=>404, :headers...
end

pull: Pull a Model

API Documentation: https://github.com/jmorganca/ollama/blob/main/docs/api.md#pull-a-model

result = client.pull(
  { name: 'llama2' }
) do |event, raw|
  puts event
end

Event:

{ 'status' => 'pulling manifest' }

Result:

[{ 'status' => 'pulling manifest' },
 { 'status' => 'pulling 4eca7304a07a',
   'digest' =>
   'sha256:4eca7304a07a42c48887f159ef5ad82ed5a5bd30fe52db4aadae1dd938e26f70',
   'total' => 1_602_463_008,
   'completed' => 1_602_463_008 },
 # ...
 { 'status' => 'verifying sha256 digest' },
 { 'status' => 'writing manifest' },
 { 'status' => 'removing any unused layers' },
 { 'status' => 'success' }]

push: Push a Model

Documentation: API and Publishing Your Model.

You need to create an account at https://ollama.ai and add your Public Key at https://ollama.ai/settings/keys.

Your keys are located in /usr/share/ollama/.ollama/. You may need to copy them to your user directory:

sudo cp /usr/share/ollama/.ollama/id_ed25519 ~/.ollama/
sudo cp /usr/share/ollama/.ollama/id_ed25519.pub ~/.ollama/

Copy your model to your user namespace:

client.copy(
  { source: 'mario',
    destination: 'your-user/mario' }
)

And push it:

result = client.push(
  { name: 'your-user/mario' }
) do |event, raw|
  puts event
end

Event:

{ 'status' => 'retrieving manifest' }

Result:

[{ 'status' => 'retrieving manifest' },
 { 'status' => 'pushing 4eca7304a07a',
   'digest' =>
   'sha256:4eca7304a07a42c48887f159ef5ad82ed5a5bd30fe52db4aadae1dd938e26f70',
   'total' => 1_602_463_008,
   'completed' => 1_602_463_008 },
 # ...
 { 'status' => 'pushing e8bcbb2eebad',
   'digest' =>
   'sha256:e8bcbb2eebad88c2fa64bc32939162c064be96e70ff36aff566718fc9186b427',
   'total' => 555,
   'completed' => 555 },
 { 'status' => 'pushing manifest' },
 { 'status' => 'success' }]

Modes

Text

You can use the generate or chat methods for text.

Image

Courtesy of Unsplash

You need to choose a model that supports images, like LLaVA or bakllava, and encode the image as Base64.

Depending on your hardware, some models that support images can be slow, so you may want to increase the client timeout:

client = Ollama.new(
  credentials: { address: 'http://localhost:11434' },
  options: {
    server_sent_events: true,
    connection: { request: { timeout: 120, read_timeout: 120 } } }
)

Using the generate method:

require 'base64'

client.generate(
  { model: 'llava',
    prompt: 'Please describe this image.',
    images: [Base64.strict_encode64(File.read('piano.jpg'))] }
) do |event, raw|
  print event['response']
end

Output:

The image is a black and white photo of an old piano, which appears to be in need of maintenance. A chair is situated right next to the piano. Apart from that, there are no other objects or people visible in the scene.

Using the chat method:

require 'base64'

result = client.chat(
  { model: 'llava',
    messages: [
      { role: 'user',
        content: 'Please describe this image.',
        images: [Base64.strict_encode64(File.read('piano.jpg'))] }
    ] }
) do |event, raw|
  puts event
end

Output:

The image displays an old piano, sitting on a wooden floor with black keys. Next to the piano, there is another keyboard in the scene, possibly used for playing music.

On top of the piano, there are two mice placed in different locations within its frame. These mice might be meant for controlling the music being played or simply as decorative items. The overall atmosphere seems to be focused on artistic expression through this unique instrument.

Streaming and Server-Sent Events (SSE)

Server-Sent Events (SSE) is a technology that allows certain endpoints to offer streaming capabilities, such as creating the impression that "the model is typing along with you," rather than delivering the entire answer all at once.

You can set up the client to use Server-Sent Events (SSE) for all supported endpoints:

client = Ollama.new(
  credentials: { address: 'http://localhost:11434' },
  options: { server_sent_events: true }
)

Or, you can decide on a request basis:

result = client.generate(
  { model: 'llama2',
    prompt: 'Hi!' },
  server_sent_events: true
) do |event, raw|
  puts event
end

With Server-Sent Events (SSE) enabled, you can use a block to receive partial results via events. This feature is particularly useful for methods that offer streaming capabilities, such as generate: Receiving Stream Events

Server-Sent Events (SSE) Hang

Method calls will hang until the server-sent events finish, so even without providing a block, you can obtain the final results of the received events: Receiving Stream Events

New Functionalities and APIs

Ollama may launch a new endpoint that we haven't covered in the Gem yet. If that's the case, you may still be able to use it through the request method. For example, generate is just a wrapper for api/generate, which you can call directly like this:

result = client.request(
  'api/generate',
  { model: 'llama2',
    prompt: 'Hi!' },
  request_method: 'POST', server_sent_events: true
)

Request Options

Adapter

The gem uses Faraday with the Typhoeus adapter by default.

You can use a different adapter if you want:

require 'faraday/net_http'

client = Ollama.new(
  credentials: { address: 'http://localhost:11434' },
  options: { connection: { adapter: :net_http } }
)

Timeout

You can set the maximum number of seconds to wait for the request to complete with the timeout option:

client = Ollama.new(
  credentials: { address: 'http://localhost:11434' },
  options: { connection: { request: { timeout: 5 } } }
)

You can also have more fine-grained control over Faraday's Request Options if you prefer:

client = Ollama.new(
  credentials: { address: 'http://localhost:11434' },
  options: {
    connection: {
      request: {
        timeout: 5,
        open_timeout: 5,
        read_timeout: 5,
        write_timeout: 5
      }
    }
  }
)

Error Handling

Rescuing

require 'ollama-ai'

begin
  client.chat_completions(
    { model: 'llama2',
      prompt: 'Hi!' }
  )
rescue Ollama::Errors::OllamaError => error
  puts error.class # Ollama::Errors::RequestError
  puts error.message # 'the server responded with status 500'

  puts error.payload
  # { model: 'llama2',
  #   prompt: 'Hi!',
  #   ...
  # }

  puts error.request.inspect
  # #<Faraday::ServerError response={:status=>500, :headers...
end

For Short

require 'ollama-ai/errors'

begin
  client.chat_completions(
    { model: 'llama2',
      prompt: 'Hi!' }
  )
rescue OllamaError => error
  puts error.class # Ollama::Errors::RequestError
end

Errors

OllamaError

BlockWithoutServerSentEventsError

RequestError

Development

bundle
rubocop -A

bundle exec ruby spec/tasks/run-client.rb
bundle exec ruby spec/tasks/test-encoding.rb

Purpose

This Gem is designed to provide low-level access to Ollama, enabling people to build abstractions on top of it. If you are interested in more high-level abstractions or more user-friendly tools, you may want to consider Nano Bots 💎 🤖.

Publish to RubyGems

gem build ollama-ai.gemspec

gem signin

gem push ollama-ai-1.3.0.gem

Updating the README

Install Babashka:

curl -s https://raw.githubusercontent.com/babashka/babashka/master/install | sudo bash

Update the template.md file and then:

bb tasks/generate-readme.clj

Trick for automatically updating the README.md when template.md changes:

sudo pacman -S inotify-tools # Arch / Manjaro
sudo apt-get install inotify-tools # Debian / Ubuntu / Raspberry Pi OS
sudo dnf install inotify-tools # Fedora / CentOS / RHEL

while inotifywait -e modify template.md; do bb tasks/generate-readme.clj; done

Trick for Markdown Live Preview:

pip install -U markdown_live_preview

mlp README.md -p 8076

Resources and References

These resources and references may be useful throughout your learning process:

Ollama Official Website
Ollama GitHub
Ollama API Documentation

Disclaimer

This is not an official Ollama project, nor is it affiliated with Ollama in any way.

This software is distributed under the MIT License. This license includes a disclaimer of warranty. Moreover, the authors assume no responsibility for any damage or costs that may result from using this project. Use the Ollama AI Ruby Gem at your own risk.

Files

template.md

Latest commit

History

template.md

File metadata and controls

Ollama AI

TL;DR and Quick Start

Index

Setup

Installing

Usage

Client

Bearer Authentication

Methods

generate: Generate a completion

Without Streaming Events

Receiving Stream Events

chat: Generate a chat completion

Back-and-Forth Conversations

embeddings: Generate Embeddings

Models

create: Create a Model

tags: List Local Models

show: Show Model Information

copy: Copy a Model

delete: Delete a Model

pull: Pull a Model

push: Push a Model

Modes

Text

Image

Streaming and Server-Sent Events (SSE)

Server-Sent Events (SSE) Hang

New Functionalities and APIs

Request Options

Adapter

Timeout

Error Handling

Rescuing

For Short

Errors

Development

Purpose

Publish to RubyGems

Updating the README

Resources and References

Disclaimer