Skip to content
/ harvest Public

🪲 Portable log aggregation tool for middle-scale system operation/troubleshooting.

License

Notifications You must be signed in to change notification settings

k1LoW/harvest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Harvest Build Status GitHub release Go Report Card

Portable log aggregation tool for middle-scale system operation/troubleshooting.

screencast

Harvest provides the hrv command with the following features.

  • Agentless.
  • Portable.
  • Only 1 config file.
  • Fetch various remote/local log data via SSH/exec/Kubernetes API. ( hrv fetch )
  • Output all fetched logs in the order of timestamp. ( hrv cat )
  • Stream various remote/local logs via SSH/exec/Kubernetes API. ( hrv stream )
  • Copy remote/local raw logs via SSH/exec. ( hrv cp )

Quick Start ( for Kubernetes )

$ hrv generate-k8s-config > cluster.yml
$ hrv stream -c cluster.yml --tag='kube_apiserver or coredns' --with-path --with-timestamp

Usage

🪲 Fetch and output remote/local log data

1. Set log sources (and log type) in config.yml

---
targetSets:
  -
    description: webproxy syslog
    type: syslog
    sources:
      - 'ssh://webproxy.example.com/var/log/syslog*'
    tags:
      - webproxy
      - syslog
  -
    description: webproxy NGINX access log
    type: combinedLog
    sources:
      - 'ssh://webproxy.example.com/var/log/nginx/access_log*'
    tags:
      - webproxy
      - nginx
  -
    description: app log
    type: regexp
    regexp: 'time:([^\t]+)'
    timeFormat: 'Jan 02 15:04:05' # Golang time format and 'unixtime'
    timeZone: '+0900'
    sources:
      - 'ssh://app-1.example.com/var/log/ltsv.log*'
      - 'ssh://app-2.example.com/var/log/ltsv.log*'
      - 'ssh://app-3.example.com/var/log/ltsv.log*'
    tags:
      - app
  -
    description: db dump log
    type: regexp
    regexp: '"ts":"([^"]+)"'
    timeFormat: '2006-01-02T15:04:05.999-0700'
    sources:
      - 'ssh://db.example.com/var/log/tcpdp/eth0/dump*'
    tags:
      - db
      - query
  -
    description: PostgreSQL log
    type: regexp
    regexp: '^\[?(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \w{3})'
    timeFormat: '2006-01-02 15:04:05 MST'
    multiLine: true
    sources:
      - 'ssh://db.example.com/var/log/postgresql/postgresql*'
    tags:
      - db
      - postgresql
  -
    description: local Apache access log
    type: combinedLog
    sources:
      - 'file:///path/to/httpd/access.log'
    tags:
      - httpd
-
    description: api on Kubernetes
    type: k8s
    sources:
      - 'k8s://context-name/namespace/pod-name*'
    tags:
      - api
      - k8s

You can use hrv configtest for config test.

$ hrv configtest -c config.yml

2. Fetch target log data via SSH/exec/Kubernetes API ( hrv fecth )

$ hrv fetch -c config.yml --tag=webproxy,db

3. Output log data ( hrv cat )

$ hrv cat harvest-20181215T2338+900.db --with-timestamp --with-host --with-path | less -R

4. Count log data ( hrv count )

$ hrv count harvest-20191015T2338+900.db -g minute -g webproxy -b db
ts      webproxy db
2019-09-24 08:01:00     9618    5910
2019-09-24 08:02:00     9767    5672
2019-09-24 08:03:00     10815   7394
2019-09-24 08:04:00     11782   7109
2019-09-24 08:05:00     9896    6346
[...]
2019-09-24 08:24:00     11619   5646
2019-09-24 08:25:00     10541   6097
2019-09-24 08:26:00     11336   5264
2019-09-24 08:27:00     1102    5261
2019-09-24 08:28:00     1318    6660
2019-09-24 08:29:00     10362   5663
2019-09-24 08:30:00     11136   5373
2019-09-24 08:31:00     1748    1340

🪲 Stream remote/local logs

2. Stream target logs via SSH/exec/Kubernetes API ( hrv stream )

$ hrv stream -c config.yml --with-timestamp --with-host --with-path --with-tag

🪲 Copy remote/local raw logs

2. Copy remote/local raw logs to local directory via SSH/exec ( hrv cp )

$ hrv cp -c config.yml

--tag filter operators

The following operators can be used to filter targets

not, and, or, !, &&, ||

$ hrv stream -c config.yml --tag='webproxy or db' --with-timestamp --with-host --with-path

, is converted to or

$ hrv stream -c config.yml --tag='webproxy,db'

is converted to

$ hrv stream -c config.yml --tag='webproxy or db'

--source filter

filter targets using source regexp

$ hrv fetch -c config.yml --source='app-[0-9].example'

Architecture

hrv fetch and hrv cat

img

hrv stream

img

Installation

$ brew install k1LoW/tap/harvest

or

$ go get github.com/k1LoW/harvest/cmd/hrv

What is "middle-scale system"?

  • < 50 instances
  • < 1 million logs per hrv fetch

What if you are operating a large-scale/super-large-scale/hyper-large-scale system?

Let's consider agent-base log collector/platform, service mesh and distributed tracing platform!

Internal

Requirements

  • UNIX commands
    • date
    • find
    • grep
    • head
    • ls
    • tail
    • xargs
    • zcat
  • sudo
  • SQLite

WANT

  • tag DAG
  • Viewer / Visualizer

References

  • Hayabusa: A Simple and Fast Full-Text Search Engine for Massive System Log Data
    • Make simple with a combination of commands.
    • Full-Text Search Engine using SQLite FTS.
  • stern: ⎈ Multi pod and container log tailing for Kubernetes
    • Multiple Kubernetes log streaming architecture.