Skip to content
This repository has been archived by the owner on Mar 6, 2024. It is now read-only.

goes-funky/tap-gemini

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yahoo Gemini Tap

This is a Singer tap that produces JSON-formatted data following the Singer spec.

This tap:

  • Pulls raw data from Yahoo Gemini reporting API
  • Extracts the reporting cubes detailed below in the "Table Schemas" section.
  • Outputs the schema for each resource
  • Incrementally pulls data based on the input state

Connecting

Follow the guidelines below to use this tap in Stitch or in a Python environment.

Stitch

To install tap-gemini in Stitch, you need to create an API application and generate an OAuth 2.0 credentials. See the authentication documentation on the Oath website for instructions, including their OAuth 2.0 guide. Enter the client ID as the username and the refresh token as the password.

Python

Follow the instructions below to use the tap as a Python package.

Installation

Create a virtual environment and install the package using pip. These instructions are bash commands that will work on Unix-based platforms.

python3 -m venv ~/.virtualenvs/tap-gemini
source ~/.virtualenvs/tap-gemini/bin/activate
pip install tap-gemini
deactivate

Execution

Run the following command to run the tap using the configuration specified in the JSON file config.json:

~/.virtualenvs/tap-gemini/bin/tap-gemini --config ~/config.json

To output the data to a CSV file, pipe the data stream into target-csv:

~/.virtualenvs/tap-gemini/bin/tap-gemini --config ~/config.json | ~/.virtualenvs/target-csv/bin/target-csv

Configuration

The options in the configuration file are described below.

Mandatory settings

These settings must be specified:

  • start_date: The lower bound of the historical load time range
  • username: the OAuth client ID
  • password: the OAuth client secret
  • refresh_token: the OAuth refresh token
  • advertiser_ids: List of advertiser (account) ID numbers

Optional settings

These additional options are available:

  • api_version: The API version to use
  • session: Options for the HTTP session such as headers and proxies with be passed into the requests.Session as keyword arguments.
  • sandbox: Use the API testing environment
  • poll_interval: The number of seconds (minimum: 1.0) between poll attempts when waiting for a report to by ready for download.

Replication

Each incremental report run begins at the timestamp when books were marked closed (i.e. when no further changes to the data are written.)

For historic data loads, the reports will run over the largest possible time frame. Some reports have a limited time range as detailed below, where the "Days" column shows the largest available number of days prior to the current calendar date:

Table Days
performance_stats 15
slot_performance_stats 15
product_ads 400
site_performance_stats 400
keyword_stats 750

Table Schemas

Most tables have the following primary key columns:

  • Advertiser ID
  • Day

The table schemas are detailed below.

Reports

The following reporting cubes are implemented:

These cubes are not implemented:

Objects

The following account structure objects are implemented.

The other API objects are implemented in Python but schema and metadata definitions need to be written.

Unsupported fields

Some fields have been excluded from the schema (i.e. the meta-data inclusion is set to unsupported) because they are incompatible with other fields. This could probably be fixed by defining meta-data exclusions that depend on other fields.

Errata

  • All dates and times use the advertiser time zone.

Contributing


Copyright © 2019 Stitch

Packages

No packages published

Languages

  • HTML 81.0%
  • Python 19.0%