Skip to content

Latest commit

 

History

History
111 lines (101 loc) · 3.45 KB

sql-actor.md

File metadata and controls

111 lines (101 loc) · 3.45 KB

The SqlActor is a generic sql actor which can be used to execute any sql-statements, including select, insert, update, merge, delete, create, alter, etc.

The sql statement can be specified by either sqlString or sqlFile property. However, at least and only one of them must be defined.

Actor Class: com.qwshen.etl.common.SqlActor

The definition for the SqlActor:

  • In YAML format
    actor:
      type: sql-actor
      properties:
        sqlString: >
          select
            substr(row_value, 1, 12) as event_id,
            substr(row_value, 13, 16) as event_time,
            substr(row_value, 29, 12) as event_host,
            substr(row_value, 41, 64) as event_location
          from events_raw
          where row_no not in (1, 2)

or

    actor:
      type: sql-actor
      properties:
        sqlFile: scripts/event_raw.sql
  • In JSON format
  {
    "actor": {
      "type": "sql-actor",
      "properties": {
        "sqlString": "select * from events_raw"
      }
    }
  }

or

  {
    "actor": {
      "type": "sql-actor",
      "properties": {
        "sqlFile": "scripts/event_raw.sql"
      }
    }
  }
  • In XML format
    <actor type="sql-actor">
        <properties>
            <sqlString>
                select
                    substr(row_value, 1, 12) as event_id,
                    substr(row_value, 13, 16) as event_time,
                    substr(row_value, 29, 12) as event_host,
                    substr(row_value, 41, 64) as event_location
                from events_raw
                where row_no not in (1, 2)
            </sqlString>
        </properties>
    </actor>

or

    <actor type="sql-actor">
        <properties>
            <sqlFile>scripts/event_raw.sql</sqlFile>
        </properties>
    </actor>

The sql-statement specified by sqlString or from sqlFile can have multiple valid sql-sub-statements separated by semi-colon (;), including set statements. For example:

  set run_date = concat('${runActor}', ' at ', '${runTime}');
  set view_users = (select * from users);
  setrun count_users = (select count(*) from users);
  with t as (
    select distinct
      u.user_id,
      u.gender,
      cast(u.birthyear as int) as birthyear,
      t.timestamp,
      t.interested,
      concat('${application.process_date}', '-', concat('${run_date}', '-', '${count_users}')) as process_date,
      t.event as event_id
    from train t
      left join ${view_users} u on t.user = u.user_id
  )
  select * from t      

In the above example:

  • The ${runActor} and ${runTime} are defined in either application.conf, job-submit arguments or pipeline;
  • The expression - concat('${runActor}', ' at ', '${runTime}') is calculated and assigned to ${run_date}, which is referenced later;
  • The expression - (select * from users) is not executed but as an alias simply assigned to ${view_usres}, which later used for a join.
  • The expression - (select count(*) from users) is calculated, and the value is assigned to ${count_users}.

The above example also shows the difference between set and setrun:

  • setrun statements get evaluated immediately.
  • set statements are lazily evaluated.

Please note:

  • Any variables defined through set or setrun statements can be referenced by any down-stream actors.
  • When a sql-actor handling multiple sql-statement in its sql-string or sql-file, only the result of the last sql-statement is outputted.