Skip to content

Module zillion.report

BaseCombinedResult

class zillion.report.BaseCombinedResult(warehouse, ds_query_results, primary_ds_dimensions, adhoc_datasources=None)

A combination of datasource query results

Parameters:

  • warehouse - (Warehouse) A zillion warehouse
  • ds_query_results - (list) A list of DataSourceQueryResult objects
  • primary_ds_dimensions - (list) A list of dimensions that will be used to create the hash primary key of the combined result table
  • adhoc_datasources - (list, optional) A list of FieldManagers specific to this combined result
add_warning(self, msg, log=True)
clean_up(self)

Clean up any resources that can/should be cleaned up

create_table(self)

Create the combined result table

get_conn(self)

Get a database connection to the combined result database

get_cursor(self, conn)

Get a cursor from a database connection

get_final_result(self, metrics, dimensions, row_filters, rollup, pivot, order_by, limit, limit_first)

Get the final result from the combined result table

ifnull_clause(self, column_clause, ifnull_value)

Produce an ifnull clause specific to this database dialect

load_table(self)

Load the combined result table

DataSourceQuery

Bases: zillion.report.ExecutionStateMixin, tlbx.logging_utils.PrintMixin

class zillion.report.DataSourceQuery(warehouse, metrics, dimensions, criteria, table_set)

Build a query to run against a particular datasource

Parameters:

  • warehouse - (Warehouse) A zillion warehouse
  • metrics - (OrderedDict) An OrderedDict mapping metric names to Metric objects
  • dimensions - (OrderedDict) An OrderedDict mapping dimension names to Dimension objects
  • criteria - (list) A list of criteria to be applied when querying. See the Report docs for more details.
  • table_set - (TableSet) Build the query against this set of tables that supports the requested metrics and grain
add_metric(self, metric)

Add a metric to this query

Parameters:

  • metric - (str) A metric name
covers_field(self, field)

Check whether a field is covered in this query

Parameters:

  • field - (str) A field name

Returns:

(bool) - True if this field is covered in this query

covers_metric(self, metric)

Check whether a metric is covered in this query

Parameters:

  • metric - (str) A metric name

Returns:

(bool) - True if this metric is covered in this query

execute(self, timeout=None, label=None)

Execute the datasource query

Parameters:

  • timeout - (float, optional) A query timeout in seconds
  • label - (str, optional) A label to apply to the SQL query

Returns:

(DataSourceQueryResult) - The result of the SQL query

get_conn(self)

Get a connection to this query's datasource

get_datasource(self)

Get a reference to the datasource for this query

get_datasource_name(self)

Get the name of the datasource used in this query

get_dialect_name(self)

Get the name of the datasource dialect

get_tables(self)

Get a reference to the tables used in this query

kill(self, main_thread=None)

Kill this datasource query

Parameters:

  • main_thread - (Thread, optional) A reference to the thread that started the query. This is used as a backup for dialects that don't have a supported way to kill a query. An exception will be asynchronously raised in this thread. It is not guaranteed to actually interrupt the query.

DataSourceQueryResult

Bases: tlbx.logging_utils.PrintMixin

class zillion.report.DataSourceQueryResult(query, data, duration)

The results for a DataSourceQuery

Parameters:

  • query - (DataSourceQuery) The DataSourceQuery that was executed
  • data - (iterable) The result rows
  • duration - (float) The duration of the query execution in seconds

DataSourceQuerySummary

Bases: tlbx.logging_utils.PrintMixin

class zillion.report.DataSourceQuerySummary(query, data, duration)

A summary of the execution results for a DataSourceQuery

Parameters:

  • query - (DataSourceQuery) The DataSourceQuery that was executed
  • data - (iterable) The result rows
  • duration - (float) The duration of the query execution in seconds
format(self)

Return a formatted summary of the DataSourceQuery results

ExecutionStateMixin

class zillion.report.ExecutionStateMixin()

A mixin to manage the state of a report or query

Report

Bases: zillion.report.ExecutionStateMixin

class zillion.report.Report(warehouse, metrics=None, dimensions=None, criteria=None, row_filters=None, rollup=None, pivot=None, order_by=None, limit=None, limit_first=False, adhoc_datasources=None, allow_partial=False)

Build a report against a warehouse. On init DataSource queries are built, but nothing is executed.

Parameters:

  • warehouse - (Warehouse) A zillion warehouse object to run the report against
  • metrics - (list, optional) A list of metric names, or dicts in the case of AdHocMetrics. These will be the measures of your report, or the statistics you are interested in computing at the given dimension grain.
  • dimensions - (list, optional) A list of dimension names to control the grain of the report. You can think of dimensions similarly to the "group by" in a SQL query.
  • criteria - (list, optional) A list of criteria to be applied when querying. Each criteria in the list is represented by a 3-item list or tuple. See core.CRITERIA_OPERATIONS for all supported operations. Note that some operations, such as "like", have varying behavior by datasource dialect. Some examples:

    • ["field_a", ">", 1]
    • ["field_b", "=", "2020-04-01"]
    • ["field_c", "like", "%example%"]
    • ["field_d", "in", ["a", "b", "c"]]
  • row_filters - (list, optional) A list of criteria to apply at the final step (combined query layer) to filter which rows get returned. The format here is the same as for the criteria arg, though the operations are limited to the values of core.ROW_FILTER_OPERATIONS.

  • rollup - (str or int, optional) Controls how metrics are rolled up / aggregated by dimension depth. If not passed no rollup will be computed. If the special value "totals" is passed, only a final tally rollup row will be added. If an int, then it controls the maximum depth to roll up the data, starting from the most granular (last) dimension of the report. Note that the rollup=3 case is like adding a totals row to the "=2" case, as a totals row is a rollup of all dimension levels. Setting rollup=len(dims) is equivalent to rollup="all". For example, if you ran a report with dimensions ["a", "b", "c"]:

    • rollup="totals" - adds a single, final rollup row
    • rollup="all" - rolls up all dimension levels
    • rollup=1 - rolls up the first dimension only
    • rollup=2 - rolls up the first two dimensions
    • rollup=3 - rolls up all three dimensions
    • Any other non-None value would raise an error
  • pivot - (list, optional) A list of dimensions to pivot to columns

  • order_by - (list, optional) A list of (field, asc/desc) tuples that control the ordering of the returned result
  • limit - (int, optional) A limit on the number of rows returned
  • limit_first - (bool, optional) Whether to apply limits before rollups/ordering
  • adhoc_datasources - (list, optional) A list of FieldManagers specific to this report
  • allow_partial - (boolean, optional) Allow reports where only some metrics can meet the requested grain.

Notes:

The order_by and limit functionality is only applied on the final/combined result, NOT in your DataSource queries. In most cases when you are dealing with DataSource tables that are of a decent size you will want to make sure to include criteria that limit the scope of your query and/or take advantage of underlying table indexing. If you were to use order_by or limit without any criteria or dimensions, you would effectively select all rows from the underlying datasource table into memory (or at least try to).

delete(warehouse, spec_id)

Delete a saved report spec

Parameters:

  • spec_id - (int) The ID of a ReportSpec to delete
execute(self)

Execute the datasource queries, combine the results, and do the final result selection. Save the ReportResult on the result attribute

from_params(warehouse, params, adhoc_datasources=None)

Build a report from a set of report params

Parameters:

  • warehouse - (Warehouse) A zillion warehouse object
  • params - (dict) A dict of Report params
  • adhoc_datasources - (list, optional) A list of FieldManagers
get_dimension_grain(self)

Get the portion of the grain specific to request dimensions

get_grain(self)

Get the grain of this report, which accounts for dimension fields required in the requested dimensions, criteria, and formula-based fields.

get_json(self)

Get a JSON representation of the Report params

get_params(self)

Get a dict of params used to create the Report

kill(self, soft=False, raise_if_failed=False)

Kill a running report

Parameters:

  • soft - (bool, optional) If true, set the report state to killed without attempting to kill any running datasource queries.
  • raise_if_failed - (bool, optional) If true, raise FailedKillException if any exceptions occurred when trying to kill datasource queries. Otherwise a warning will be emitted.
load(warehouse, spec_id, adhoc_datasources=None)

Load a report from a spec ID

Parameters:

  • warehouse - (Warehouse) A zillion warehouse object
  • spec_id - (int) A ReportSpec ID
  • adhoc_datasources - (list, optional) A list of FieldManagers
load_warehouse_id_for_report(spec_id)

Get the Warehouse ID for a particular report spec

Parameters:

  • spec_id - (int) A ReportSpec ID

Returns:

(dict) - A Warehouse ID

save(self, meta=None)

Save the report spec and return the saved spec ID

Parameters:

  • meta - (object, optional) A metadata object to be serialized as JSON and stored with the report

Returns:

(int) - The ID of the saved ReportSpec

ReportResult

Bases: tlbx.logging_utils.PrintMixin

class zillion.report.ReportResult(df, duration, query_summaries, metrics, dimensions, rollup, unsupported_grain_metrics=None, warnings=None)

Encapsulates a report result as well as some additional helpers and summary statistics.

Parameters:

  • df - (DataFrame) The DataFrame containing the final report result
  • duration - (float) The report execution duration in seconds
  • query_summaries - (list of DataSourceQuerySummary) Summaries of the underyling query results.
  • metrics - (OrderedDict) A mapping of requested metrics to Metric objects
  • dimensions - (OrderedDict) A mapping of requested dimensions to Dimension objects
  • rollup - (str or int) See the Report docs for more details.
  • unsupported_grain_metrics - (dict) A dictionary mapping metric names to reasons why they could not meet the requested grain of the Report.

SQLiteMemoryCombinedResult

Bases: zillion.report.BaseCombinedResult

class zillion.report.SQLiteMemoryCombinedResult(warehouse, ds_query_results, primary_ds_dimensions, adhoc_datasources=None)

Combine query results in an in-memory SQLite database

add_warning(self, msg, log=True)
clean_up(self)

Clean up the SQLite combined result table

create_table(self)

Create a table in the SQLite database to store the combined result

get_conn(self)

Get a SQLite memory database connection

get_cursor(self, conn)

Get a SQLite cursor from the connection

get_final_result(self, metrics, dimensions, row_filters, rollup, pivot, order_by, limit, limit_first)

Get the final reseult from the combined result table

Parameters:

  • metrics - (OrderedDict) An OrderedDict mapping metric names to Metric objects
  • dimensions - (OrderedDict) An OrderedDict mapping dimension names to Dimension objects
  • row_filters - (list) A list of criteria to filter which rows get returned
  • rollup - (str or int) Controls how metrics are rolled up / aggregated by dimension. See the Report docs for more details.
  • pivot - (list) A list of dimensions to pivot to columns
  • order_by - (list) A list of (field, asc/desc) tuples that control the ordering of the returned result
  • limit - (int) A limit on the number of rows returned
  • limit_first - (bool, optional) Whether to apply limits before rollups/ordering

Returns:

(DataFrame) - A DataFrame with the final report result

Notes:

The default ordering of operations is meant to roughly parallel that of MySQL's rollup, having, order by and limit behavior. The operations are applied in the following order: technicals, rollups, rounding, order_by, row_filters, limit, pivot. If you set limit_first=True the the row_filter and limit operations are moved ahead of the rollups: technicals, row_filters, limit, rollups, rounding, order_by, pivot.

ifnull_clause(self, column_clause, ifnull_value)

Produce an ifnull clause specific to this database dialect

load_table(self)

Load the combined result table