Module zillion.report¶

BaseCombinedResult ¶

class zillion.report.BaseCombinedResult(warehouse, ds_query_results, primary_ds_dimensions, adhoc_datasources=None)

A combination of datasource query results

Parameters:

warehouse - (Warehouse) A zillion warehouse
ds_query_results - (list) A list of DataSourceQueryResult objects
primary_ds_dimensions - (list) A list of dimensions that will be used to create the hash primary key of the combined result table
adhoc_datasources - (list, optional) A list of FieldManagers specific to this combined result

add_warning(self, msg, log=True)

clean_up(self)

Clean up any resources that can/should be cleaned up

create_table(self)

Create the combined result table

get_conn(self)

Get a database connection to the combined result database

get_cursor(self, conn)

Get a cursor from a database connection

get_final_result(self, metrics, dimensions, row_filters, rollup, pivot, order_by, limit, limit_first)

Get the final result from the combined result table

get_metric_clause(self, metric, has_formula_dims)

Get a select clause for a metric including aggregation

ifnull_clause(self, column_clause, ifnull_value)

Produce an ifnull clause specific to this database dialect

load_table(self)

Load the combined result table

DataSourceQuery ¶

Bases: zillion.report.ExecutionStateMixin, tlbx.logging_utils.PrintMixin

class zillion.report.DataSourceQuery(warehouse, metrics, dimensions, criteria, table_set)

Build a query to run against a particular datasource

Parameters:

warehouse - (Warehouse) A zillion warehouse
metrics - (OrderedDict) An OrderedDict mapping metric names to Metric objects
dimensions - (OrderedDict) An OrderedDict mapping dimension names to Dimension objects
criteria - (list) A list of criteria to be applied when querying. See the Report docs for more details.
table_set - (TableSet) Build the query against this set of tables that supports the requested metrics and grain

add_metric(self, metric, adhoc_fms=None)

Add a metric to this query

Parameters:

metric - (str) A metric name
adhoc_fms - (list, optional) A list of adhoc FieldManagers

covers_field(self, field)

Check whether a field is covered in this query

Parameters:

field - (str) A field name

Returns:

(bool) - True if this field is covered in this query

covers_metric(self, metric)

Check whether a metric is covered in this query

Parameters:

metric - (str) A metric name

Returns:

(bool) - True if this metric is covered in this query

execute(self, timeout=None, label=None)

Execute the datasource query

Parameters:

timeout - (float, optional) A query timeout in seconds
label - (str, optional) A label to apply to the SQL query

Returns:

(DataSourceQueryResult) - The result of the SQL query

get_conn(self)

Get a connection to this query's datasource

get_datasource(self)

Get a reference to the datasource for this query

get_datasource_name(self)

Get the name of the datasource used in this query

get_dialect_name(self)

Get the name of the datasource dialect

get_tables(self)

Get a reference to the tables used in this query

kill(self, main_thread=None)

Kill this datasource query

Parameters:

main_thread - (Thread, optional) A reference to the thread that started the query. This is used as a backup for dialects that don't have a supported way to kill a query. An exception will be asynchronously raised in this thread. It is not guaranteed to actually interrupt the query.

DataSourceQueryResult ¶

Bases: tlbx.logging_utils.PrintMixin

class zillion.report.DataSourceQueryResult(query, data, duration)

The results for a DataSourceQuery

Parameters:

query - (DataSourceQuery) The DataSourceQuery that was executed
data - (iterable) The result rows
duration - (float) The duration of the query execution in seconds

DataSourceQuerySummary ¶

Bases: tlbx.logging_utils.PrintMixin

class zillion.report.DataSourceQuerySummary(query, data, duration)

A summary of the execution results for a DataSourceQuery

Parameters:

query - (DataSourceQuery) The DataSourceQuery that was executed
data - (iterable) The result rows
duration - (float) The duration of the query execution in seconds

format(self)

Return a formatted summary of the DataSourceQuery results

ExecutionStateMixin ¶

class zillion.report.ExecutionStateMixin()

A mixin to manage the state of a report or query

Report ¶

Bases: zillion.report.ExecutionStateMixin

class zillion.report.Report(warehouse, metrics=None, dimensions=None, criteria=None, row_filters=None, rollup=None, pivot=None, order_by=None, limit=None, limit_first=False, adhoc_datasources=None, allow_partial=False, report_depth=1)

Build a report against a warehouse. On init DataSource queries are built, but nothing is executed.

Parameters:

warehouse - (Warehouse) A zillion warehouse object to run the report against
metrics - (list, optional) A list of metric names, or dicts in the case of AdHocMetrics. These will be the measures of your report, or the statistics you are interested in computing at the given dimension grain.
dimensions - (list, optional) A list of dimension names to control the grain of the report. You can think of dimensions similarly to the "group by" in a SQL query.
criteria - (list, optional) A list of criteria to be applied when querying. Each criteria in the list is represented by a 3-item list or tuple. See core.CRITERIA_OPERATIONS for all supported operations. Note that some operations, such as "like", have varying behavior by datasource dialect. Some examples:
- ["field_a", ">", 1]
- ["field_b", "=", "2020-04-01"]
- ["field_c", "like", "%example%"]
- ["field_d", "in", ["a", "b", "c"]]
row_filters - (list, optional) A list of criteria to apply at the final step (combined query layer) to filter which rows get returned. The format here is the same as for the criteria arg, though the operations are limited to the values of core.ROW_FILTER_OPERATIONS.
rollup - (str or int, optional) Controls how metrics are rolled up / aggregated by dimension depth. If not passed no rollup will be computed. If the special value "totals" is passed, only a final tally rollup row will be added. If an int, then it controls the maximum depth to roll up the data, starting from the most granular (last) dimension of the report. Note that the rollup=3 case is like adding a totals row to the "=2" case, as a totals row is a rollup of all dimension levels. Setting rollup=len(dims) is equivalent to rollup="all". For example, if you ran a report with dimensions ["a", "b", "c"]:
- rollup="totals" - adds a single, final rollup row
- rollup="all" - rolls up all dimension levels
- rollup=1 - rolls up the first dimension only
- rollup=2 - rolls up the first two dimensions
- rollup=3 - rolls up all three dimensions
- Any other non-None value would raise an error
pivot - (list, optional) A list of dimensions to pivot to columns
order_by - (list, optional) A list of (field, asc/desc) tuples that control the ordering of the returned result
limit - (int, optional) A limit on the number of rows returned
limit_first - (bool, optional) Whether to apply limits before rollups/ordering
adhoc_datasources - (list, optional) A list of FieldManagers specific to this report
allow_partial - (boolean, optional) Allow reports where only some metrics can meet the requested grain.
report_depth - (int, optional) The depth of subreport recursion we are currently in. This is used to prevent infinite recursion when building subreports.

Notes:

The order_by and limit functionality is only applied on the final/combined result, NOT in your DataSource queries. In most cases when you are dealing with DataSource tables that are of a decent size you will want to make sure to include criteria that limit the scope of your query and/or take advantage of underlying table indexing. If you were to use order_by or limit without any criteria or dimensions, you would effectively select all rows from the underlying datasource table into memory (or at least try to).

delete(warehouse, spec_id)

Delete a saved report spec

Parameters:

spec_id - (int) The ID of a ReportSpec to delete

execute(self)

Execute the datasource queries, combine the results, and do the final result selection. Save the ReportResult on the result attribute

from_params(warehouse, params, adhoc_datasources=None, report_depth=1)

Build a report from a set of report params

Parameters:

warehouse - (Warehouse) A zillion warehouse object
params - (dict) A dict of Report params
adhoc_datasources - (list, optional) A list of FieldManagers
report_depth - (int, optional) The depth of the subreport, used internally to prevent too much recursion.

from_text(warehouse, text, adhoc_datasources=None, allow_partial=False, report_depth=1)

Build a report from a set of report params

Parameters:

warehouse - (Warehouse) A zillion warehouse object
text - (str) A natural language query
adhoc_datasources - (list, optional) A list of FieldManagers
allow_partial - (bool, optional) Allow partial report results
report_depth - (int, optional) The depth of the subreport, used internally to prevent too much recursion.

Returns:

report - (Report) A report object

get_dimension_grain(self)

Get the portion of the grain specific to request dimensions

get_grain(self)

Get the grain of this report, which accounts for dimension fields required in the requested dimensions, criteria, and formula-based fields.

get_json(self)

Get a JSON representation of the Report params

get_params(self)

Get a dict of params used to create the Report

kill(self, soft=False, raise_if_failed=False)

Kill a running report

Parameters:

soft - (bool, optional) If true, set the report state to killed without attempting to kill any running datasource queries.
raise_if_failed - (bool, optional) If true, raise FailedKillException if any exceptions occurred when trying to kill datasource queries. Otherwise a warning will be emitted.

load(warehouse, spec_id, adhoc_datasources=None, report_depth=1)

Load a report from a spec ID

Parameters:

warehouse - (Warehouse) A zillion warehouse object
spec_id - (int) A ReportSpec ID
adhoc_datasources - (list, optional) A list of FieldManagers
report_depth - (int, optional) The depth of the subreport, used internally to prevent too much recursion.

load_warehouse_id_for_report(spec_id)

Get the Warehouse ID for a particular report spec

Parameters:

spec_id - (int) A ReportSpec ID

Returns:

(dict) - A Warehouse ID

save(self, meta=None)

Save the report spec and return the saved spec ID

Parameters:

meta - (object, optional) A metadata object to be serialized as JSON and stored with the report

Returns:

(int) - The ID of the saved ReportSpec

ReportResult ¶

Bases: tlbx.logging_utils.PrintMixin

class zillion.report.ReportResult(df, duration, query_summaries, metrics, dimensions, rollup, unsupported_grain_metrics=None, warnings=None)

Encapsulates a report result as well as some additional helpers and summary statistics.

Parameters:

df - (DataFrame) The DataFrame containing the final report result
duration - (float) The report execution duration in seconds
query_summaries - (list of DataSourceQuerySummary) Summaries of the underyling query results.
metrics - (OrderedDict) A mapping of requested metrics to Metric objects
dimensions - (OrderedDict) A mapping of requested dimensions to Dimension objects
rollup - (str or int) See the Report docs for more details.
unsupported_grain_metrics - (dict) A dictionary mapping metric names to reasons why they could not meet the requested grain of the Report.

SQLiteMemoryCombinedResult ¶

Bases: zillion.report.BaseCombinedResult

class zillion.report.SQLiteMemoryCombinedResult(warehouse, ds_query_results, primary_ds_dimensions, adhoc_datasources=None)

Combine query results in an in-memory SQLite database

add_warning(self, msg, log=True)

clean_up(self)

Clean up the SQLite combined result table

create_table(self)

Create a table in the SQLite database to store the combined result

get_conn(self)

Get a SQLite memory database connection

get_cursor(self, conn)

Get a SQLite cursor from the connection

get_final_result(self, metrics, dimensions, row_filters, rollup, pivot, order_by, limit, limit_first)

Get the final reseult from the combined result table

Parameters:

metrics - (OrderedDict) An OrderedDict mapping metric names to Metric objects
dimensions - (OrderedDict) An OrderedDict mapping dimension names to Dimension objects
row_filters - (list) A list of criteria to filter which rows get returned
rollup - (str or int) Controls how metrics are rolled up / aggregated by dimension. See the Report docs for more details.
pivot - (list) A list of dimensions to pivot to columns
order_by - (list) A list of (field, asc/desc) tuples that control the ordering of the returned result
limit - (int) A limit on the number of rows returned
limit_first - (bool, optional) Whether to apply limits before rollups/ordering

Returns:

(DataFrame) - A DataFrame with the final report result

Notes:

The default ordering of operations is meant to roughly parallel that of MySQL's rollup, having, order by and limit behavior. The operations are applied in the following order: technicals, rollups, rounding, order_by, row_filters, limit, pivot. If you set limit_first=True the the row_filter and limit operations are moved ahead of the rollups: technicals, row_filters, limit, rollups, rounding, order_by, pivot.

get_metric_clause(self, metric, has_formula_dims)

Get a select clause for a metric including aggregation

ifnull_clause(self, column_clause, ifnull_value)

Produce an ifnull clause specific to this database dialect

load_table(self)

Load the combined result table