# Databricks

## Requirements

* Within Databricks, you must be an Account Admin
* Within Teleskope, you have a Teleskope Account with the Admin role
* A **Teleskope service principal** created in Databricks
* A **personal access token (PAT)** or **OAuth credentials** for API access
* An active **SQL warehouse** per workspace
  * Use the same warehouse name across workspaces, recommended to be specific to Teleskope
* Assigned permissions for the Teleskope service principal on all target data assets

## Integration

Teleskope integrates with Databricks using Unity Catalog metadata APIs, SQL query execution APIs, and system tables. The connector supports:

#### Metadata Discovery

Teleskope scans Unity Catalog using the following object hierarchy:

```
mathematicaCopyEditWorkspace → Catalog → Schema → Table → Column
                             ↘︎ Volume (optional)
```

It discovers:

* Table metadata including schema, data types, tags, and masking policies
* Volume objects such as unstructured file paths (e.g., CSV, JSON, Parquet)

#### Data Sampling

Teleskope executes parameterized SQL queries against the assigned SQL warehouse using:

* `TABLESAMPLE` clause for row-level sampling
* Optional Genie-based sampling (for distribution and profiling of string fields)

#### Tagging and Governance

Using Unity Catalog, Teleskope can:

* Apply governance tags directly to tables and columns via `APPLY TAG`
* Detect and audit existing column masks using `INFORMATION_SCHEMA.COLUMN_MASKS`
* Track and potentially define masking policies for sensitive data

#### Policy Management (optional)

Teleskope can integrate with Policy Maker to:

* Deploy masking policies via SQL commands (`CREATE FUNCTION`, `SET MASK`)
* Automate row-level security using dynamic `FILTER POLICY` functions
* Track and log data access patterns for alerting or escalation

#### Access Monitoring

Databricks system tables allow Teleskope to monitor query and access history:

* `system.query_history` for user-level query logging
* Audit logs for data access, table modifications, and privilege changes (if enabled)

## Enrollment

To enroll Databricks with Teleskope:

{% stepper %}
{% step %}
**Create a Teleskope Service Principal**

* Set up a dedicated [service principal](https://docs.databricks.com/aws/en/admin/users-groups/service-principals) in Databricks for Teleskope access.
* Generate a personal access token (PAT) or configure OAuth credentials.
  {% endstep %}

{% step %}
**Assign the Account Admin Role**

The service principal requires the account admin role in order to be able to list workspaces.
{% endstep %}

{% step %}
**Assign Required Permissions per Catalog**

The service principal must be granted the following minimum permissions:

| Object Type  | Privileges Needed     |
| ------------ | --------------------- |
| Catalog      | `USE CATALOG`         |
| Schema       | `USE SCHEMA`          |
| Tables/Views | `SELECT`, `APPLY TAG` |

For advanced features such as **Access Monitoring**, grant the following additional permissions:

* Access to `system.query_history` and audit logs (for user activity tracking)

[Link to full Python enrollment script](https://docs.teleskope.ai/connectors/saas/databricks/enrollment-script)
{% endstep %}

{% step %}
**Collect Integration Details**

Have the following information available to configure the connector:

* Databricks workspace URL
* Account ID
* SQL Warehouse ID
* Client ID / Client Secret for the Teleskope service principal
  {% endstep %}
  {% endstepper %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.teleskope.ai/connectors/saas/databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
