Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sharepoint documentation #676

Merged
merged 1 commit into from
Dec 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 58 additions & 50 deletions spiceaidocs/docs/components/data-connectors/sharepoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,55 +16,58 @@ datasets:
sharepoint_tenant_id: ${secrets:SPICE_SHAREPOINT_TENANT_ID}
sharepoint_client_secret: ${secrets:SPICE_SHAREPOINT_CLIENT_SECRET}
```

#### Example

```sql
SELECT * FROM important_documents limit 1
```

Returns
```json

````json
[
{
"created_by_id": "cbccd193-f9f1-4603-b01d-ff6f3e6f2108",
"created_by_name": "Jack Eadie",
"created_at": "2024-09-09T04:57:00",
"c_tag": "\"c:{BD4D130F-2C95-4E59-9F93-85BD0A9E1B19},1\"",
"e_tag": "\"{BD4D130F-2C95-4E59-9F93-85BD0A9E1B19},1\"",
"id": "01YRH3MPAPCNG33FJMLFHJ7E4FXUFJ4GYZ",
"last_modified_by_id": "cbccd193-f9f1-4603-b01d-ff6f3e6f2108",
"last_modified_by_name": "Jack Eadie",
"last_modified_at": "2024-09-09T04:57:00",
"name": "ngx_google_perftools_module.md",
"size": 959,
"web_url": "https://spiceai.sharepoint.com/Shared%20Documents/md/ngx_google_perftools_module.md",
"content": "# Module ngx_google_perftools_module\n\nThe `ngx_google_perftools_module` module (0.6.29) enables profiling of nginx worker processes using [Google Performance Tools](https://github.com/gperftools/gperftools). The module is intended for nginx developers.\n\nThis module is not built by default, it should be enabled with the `--with-google_perftools_module` configuration parameter.\n\n> **Note:** This module requires the [gperftools](https://github.com/gperftools/gperftools) library.\n\n## Example Configuration\n\n```nginx\ngoogle_perftools_profiles /path/to/profile;\n```\n\nProfiles will be stored as `/path/to/profile.<worker_pid>`.\n\n## Directives\n\n### google_perftools_profiles\n\n- **Syntax:** `google_perftools_profiles file;`\n- **Default:** —\n- **Context:** `main`\n\nSets a file name that keeps profiling information of nginx worker process. The ID of the worker process is always a part of the file name and is appended to the end of the file name, after a dot.\n"
}
{
"created_by_id": "cbccd193-f9f1-4603-b01d-ff6f3e6f2108",
"created_by_name": "Jack Eadie",
"created_at": "2024-09-09T04:57:00",
"c_tag": "\"c:{BD4D130F-2C95-4E59-9F93-85BD0A9E1B19},1\"",
"e_tag": "\"{BD4D130F-2C95-4E59-9F93-85BD0A9E1B19},1\"",
"id": "01YRH3MPAPCNG33FJMLFHJ7E4FXUFJ4GYZ",
"last_modified_by_id": "cbccd193-f9f1-4603-b01d-ff6f3e6f2108",
"last_modified_by_name": "Jack Eadie",
"last_modified_at": "2024-09-09T04:57:00",
"name": "ngx_google_perftools_module.md",
"size": 959,
"web_url": "https://spiceai.sharepoint.com/Shared%20Documents/md/ngx_google_perftools_module.md",
"content": "# Module ngx_google_perftools_module\n\nThe `ngx_google_perftools_module` module (0.6.29) enables profiling of nginx worker processes using [Google Performance Tools](https://github.com/gperftools/gperftools). The module is intended for nginx developers.\n\nThis module is not built by default, it should be enabled with the `--with-google_perftools_module` configuration parameter.\n\n> **Note:** This module requires the [gperftools](https://github.com/gperftools/gperftools) library.\n\n## Example Configuration\n\n```nginx\ngoogle_perftools_profiles /path/to/profile;\n```\n\nProfiles will be stored as `/path/to/profile.<worker_pid>`.\n\n## Directives\n\n### google_perftools_profiles\n\n- **Syntax:** `google_perftools_profiles file;`\n- **Default:** —\n- **Context:** `main`\n\nSets a file name that keeps profiling information of nginx worker process. The ID of the worker process is always a part of the file name and is appended to the end of the file name, after a dot.\n"
}
]
```

````

:::warning[Limitations]
The sharepoint connector does not yet support creating a dataset from a single file (e.g. an Excel spreadsheet). Datasets must be created from a folder of documents (see [Document Support](/components/data-connectors/index.md#document-support)).
:::


## Configuration

### Parameters

| Name | Required? | Description |
|---|---| --- |
| `sharepoint_client_id` | **Yes** | The client ID of the Azure AD (Entra) application |
| `sharepoint_tenant_id` | **Yes** | The tenant ID of the Azure AD (Entra) application. |
| `sharepoint_client_secret` | Optional | For service principal authentication. The client secret of the Azure AD (Entra) application. |
| `sharepoint_auth_code` | Optional | For user authentication. The authorization code obtained from the OAuth2 flow (see `spice login sharepoint` [docs](/cli/reference/login)). |
| Name | Required? | Description |
| -------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `sharepoint_client_id` | **Yes** | The client ID of the Azure AD (Entra) application |
| `sharepoint_tenant_id` | **Yes** | The tenant ID of the Azure AD (Entra) application. |
| `sharepoint_client_secret` | Optional | For service principal authentication. The client secret of the Azure AD (Entra) application. |
| `sharepoint_bearer_token` | Optional | For user authentication. The bearer access token obtained from the OAuth2 flow (see `spice login sharepoint` [docs](/cli/reference/login)). |

:::note
Only one of `sharepoint_client_secret` or `sharepoint_auth_code` is allowed.
Only one of `sharepoint_client_secret` or `sharepoint_bearer_token` is allowed.
:::

### `from` formats

The `from` field in a SharePoint dataset takes the following format:

```yaml
from: 'sharepoint:<drive_type>:<drive_id>/<subpath_type>:<subpath_value>'
```
Expand All @@ -73,15 +76,15 @@ from: 'sharepoint:<drive_type>:<drive_id>/<subpath_type>:<subpath_value>'

`drive_type` in a SharePoint Connector `from` field supports the following types:

| Drive Type | Description | Example |
|---|---| --- |
| `drive` | The SharePoint drive's name | `from: sharepoint:drive:Documents/...` |
| `driveID` | The SharePoint drive's ID | `from: sharepoint:driveId:b!Mh8opUGD80ec7zGXgX9r/...` |
| `site` | A SharePoint site's name | `from: sharepoint:site:MySite/...` |
| `siteID` | A SharePoint site's ID | `from: sharepoint:siteId:b!Mh8opUGD80ec7zGXgX9r/...` |
| `group` | A SharePoint group's name | `from: sharepoint:group:MyGroup/...` |
| `groupId` | A SharePoint group's ID | `from: sharepoint:groupId:b!Mh8opUGD80ec7zGXgX9r/...` |
| `me` | A user's OneDrive | `from: sharepoint:me/...` |
| Drive Type | Description | Example |
| ---------- | --------------------------- | ----------------------------------------------------- |
| `drive` | The SharePoint drive's name | `from: sharepoint:drive:Documents/...` |
| `driveID` | The SharePoint drive's ID | `from: sharepoint:driveId:b!Mh8opUGD80ec7zGXgX9r/...` |
| `site` | A SharePoint site's name | `from: sharepoint:site:MySite/...` |
| `siteID` | A SharePoint site's ID | `from: sharepoint:siteId:b!Mh8opUGD80ec7zGXgX9r/...` |
| `group` | A SharePoint group's name | `from: sharepoint:group:MyGroup/...` |
| `groupId` | A SharePoint group's ID | `from: sharepoint:groupId:b!Mh8opUGD80ec7zGXgX9r/...` |
| `me` | A user's OneDrive | `from: sharepoint:me/...` |

:::note
For the `me` drive type the user is identified based on `sharepoint_client_code` and cannot be used with `sharepoint_client_secret`
Expand All @@ -93,29 +96,33 @@ For a name-based `drive_id`, the connector will attempt to resolve the name to a

Within a drive, the SharePoint connector can load documents from:

| Description | Example |
| ---| --- |
| The root of the drive | `from: sharepoint:me/root` |
| A specific path within the drive | `from: sharepoint:drive:Documents/path:/top_secrets` |
| A specific folder ID | `from: sharepoint:group:MyGroup/id:01QM2NJSNHBISUGQ52P5AJQ3CBNOXDMVNT` |

| Description | Example |
| -------------------------------- | ---------------------------------------------------------------------- |
| The root of the drive | `from: sharepoint:me/root` |
| A specific path within the drive | `from: sharepoint:drive:Documents/path:/top_secrets` |
| A specific folder ID | `from: sharepoint:group:MyGroup/id:01QM2NJSNHBISUGQ52P5AJQ3CBNOXDMVNT` |

## Authentication

As outlined in the [connector parameters](#parameters), the SharePoint connector supports two types of authentication:
1. Service principal authentication, by setting the `sharepoint_client_secret` parameter.
2. User authentication, by setting the `sharepoint_auth_code` parameter. Generally this is obtained by running `spice login sharepoint` and following the OAuth2 flow.

1. Service principal authentication, by setting the `sharepoint_client_secret` parameter.
2. User authentication, by setting the `sharepoint_bearer_token` parameter. Generally this is obtained by running `spice login sharepoint` and following the OAuth2 flow.

### Creating an Enterprise Application
To use the SharePoint connector with service principal authentication, you will need to create an Azure AD application and grant it the necessary permissions. This will also support OAuth2 authentication for users within the tenant (i.e. `sharepoint_auth_code`).

To use the SharePoint connector with service principal authentication, you will need to create an Azure AD application and grant it the necessary permissions. This will also support OAuth2 authentication for users within the tenant (i.e. `sharepoint_bearer_token`).

1. Create a new Azure AD application in the [Azure portal](https://portal.azure.com/#view/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/~/Overview).
2. Under the application's `API permissions`, add the following permissions: `Sites.Read.All`, `Files.Read.All`, `User.Read`, `GroupMember.Read.All`
- For service principal authentication, Application permissions are required.
- For user authentication, only delegated permissions are required.
3. Add `sharepoint_client_id` (from the `Application (Client) ID` field) and `sharepoint_tenant_id` to the connector configuration.
4. (For service principal authentication): Under the application's `Certificates & secrets`, create a new client secret. Use this for the `sharepoint_client_secret` parameter.
- For service principal authentication, Application permissions are required.
- For user authentication, only delegated permissions are required.
3. (For user authentication): Under the applications's `Authentication`, add `http://localhost` as Mobile and desktop applications redirect URI.
4. Add `sharepoint_client_id` (from the `Application (Client) ID` field) and `sharepoint_tenant_id` to the connector configuration.
5. (For service principal authentication): Under the application's `Certificates & secrets`, create a new client secret. Use this for the `sharepoint_client_secret` parameter.

### Default Spice Application

For your convenience, Spice AI maintains a default Entra (Azure AD) application that can be used for authentication against your SharePoint instance. This application requires OAuth2 authentication. To use it:

```yaml
Expand All @@ -125,10 +132,11 @@ datasets:
params:
sharepoint_client_id: f2b3116e-b4c4-464f-80ec-73cd9d9886b4
sharepoint_tenant_id: #${env:TENANT_ID}
sharepoint_auth_code: ${secrets:SPICE_SHAREPOINT_AUTH_CODE}
sharepoint_bearer_token: ${secrets:SPICE_SHAREPOINT_BEARER_TOKEN}
```

And set the `SPICE_SHAREPOINT_AUTH_CODE` secret via:
And set the `SPICE_SHAREPOINT_BEARER_TOKEN` secret via:

```shell
spice login sharepoint --tenant-id $TENANT_ID --client-id f2b3116e-b4c4-464f-80ec-73cd9d9886b4
```
Loading