The azure-search-openai-demo project can set up a full RAG chat app on Azure AI Search and OpenAI so that you can chat on custom data, like internal enterprise data or domain-specific knowledge sets. For full instructions on setting up the project, consult the main README, and then return here for detailed instructions on configuring login and access control.
- Requirements
- Setting up Microsoft Entra applications
- Adding data with document level access control
- Environment variables reference
This guide demonstrates how to add an optional login and document level access control system to the sample. This system can be used to restrict access to indexed data to specific users based on what Microsoft Entra groups they are a part of, or their user object id.
IMPORTANT: In order to add optional login and document level access control, you'll need the following in addition to the normal sample requirements
- Azure account permissions: Your Azure account must have permission to manage applications in Microsoft Entra.
Two Microsoft Entra applications must be registered in order to make the optional login and document level access control system work correctly. One app is for the client UI. The client UI is implemented as a single page application. The other app is for the API server. The API server uses a confidential client to call the Microsoft Graph API.
The easiest way to setup the two apps is to use the azd
CLI. We've written scripts that will automatically create the two apps and configure them for use with the sample. To trigger the automatic setup, run the following commands:
- Run
azd env set AZURE_USE_AUTHENTICATION true
to enable the login UI and use App Service authentication by default. - Ensure access control is enabled on your search index. If your index doesn't exist yet, run prepdocs with
AZURE_USE_AUTHENTICATION
set totrue
. If your index already exists, runpython ./scripts/manageacl.py --acl-action enable_acls
. - (Optional) To require access control when using the app, run
azd env set AZURE_ENFORCE_ACCESS_CONTROL true
. Authentication is always required to search on documents with access control assigned, regardless of if unauthenticated access is enabled or not. - (Optional) To allow authenticated users to search on documents that have no access controls assigned, even when access control is required, run
azd env set AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS true
. - (Optional) To allow unauthenticated users to use the app, even when access control is enforced, run
azd env set AZURE_ENABLE_UNAUTHENTICATED_ACCESS true
.AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS
should also be set to true if you want unauthenticated users to be able to search on documents with no access control. - Run
azd env set AZURE_AUTH_TENANT_ID <YOUR-TENANT-ID>
to set the tenant ID associated with authentication. - If your auth tenant ID is different from your currently logged in tenant ID, run
azd auth login --tenant-id <YOUR-TENANT-ID>
to login to the authentication tenant simultaneously. - Run
azd up
to deploy the app.
The following instructions explain how to setup the two apps using the Azure Portal.
-
Sign in to the Azure portal.
-
Select the Microsoft Entra ID service.
-
In the left hand menu, select Application Registrations.
-
Select New Registration.
- In the Name section, enter a meaningful application name. This name will be displayed to users of the app, for example
Azure Search OpenAI Chat API
. - Under Supported account types, select Accounts in this organizational directory only.
- In the Name section, enter a meaningful application name. This name will be displayed to users of the app, for example
-
Select Register to create the application
-
In the app's registration screen, find the Application (client) ID.
- Run the following
azd
command to save this ID:azd env set AZURE_SERVER_APP_ID <Application (client) ID>
.
- Run the following
-
Microsoft Entra supports three types of credentials to authenticate an app using the client credentials: passwords (app secrets), certificates, and federated identity credentials. For a higher level of security, either certificates or federated identity credentials are recommended. This sample currently uses an app secret for ease of provisioning.
-
Select Certificates & secrets in the left hand menu.
-
In the Client secrets section, select New client secret.
- Type a description, for example
Azure Search OpenAI Chat Key
. - Select one of the available key durations.
- The generated key value will be displayed after you select Add.
- Copy the generated key value and run the following
azd
command to save this ID:azd env set AZURE_SERVER_APP_SECRET <generated key value>
.
- Type a description, for example
-
Select API Permissions in the left hand menu. By default, the delegated
User.Read
permission should be present. This permission is required to read the signed-in user's profile to get the security information used for document level access control. If this permission is not present, it needs to be added to the application.- Select Add a permission, and then Microsoft Graph.
- Select Delegated permissions.
- Search for and and select
User.Read
. - Select Add permissions.
-
Select Expose an API in the left hand menu. The server app works by using the On Behalf Of Flow, which requires the server app to expose at least 1 API.
- The application must define a URI to expose APIs. Select Add next to Application ID URI.
- By default, the Application ID URI is set to
api://<application client id>
. Accept the default by selecting Save.
- By default, the Application ID URI is set to
- Under Scopes defined by this API, select Add a scope.
- Fill in the values as indicated:
- For Scope name, use access_as_user.
- For Who can consent?, select Admins and users.
- For Admin consent display name, type Access Azure Search OpenAI Chat API.
- For Admin consent description, type Allows the app to access Azure Search OpenAI Chat API as the signed-in user..
- For User consent display name, type Access Azure Search OpenAI Chat API.
- For User consent description, type Allow the app to access Azure Search OpenAI Chat API on your behalf.
- Leave State set to Enabled.
- Select Add scope at the bottom to save the scope.
- The application must define a URI to expose APIs. Select Add next to Application ID URI.
-
(Optional) Enable group claims. Include which Microsoft Entra groups the user is part of as part of the login in the optional claims. The groups are used for optional security filtering in the search results.
- In the left hand menu, select Token configuration
- Under Optional claims, select Add groups claim
- Select which group types to include in the claim. Note that a overage claim will be emitted if the user is part of too many groups. In this case, the API server will use the Microsoft Graph to list the groups the user is part of instead of relying on the groups in the claim.
- Select Add to save your changes
- Sign in to the Azure portal.
- Select the Microsoft Entra ID service.
- In the left hand menu, select Application Registrations.
- Select New Registration.
- In the Name section, enter a meaningful application name. This name will be displayed to users of the app, for example
Azure Search OpenAI Chat Web App
. - Under Supported account types, select Accounts in this organizational directory only.
- Under
Redirect URI (optional)
section, selectSingle-page application (SPA)
in the combo-box and enter the following redirect URI:- If you are running the sample locally, add the endpoints
http://localhost:50505/redirect
andhttp://localhost:5173/redirect
- If you are running the sample on Azure, add the endpoints provided by
azd up
:https://<your-endpoint>.azurewebsites.net/redirect
. - If you are running the sample from Github Codespaces, add the Codespaces endpoint:
https://<your-codespace>-50505.app.github.dev/redirect
- If you are running the sample locally, add the endpoints
- In the Name section, enter a meaningful application name. This name will be displayed to users of the app, for example
- Select Register to create the application
- In the app's registration screen, find the Application (client) ID.
- Run the following
azd
command to save this ID:azd env set AZURE_CLIENT_APP_ID <Application (client) ID>
.
- Run the following
- In the left hand menu, select Authentication.
- Under Web, add a redirect URI with the endpoint provided by
azd up
:https://<your-endpoint>.azurewebsites.net/.auth/login/aad/callback
. - Under Implicit grant and hybrid flows, select ID Tokens (used for implicit and hybrid flows)
- Select Save
- Under Web, add a redirect URI with the endpoint provided by
- In the left hand menu, select API permissions. You will add permission to access the access_as_user API on the server app. This permission is required for the On Behalf Of Flow to work.
- Select Add a permission, and then My APIs.
- In the list of applications, select your server application Azure Search OpenAI Chat API
- Ensure Delegated permissions is selected.
- In the Select permissions section, select the access_as_user permission
- Select Add permissions.
- Stay in the API permissions section and select Add a permission.
- Select Microsoft Graph.
- Select Delegated permissions.
- Search for and select
User.Read
. - Select Add permissions.
Consent from the user must be obtained for use of the client and server app. The client app can prompt the user for consent through a dialog when they log in. The server app has no ability to show a dialog for consent. Client apps can be added to the list of known clients to access the server app, so a consent dialog is shown for the server app.
- Navigate to the server app registration
- In the left hand menu, select Manifest
- Replace
"knownClientApplications": []
with"knownClientApplications": ["<client application id>"]
- Select Save
If you are running setup for the first time, ensure you have run azd env set AZURE_ADLS_GEN2_STORAGE_ACCOUNT <YOUR-STORAGE_ACCOUNT>
before running azd up
. If you do not set this environment variable, your index will not be initialized with access control support when prepdocs
is run for the first time. To manually enable access control in your index, use the manual setup script.
Ensure you run azd env set AZURE_USE_AUTHENTICATION
to enable the login UI once you have setup the two Microsoft Entra apps before you deploy or run the application. The login UI will not appear unless all required environment variables have been setup.
In both the chat and ask a question modes, under Developer settings optional Use oid security filter and Use groups security filter checkboxes will appear. The oid (User ID) filter maps to the oids
field in the search index and the groups (Group ID) filter maps to the groups
field in the search index. If AZURE_ENFORCE_ACCESS_CONTROL
has been set, then both the Use oid security filter and Use groups security filter options are always enabled and cannot be disabled.
If you want to use the chat endpoint without the UI and still use authentication, you must disable App Service built-in authentication and use only the app's MSAL-based authentication flow. Ensure the AZURE_DISABLE_APP_SERVICES_AUTHENTICATION
environment variable is set before deploying.
Get an access token that can be used for calling the chat API using the following code:
from azure.identity import DefaultAzureCredential
import os
token = DefaultAzureCredential().get_token(f"api://{os.environ['AZURE_SERVER_APP_ID']}/access_as_user", tenant_id=os.getenv('AZURE_AUTH_TENANT_ID', os.getenv('AZURE_TENANT_ID')))
print(token.token)
- If your primary tenant restricts the ability to create Entra applications, you'll need to use a separate tenant to create the Entra applications. You can create a new tenant by following these instructions. Then run
azd env set AZURE_AUTH_TENANT_ID <YOUR-AUTH-TENANT-ID>
before runningazd up
. - If any Entra apps need to be recreated, you can avoid redeploying the app by changing the app settings in the portal. Any of the required environment variables can be changed. Once the environment variables have been changed, restart the web app.
- It's possible a consent dialog will not appear when you log into the app for the first time. If this consent dialog doesn't appear, you will be unable to use the security filters because the API server app does not have permission to read your authorization information. A consent dialog can be forced to appear by adding
"prompt": "consent"
to theloginRequest
property inauthentication.py
- It's possible that your tenant admin has placed a restriction on consent to apps with unverified publishers. In this case, only admins may consent to the client and server apps, and normal user accounts are unable to use the login system until the admin consents on behalf of the entire organization.
- It's possible that your tenant admin requires admin approval of all new apps. Regardless of whether you select the delegated or admin permissions, the app will not work without tenant admin consent. See this guide for granting consent to an app.
The sample supports 2 main strategies for adding data with document level access control.
- Using the Add Documents API. Sample scripts are provided which use the Azure AI Search Service Add Documents API to directly manage access control information on existing documents in the index.
- Using prepdocs and Azure Data Lake Storage Gen 2. Sample scripts are provided which set up an Azure Data Lake Storage Gen 2 account, set the access control information on files and folders stored there, and ingest those documents into the search index with their access control information.
Manually enable document level access control on a search index and manually set access control values using the manageacl.py script.
Prior to running the script:
- Run
azd up
or useazd env set
to manually set theAZURE_SEARCH_SERVICE
andAZURE_SEARCH_INDEX
azd environment variables - Activate the Python virtual environment for your shell session
The script supports the following commands. All commands support -v
for verbose logging.
-
python ./scripts/manageacl.py --acl-action enable_acls
: Creates the requiredoids
(User ID) andgroups
(Group IDs) security filter fields for document level access control on your index, as well as thestorageUrl
field for storing the Blob storage URL. Does nothing if these fields already exist.Example usage:
python ./scripts/manageacl.py -v --acl-action enable_acls
-
python ./scripts/manageacl.py --acl-type [oids or groups]--acl-action view --url [https://url.pdf]
: Prints access control values associated with either User IDs or Group IDs for the document at the specified URL.Example to view all Group IDs:
python ./scripts/manageacl.py -v --acl-type groups --acl-action view --url https://st12345.blob.core.windows.net/content/Benefit_Options.pdf
-
python ./scripts/manageacl.py --acl-type [oids or groups]--acl-action add --acl [ID of group or user] --url [https://url.pdf]
: Adds an access control value associated with either User IDs or Group IDs for the document at the specified URL.Example to add a Group ID:
python ./scripts/manageacl.py -v --acl-type groups --acl-action add --acl xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx --url https://st12345.blob.core.windows.net/content/Benefit_Options.pdf
-
python ./scripts/manageacl.py --acl-type [oids or groups]--acl-action remove_all --url [https://url.pdf]
: Removes all access control values associated with either User IDs or Group IDs for a specific document.Example to remove all Group IDs:
python ./scripts/manageacl.py -v --acl-type groups --acl-action remove_all --url https://st12345.blob.core.windows.net/content/Benefit_Options.pdf
-
python ./scripts/manageacl.py --url [https://url.pdf] --acl-type [oids or groups]--acl-action remove --acl [ID of group or user]
: Removes an access control value associated with either User IDs or Group IDs for a specific document.Example to remove a specific User ID:
python ./scripts/manageacl.py -v --acl-type oids --acl-action remove --acl xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx --url https://st12345.blob.core.windows.net/content/Benefit_Options.pdf
Azure Data Lake Storage Gen2 implements an access control model that can be used for document level access control. The adlsgen2setup.py script uploads the sample data included in the data folder to a Data Lake Storage Gen2 storage account. The Storage Blob Data Owner role is required to use the script.
In order to use this script, an existing Data Lake Storage Gen2 storage account is required. Run azd env set AZURE_ADLS_GEN2_STORAGE_ACCOUNT <your-storage-account>
prior to running the script.
Then run the script inside your Python environment:
python /scripts/adlsgen2setup.py './data/*' --data-access-control './scripts/sampleacls.json' -v
The script performs the following steps:
- Creates example groups listed in the sampleacls.json file.
- Creates a filesystem / container
gptkbcontainer
in the storage account. - Creates the directories listed in the sampleacls.json file.
- Uploads the sample PDFs referenced in the sampleacls.json file into the appropriate directories.
- Recursively sets Access Control Lists (ACLs) using the information from the sampleacls.json file.
In order to use the sample access control, you need to join these groups in your Microsoft Entra tenant.
Note that this optional script may not work in Codespaces if your administrator has applied a Conditional Access policy to your tenant.
Once a Data Lake Storage Gen2 storage account has been setup with sample data and access control lists, prepdocs.py can be used to automatically process PDFs in the storage account and store them with their access control lists in the search index.
To run this script with a Data Lake Storage Gen2 account, first set the following environment variables:
AZURE_ADLS_GEN2_STORAGE_ACCOUNT
: Name of existing Data Lake Storage Gen2 storage account.- (Optional)
AZURE_ADLS_GEN2_FILESYSTEM
: Name of existing Data Lake Storage Gen2 filesystem / container in the storage account. If empty,gptkbcontainer
is used. - (Optional)
AZURE_ADLS_GEN2_FILESYSTEM_PATH
: Specific path in the Data Lake Storage Gen2 filesystem / container to process. Only PDFs contained in this path will be processed.
Once the environment variables are set, run the script using the following command: /scripts/prepdocs.ps1
or /scripts/prepdocs.sh
.
The following environment variables are used to setup the optional login and document level access control:
AZURE_USE_AUTHENTICATION
: Enables Entra ID login and document level access control. Set to true before runningazd up
.AZURE_ENFORCE_ACCESS_CONTROL
: Enforces Entra ID based login and document level access control on documents with access control assigned. Set to true before runningazd up
. IfAZURE_ENFORCE_ACCESS_CONTROL
is enabled andAZURE_ENABLE_UNAUTHENTICATED_ACCESS
is not enabled, then authentication is required to use the app.AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS
: Allows users to search on documents that have no access controls assignedAZURE_ENABLE_UNAUTHENTICATED_ACCESS
: Allows unauthenticated users to access the chat app, even whenAZURE_ENFORCE_ACCESS_CONTROL
is enabled.AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS
should be set to true to allow unauthenticated users to search on documents that have no access control assigned. Unauthenticated users cannot search on documents with access control assigned.AZURE_DISABLE_APP_SERVICES_AUTHENTICATION
: Disables use of built-in authentication for App Services. An authentication flow based on the MSAL SDKs is used instead. Useful when you want to provide programmatic access to the chat endpoints with authentication.AZURE_SERVER_APP_ID
: (Required) Application ID of the Microsoft Entra app for the API server.AZURE_SERVER_APP_SECRET
: Client secret used by the API server to authenticate using the Microsoft Entra server app.AZURE_CLIENT_APP_ID
: Application ID of the Microsoft Entra app for the client UI.AZURE_AUTH_TENANT_ID
: Tenant ID associated with the Microsoft Entra tenant used for login and document level access control. Defaults toAZURE_TENANT_ID
if not defined.AZURE_ADLS_GEN2_STORAGE_ACCOUNT
: (Optional) Name of existing Data Lake Storage Gen2 storage account for storing sample data with access control lists. Only used with the optional Data Lake Storage Gen2 setup and prep docs scripts.AZURE_ADLS_GEN2_FILESYSTEM
: (Optional) Name of existing Data Lake Storage Gen2 filesystem for storing sample data with access control lists. Only used with the optional Data Lake Storage Gen2 setup and prep docs scripts.AZURE_ADLS_GEN2_FILESYSTEM_PATH
: (Optional) Name of existing path in a Data Lake Storage Gen2 filesystem for storing sample data with access control lists. Only used with the optional Data Lake Storage Gen2 prep docs script.
This application uses an in-memory token cache. User sessions are only available in memory while the application is running. When the application server is restarted, all users will need to log-in again.
The following table describes the impact of the AZURE_USE_AUTHENTICATION
and AZURE_ENFORCE_ACCESS_CONTROL
variables depending on the environment you are deploying the application in:
AZURE_USE_AUTHENTICATION | AZURE_ENFORCE_ACCESS_CONTROL | Environment | Default Behavior |
---|---|---|---|
True | False | App Services | Use integrated auth Login page blocks access to app User can opt-into access control in developer settings Allows unrestricted access to sources |
True | True | App Services | Use integrated auth Login page blocks access to app User must use access control |
True | False | Local or Codespaces | Do not use integrated auth Can use app without login User can opt-into access control in developer settings Allows unrestricted access to sources |
True | True | Local or Codespaces | Do not use integrated auth Cannot use app without login Behavior is chat box is greyed out with default “Please login message” User must use login button to make chat box usable User must use access control when logged in |
False | False | All | No login or access control |
False | True | All | Invalid setting |