Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministc deploy bug with 5+ different non-descript failures on two different OS's, worked before, support stumped #1306

Open
ZirconCode opened this issue Aug 18, 2023 · 12 comments
Assignees

Comments

@ZirconCode
Copy link

ZirconCode commented Aug 18, 2023

Unsure how to title this. It's been a week of debugging and isolating with no results.
Is this issue in the correct repo? Also unsure. Help me out here.
I'll walk you through my journey.

I have a very large code base, it runs flawlessly locally. I've deployed it just fine for a long time until recently.
I made a change, including google-cloud-texttospeech in requirements.txt.
It stopped deploying after this (maybe relevant, maybe not).
Removing this change, to the exact same code base as before, still fails to deploy.

Some errors I get at random, and I've tried incredibly hard to isolate.
Both development environments can deploy other things successfully, and push to a new app fine as well.
I have not made any local environment changes at all when it began to fail.

Details:
Azure Functions runtime version 4.24.4.4, Linux premium plan with Elastic Premium EP3.
I deploy directly from visual studio code azure function extension.
My python code follows the folder structure of the v1 coding model (no decorators, lots of folders), however I am on v2 (host.json etc.), this has always worked, and runs locally of course.
Deploying to python 3.9.7.

First development environment:
Manjaro linux, vscode 1.81.1, azure extension 1.12.3

Errors I've encountered seemingly at random, when trying to deploy:

2:50:41 PM debugApp123: **Deployment successful**. deployer = ms-azuretools-vscode deploymentPath = Functions App ZipDeploy. Extract zip. Remote build.
2:51:07 PM debugApp123: Syncing triggers...
2:51:12 PM debugApp123: Querying triggers...
2:51:18 PM debugApp123: **No HTTP triggers found.**
3:34:11 PM debugApp123: Deployment Failed. deployer = ms-azuretools-vscode deploymentPath = Functions App ZipDeploy. Extract zip. Remote build.
3:34:25 PM debugApp123: Deployment failed.
4:00:49 PM: Error: The operation was aborted.
10:52:10 AM ae-API-compute: Deployment successful. deployer = ms-azuretools-vscode deploymentPath = Functions App ZipDeploy. Extract zip. Remote build.
10:52:42 AM ae-API-compute: Syncing triggers...
10:52:49 AM ae-API-compute: Querying triggers...
10:52:52 AM ae-API-compute: WARNING: Some http trigger urls cannot be displayed in the output window because they require an authentication token. Instead, you may copy them from the Azure Functions explorer.

Except it was not successful and functions are empty / don't run (the zip file in webjobstorage contains them thought).

11:55:04 AM debugApp123: Syncing triggers...
11:55:45 AM debugApp123: Syncing triggers (Attempt 2/6)...
11:55:56 AM debugApp123: Syncing triggers (Attempt 3/6)...
11:56:17 AM debugApp123: Syncing triggers (Attempt 4/6)...
11:56:59 AM debugApp123: Syncing triggers (Attempt 5/6)...
11:58:20 AM debugApp123: Syncing triggers (Attempt 6/6)...
11:59:12 AM: Error: Encountered an error (ServiceUnavailable) from host runtime.

And a very exciting shiny rare one:

Offset to Central Directory cannot be held in an Int64.

Second development environment:
Windows, visual studio code, azure extension v1.12.4.
I've also used this environment before, and it has worked, same as the above one.

Errors:

One successful deploy, and randomly most of the above (with no changes), as well as a new one:

10:35:34 AM ae-API-compute: Starting deployment...
10:35:34 AM ae-API-compute: Creating zip package...
10:45:11 AM: Error: socket hang up

Some other random things I've tried:

  • different development environments with different OS's, cloning the repo fresh
  • rolling back azure function extension in visual studio code to various versions
  • protobuf==3.20.*
  • setting AzureWebJobsFeatureFlags:EnableWorkerIndexing
  • setting SCM_DO_BUILD_DURING_DEPLOYMENT:1
  • removing type hints
  • restarting

What now?
The errors are not helpful. Logs are missing, I can drill down into events, insights, many different logs, they are all over the place and they are all useless or empty. I've mentioned that I've had multiple 2hr+ calls with the technical support team of various ever-increasing escalations, and they are just as stumped as me (and I am grateful for their efforts).

Any thoughts?

What do I try next? Any information I can provide?

See also (maybe relevant, I don't know at this point):
microsoft/vscode-azurefunctions#3805
microsoft/vscode-azurefunctions#2529
microsoft/Oryx#1774
https://stackoverflow.com/questions/76478668/adding-python-module-google-cloud-storage-is-causing-a-working-azure-function-ap
https://stackoverflow.com/questions/72441758/typeerror-descriptors-cannot-not-be-created-directly
projectkudu/kudu#3348
microsoft/azure-pipelines-tasks#14201
microsoft/vscode-azurefunctions#2529
microsoft/Oryx#1774

@ZirconCode ZirconCode changed the title Deploy Bug with 5+ different non-descript failures on two different OS's in code base which worked before, support calls also stumped (deploy?) Bug with 5+ different non-descript failures on two different OS's, worked before, support stumped Aug 18, 2023
@ZirconCode ZirconCode changed the title (deploy?) Bug with 5+ different non-descript failures on two different OS's, worked before, support stumped Nondeterministc deploy bug with 5+ different non-descript failures on two different OS's, worked before, support stumped Aug 18, 2023
@bhagyshricompany bhagyshricompany self-assigned this Aug 21, 2023
@bhagyshricompany
Copy link

bhagyshricompany commented Aug 21, 2023

pls share the function name,app name,instance id,timestamp ,region etc.

@ZirconCode
Copy link
Author

function name: all, since the deploy doesn't work
app name: as in the logs above, ae-API-compute, but also debugApp123
instance id: azure function instance id? (i.e. ExecutionContext.InvocationId?), not relevant since it is a deploy error
timestamp: for some examples see logs above
region: west europe

etc.: I also wish I could provide the relevant information to isolate the error, however the error messages have not allowed me to do so.

@ZirconCode
Copy link
Author

Setting PYTHON_ISOLATE_WORKER_DEPENDENCIES:1 also does not resolve the issue.

@bhagyshricompany
Copy link

@ZirconCode
Copy link
Author

As mentioned previously, I am already in contact with azure support.

@ZirconCode
Copy link
Author

ZirconCode commented Aug 24, 2023

Tried:

  • upping the "functionTimeout": "00:10:00", in host.json, since the randomness made me think of timeout / resource issues. No changes in behavior.

Ran into two new bugs while isolating:

  • There is a warning about it in the documentation here , but it is not possible to disable functions in the Linux azure functions environment if the function contains a '-'. There should be a warning about this extreme idiosyncrasy when creating a function in azure, or at least during deployment, or even when running locally. Filed a bug report .
  • A bug during zip creation, as referenced in many places: here, here, and finally forwarded to upstream here, where the issue is closed (but seems unresolved to some people) and the environment should be irrelevant to mine (this occurs in a manjaro python project for contrast). Log below.
11:35:41 AM ae-api-compute-secondary: Starting deployment...
11:35:41 AM ae-api-compute-secondary: Creating zip package...
11:46:39 AM: Error: socket hang up

I've been using .funcignore and disabling functions to try to isolate the problem within the scope of my larger project, since it was not possible to do so from a clean project upwards. Both these things seem to invite a host of new issues.

The combination of all these issues makes it impossible to work. It is very disappointing.

@ZirconCode
Copy link
Author

ZirconCode commented Aug 24, 2023

I have isolated and reproduced reliably one of the vague errors listed above:

4:00:49 PM: Error: The operation was aborted.

This specific reproducible isolated case was fixable with:

  • PYTHON_ISOLATE_WORKER_DEPENDENCIES:1 (couldn't find documentation to link to)
  • One azure function contained a local import file name with the same name as another file which was imported by another function through importing a local package. This was not an issue before. Assuming this may have something to do with v1 or v2 programming model folder structure and probably recent undocumented changes to behavior in this direction.

Also a note that all my logfiles are still non-existent on failed deployment. This shouldn't be the case.

@ZirconCode
Copy link
Author

So, the above error

2:39:54 PM ae-api-compute-secondary: Writing the artifacts to a Zip file
2:40:18 PM: Error: The operation was aborted.

came back when including further pieces of my project.

I isolated it to the line import tempfile. For some reason this causes the error. It worked previously.
This also causes the abort when I have it in a default httptrigger template function by itself.

@ZirconCode
Copy link
Author

I have discovered a new non-reproducible randomly appearing bug:

2:57:46 PM ae-api-compute-secondary: Starting deployment...
2:57:46 PM ae-api-compute-secondary: Creating zip package...
2:58:02 PM ae-api-compute-secondary: Zip package size: 180 MB
2:58:04 PM ae-api-compute-secondary: Fetching changes.
2:58:06 PM ae-api-compute-secondary: Cleaning up temp folders from previous zip deployments and extracting pushed zip file /tmp/zipdeploy/1455b335-1904-468e-8ed3-3384bf99dbe4.zip (0.00 MB) to /tmp/zipdeploy/extracted
2:58:06 PM ae-api-compute-secondary: Central Directory corrupt.
2:58:13 PM ae-api-compute-secondary: Deployment failed.

I'm not even going to try to figure that one out.

I isolated the next reason for abort to including openai in requirements.txt (no importing). During the pip install the there are no errors and deploy seems to be satisfied, however it aborts at the end.

@ZirconCode
Copy link
Author

So, I have solved the deployment issue as a final step, at least for me, by using a specific AUR and deploying from the terminal instead of the azure extension:

  1. Arch repo for working azure cli (currently): https://aur.archlinux.org/packages/azure-cli
  2. azure login
  3. func azure functionapp publish appName --slot slotName

Interestingly, the deployment zip is around 200mb smaller, though both do a remote build.

I will keep this issue open because I think it highlights the need for better/existent logging and error feedback in many cases. The above combination of steps got my project deployable again, however I will likely never know what was broken, and why it happened without my agency, and good luck to anyone with similar issues.

@jannikmi
Copy link

jannikmi commented Oct 23, 2023

In case it helps others: Check the environment variable names you are using. They might conflict with Azure specific variable names and thereby cause errors. In my case no http triggers were found (failing silently), because of the the environment variable CONTAINER_NAME.

Very annoying and impossible to debug. Please add verbose error messages to the deployment output!

@lucazav
Copy link

lucazav commented Dec 17, 2023

For almost a year now, there has been such a bug in Azure Functions for Python that does not allow them to be used profitably, and which consists in displaying the message "No HTTP triggers found" at the end of deployment from VS Code, despite the fact that the function code works correctly.

At this link you will find the desperate attempt of developers to report the anomaly in an issue in the GitHub repository of Oryx:
microsoft/Oryx#1774

Rightly, Paul Dorsh responds after some reporting on this issue that the problem is not with Oryx (used to build the code), but the problem is in the deployment part to the Azure function:
microsoft/Oryx#1774 (comment)

Paul points to the vscode-azurefunction repo, where someone takes the report and fixes the bug, but only for Node.js, not Python:
microsoft/vscode-azurefunctions#3805

So much so that Simon from Zirconcode is urged to open this issue in the azure-functions-python-worker repo:
#1306. It was opened in mid-August, and as of today (mid-December) has still not been fixed, despite the severity of the bug.

One of the latest report, which always came in the first issue thread on Oryx, reports the following:

This bug forced me to remake the entire app for AWS Lambda. Nothing in this thread worked sadly.
I prefer Azure but what can we do when something is just completely broken with no concrete solution.

(from microsoft/Oryx#1774 (comment))

I myself had to implement a feature for a major customer using an Azure function with Python, and I sweated my nuts off trying to figure out what was wrong (from 1 day of implementation, it took me 5!).

Is it possible that this serious bug cannot be addressed in a reasonable time?

@vrdmr vrdmr assigned vrdmr and unassigned bhagyshricompany Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants