-
Notifications
You must be signed in to change notification settings - Fork 45
Multiple request at same time alway throw this error - "4 DEADLINE_EXCEEDED: Deadline exceeded" #397
Comments
@xesunny is this in a cloud function? I think you instead:
Otherwise the cloud function retracts resources once the request is served, and you may get odd behavior. |
I'm using this code in (A) CF (B) Server side application (traditional) I face this issue on both side. Temporary solution is : I have to submit my task sequentially. Avg. time to submit 1 task is 200ms to 500ms. This cause unnecessary delay of ~5 seconds. Because of this single issue, I can't use cloud task a lot. Though I love the product. |
I love tasks, but having to create tasks sequentially makes it unusable for me. I need to schedule too many tasks. I've been batching promises and waiting for them to resolve, but I still get the error. Perhaps I can tweak the batch size some, but if it becomes too slow I'll be forced to look for another solution :( |
@bcoe - We're getting this error too. |
@yossi-eynav @isaiah-coleman I'm not sure what the hard limit is on concurrently creating tasks, but I wouldn't be shocked if one exists. My recommendation would be, if it's not reasonable to enqueue them sequentially, do something like this: const WORK_SIZE = 16; // or 32, etc.
while (work.length) {
const chunkOfWork = work.splice(0, WORK_SIZE);
await Promiser.all(chunkOfWork.map((chunk) => {
return enqueueTask(chunk);
});
} ☝️ this allows you to manage the number of tasks enqueued concurrently, without doing so completely sequentially. |
I'm getting this error dispatching a single task. |
@ChrisBeeson are you in a similar position as @xesunny describes, in which you enqueue a single task, but there may be multiple users causing this action concurrently. |
@bcoe no there are no other users |
@ChrisBeeson to me this sounds like a different issue than @xesunny and others are describing, which seems to be related to concurrency limits. I think it might be worth opening a new issue, so that I can help debug (I believe something else is happening to you). |
@bcoe thanks! |
@yossi-eynav could you please share an example of the code you are running, and let me know what environment you're running in, i.e., Cloud Functions? @xesunny reading your original message, the error is happening on your local OSX machine? |
@bcoe I'm running on GKE |
I had the same problem, I think it's because httpRequest.body is expecting bytes, not a string. Try changing: |
@bcoe
|
@yossi-eynav if you look, a lot of see: grpc/grpc-node#1158 it might help |
@bcoe @isaiah-coleman it seems that the problem is with |
@yossi-eynav I'm not sure I can help you. I'm not a contributor on the library, merely just sharing what seems to have helped others. Can you explain how you used the |
@yossi-eynav Setting Would it be possible for you to measure the time - how long does it take for it to fail with |
Yes, It's happening on my Mac OSX machine but I have got this error on "Google Cloud instance" & "Google Cloud run" as well. |
Can you give an example on how to use "fallback: true" ? I don't have much experience with gprc. |
@alexander-fenster |
|
Are the requests over HTTP/1 regional? Electing for { falback: true } gives me the following
I know this is the correct region, when i switch out the region I receive the following:
For due diligence this is me checking the ListLocations
I'm grateful for any help that can be provided. |
@isaiah-coleman i'm on us-center1 |
Watching here as I am seeing the exact same issue that @liorschwimmer saw with Could also be user error... haven't ruled that out yet. |
Has anyone solved the issue? I facing the same issue. I need to create 500-600 tasks when a user added. I am using a loop to create it. But facing an error. |
Internal tracking number: 186681285 |
We had the same issue come up in the past 2-4 days. Any updates on this? This is affecting us in production, and we can't deliver on our SLOs because of this. If there's no resolution (this issue is open for over a year now), then the only option would be move to a different infrastructure/cloud provider, any recommendations? |
Not sure how the internal tracking number helps us sadly, @sofisl. Are there any details from that you can share publicly? To echo the comment above, at a certain point one has to wonder just how important customers facing issues with core libs on GCP are to Google (this being unsolved and sparsely updated for over a year now), and if this would happen on AWS (seriously doubt it). Get you all are busy with competing priorities and this may not be be a quick fix, but in the meantime we’re held hostage and holding the bag. Blunt updates welcomed! |
@shayne-lp @superzadeh, since @sofisl opened the ticket, it has been assigned to an engineer and is being investigated. I'm sorry about how long this issue has been affecting folks, and will keep sharing information with the product team. |
Any updates @bcoe or @sofisl? or a workaround maybe? This is really starting to become a problem in our production environment; and I'm also starting to see this in Firestore (via the NPM package
This all looks like I'm running some code that is not production ready/maybe there's an earlier version of NPM packages that are more stable/battle tested? |
@bcoe Appreciate if we can get any quick help on this one. We are using GCP heavily for our infrastructure. Currently, in a new microservice hosted on GCP Kubernetes, we are getting this error so many times. Earlier, it was coming for firestore, google cloud storage, and task requests, we tried everything updated all lib. version, node version to 14.x. But currently, we are seeing this error for the task request these are a number for this service.
Cloud Task got created successfully:- 131,148 In case you need anything from our side let me know I can provide many examples, traces, and project ID if required. This is the only error preventing us right now to enable this feature for everyone and affecting us seriously on production. We have P1 support as well but the reason to post it here is in case I can provide some quick trace or detail if you guys need anything from us check this in detail. |
Very similar symptoms to you @nikkanetiya and unreliability of most services we use on GCP that rely on GRPC (so, most of them actually). We also ran into https://issuetracker.google.com/issues/158014637?pli=1 last year (which is still unresolved!) and already spent an unnecessary amount of time migrating our code from a Cloud Functions to AppEngine (which was the only option we had with the time we had). I have the impression that the GRPC/core libs that are used by GCP services are fundamentally broken. We already spent too much time trying to find workarounds, and the fact that these issues stay open for over a year made us take the decision to move away all our stuff back to AWS (we used it for most of our products, and tried GCP/Firebase for a new mobile product ==> we'll migrate it to AWS). It is a significant time investment (again), coming at the worse timing possible (we're fundraising in 3 months), and I would have really preferred to find another resolution. This stuff just doesn't happen on AWS (not affiliated to them, can't say for Azure or others providers but we're having a much better experience with AWS), and even if it does, you get daily updates on the status of the issue/investigation, and resolution happens much faster. I am mostly sharing this here for transparency towards the community who might have to make hard decisions in order to deliver the bare minimum of UX to their users by providing a reliable infrastructure for their services. |
See other replies below for better solutions Might be helpful for some. I had this issue with creating multiple tasks in the queue and it looks like I wasn't handling the promises properly. Solution: import {createTask} from 'wherever/your/cloud-task-function/is';
let arr = [item];
let promises = [];
arr.forEach((item)=>{
const {prop1, prop2, propN...} = item;
promises.push(createTask({prop1, prop2, propN...));
}
return Promise.all(promises).then(()=>{
return res.status(201).send('Task Generator Initialized');
}).catch((err)=>{
return res.status(500).send(err.message);
} returning Promise.all is letting me schedule all of them instead of getting the deadline exceeded error, I've also set the function timeout to the 9 minute max since my needs require an unknown number of task creations |
@nikkanetiya @superzadeh are you scheduling many tasks concurrently, if you perform If you have many tasks to enqueue in on job, an approach I take is as follows: const tasksToCreate = [...]; // an array with a large number of tasks to enqueue.
const WORK_SIZE = 32; // some size of work to perform.
while (work.length > 0) {
const work = tasksToCreate.slice(0, WORK_SIZE).map(() => {
return createTaskPromise();
});
await Promise.all(work);
} |
@xesunny I believe this approach, in my last comment, of chunking up work will likely work for your use case too. You do not need to perform one task at a time, but you should pick some upper bound on the amount of work that is processed concurrently, and you should await completion, otherwise you can end up with an unhandled rejection. |
Thanks for the reply 🙏 That is what we're doing, except we use batch sizes of 100. I'll try to tune it down to 32, but overall this feels pretty limited if even a batch size of 100 causes contention on the system. What I've typically found when using similar queueing service is the ability to queue batches: the client sends X amount of tasks to queue, and the contention/batch queue is managed by the system itself (in this case, Cloud Task). This feels like a critical feature missing, that is now being thrown at users to "deal with it". How do you recommend scaling this approach? We have larger batches (currently up to 20k, but we plan to scale it to at least 10x, maybe 50x towards December this year), and queuing them "32 by 32" is going to be too slow. We are currently queueing the tasks through a fan-out Cloud Scheduler job, calling AppEngine which is creating the tasks. I fear that with a batch size of 32, the AppEngine request will timeout before finishing to creating all the tasks. Any ideas? |
@bcoe Thanks for your reply. We are using this with worker listening to the messages from the PubSub queue, so we are not always adding 100s task queue at the same time concurrently so we will not able to chuck it and honestly, I think currently a single pod is only processing 100 messages at a time (not all message will create up the create task). And Just to give you some more context let's say if we will always get |
@nikkanetiya @superzadeh have you tried tuning timeout in call options; const call_options = {
timeout: 200000 // millis
}
const response = await client.createTask(request, call_options); I believe this can be overridden on a request by request basis, and combined with chunking up your work, you may find that you avoid the DEADLINE_EXCEEDED issue. |
I've been calling this method explicitly after task creation (e.g., using |
I will give this a try @bcoe, but this raises even more concerns regarding scalability: if we are only creating tasks 32 at a time, with a timeout of up to Also, if you have any status updates on the internal tracking number |
We gave this a shot @bcoe, and when we add the Here's the TS definition I have for call options (in export interface CallOptions {
timeout?: number;
retry?: Partial<RetryOptions> | null;
autoPaginate?: boolean;
pageToken?: string;
pageSize?: number;
maxResults?: number;
maxRetries?: number;
otherArgs?: {
[index: string]: any;
};
bundleOptions?: BundleOptions | null;
isBundling?: boolean;
longrunning?: BackoffSettings;
apiName?: string;
retryRequestOptions?: RetryRequestOptions;
} Here's how we create the tasks, do you see any reason why they end up being constructor() {
// We only create the task client once
const client = new CloudTasksClient();
const project = process.env.GCLOUD_PROJECT!;
const location = process.env.QUEUE_LOCATION;
this.taskClient = client
}
public async queueTask = (uri: string, queueName: string) => {
// we reuse the task client
const parent = this.taskClient.queuePath(project, location, queueName);
const task: protos.google.cloud.tasks.v2.ITask = {
appEngineHttpRequest: {
httpMethod: "GET",
relativeUri: uri,
},
};
// Send create task request.
const request: protos.google.cloud.tasks.v2.ICreateTaskRequest = {
parent,
task,
};
// Creating the task like this results in the task sent to our AppEngine as POST ❌
const callOptions = {
timeout: 200000, // millis
}
const [response] = await this.taskClient.createTask(request, callOptions)
// Creating the task like this results in the task sent to our AppEngine as GET ✅
const [response] = await this.taskClient.createTask(request)
//...
} |
I'm really sorry if I come off as being pushy, but we're still having this error in production. Any updates you can share? Our leadership team will kick off a migration to AWS if I come back to them saying there's still no updates on this. |
@superzadeh EDIT: by having this separate ticket to point people towards, I can more easily draw attention to other people internally. |
@xesunny I'm doing some cleanup up issues, and am closing this as I haven't heard back from you in a while. If folks are continuing to bump into problems, please don't hesitate to open a new issue, including a code snippet, and your gcloud project ID. |
@shayne-lp if you open a new issue with a code snippet that demonstrates the issue you're bumping into, along with your project ID, we can share it with the internal engineering team. If you don't feel comfortable sharing a project ID on GitHub, you can send an email to |
People who are still running into this issue, consider the workaround mentioned by @yossi-eynav .
This is what you will be enabling: #397 (comment) Currently, we're enqueuing batches of 500 tasks concurrently without any problem 👍 |
Environment details
@google-cloud/tasks
version: 1.7.2Steps to reproduce
Thanks!
The text was updated successfully, but these errors were encountered: