Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tpu_queued_resources_startup_script/create_network/time_bound #3907

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ compute @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/nodejs-samples-revie
iam @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/nodejs-samples-reviewers @GoogleCloudPlatform/cloud-samples-reviewers
kms @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/nodejs-samples-reviewers @GoogleCloudPlatform/cloud-samples-reviewers
orgpolicy @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/nodejs-samples-reviewers @GoogleCloudPlatform/cloud-samples-reviewers
tpu @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/nodejs-samples-reviewers @GoogleCloudPlatform/cloud-samples-reviewers
recaptcha_enterprise @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/nodejs-samples-reviewers @GoogleCloudPlatform/cloud-samples-reviewers
recaptcha_enterprise/demosite @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/recaptcha-customer-obsession-reviewers @GoogleCloudPlatform/nodejs-samples-reviewers @GoogleCloudPlatform/cloud-samples-reviewers
secret-manager @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/nodejs-samples-reviewers @GoogleCloudPlatform/cloud-samples-reviewers @GoogleCloudPlatform/cloud-secrets-team
Expand Down
123 changes: 123 additions & 0 deletions tpu/queuedResources/createQueuedResourceNetwork.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
/*
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

'use strict';

async function main(tpuClient) {
// [START tpu_queued_resources_network]
// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2alpha1;
iennae marked this conversation as resolved.
Show resolved Hide resolved
const {Node, NetworkConfig, QueuedResource} =
require('@google-cloud/tpu').protos.google.cloud.tpu.v2alpha1;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
Comment on lines +28 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider instantiating the tpuClient here since you're suggesting to import the class above. This makes the example runnable.

Suggested change
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
// Instantiate a tpuClient
const tpuClient = new TpuClient();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is commented, becuase mocked tpuClient is used for testing and I pass mock to this function. If user wants to use real TPUClient, they will need to uncomment this line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. For clarity, perhaps a comment indicating that the tpuClient instantiation is for non-testing scenarios would be beneficial. This would help users understand the purpose of the commented-out code more easily.

Suggested change
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
// Instantiate a tpuClient (uncomment for non-testing scenarios)
// const tpuClient = new TpuClient();


/**
* TODO(developer): Update/uncomment these variables before running the sample.
*/
iennae marked this conversation as resolved.
Show resolved Hide resolved
// Project ID or project number of the Google Cloud project, where you want to create queued resource.
const projectId = await tpuClient.getProjectId();

// The name of the network you want the node to connect to. The network should be assigned to your project.
const networkName = 'compute-tpu-network';

// The region of the network, that you want the node to connect to.
const region = 'us-central1';

// The name for your queued resource.
const queuedResourceName = 'queued-resource-1';

// The name for your node.
const nodeName = 'node-name-1';

// The zone in which to create the node.
// For more information about supported TPU types for specific zones,
// see https://cloud.google.com/tpu/docs/regions-zones
const zone = `${region}-a`;

// The accelerator type that specifies the version and size of the node you want to create.
// For more information about supported accelerator types for each TPU version,
// see https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#versions.
const tpuType = 'v2-8';

// Software version that specifies the version of the node runtime to install. For more information,
// see https://cloud.google.com/tpu/docs/runtimes
const tpuSoftwareVersion = 'tpu-vm-tf-2.14.1';

async function callCreateQueuedResourceNetwork() {
// Specify the network and subnetwork that you want to connect your TPU to.
const networkConfig = new NetworkConfig({
enableExternalIps: true,
network: `projects/${projectId}/global/networks/${networkName}`,
subnetwork: `projects/${projectId}/regions/${region}/subnetworks/${networkName}`,
});

// Create a node
const node = new Node({
name: nodeName,
zone,
acceleratorType: tpuType,
runtimeVersion: tpuSoftwareVersion,
networkConfig,
queuedResource: `projects/${projectId}/locations/${zone}/queuedResources/${queuedResourceName}`,
});

// Define parent for requests
const parent = `projects/${projectId}/locations/${zone}`;

// Create queued resource
const queuedResource = new QueuedResource({
name: queuedResourceName,
tpu: {
nodeSpec: [
{
parent,
node,
nodeId: nodeName,
},
],
},
});

const request = {
parent: `projects/${projectId}/locations/${zone}`,
queuedResource,
queuedResourceId: queuedResourceName,
};

const [operation] = await tpuClient.createQueuedResource(request);

// Wait for the create operation to complete.
const [response] = await operation.promise();

// You can wait until TPU Node is READY,
// and check its status using getTpuVm() from `tpu_vm_get` sample.
return response;
}
return await callCreateQueuedResourceNetwork();
// [END tpu_queued_resources_network]
}

module.exports = main;

// TODO(developer): Uncomment below lines before running the sample.
// main(...process.argv.slice(2)).catch(err => {
// console.error(err);
// process.exitCode = 1;
// });
127 changes: 127 additions & 0 deletions tpu/queuedResources/createQueuedResourceStartupScript.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
/*
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

'use strict';

async function main(tpuClient) {
// [START tpu_queued_resources_startup_script]
// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2alpha1;
Comment on lines +22 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since TpuClient is already imported in the tests, consider importing it here as well instead of relying on the parameter. This makes the code more self-contained and easier to understand.

Suggested change
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2alpha1;
const {TpuClient} = require('@google-cloud/tpu').v2alpha1;
const {Node, NetworkConfig, QueuedResource} = require('@google-cloud/tpu').protos.google.cloud.tpu.v2alpha1;

const {Node, NetworkConfig, QueuedResource} =
require('@google-cloud/tpu').protos.google.cloud.tpu.v2alpha1;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
Comment on lines +28 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider instantiating the tpuClient here since you're suggesting to import the class above. This makes the example runnable.

Suggested change
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
// Instantiate a tpuClient
const tpuClient = new TpuClient();


/**
* TODO(developer): Update/uncomment these variables before running the sample.
*/
Comment on lines +31 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the TODO with instructions on how to set up the required environment variables or configuration.

Suggested change
/**
* TODO(developer): Update/uncomment these variables before running the sample.
*/
/**
* TODO(developer): Before running the sample,
* 1. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file.
* 2. Create a network named 'compute-tpu-network' in the 'europe-west4' region.
* 3. Create a subnetwork with the same name as the network in the 'europe-west4' region.
*/

// Project ID or project number of the Google Cloud project, where you want to create queued resource.
const projectId = await tpuClient.getProjectId();

// The name of the network you want the node to connect to. The network should be assigned to your project.
const networkName = 'compute-tpu-network';

// The region of the network, that you want the node to connect to.
const region = 'us-central1';

// The name for your queued resource.
const queuedResourceName = 'queued-resource-1';

// The name for your node.
const nodeName = 'node-name-1';

// The zone in which to create the node.
// For more information about supported TPU types for specific zones,
// see https://cloud.google.com/tpu/docs/regions-zones
const zone = `${region}-a`;

// The accelerator type that specifies the version and size of the node you want to create.
// For more information about supported accelerator types for each TPU version,
// see https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#versions.
const tpuType = 'v2-8';

// Software version that specifies the version of the node runtime to install. For more information,
// see https://cloud.google.com/tpu/docs/runtimes
const tpuSoftwareVersion = 'tpu-vm-tf-2.14.1';

async function callCreateQueuedResourceStartupScript() {
// Create a node
const node = new Node({
name: nodeName,
zone,
acceleratorType: tpuType,
runtimeVersion: tpuSoftwareVersion,
// Define network
networkConfig: new NetworkConfig({
enableExternalIps: true,
network: `projects/${projectId}/global/networks/${networkName}`,
subnetwork: `projects/${projectId}/regions/${region}/subnetworks/${networkName}`,
}),
queuedResource: `projects/${projectId}/locations/${zone}/queuedResources/${queuedResourceName}`,
metadata: {
// The script updates numpy to the latest version and logs the output to a file.
'startup-script': `#!/bin/bash
echo "Hello World" > /var/log/hello.log
sudo pip3 install --upgrade numpy >> /var/log/hello.log 2>&1`,
},
});

// Define parent for requests
const parent = `projects/${projectId}/locations/${zone}`;

// Create queued resource
const queuedResource = new QueuedResource({
name: queuedResourceName,
tpu: {
nodeSpec: [
{
parent,
node,
nodeId: nodeName,
},
],
},
});

const request = {
parent: `projects/${projectId}/locations/${zone}`,
queuedResource,
queuedResourceId: queuedResourceName,
};

const [operation] = await tpuClient.createQueuedResource(request);

// Wait for the create operation to complete.
const [response] = await operation.promise();

// You can wait until TPU Node is READY,
// and check its status using getTpuVm() from `tpu_vm_get` sample.
return response;
}
return await callCreateQueuedResourceStartupScript();
// [END tpu_queued_resources_startup_script]
}

module.exports = main;

// TODO(developer): Uncomment below lines before running the sample.
// main(...process.argv.slice(2)).catch(err => {
// console.error(err);
// process.exitCode = 1;
// });
Loading
Loading