Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasources instantly reach status "DELETED" after creation #320

Closed
kciredor opened this issue Jul 20, 2015 · 2 comments
Closed

Datasources instantly reach status "DELETED" after creation #320

kciredor opened this issue Jul 20, 2015 · 2 comments
Labels
guidance Question that needs advice or information.

Comments

@kciredor
Copy link

Creating two datasources (one for observations, one for evaluations) with one CSV file in S3, in a for loop, for multiple datasets.

This results in "random" behaviour, where most of the time some or all of the newly created datasources instantly get the status "DELETED" after creation when looking up their status.

My code:

    for i, _ := range datasets {
        dataset := &datasets[i]

        // Create observations datasource.
        srcName := fmt.Sprintf("DS-OBS-%s", dataset.csvFn)
        srcId := fmt.Sprintf("%s-%s", srcName, core.RandSeq(8))

        input := machinelearning.CreateDataSourceFromS3Input{
            ComputeStatistics: aws.Boolean(true),
            DataSourceID:      aws.String(srcId),
            DataSourceName:    aws.String(srcName),
            DataSpec: &machinelearning.S3DataSpec{
                DataLocationS3:    aws.String("s3://input/" + csvFn),
                DataSchema:        aws.String(dataset.DataSchema),
                DataRearrangement: aws.String("{\"splitting\":{\"percentBegin\":0,\"percentEnd\":70}}"),
            },
        }

        res, err := awsClient.Ml.CreateDataSourceFromS3(&input)

        if err != nil {
            log.Fatalln(err)
        }

        dataset.ObsDataSourceID = *res.DataSourceID

        // Create evaluations datasource.
        srcName = fmt.Sprintf("DS-EVL-%s", csvFn)
        srcId = fmt.Sprintf("%s-%s", srcName, core.RandSeq(8))

        input = machinelearning.CreateDataSourceFromS3Input{
            ComputeStatistics: aws.Boolean(false),
            DataSourceID:      aws.String(srcId),
            DataSourceName:    aws.String(srcName),
            DataSpec: &machinelearning.S3DataSpec{
                DataLocationS3:    aws.String("s3://input/" + csvFn),
                DataSchema:        aws.String(dataset.DataSchema),
                DataRearrangement: aws.String("{\"splitting\":{\"percentBegin\":70,\"percentEnd\":100}}"),
            },
        }

        res, err = awsClient.Ml.CreateDataSourceFromS3(&input)

        if err != nil {
            log.Fatalln(err)
        }

        dataset.EvlDataSourceID = *res.DataSourceID
    }
@lsegal
Copy link
Contributor

lsegal commented Jul 20, 2015

@kciredor I'm not sure this would be an issue with the AWS SDK for Go since it seems like you were successfully making these calls and the service successfully initiated the creation, which implies the SDK successfully serialized the requests. The SDK performs no other operation besides the exact operation you called.

As for why the requests are being marked as DELETED immediately after creation, my best guess would be there is some type of data validation happening on the service side causing Machine Learning to remove the data source, but I could be wrong. Also, I'm not sure of all the details of your app so I could be wrong again, but if the 2nd CreateDataSourceFromS3 call relies on the presence of the first, you should note that these calls may be "eventually consistent"; in other words, it's likely that the first call has not always completed in time for the 2nd to begin-- that would certainly explain the "randomness" portion of the report. I would suggest opening a thread on the Amazon Machine Learning forums to ask about your issue, giving the input you're providing to the calls. It's likely they would be able to provide more precise feedback as to what may be happening on the service side.

As a sidenote, and I'm not sure this is relevant but I thought it might be worth pointing out: it looks like you're setting dataset.ObsDataSourceID and dataset.EvlDataSourceID values based on the returned IDs from the successful calls, but you claim to be running this over multiple datasets. Reading through the code, it looks like it will only set the last ID, since you're only storing a single ID from each of these calls. Again, I don't know the details of your app, but that may not be what you want.

@lsegal lsegal added the guidance Question that needs advice or information. label Jul 20, 2015
@kciredor
Copy link
Author

Thanks very much for your swift and kind answer @lsegal - I'll proceed and close this issue.

skotambkar pushed a commit to skotambkar/aws-sdk-go that referenced this issue May 20, 2021
Fixes the JSON unmarshaling of maps of bools. The unmarshal case was
missing the condition for bool value, in addition the bool pointer.

Fix aws#319
skotambkar pushed a commit to skotambkar/aws-sdk-go that referenced this issue May 20, 2021
Services
===
* Synced the V2 SDK with latest AWS service API definitions.
* Fixes [aws#341](aws/aws-sdk-go-v2#341)
* Fixes [aws#342](aws/aws-sdk-go-v2#342)

SDK Breaking Changes
===
* `aws`: Add default HTTP client instead of http.DefaultClient/Transport ([aws#315](aws/aws-sdk-go-v2#315))
  * Adds a new BuildableHTTPClient type to the SDK's aws package. The type uses the builder pattern with immutable changes. Modifications to the buildable client create copies of the client.  Adds a HTTPClient interface to the aws package that the SDK will use as an abstraction over the specific HTTP client implementation. The SDK will default to the BuildableHTTPClient, but a *http.Client can be also provided for custom configuration.  When the SDK's aws.Config.HTTPClient value is a BuildableHTTPClient the SDK will be able to use API client specific request timeout options.
  * Fixes [aws#279](aws/aws-sdk-go-v2#279)
  * Fixes [aws#269](aws/aws-sdk-go-v2#269)

SDK Enhancements
===
* `service/s3/s3manager`: Update S3 Upload Multipart location ([aws#324](aws/aws-sdk-go-v2#324))
  * Updates the Location returned value of S3 Upload's Multipart UploadOutput type to be consistent with single part upload URL. This update also brings the multipart upload Location inline with the S3 object URLs created by the SDK.
  * Fixes [aws#323](aws/aws-sdk-go-v2#323)
  * V2 Port [aws#2453](aws#2453)

SDK Bugs
===
* `private/model`: Handles empty map vs unset map behavior in send request ([aws#337](aws/aws-sdk-go-v2#337))
  * Updated shape marshal model to handle the empty map vs nil map behavior. Adding a test case to assert behavior when a user sends an empty map vs nil map.
  * Fix [aws#332](aws/aws-sdk-go-v2#332)
* `service/rds`: Fix presign URL for same region ([aws#331](aws/aws-sdk-go-v2#331))
  * Fixes RDS no-autopresign URL for same region issue for aws-sdk-go-v2. Solves the issue by making sure that the presigned URLs are not created, when the source and destination regions are the same. Added and updated the tests accordingly.
  * Fix [aws#271](aws/aws-sdk-go-v2#271)
* `private/protocola/json/jsonutil`: Fix Unmarshal map[string]bool ([aws#320](aws/aws-sdk-go-v2#320))
  * Fixes the JSON unmarshaling of maps of bools. The unmarshal case was missing the condition for bool value, in addition the bool pointer.
  * Fix [aws#319](aws/aws-sdk-go-v2#319)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

2 participants