Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request to add gcp-implementation-demos to main branch #315

Closed
wants to merge 5 commits into from

Conversation

tgaillard1
Copy link

holt skinner asked if I could add my repo to the main branch of document-ai-samples.

@tgaillard1 tgaillard1 requested a review from a team as a code owner February 3, 2023 19:57
@google-cla
Copy link

google-cla bot commented Feb 3, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@holtskinner
Copy link
Collaborator

Ok, the linter checks are failing. Looks like most are format-related.

You can use the contributing guide for information on structure and linters used
https://github.com/GoogleCloudPlatform/document-ai-samples/blob/main/.github/CONTRIBUTING.md#code-quality-checks

@tgaillard1
Copy link
Author

updated with linter to resolve formatting errors.

Copy link
Collaborator

@holtskinner holtskinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main question I have is why these two separate applications are needed when most of the codebase seems to be the same?

Also follows a very similar structure to these existing projects

I think it would make more sense to create a more general system that could use a schema file or create new fields based on the entities that are extracted.

Then this could be simplified into a single codebase with other applications using the examples.

Comment on lines +1 to +8
export PROJECT_ID=shared-vpc-host-254614
export BUCKET_LOCATION=us-central1
export CLOUD_FUNCTION_LOCATION=us-central1
export INGRESS_SETTINGS=internal-and-gclb
export CLOUD_FUNCTION_SERVICE_ACCOUNT=tg-docai
export BQ_DATASET_NAME=acme_contract_parser_results
export BQ_TABLE_NAME=doc_ai_extracted_entities
export COMPANY_NAME=acme
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't commit project ids or other global identifiers into repositories, replace with a placeholder like YOUR_PROJECT_ID

Comment on lines +57 to +66
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
ignore_unknown_values=True,
schema=[
bigquery.SchemaField("document_name", "STRING"),
bigquery.SchemaField("agreement_date", "DATE"),
bigquery.SchemaField("arbitration_venue", "STRING"),
bigquery.SchemaField("confidentiality_clause", "STRING"),
bigquery.SchemaField("effective_date", "DATE"),
bigquery.SchemaField("expiration_date", "DATE"),
bigquery.SchemaField("governing_law", "STRING"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to change this to use a schema.json file or something? Seems like this is less reusable

@tgaillard1
Copy link
Author

just verifying if anything is needed from my end to commit this PR

@holtskinner
Copy link
Collaborator

Sorry, I didn't notice that you made an additional comment. I'd recommend combining most of these two demos into a single demo because most of the code base is exactly the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants