-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
request to add gcp-implementation-demos to main branch #315
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Ok, the linter checks are failing. Looks like most are format-related. You can use the contributing guide for information on structure and linters used |
updated with linter to resolve formatting errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main question I have is why these two separate applications are needed when most of the codebase seems to be the same?
Also follows a very similar structure to these existing projects
- https://github.com/GoogleCloudPlatform/document-ai-samples/tree/main/community/expense-parser-python
- https://github.com/GoogleCloudPlatform/document-ai-samples/tree/main/fraud-detection-python
I think it would make more sense to create a more general system that could use a schema file or create new fields based on the entities that are extracted.
Then this could be simplified into a single codebase with other applications using the examples.
export PROJECT_ID=shared-vpc-host-254614 | ||
export BUCKET_LOCATION=us-central1 | ||
export CLOUD_FUNCTION_LOCATION=us-central1 | ||
export INGRESS_SETTINGS=internal-and-gclb | ||
export CLOUD_FUNCTION_SERVICE_ACCOUNT=tg-docai | ||
export BQ_DATASET_NAME=acme_contract_parser_results | ||
export BQ_TABLE_NAME=doc_ai_extracted_entities | ||
export COMPANY_NAME=acme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't commit project ids or other global identifiers into repositories, replace with a placeholder like YOUR_PROJECT_ID
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON, | ||
ignore_unknown_values=True, | ||
schema=[ | ||
bigquery.SchemaField("document_name", "STRING"), | ||
bigquery.SchemaField("agreement_date", "DATE"), | ||
bigquery.SchemaField("arbitration_venue", "STRING"), | ||
bigquery.SchemaField("confidentiality_clause", "STRING"), | ||
bigquery.SchemaField("effective_date", "DATE"), | ||
bigquery.SchemaField("expiration_date", "DATE"), | ||
bigquery.SchemaField("governing_law", "STRING"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to change this to use a schema.json file or something? Seems like this is less reusable
just verifying if anything is needed from my end to commit this PR |
Sorry, I didn't notice that you made an additional comment. I'd recommend combining most of these two demos into a single demo because most of the code base is exactly the same. |
holt skinner asked if I could add my repo to the main branch of document-ai-samples.