Skip to content

v1.0

Compare
Choose a tag to compare
@lorischl-otter lorischl-otter released this 12 Oct 22:43
· 99 commits to main since this release
75fdd8f

Data Science Team’s MVP for Story Squad Release Canvas 2

Functional features by category:

Transcription and Moderation

  • Transcription
    • Google Cloud Vision OCR (#10)
      • Connects to Google Cloud Vision API and uses their Optical Character Recognition model to transcribe the handwritten stories uploaded by users.
    • Low confidence flag (#20)
      • During transcription, returns Google Cloud Vision’s confidence in each transcribed submission. Raises a flag if the transcription confidence is below 85% signifying poor image or handwriting quality and consequently possibly inaccurate evaluation metrics.
  • Text Moderation
    • Bad/Inappropriate words filter (#31)
      • Added method into Google API service that checks the word tokens against a list of words that are known to be inappropriate.
  • Image Moderation
    • Safe Search (#10)
      • Connects to Google Cloud Vision API and utilizes their built-in Safe Search service to flag if a user’s uploaded illustration has racy, adult, or violent content.

Complexity Analysis

  • Complexity Metric - “Squad Score” (#18)
    • Cleans transcribed text and returns a custom complexity score
    • This baseline implementation includes four features generated only with Python and Pandas. It is intended to be iterated upon.
    • Given the limited amount of labels to train a model/formula toward, this formula only utilizes features that are representative of features seen in validated complexity models or requested by stakeholder and in form that is least susceptible to errors in child writing/handwriting and transcription. (i.e. using characters for length metric rather than syllables or words)
    • Features:
      • sl: story length (in characters)
      • awl: average word length (in characters)
      • qn: quotes number
      • uw: unique words count (over two characters)
    • Weights:
      • Squad Score is initiated with only weights of 1 for each feature, as there were not enough labels on the data to be able to tune weights in a generalizable way.
      • There is also a standardized “range scaler” of 30, meant to bring the overall Squad Score up to a closer range of 0-100, purely for ease of metric reading.
    • Formula: sl(1)(30) + awl(1)(30) + qn(1)(30) + uw(1)(30)
    • Range: the score bottoms out at 0, but does not have a bounded upper range
    • Metrics:
      • The only labels available at the time of this development were a 1-25 ranking of 25 of the training set stories. Applying this Squad Score formula to these 25 stories resulted in a -.60 correlation coefficient of scores to rankings.

Deployment

  • API Endpoints
    • Submission/text (#19)
      • REST API endpoint that transcribes and computes squad score of submission then returns that information to the web backend.
    • Submission/illustration (#19)
      • REST API endpoint that submits the illustration to the Google Vision API: Safe Search service to flag inappropriate content in user submitted content.
  • GitHub Actions
    • Docker image action (#30)
      • Builds and pushes the project Docker image to the DockerHub container registry via GitHub actions.
    • Code Climate upload action (#25)
      • Collects coverage report statistics about the project and uploads those statistics to Code Climate via GitHub actions.
  • Header Security Token Checking
    • AuthRouteHandler (#27)
      • Feature that checks request’s headers against a known security token to allow access to API endpoints.