Refactor codebase into classes #118

kouloumos · 2023-11-02T19:34:56Z

All the logic has been moved and slightly refactored into separate
classes to remove duplicate code, scattered logic and achieve better
readability and maintanability of the codebase by having a clear flow
of the process.

An overview of the classes:

Transcription is the main class that contains Transcripts.
Each Transcript holds a Source which we want to transcribe,
and it is either Audio, Video or Playlist.

How the flow looks like:

We initialize a Transcription object that holds all the
related configurations for the current transcription process
We can add as many sources as we want to the current transcription
with transcription.add_transcription_source(...)
when we are ready, we transcription.start(), which:
- produces an audio file by processing the source. This step is
  responsible for any downloads or conversions that needs to happen.
- produces the transcription by processing the audio file. This step
  includes any summarizations, chapter generations, diarization that we
  might have configure.
  It can optionally:
  - write the transcription to a markdown file.
  - open a PR to the repo.
  - upload the transcription to an AWS S3 Bucket.
  - push the transcript to a Queuer backend.
  - write the payload of the transcription to a json file.

Also, updated the tests based on the new flow of the refactored codebase.
The Deepgram related methods that are still in application.py can later
become their own class as we start adding more services.

All the logic has been moved and slightly refactored into separate classes for better readability and maintanability of the codebase. (redundant code will be removed with the next commit alongside tests) An overview of the classes: - `Transcription` is the main class that contains `Transcripts`. - Each `Transcript` holds a `Source` which we want to transcribe and it is either `Audio` or `Video`. How the flow looks like: - We initialize a `Transcription` object that holds all the related configurations for the current transcription process - We can add as many sources as we want to the current `transcription` with `transcription.add_transcription_source(...)` - when we are ready, we `transcription.start()`, which: - produces an audio file by processing the source. This step is responsible for any downloads or conversions that needs to happen. - produces the transcription by processing the audio file. This step includes any summarizations, chapter generations, diarization that we might have configure. It can optionally: - write the transcription to a markdown file. - open a PR to the repo. - upload the transcription to an AWS S3 Bucket. - push the transcript to a Queuer backend. - write the payload of the transcription to a json file.

- The code removed from `application.py` as part of this commit, has already been moved and slightly refactored as part of the prev commit. - update tests to work based on the new flow of the refactored codebase

- move logging configuration in a separate module - alongside console logging, always log to a file in the workdir

- `tags`, `speakers` and `category` must now be used one time per each item (tag, speaker, category) that we want to add to the metadata of the transcript - Better wording for the help text of cli options - Update README

kouloumos force-pushed the refactor branch 7 times, most recently from 27a18eb to 9f9e447 Compare November 8, 2023 14:36

kouloumos mentioned this pull request Nov 9, 2023

As an admin, I can manually upload a transcript to the queue bitcointranscripts/transcription-review-front-end#93

Open

kouloumos force-pushed the refactor branch 5 times, most recently from 62ec45c to 0c30d15 Compare November 16, 2023 06:45

kouloumos added 4 commits November 16, 2023 12:44

remove redundant code and update tests

96f60df

- The code removed from `application.py` as part of this commit, has already been moved and slightly refactored as part of the prev commit. - update tests to work based on the new flow of the refactored codebase

configure logging in separate module

128d295

- move logging configuration in a separate module - alongside console logging, always log to a file in the workdir

configuration changes for cli options & update README

18ba856

- `tags`, `speakers` and `category` must now be used one time per each item (tag, speaker, category) that we want to add to the metadata of the transcript - Better wording for the help text of cli options - Update README

kouloumos force-pushed the refactor branch from 0c30d15 to 18ba856 Compare November 16, 2023 12:45

kouloumos merged commit 0edce78 into bitcointranscripts:main Nov 16, 2023
1 check passed

kouloumos mentioned this pull request Apr 15, 2024

chore: refactoring of the codebase is needed #99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor codebase into classes #118

Refactor codebase into classes #118

kouloumos commented Nov 2, 2023 •

edited

Loading

Refactor codebase into classes #118

Refactor codebase into classes #118

Conversation

kouloumos commented Nov 2, 2023 • edited Loading

kouloumos commented Nov 2, 2023 •

edited

Loading