Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update modjo import script #33

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Update modjo import script #33

wants to merge 11 commits into from

Conversation

stang
Copy link
Contributor

@stang stang commented Oct 14, 2024

What

We wanted to import Modjo transcripts in Dust and saw this existing script. However, it didn't entirely fit our needs, so we made a few improvements.

This PR includes small fixes:

  • chore(modjo): update Dust datasource endpoint
  • fix(modjo): correctly render AI Summary in Dust document
  • chore(modjo): Use new 'Highlights' API field

Along with additional features:

  • feat(modjo): make recording URL optional
  • feat(modjo): add option to skip ingestion of contacts' details
  • feat(modjo): render conversation tags to dust documents
  • chore(modjo): specify source_url when creating Dust documents

Notes for reviewers

@albandum, the TRANSCRIPTS_SINCE was hardcoded within the script itself, which wasn't convenient if we wanted to tweak its value in the context of a scheduled job.
We made it configurable through environment variables, and introduced a small breaking change: by default, the script would ingest every transcript since yesterday, not since 2024-01-01.

I guess it is a matter of preference, so I'll understand if you prefer to keep the existing behavior: I can amend the PR.

  • BREAKING CHANGE - chore(modjo): by default, fetch transcripts since yesterday
  • feat(modjo): make the TRANSCRIPTS_SINCE configurable via env var

Dust has recently changed the endpoint for upserting
documents to the datasource.

This commit uses the new endpoint.

See: https://docs.dust.tt/reference/post_api-v1-w-wid-vaults-vid-data-sources-dsid-documents-documentid
The first time I ran the script, I didn't realize
the `TRANSCRIPTS_SINCE` configuration was a constant
set directly in the script.
Previously, the `TRANSCRIPTS_SINCE` settings was
set directly in the script.

This is not convenient in case you want to schedule
this script on a regular basis.

This commit makes the `TRANSCRIPTS_SINCE` setting
configurable via an environment variable, while
preserving the initial behavior.
…esterday

Previously, if `TRANSCRIPTS_SINCE` was not set,
the default behavior was to fetch all transcripts
starting from 2024-01-01.

This commit changes the default behavior to fetch
transcripts starting from yesterday.

Although this is a breaking change, it is
reasonable to think this won't hurt anyone.
That would allow Dust to link to a specific transcript
in Modjo if needed.
Previously, the `AI Summary` field was not correctly
rendered in the Dust document.

It was displaying:

```
AI Summary: [object Object]
```

Now it displays the actual `content` of the AI Summary.
The 'AI Summary' field is some multiline markdown text.

All 'Speakers', 'AI Summary' and 'Transcript' sections
are multiline blocks, so I'm using markdown section headers
to denote them.
The 'AI Summary' field is deprecated, in favor of
the 'Highlights' field.

See: https://api.modjo.ai/v1#tag/calls/operation/export-calls
In some cases, we might not want to ingest contact
details in Dust documents.

This commit adds an option for this.

If not set, the default behavior is to include
contact details, so we preserve the previous behavior.
This gives the user the option to skip ingesting
the recording URL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant