Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added example: meeting conversations extractor #1009

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Ashish-Abraham
Copy link
Contributor

@Ashish-Abraham Ashish-Abraham commented Nov 7, 2024

Context

Valuable business insights are often hidden in daily conversations across organizations, from customer interactions to internal meetings. Had an idea to develop something using Indexify that helps extract and utilize this data effectively.

Here I have added an example use-case. This is a conversation extractor that uses a custom indexify extraction graph to extract summarized data in a structured format from long meeting audio files.

What

The extractor workflow is as given:

  1. Audio Processing:

    • Transcription: Converts speech to text using Faster Whisper.
    • Meeting Classification Router: Uses LLM to determine the type of meeting and routes control to corresponding node of compute graph.
  2. Content Analysis:
    Based on the meeting type classification, the system generates structured summaries:

    • Strategy Meetings: Key decisions, action items, and strategic initiatives
    • Sales/Marketing/Product Calls: Customer details, pain points, and next steps
    • R&D Brainstorms: Innovative ideas, technical challenges, resource requirements, and potential impacts

You can tweak the fields to extract whatever data needed.

Sample Outputs

2024-11-10 22:41:27,465 - INFO - Transcription Classification: sales-call
2024-11-10 22:41:27,471 - INFO - 
Extracted information:
Meeting ID: RD-20241110-224127
Date: 2024-11-10 22:41:27
Duration: 538 seconds
Participants: None
Meeting Type: Sales Call
Customer Pain Points: []
Proposed Solutions: ['Having fiber connection, faster internet speeds for 4K streaming (up to 800 megabits), 100 megabits for only $25 with free installation fee']
Objections: []
Next Steps: ['(Empty response for 1 and 3. Added some data in the 2 and 4 category based on provided transcript response)', "Candice will call back at 7 in the evening and after talking with Vanessa's husband to answer few questions. Candice will send a link to sign up for the 100 megabits plan which Vanessa will fill out to complete the purchase"]

2024-11-10 22:51:40,001 - INFO - Transcription Classification: strategy-meeting
2024-11-10 22:51:40,006 - INFO - 
Extracted information:
Meeting ID: Strategy-20241110-225139
Date: 2024-11-10 22:51:39
Duration: 97 seconds
Participants: None
Meeting Type: Strategy Meeting
Key Decisions: ['Host a pancake breakfast next week to encourage students to come to school on Fridays', "Put up posters with tips on not getting sick since it's almost flu season", 'Refer John Smith to the guidance counselor for support', "Look for free or low-cost community resources to help John Smith's family."]
Risk Assessments: ['Chronically absent students', 'Students getting sick due to the cold weather.']
Strategic Initiatives: ['Improving student attendance', 'Promoting health and hygiene practices among students.']
Action Items: ['Plan and host a pancake breakfast next week', 'Create and put up posters with tips on not getting sick', 'Refer John Smith to the guidance counselor', "Research and share free or low-cost community resources with John Smith's family."]

Testing

Local Installation - In Process

  1. Clone this repository:

    git clone https://github.com/tensorlakeai/indexify
    cd indexify/examples/conversation_extraction
    
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Run the main script:

    python main.py --mode in-process-run
    

Contribution Checklist

  • If the python-sdk was changed, please run make fmt in python-sdk/.
  • If the server was changed, please run make fmt in server/.
  • Make sure all PR Checks are passing.

@Ashish-Abraham Ashish-Abraham changed the title Added example meeting conversations extractor Added example: meeting conversations extractor Nov 7, 2024
@diptanu
Copy link
Collaborator

diptanu commented Nov 8, 2024

@Ashish-Abraham How is this going? Are you blocked on antyhing?

@Ashish-Abraham
Copy link
Contributor Author

No issues @diptanu . Was a little busy. Will complete it soon. Thanks!

@Ashish-Abraham Ashish-Abraham marked this pull request as ready for review November 9, 2024 18:09
@diptanu
Copy link
Collaborator

diptanu commented Nov 9, 2024

@Ashish-Abraham Did you see our example here - https://github.com/tensorlakeai/indexify/tree/main/examples/video_summarization -- I am wondering what is the difference in this demo vs what's on there?

@Ashish-Abraham
Copy link
Contributor Author

Ashish-Abraham commented Nov 10, 2024

Sorry. Added the wrong file. Here we are extracting the summary in structured format defined by the schema of each meeting type. This data structure can be passed to the frontend or processed further in any manner required. Please check.

Should I convert to JSON or sth?

@diptanu
Copy link
Collaborator

diptanu commented Nov 10, 2024

@Ashish-Abraham Yeah if you use JSON it might be easier for people to consume the workflow using HTTP APIs directly. Add encoder='json' in your decorators and function classes. Also, please add a video link you have used to test this so that people get the best result first when they try out the example :)

After that you could do something like to invoke the workflow

curl -X POST -H"Content-Type: application/json` http://localhost:8900/namespaces/default/compute_graphs
-d '{....}'

and

curl -X GET http://localhost:8900/namespaces/default/compute_graphs/<cg>/invocations/<invoction>/fn/<fn_name>

I don't quite remember the APIs correctly, they are in code and on our website.

@Ashish-Abraham
Copy link
Contributor Author

@Ashish-Abraham Yeah if you use JSON it might be easier for people to consume the workflow using HTTP APIs directly. Add encoder='json' in your decorators and function classes. Also, please add a video link you have used to test this so that people get the best result first when they try out the example :)

After that you could do something like to invoke the workflow

curl -X POST -H"Content-Type: application/json` http://localhost:8900/namespaces/default/compute_graphs
-d '{....}'

and

curl -X GET http://localhost:8900/namespaces/default/compute_graphs/<cg>/invocations/<invoction>/fn/<fn_name>

I don't quite remember the APIs correctly, they are in code and on our website.

I cant find the page you are referring to. Is this the page? https://docs.tensorlake.ai/api-reference/documents/extract/extract-file-sync. Could you please guide me a bit on how to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants