Skip to content
This repository has been archived by the owner on May 14, 2023. It is now read-only.

Cleans up caption files for PBS COVE ingestion. Removes characters that are known to cause ingestion failures.

License

Notifications You must be signed in to change notification settings

JasonRaveling/pbs-caption-clean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PBS Caption Clean

This is a simple Python script that cleaning up caption files for PBS COVE ingestion. It removes characters that are known to cause ingestion failures.

When a caption file ingestion fails, usually the first thing I do is check the caption file for special characters.

This script only works with plain text caption files. If you're working with caption files that are not stored in plain text, you may want to check out pycaption for converting caption files to PBS' preferred caption format (DFXP).

Usage

./caption-clean [input-file] [output-file]

Below are some examples on how to use this script assuming you're running python from the command line.

If no output file is provided, it is copied to myCaptionFile.xml.orig

./caption-clean myCaptionFile.xml

You can also give the modified file a new name.

./caption-clean myCaptionFile.xml newFileName.xml

Find a charachter that isn't being remove?

If you're noticing a charachter that is probably the cause of a failed ingest, feel free to submit a pull request or open an issue on GitHub.

About

Cleans up caption files for PBS COVE ingestion. Removes characters that are known to cause ingestion failures.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages