Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move input .txt files from 1_to_process to 2_done folder when successful, and 3_failed if problematic #6

Open
turnkit opened this issue Jan 6, 2023 · 1 comment

Comments

@turnkit
Copy link

turnkit commented Jan 6, 2023

Batch process should move source files from "1_to_process" folder (currently the source files are in the whisp_out folder instead of whisp_out/1_to_process) to a "2_done" folder when successfully processed, to a "3_failed" folder if problems occur, such that if stopped and then later restarted, the same files are not reprocessed.

e.g. current behavior... a few files were processed then the process stopped, then restarted and the same files were being reprocessed:

PS C:\sermonindex_audio> .\do_quint_transcript.bat 192.168.0.62:8000

C:\sermonindex_audio>set INPUT_FOLDER=C:\sermonindex_audio\bak_whisp_out\txt_to_do

C:\sermonindex_audio>set OUTPUT_FOLDER=C:\sermonindex_audio\bak_whisp_out\txt_chunked_out
Input folder is C:\sermonindex_audio\bak_whisp_out\txt_to_do
Output folder is C:\sermonindex_audio\bak_whisp_out\txt_chunked_out
[INFO]: Input is a directory, assuming batch mode
[INFO]: Found 1472 files in C:\sermonindex_audio\bak_whisp_out\txt_to_do
[INFO]: [1/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1166.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1166.mp3.txt_out.txt...
[INFO]: [2/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1167.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1167.mp3.txt_out.txt...
[INFO]: [3/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1168.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1168.mp3.txt_out.txt...
[INFO]: [4/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1169.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1169.mp3.txt_out.txt...
[INFO]: [5/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1170.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1170.mp3.txt_out.txt...
[INFO]: [6/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1171.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1171.mp3.txt_out.txt...
[INFO]: [7/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1172.mp3.txt to API server on 192.168.0.62:8000...
Traceback (most recent call last):
File "C:\sermonindex_audio\chunk_paragraphs.py", line 136, in
main()
File "C:\sermonindex_audio\chunk_paragraphs.py", line 125, in main
chunk_paragraphs_dir(args.i, args.o)
File "C:\sermonindex_audio\chunk_paragraphs.py", line 88, in chunk_paragraphs_dir
r = requests.post("http://" + args.H + "/chunk", json={
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\adapters.py", line 489, in send
resp = conn.urlopen(
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "C:\Users\turnk\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\http\client.py", line 1374, in getresponse
response.begin()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\http\client.py", line 318, in begin
version, status, reason = self._read_status()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\http\client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\socket.py", line 705, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt
^CTerminate batch job (Y/N)? y
PS C:\sermonindex_audio> .\do_quint_transcript.bat 192.168.0.62:8000

C:\sermonindex_audio>set INPUT_FOLDER=C:\sermonindex_audio\bak_whisp_out\txt_to_do

C:\sermonindex_audio>set OUTPUT_FOLDER=C:\sermonindex_audio\bak_whisp_out\txt_chunked_out
Input folder is C:\sermonindex_audio\bak_whisp_out\txt_to_do
Output folder is C:\sermonindex_audio\bak_whisp_out\txt_chunked_out
[INFO]: Input is a directory, assuming batch mode
[INFO]: Found 1472 files in C:\sermonindex_audio\bak_whisp_out\txt_to_do
[INFO]: [1/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1166.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1166.mp3.txt_out.txt...
[INFO]: [2/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1167.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1167.mp3.txt_out.txt...
[INFO]: [3/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1168.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1168.mp3.txt_out.txt...
[INFO]: [4/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1169.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1169.mp3.txt_out.txt...
[INFO]: [5/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1170.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1170.mp3.txt_out.txt...
[INFO]: [6/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1171.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1171.mp3.txt_out.txt...
[INFO]: [7/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1172.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1172.mp3.txt_out.txt...
[INFO]: [8/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1173.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1173.mp3.txt_out.txt...
[INFO]: [9/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1174.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1174.mp3.txt_out.txt...
[INFO]: [10/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1175.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1175.mp3.txt_out.txt...
[INFO]: [11/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1176.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1176.mp3.txt_out.txt...
[INFO]: [12/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1177.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1177.mp3.txt_out.txt...
[INFO]: [13/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1178.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1178.mp3.txt_out.txt...
[INFO]: [14/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1179.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1179.mp3.txt_out.txt...
[INFO]: [15/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1180.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1180.mp3.txt_out.txt...
[INFO]: [16/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1181.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1181.mp3.txt_out.txt...
[INFO]: [17/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1182.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1182.mp3.txt_out.txt...
[INFO]: [18/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1183.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1183.mp3.txt_out.txt...
[INFO]: [19/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1184.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1184.mp3.txt_out.txt...
[INFO]: [20/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1185.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1185.mp3.txt_out.txt...
[INFO]: [21/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1186.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1186.mp3.txt_out.txt...
[INFO]: [22/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1187.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1187.mp3.txt_out.txt...
[INFO]: [23/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1188.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1188.mp3.txt_out.txt...
[INFO]: [24/1472] Sending file contents of C:\sermonindex_audio\bak_whisp_out\txt_to_do/SID1189.mp3.txt to API server on 192.168.0.62:8000...
[INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1189.mp3.txt_out.txt...
Traceback (most recent call last):
File "C:\sermonindex_audio\chunk_paragraphs.py", line 136, in
main()
File "C:\sermonindex_audio\chunk_paragraphs.py", line 125, in main
chunk_paragraphs_dir(args.i, args.o)
File "C:\sermonindex_audio\chunk_paragraphs.py", line 78, in chunk_paragraphs_dir
_input_contents: str = input_file.read().encode("ascii", errors="ignore").decode().replace("\r\n", " ").replace("\n", " ")
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 101: character maps to
PS C:\sermonindex_audio>

@turnkit
Copy link
Author

turnkit commented Jan 6, 2023

note: currently I am launching the processing using this Windows 11 command .bat file: with a:

PS C:\sermonindex_audio> .\do_quint_transcript.bat 192.168.0.62:8000

:: do_quint_transcript.bat
:: Batch launcher for Quint application. Quint must be running in an Oracle VM VirtualBox.
:: bundled by Joseph Packard for David Sutherland's TapeArchives.org January 2023

set INPUT_FOLDER=.\txt_to_do
set OUTPUT_FOLDER=.\txt_chunked_out\

@echo off

IF [%1] == [] (
echo No IP:PORT defined: do_quint_transcript.bat HOST:PORT
exit
)
echo Input folder is %INPUT_FOLDER%
echo Output folder is %OUTPUT_FOLDER%
python C:\sermonindex_audio\chunk_paragraphs.py -i %INPUT_FOLDER% -o %OUTPUT_FOLDER% -H %1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant