-
Notifications
You must be signed in to change notification settings - Fork 0
/
HOWTO.txt
32 lines (21 loc) · 1.17 KB
/
HOWTO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
HOWTO convert .docx to plaintext
using pandoc -v
pandoc 3.1.13
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /data/data/com.termux/files/home/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.this was converted using PERL pandoc 3.1.13
pandoc freePress™_-_Google_Edition.493.3162.02704.bce793ca02283c4b75a261de58c2757c.docx -t plain -o out.txt
next remove newlines for a single line
perl -p -i -e 's/\R//g;' filename
GNU nano 8.2 is used for wordcount in filename
from help:
press Meta(ALT)+D to Count the number of lines, words, and characters
[ 1 line, 422 words, 3155 characters ]
Then simply append that to the filename freePress.1.422.3155
Now notice the command file gives a different count- less one.
~/freePress $ file freePress.1.422.3155.txt
freePress.1.422.3155.txt: Unicode text, UTF-8 text, with very long lines (3154), with no line terminators
This may be because of a file terminator or trailing whitespace implied by nano. This needs to be figured out.