forked from google/mtail
-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO
139 lines (80 loc) · 7.14 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
Implement a standard library, search path:
Means we can provide standard syslog decorator.
Requires figuring out where we keep standard library definitions, and what the syntax for import looks like.
Can't put trailing newlines in cases in parser test, requires changes to expr stmt
parse tree/ast testing? - expected AST as result from parse/check instead of
merely getting a result. A similar version of this is in codegen_test.go:TestCodeGenFromAST
A mapping between progs and logs to reduce wasted processing- issue #35
Means we don't fan out log lines to every VM if reading from multiple sources.
Requires figuring out how to provide this configuration. Special syntax in a program? Not very flexible. A real config file? Been trying to avoid that. Commandline flag? Seems difficult to maintain.
bytecode like
[{push 1} {push 0} {cmp 1}
{jm 6} {push 0} {jmp 7} {push 1} {jnm 13}
{setmatched false} {mload 0} {dload 0} {inc <nil>} {setmatched true}]
can be expressed as
[{push 1} {push 0} {cmp 1}
{jm 9}
{setmatched false} {mload 0} {dload 0} {inc <nil>} {setmatched true}]
but jnm 13 is from the condExpr and the previous is from a comparison binary
expr; an optimizer is needed to collapse the bytecode to undersand that
cmp, jm, push, jump, push, jnm in sequence like so is the same as a cmp, jm
and we need to worry about the jump table too
reversed casts: s2i,i2s pairs as well
count stack size and preallocate stack
-> counts of push/pop per instruction
-> test to keep p/p counts updated
: seems like a lot of work for not much return
# Won't do
X Use capture group references to feed back to declaring regular expression,
X noting unused caprefs,
X possibly flipping back to noncapturing (and renumbering the caprefs?)
X -> unlikely to implement, probably won't impact regexp speed
When using a const by itself as a match condition, then we get warnings about
the capture group names not existing.
const A /(?<a>.*)/
A {
x[$a]++
}
... => $a not defined in scope.
Can't define string constants, like const STRPTIME_FORMAT "Jan _2"
Multline const can't startwith a newline, must be const FOO // +\n..., yuo might want to do this for long first fragments, e.e.g const FOO\n /somethign/
Can't chain two matches in same expresison like getfilename() =~ 'name' &&
EXPR_RE because $0 is redefined
This seems like somethign you might weant to do, and we are unlikely to want to use $0, but this is also true for the first capture group. Do we standardise on "the last pattern match wins"?
Can't set the timestamp when processing one log line and reuse it in another; must use the
caching state metric pattern, hidden gauge time. (I think this shows up in the original mysql example.)
Could one preserve the last parsed timestamp in VM state between runs? How does this interact with programs that never strptime because they have no timestamp in the log? #pragma notimestamp?
Get a list of non-stdlib deps
go list -f "{{if not .Standard}}{{.ImportPath}}{{end}}" $(go list -f '{{join .Deps "\n"}}' ./...)
This is just a neat thing to remember for Go.
Programs may not use mtail_ as a metric prefix. Should just document this.
Theory: Implicitly cast Int shouldn't get the S2i conversion applied to them. Do we need to name Implicit Int separate from Int and then not create s2i or other conversions for implicits. (and we need to keep the runtime conversions?)
if you comment out the MATCH_NETWORK clase in dhcpd.mtail it gets 30x faster... because the regexp no longer backtracks... why... hints are that we exeute in an NFA regular expression becayuse it's unanchored.
Avoid byte to string conversions in the tailer and vm FindStringSubmatch > https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html#strings_and_bytes . Pro: speed. Con, not sure how we manage utf-8 in decode.go?
Use FindSubmatchIndex to avoid copies? Not sure if there's a performance win here, but we want to avoid memcpy if we can.
Why is strings.Builder slower than bytes.Buffer when the latter's docstring recommends the former?
ci: rerun failed tests to see if they're flaky.
Find out if OpenTelemetry is better than OpenCensus when creating no-op trace spans.
Test that when path/* is the logpathpattern that we handle log rotation, e.g. log -> log.1
= how can this work, we can't tell the difference between log.1 being a rotation or a new log. This could work if we can have a tailer-level registry of filenames currently with a goroutine. But we don't know the name of the new file when filestream creates a new goroutine for the replacement; fd.Stat() doesn't return the new name of the file.
- Workaround: warn when '*' is the last of a glob pattern.
VM profiler, when enabled, times instructions so user gets feedback on where their program is slow.
Can we create a linter that checks for code patterns like 'path.Join' and warns against them? Can govet be made to do this?
Detect when a regular expression compiled doesn't have a onepass program, and report a compile warning. we can't do this today with the regexp API, because it's not an exported field, and the onepass compilation step is not an exported function. IF we can do this, we can warn the user that their regular expression has ambiguity and will backtrack. See MATCH_NETWORK above.
Do we have a precision problem that shold be solved by using math/big for literals in the AST. Extra credit: find out if the vm runtime should use big internally as well?
regular expression matching is expensive. prefilter on match prefix. for extra credit, filter on all substrings of the expressions, using aho-corasick.
once the vm refactoring has completed, move the VM execute method into per-opcode functions, and use the same state machine function as in lexer.NextToken() to simulate threaded code as we don't get tail recursion in Go. The plan is to see if execution speed is same or better -- expect moving to function calls to be slower unless inlined, but gain in readability and reuse.
refactor vm further to replace stack with registers, we need typed registers to remove the pop runtime type cast. new opcodes to do migration from stack to register based ops required
Once the tailer can read from sockets, I'll move it out of `internal/`.
Pass a Logger as an option to tailer and vm.
StatusHTML in vm reads expvars; can we not do that?
Move from expvar to OpenSomething metrics.
Should the exporter move into the metric package?
Should the waker move into the tailer package?
Benchmarks on GHA are too variable. Compute benchmarks old and new in same instance, per guidelines from "Software Microbenchmarking in the Cloud. How Bad is it Really?" Laaber et al.
Move loc and useCurrentYear out of VM and into Runtime env.
Move const folding into parser during AST build.
Const-fold identity functions.
Both tailer and logstream probably don't need to do URL parsing. Tailer could do it on the log path patterns before filling up the poll patterns list. Non-globs don't need repolling, and any with a scheme can already be constructed by TailPattern.
Trim unused string and regexp constants, as .e.g /s/ && 1 gets optimised away.
Collapse duplicate string and regexp constants.