Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOT codegen #624

Merged
merged 31 commits into from
Sep 18, 2020
Merged

AOT codegen #624

merged 31 commits into from
Sep 18, 2020

Conversation

patrickt
Copy link
Contributor

@patrickt patrickt commented Sep 10, 2020

As detailed in #622, compile-time codegen is not a viable path forward for Cabal or Bazel projects. However, we can take advantage of the fact that we can run Template Haskell splices in the IO monad, with some elbow grease, then pretty-print the resulting TH declarations, piping the result through ormolu. We’ll run this executable every time we bump the grammars, which will lead to much better compile-time caching.

To test:

bazel build //semantic-ast/...
bazel run //semantic-ast:generate-ast -- --language JSON --path $PWD/../tree-sitter-json/src/node-types.json

As always, massive props to @aymannadeem for the TH codegen work: there’s absolutely no way this would have been feasible without being able to piggyback on that work.

Stuff left to do:

  • Regenerate all the grammars
  • Provide a script that does step 1 automatically.

Copy link
Contributor Author

@patrickt patrickt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to regenerate all the language AST’s (and no doubt fix whatever bugs I encounter), but this is ready for a look.

@@ -69,14 +77,14 @@ annParameterName :: Name
annParameterName = mkName "a"

-- Auto-generate Haskell datatypes for sums, products and leaf types
syntaxDatatype :: Ptr TS.Language -> [(String, Named)] -> Datatype -> Q [Dec]
syntaxDatatype language allSymbols datatype = skipDefined $ do
syntaxDatatype :: (String -> Q (Maybe Name)) -> Ptr TS.Language -> [(String, Named)] -> Datatype -> Q [Dec]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed to add this extra parameter so we don’t try to call lookupTypeName from IO, which doesn’t have any access to type information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

let traversalInstances = mappend <$> makeStandaloneDerivings (conT name) <*> makeTraversalInstances (conT name)
glue a b c = a : b <> c
name = mkName nameStr
generatedDatatype cons = dataD (cxt []) name [plainTV annParameterName] Nothing cons [deriveStockClause, deriveAnyClassClause]
deriveStockClause = derivClause (Just StockStrategy) [conT ''Generic, conT ''Generic1]
deriveAnyClassClause = derivClause (Just AnyclassStrategy) [conT ''Traversable1 `appT` varT (mkName "someConstraint")]
deriveAnyClassClause = derivClause (Just AnyclassStrategy) [ [t| (forall a. Traversable1 a) |] ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is a little unusual. The problem here was that deriving statements associated with the Traversable1 class weren’t being pretty-printed correctly, to wit:

  deriving anyclass Traversable1 someConstraint

Inserting manual parensT calls into this invocation did not produce a change in behavior: I think there’s some logic in the TH pretty-printer that looks to see if a deriving statement has one and only one child, and elides the parentheses if this is so. That works for zero-argument typeclasses, but not in this case.

An explicit forall prevents the parentheses from being elided. However, weirdly enough, GHC 8.8 rejects this as invalid syntax, whereas 8.10 handles it without an issue. Personally, since @rewinfrey did the legwork to switch our downstream projects to 8.10, we should just cut 8.8 support, but I’m open to other ideas. (Previously, I had been fixing this via invoking sed from generate-ast).

(path, tf) <- openTempFile "/tmp" "generated.hs"
T.hPutStrLn tf programText
hClose tf
callProcess "ormolu" ["--mode", "inplace", path]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could link in the ormolu package, at the cost of a ton of dependencies. It seems to me to be easier just to shell out in this invocation. The most idiomatic way to do this would be to use streaming-process and run ormolu such that it looks from stdin, but who cares? This is fine.

Comment on lines 64 to 83
haskell_binary(
name = "generate-ast",
srcs = glob(["app/**/*.hs"]),
compiler_flags = GHC_FLAGS + EXECUTABLE_FLAGS + ["-XStrictData"],
deps = [
":semantic-ast",
"//semantic-source",
"//:base",
"//:filepath",
"//:process",
"//:template-haskell",
"//:text",
"@stackage//:directory",
"@stackage//:generic-lens",
"@stackage//:lens",
"@stackage//:tree-sitter",
"@stackage//:neat-interpolation",
"@stackage//:optparse-generic",
] + all_ts_deps,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t added this to semantic-ast.cabal, reasoning that we’re not going to run this anywhere but our local machines, that expressing it in the cabal format would be tedious with few upsides, and that I want people to start migrating to Bazel!

Comment on lines 51 to 62
all_ts_deps = ["@tree-sitter-{name}//:tree-sitter-{name}".format(name = name) for name in [
"go",
"java",
"json",
"php",
"python",
"ql",
"ruby",
"rust",
"tsx",
"typescript",
]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List comprehensions in build files are the bee’s knees. Indeed, for loops are prohibited at the top-level in Starlark files, so list comprehensions are our only way of iteration. Really, Starlark is kind of like a functional subset of Python.

Comment on lines +54 to +64
adjust :: Dec -> Dec
adjust = _InstanceD . typed . mapped %~ (values %~ truncate) . (functions %~ truncate)
where
-- Need to handle functions with no arguments, which are parsed as ValD entities,
-- as well as those with arguments, which are FunD.
values, functions :: Traversal' Dec Name
values = _ValD . typed . _VarP
functions = _FunD . typed

truncate :: Name -> Name
truncate = mkName . nameBase
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, there’s a dilemma between tedious/unreadable pattern matching and cryptic/quasi-readable lens operators. I went with the latter, since we already use lens in the project, and it comes with lenses/prisms for the types we need. (TH doesn’t define record selectors for its product types, so you can’t even do record updates.)

@patrickt patrickt requested review from aymannadeem and a team September 10, 2020 14:31
Comment on lines +114 to +128
import qualified AST.Parse
import qualified AST.Token
import qualified AST.Traversable1.Class
import qualified AST.Unmarshal
import qualified Data.Foldable
import qualified Data.List as Data.OldList
import qualified Data.Maybe as GHC.Maybe
import qualified Data.Text.Internal
import qualified Data.Traversable
import qualified GHC.Base
import qualified GHC.Generics
import qualified GHC.Records
import qualified GHC.Show
import qualified Prelude as GHC.Classes
import qualified TreeSitter.Node
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these imports' assigned names don’t always match up with their actual module names.

Copy link
Contributor

@aymannadeem aymannadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work; thanks for all the self-review comments. Excited to finally use Bazel.

@@ -69,14 +77,14 @@ annParameterName :: Name
annParameterName = mkName "a"

-- Auto-generate Haskell datatypes for sums, products and leaf types
syntaxDatatype :: Ptr TS.Language -> [(String, Named)] -> Datatype -> Q [Dec]
syntaxDatatype language allSymbols datatype = skipDefined $ do
syntaxDatatype :: (String -> Q (Maybe Name)) -> Ptr TS.Language -> [(String, Named)] -> Datatype -> Q [Dec]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

@patrickt patrickt marked this pull request as ready for review September 16, 2020 23:07
@patrickt patrickt merged commit a2902fd into master Sep 18, 2020
@patrickt patrickt deleted the direct-codegen branch September 18, 2020 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants