Skip to content

petukhovv/tree-ngram-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ast-ngram-extractor

N-gram extractor by AST.

Example of use

python3 main.py -i ./ast -o features.json

Parameters

  • -i, --input_folder: input folder with ASTs (in JSON format);
  • -o, --output_file: output file, which will contain extracted features (in JSON format: "feature": "found_number");

Configuration

Configuration is specified in main.py and contains the following parameters:

  • n: max n in n-gram;
  • max_distance: max distance between neighboring nodes (window);
  • no_normalize: flag to normalize values (n-grams number);
  • include: array of arrays with sub-n-gram witch should be contained in the found n-grams;
  • include_strict: required n-grams (the remaining n-grams found will be removed);
  • exclude: array of arrays with sub-n-gram witch should be not contained in the found n-grams;
  • exclude_strict: n-grams, which should be excluded.

AST format

The program is required on input the AST of the following format (example input):

[
   {
      "type":"FUN",
      "chars":"override fun onCreateView(inflater: LayoutInflater?, container: ViewGroup?, savedInstanceState: Bundle?): View? {\n        dialog.window.requestFeature(Window.FEATURE_NO_TITLE)\n\n        DaggerAppComponent.builder()\n                .appModule(AppModule(context))\n                .mainModule((activity.application as MyApplication).mainModule)\n                .build().inject(this)\n\n        var view = inflater?.inflate(R.layout.dialog_signup, container, false)\n\n        ButterKnife.bind(this, view!!)\n\n        return view\n    }",
      "children":[
         {
            "type":"MODIFIER_LIST",
            "chars":"override",
            "children":[
               {
                  "type":"override",
                  "chars":"override"
               }
            ]
         },
         {
            "type":"IDENTIFIER",
            "chars":"onCreateView"
         },
         {
            "type":"VALUE_PARAMETER_LIST",
            "chars":"(inflater: LayoutInflater?, container: ViewGroup?, savedInstanceState: Bundle?)",
            "children":[
               {
                  "type":"LPAR",
                  "chars":"("
               },
               {
                  "type":"VALUE_PARAMETER",
                  "chars":"inflater: LayoutInflater?",
                  "children":[
                     {
                        "type":"IDENTIFIER",
                        "chars":"inflater"
                     }
                  ]
               }
            ]
         }
      ]
   }
]

It is Kotlin AST, generated by Kotlin custom compiler

Also reqired AST transformer, which is a part of kotlin-source2ast (see lib/helper/AstHelper.py)

Output format

N-grams is written in the JSON format.

For example:

{
   "MODIFIER_LIST":1,
   "override":1,
   "MODIFIER_LIST:override":1,
   "WHITE_SPACE":2,
   "fun":1,
   "IDENTIFIER":2,
   "VALUE_PARAMETER_LIST":1,
   "LPAR":11,
   "VALUE_PARAMETER_LIST:LPAR":1,
   "VALUE_PARAMETER":1,
   "VALUE_PARAMETER_LIST:VALUE_PARAMETER":3,
   "VALUE_PARAMETER_LIST:RPAR":1,
   "BLOCK":1,
   "LBRACE":1,
   "BLOCK:LBRACE":1,
   "BLOCK:WHITE_SPACE":11,
   "DOT_QUALIFIED_EXPRESSION":13,
   "BLOCK:DOT_QUALIFIED_EXPRESSION":6,
   "DOT_QUALIFIED_EXPRESSION:DOT_QUALIFIED_EXPRESSION":12,
   "BLOCK:DOT_QUALIFIED_EXPRESSION:DOT_QUALIFIED_EXPRESSION":9,
   "DOT_QUALIFIED_EXPRESSION:REFERENCE_EXPRESSION":29,
   "BLOCK:REFERENCE_EXPRESSION":8,
   "DOT_QUALIFIED_EXPRESSION:DOT_QUALIFIED_EXPRESSION:REFERENCE_EXPRESSION":29,
   "BLOCK:DOT_QUALIFIED_EXPRESSION:REFERENCE_EXPRESSION":14,
   "DOT_QUALIFIED_EXPRESSION:IDENTIFIER":24,
   "DOT_QUALIFIED_EXPRESSION:REFERENCE_EXPRESSION:IDENTIFIER":29
}

N-gram list can be used in feature-selection or ast2vec.

About

N-gram generation by tree

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages