Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hundread of files that are properly parsed(or not) in opposite to Python 3.11.2 #66

Open
qarmin opened this issue May 22, 2023 · 7 comments

Comments

@qarmin
Copy link

qarmin commented May 22, 2023

d23611d

When parsing with this repo multiple files, I found that sometimes python cannot parse file but this library can and vice versa.

Command to check if python can parse file

python -m py_compile PY_FILE_TEST_427160.py

Code to check if parser can parse file

let rust_valid = parse(content.as_str(), Mode::Module,"").is_ok();

Only ~3% files from pack, can be parsed by python, but not rustpython parser

Pack 644 files - OUTPUT_FILES.zip

Example files and errors

return imprt

SyntaxError: 'return' outside function
\
 import _pl

Sorry: IndentationError: unexpected indent (79285088PY_FILE_TEST_5137609254.py, line 2)
# encoding: ut

SyntaxError: unknown encoding: ut
__p|ersion__ = '2.9.0'

SyntaxError: cannot assign to expression here. Maybe you meant '==' instead of '='?
from __future__ import un

SyntaxError: future feature un is not defined
@DimitrisJim
Copy link
Member

DimitrisJim commented May 23, 2023

Compilation involves further processing. Something might be valid syntactically but in later stages the compiler might reject it due to additional rules that come into play when bytecode is to be generated.

This comparison might make more sense using the rest of the implementation in RustPython whereby you can actually test compile vs compile instead of compile vs parse.

If you only care about parsing, you could re-do the test using CPython's parser.

I'll close for now, feel free to open again if I missed something or if you re-run the test with CPython's parser and you do actually find discrepancies. Thanks!

@youknowone
Copy link
Member

@DimitrisJim Because we actually have parser incompatibility, the test set looks very useful.

@qarmin Rather than py_compile, using ast will make more sense to exclude compile step

@youknowone
Copy link
Member

@qarmin Do you know about the license of the test set? It will be helpful if we can include it to our test suite.

@qarmin
Copy link
Author

qarmin commented May 24, 2023

This files(or minimized part of them) I took from most popular pypi libraries, so licenses are mixed.

How to parse file in python?

Chatgpt provide this solution, but don't know if this is proper

import ast

def parse_python_file(file_path):
    with open(file_path, 'r') as file:
        source_code = file.read()

      try:
          ast.parse(source_code)
      except SyntaxError as e:
          print(f"Syntax error in {file_path}: {e}")
          raise Exception()

parse_python_file('plik.py')

@qarmin
Copy link
Author

qarmin commented May 28, 2023

When using ast.parse(from above example) instead py_compile I got different set of invalid files
OutputASTInvalid.zip

@youknowone
Copy link
Member

Thanks a lot! I am sorry for late response. I am also trying to make a tool to do it easier.

@youknowone
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants