Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result parser issue when dealing with results containing a semicolon #3

Open
hecklerponics opened this issue Sep 26, 2019 · 0 comments

Comments

@hecklerponics
Copy link

There's an issue with parsing returned data with URLs that include a semicolon:

From /python_semrush/semrush.py

     84             result = {}
     85             for i, datum in enumerate(line.split(';')):
---> 86                 result[columns[i]] = datum.strip('"\n\r\t')
     87             results.append(result)

As an example this URL was passed from a call to the organic_phrase function:
http://www.hilton.com/en/hotels/content/SPTSHHF/media/pdf/Tangerine_Bar_2.pdf;jsessionid=DTE5TAZBV525MCSGBI12VCQ

Resulting in list index out of range error. To get around this (just in case others find the same problem) I modified my script to declare export_escape=1 in the arguments to force double-quotes; I then updated the parser to split on '";"' instead of ";"

The new code looks like this:
(lines 75-89 of /python_semrush/semrush.py)

    @staticmethod
    def parse_response(data):
        results = []
        data = data.decode('unicode_escape')
        lines = data.split('\r\n')
        lines = list(filter(bool, lines))
        columns = lines[0].split(';')

        for line in lines[1:]:
            result = {}
            for i, datum in enumerate(line.split('";"')):
                result[columns[i]] = datum.strip('"\n\r\t')
            results.append(result)

        return results

I'm sure there is a better way to do this, but in the meantime, this is a workaround that works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant