Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If a column is called "group" (or other reserved words) the example link to the table doesn't work #134

Closed
simonw opened this issue Jun 27, 2021 · 11 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link
Owner

simonw commented Jun 27, 2021

Clicking this link:

SQL__select_count____from_availability_tag_____select_id__name__notes__disabled__previous_names__slug__group_from_availability_tag

Links to this SQL which throws an error:

select id, name, notes, disabled, previous_names, slug, group from availability_tag

Error:

syntax error at or near "group" LINE 1: ... id, name, notes, disabled, previous_names, slug, group from... ^

@simonw simonw added the bug Something isn't working label Jun 27, 2021
@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

Fix is to wrap the column name in double quotes:

select id, name, notes, disabled, previous_names, slug, "group" from availability_tag

Need a list of reserved SQL words - or some other mechanism for detecting them, maybe psycopg2 has something?

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

https://www.psycopg.org/docs/extensions.html#psycopg2.extensions.quote_ident

psycopg2.extensions.``quote_ident(str, scope)

Return quoted identifier according to PostgreSQL quoting rules.

The scope must be a connection or a cursor, the underlying connection encoding is used for any necessary character conversion.

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

That quotes identifiers whether or not they need to be quoted, which I find a little ugly. It gave me this result:

select "id", "name", "notes", "disabled", "previous_names", "slug", "group" from availability_tag

I'm optimistic about this to help solve the problem:

select * from pg_get_keywords()

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

It looks like the full list of reserved words (that are invalid if used as column names without double quotes) can be found with:

select string_agg(word, ', ') from pg_get_keywords() where catcode = 'R'

It's:

all, analyse, analyze, and, any, array, as, asc, asymmetric, both, case, cast, check, collate, column, constraint, create, current_catalog, current_date, current_role, current_time, current_timestamp, current_user, default, deferrable, desc, distinct, do, else, end, except, false, fetch, for, foreign, from, grant, group, having, in, initially, intersect, into, lateral, leading, limit, localtime, localtimestamp, not, null, offset, on, only, or, order, placing, primary, references, returning, select, session_user, some, symmetric, table, then, to, trailing, true, union, unique, user, using, variadic, when, where, window, with

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

I could incorporate that into this SQL query in a clever way:

tables_cursor.execute(
"""
with visible_tables as (
select table_name
from information_schema.tables
where table_schema = 'public'
order by table_name
)
select
information_schema.columns.table_name,
string_agg(column_name, ', ' order by ordinal_position) as columns
from
information_schema.columns
join
visible_tables on
information_schema.columns.table_name = visible_tables.table_name
where
information_schema.columns.table_schema = 'public'
group by
information_schema.columns.table_name
order by
information_schema.columns.table_name
"""
)

Or I could run it once at the start, pull the list of reserved words into Python and apply optional escaping in Python code.

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

Doing it with fancy SQL looks like it could work really well:

with visible_tables as ( 
  select table_name 
    from information_schema.tables 
    where table_schema = 'public' 
    order by table_name 
),
reserved_keywords as (
  select word
    from pg_get_keywords()
    where catcode = 'R'
  union select 'id' as word
)
select 
  information_schema.columns.table_name, 
  string_agg(column_name, ', ' order by ordinal_position) as columns,
  array_agg(case when column_name in (select word from reserved_keywords) then '"' || column_name || '"' else column_name end order by ordinal_position) as columns 
from 
  information_schema.columns 
join 
  visible_tables on 
  information_schema.columns.table_name = visible_tables.table_name 
where 
  information_schema.columns.table_schema = 'public' 
group by 
  information_schema.columns.table_name 
order by 
  information_schema.columns.table_name

(I added the union select 'id' as word bit just to ensure I had some columns that were treated as needing quoting)

Output:

SQL__with_visible_tables_as___select_table_name_from_information_schema_tables_where_table_schema____public__order_by_table_name____reserved_keywords_as___select_word_from_pg_get_keywords___where_catcode____R____select_information_schema_co

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

I also need to solve this for "sort by column" queries, see #57 - so I'm going to load the results of pg_get_keywords() once (and cache for the lifetime of the server) and implement this in Python instead.

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

Python 3.9 introduced a @functools.cache decorator - https://github.com/python/cpython/blob/3.9/Lib/functools.py#L650 - but under the hood it looks like this, and lru_cache has been in Python since 3.2:

def cache(user_function, /):
    'Simple lightweight unbounded cache.  Sometimes called "memoize".'
    return lru_cache(maxsize=None)(user_function)

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

In [3]: @functools.lru_cache(maxsize=None)
   ...: def cached():
   ...:     print("called")
   ...:     return 5
   ...: 

In [4]: 

In [4]: cached()
called
Out[4]: 5

In [5]: cached()
Out[5]: 5

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

That won't work because the function needs to be passed a connection. I'll roll my own dumb caching mechanism instead.

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2021

_reserved_words = None

def postgresql_reserved_words(connection):
    global _reserved_words
    if _reserved_words is None:
        with connection.cursor() as cursor:
            cursor.execute("select word from pg_get_keywords() where catcode = 'R'")
            _reserved_words = [row[0] for row in cursor.fetchall()]
    return _reserved_words

@simonw simonw closed this as completed in bacf5de Jul 1, 2021
simonw added a commit that referenced this issue Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant