Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create DB tables in advance? #75

Open
nsshah1288 opened this issue May 26, 2020 · 7 comments
Open

Create DB tables in advance? #75

nsshah1288 opened this issue May 26, 2020 · 7 comments

Comments

@nsshah1288
Copy link

nsshah1288 commented May 26, 2020

Hi,

I was trying to use pgcontents to store Jupyter notebook code in PostgreSQL.

I set up a blank PostgresSQL database like it said in instructions and specified it in my jupyter_notebook_config.py:

from pgcontents import PostgresContentsManager c = get_config() # noqa c.NotebookApp.contents_manager_class = PostgresContentsManager c.PostgresContentsManager.db_url = 'postgresql://postgres:[email protected]/pgcontents'

Then I brought up the notebook in EKS and tried to see if it would save my Jupyter notebook code.

I got an error like this:

$ kubectl logs jupyter-deployment-f4846c8db-b7t95 -n jupyter
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UndefinedTable: relation "pgcontents.users" does not exist
LINE 1: INSERT INTO pgcontents.users (id) VALUES ('root')
^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/bin/jupyter-notebook", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/jupyter_core/application.py", line 270, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 663, in launch_instance
app.initialize(argv)
File "", line 2, in initialize
File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/notebook/notebookapp.py", line 1766, in initialize
self.init_configurables()
File "/usr/local/lib/python3.8/dist-packages/notebook/notebookapp.py", line 1383, in init_configurables
self.contents_manager = self.contents_manager_class(
File "/usr/local/lib/python3.8/dist-packages/pgcontents/pgmanager.py", line 99, in init
super(PostgresContentsManager, self).init(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pgcontents/managerbase.py", line 66, in init
self.ensure_user()
File "/usr/local/lib/python3.8/dist-packages/pgcontents/managerbase.py", line 70, in ensure_user
ensure_db_user(db, self.user_id)
File "/usr/local/lib/python3.8/dist-packages/pgcontents/query.py", line 89, in ensure_db_user
db.execute(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1020, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1323, in _execute_context
self.handle_dbapi_exception(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1517, in handle_dbapi_exception
util.raise
(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 178, in raise

raise exception
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "pgcontents.users" does not exist
LINE 1: INSERT INTO pgcontents.users (id) VALUES ('root')

Do I need to create the users, directories, files tables beforehand?
I am confused because the directions say:

Prerequisites:
Write access to an empty PostgreSQL database.

Also I was trying to look at the code and see where you actually create the tables, and I couldn't find where you actually do a 'CREATE TABLE' command. @ssanderson Could you please give guidance on this?

Thanks!

@ssanderson
Copy link
Contributor

Hi @nsshah1288,

The tables used by pgcontents are created by the pgcontents init command noted in the README's installation steps. That command uses alembic to run a migration that should set up the database as necessary.

@nsshah1288
Copy link
Author

Thanks @ssanderson for the quick reply!

Right now I have a Dockerfile like this:

FROM ubuntu:latest
WORKDIR /code
RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y build-essential python-dev libpq-dev
RUN apt-get install -y python3.6 python-distribute python3-pip
RUN pip3 -q install pip --upgrade
ADD jupyter_notebook_2_config.py /code
RUN pip3 install jupyter
RUN pip install pgcontents
RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt
RUN mkdir /notebooks
CMD jupyter notebook --allow-root --no-browser --ip 0.0.0.0 --config=jupyter_notebook_2_config.py --port 8080 /notebooks

When I try to run
docker build -t postgres .

I can't connect to the PostgreSQL DB that I have setup in AWS.

Step 10/12 : RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt
---> Running in abffc35920e4
Initializing pgcontents...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 2345, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 778, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 495, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 239, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 440, in init
self.__connect(first_connect_check=True)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 661, in connect
pool.logger.debug("Error on connect(): %s", e)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 68, in exit
compat.raise
(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 178, in raise

raise exception
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 656, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 490, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib/python3.8/dist-packages/psycopg2/init.py", line 127, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection refused
Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.158.188) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.159.214) and accepting
TCP/IP connections on port 5432?

This error makes sense, but I have not yet figured out how to supply temporary credentials in my jupyter_notebook_2_config.py file yet.

When I used s3contents I was able to specify access key, secret key, and session token, and then it connected to s3. Is there a way to do this using pgcontents?

Something like below?

#c.PostgresContentsManager.access_key_id="blah"
#c.PostgresContentsManager.secret_access_key="blah"
#c.PostgresContentsManager.session_token="blah"

I am guessing I should instead put pgcontents init inside the jupyter_notebook_2_config.py, but I don't see how that will work if I can't get the temporary credentials to work. Any ideas?

@nsshah1288
Copy link
Author

@ssanderson do you have any suggestions?

@ssanderson
Copy link
Contributor

hey @nsshah1288 . You only need to run pgcontents init once. It will create all the tables that will be used. It's usually done manually from the command line.

For specifying your database credentials, you can put those in the jupyter notebook config. There's an example of what that looks like here: https://github.com/quantopian/pgcontents/blob/master/examples/example_jupyter_notebook_config.py.

@nsshah1288
Copy link
Author

nsshah1288 commented Jun 2, 2020

Hi @ssanderson thanks for the reply.

I ranpgcontents initwith a file database URL of
postgresql://{username}:{password}@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres

I provided the real username and password obviously.

I am able to log into the query editor in AWS for this Aurora PostgreSQL serverless DB with the credentials. However, when I run pgcontents init with the same credentials, I get this error:

sh-4.2$ pgcontents init
File Database URL: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres
Repeat for confirmation: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres
Initializing pgcontents...

About to run schema migrations against supplied database URL. If you have
stored data from a previous pgcontents installation, it may not be
correctly preserved.

It is HIGHLY recommended that you back up stored data before proceeding.

Proceed? [y/N]: y
Traceback (most recent call last):
File "/usr/local/bin/alembic", line 11, in
load_entry_point('alembic==1.4.2', 'console_scripts', 'alembic')()
File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 577, in main
CommandLine(prog=prog).main(argv=argv)
File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 571, in main
self.run_cmd(cfg, options)
File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 551, in run_cmd
**dict((k, getattr(options, k, None)) for k in kwarg)
File "/usr/local/lib/python2.7/site-packages/alembic/command.py", line 298, in upgrade
script.run_env()
File "/usr/local/lib/python2.7/site-packages/alembic/script/base.py", line 489, in run_env
util.load_python_file(self.dir, "env.py")
File "/usr/local/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
module = load_module_py(module_id, path)
File "/usr/local/lib/python2.7/site-packages/alembic/util/compat.py", line 250, in load_module_py
mod = imp.load_source(module_id, path, fp)
File "/usr/local/lib/python2.7/site-packages/pgcontents/alembic/env.py", line 83, in
run_migrations_online()
File "/usr/local/lib/python2.7/site-packages/pgcontents/alembic/env.py", line 65, in run_migrations_online
connection = engine.connect()
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2272, in connect
return self._connection_cls(self, **kwargs)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 104, in init
else engine.raw_connection()
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2379, in raw_connection
self.pool.unique_connection, _connection
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2349, in _wrap_pool_connect
e, dialect, self
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1591, in handle_dbapi_exception_noconnection
sqlalchemy_exception, with_traceback=exc_info[2], from
=e
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2345, in _wrap_pool_connect
return fn()
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
return _ConnectionFairy._checkout(self)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
rec = pool._do_get()
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/impl.py", line 239, in _do_get
return self._create_connection()
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 440, in init
self.__connect(first_connect_check=True)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
pool.logger.debug("Error on connect(): %s", e)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in exit
exc_value, with_traceback=exc_tb,
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 490, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib64/python2.7/site-packages/psycopg2/init.py", line 127, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user "master"
FATAL: password authentication failed for user "master"

(Background on this error at: http://sqlalche.me/e/e3q8)
Traceback (most recent call last):
File "/usr/local/bin/pgcontents", line 201, in
main()
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/bin/pgcontents", line 98, in init
upgrade(db_url, revision)
File "/usr/local/lib/python2.7/site-packages/pgcontents/utils/migrate.py", line 40, in upgrade
['alembic', '-c', alembic_ini, 'upgrade', revision]
File "/usr/lib64/python2.7/subprocess.py", line 190, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['alembic', '-c', '/tmp/tmpf740BC/temp_alembic.ini', 'upgrade', u'head']' returned non-zero exit status 1

Do you know if pgcontents works with Aurora PostgreSQL Serverless? Thanks

@nsshah1288
Copy link
Author

Hi @ssanderson
I tried using a normal PostgreSQL DB and pgcontents init worked. Do you know if pgcontents can work with PostgreSQL serverless?

Thanks!

@jorisgillis
Copy link

I'm having a similar issue as pl31 created (pl31/heroku-jupyter#38).

  • Using pgcontents 0.6.0.
  • Created a database (notebooks) in PostgreSQL 11 (PostgreSQL 14 did not work at all)
  • Ran pgcontents init with the database url pointing to the newly created database: postgresql://<username>:<password>@localhost/notebooks
  • Initialization completes successfully
  • When I log into the database, there are no tables. I do find the pgcontents schema, and the alembic_version table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants