Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More database problems updating jobs #421

Open
sfayer opened this issue Nov 9, 2018 · 1 comment
Open

More database problems updating jobs #421

sfayer opened this issue Nov 9, 2018 · 1 comment
Assignees
Labels
awaiting input This issue is on hold until more information is received

Comments

@sfayer
Copy link
Collaborator

sfayer commented Nov 9, 2018

We got this stack trace which testing the BY_SIZE algorithm (possibly multiple worker instances updating things at the same time or something like that)?

'2018-11-09 16:20:22,015 pdm.utils.db ERROR Error updating job
Traceback (most recent call last):
  File "/home/pdm/pdm/src/pdm/utils/db.py", line 23, in managed_session
    yield request.db.session
  File "/home/pdm/pdm/src/pdm/workqueue/WorkqueueDB.py", line 256, in update
    session.merge(self)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 150, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1604, in merge
    self._autoflush()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1198, in _autoflush
    self.flush()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
    self._flush(objects)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2001, in _flush
    flush_context.execute()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 372, in execute
    rec.execute(self)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 526, in execute
    uow
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 46, in save_obj
    uowtransaction)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 171, in _organize_states_for_save
    state_str(existing)))
FlushError: New instance <JobElement at 0x7f0cb5f5e690> with identity key (<class 'pdm.workqueue.WorkqueueDB.JobElement'>, (1, 417)) conflicts with persistent instance <JobElement at 0x7f0cb6812a90>
'2018-11-09 16:20:22,092 server/workqueue INFO 80242100-9eec-4e48-be95-7d2568163ef7: 146.179.245.46 PUT https://pdm00.grid.hep.ph.ic.ac.uk:5446/workqueue/api/v1.0/worker/jobs/417/elements/0 500 154
'2018-11-09 16:20:22,019 server/workqueue INFO 3ef40f34-5221-45cd-837f-48e06dcb94a7: 146.179.245.46 POST https://pdm00.grid.hep.ph.ic.ac.uk:5446/workqueue/api/v1.0/worker/jobs 404 131
'2018-11-09 16:20:22,020 pdm.utils.db ERROR Error updating job element
Traceback (most recent call last):
  File "/home/pdm/pdm/src/pdm/utils/db.py", line 23, in managed_session
    yield request.db.session
  File "/home/pdm/pdm/src/pdm/workqueue/WorkqueueDB.py", line 349, in update
    session.merge(self)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 150, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1604, in merge
    self._autoflush()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1208, in _autoflush
    util.raise_from_cause(e)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1198, in _autoflush
    self.flush()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
    self._flush(objects)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2001, in _flush
    flush_context.execute()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 372, in execute
    rec.execute(self)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 526, in execute
    uow
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 60, in save_obj
    mapper, table, update)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 518, in _emit_update_statements
    execute(statement, params)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
    exc_info
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (IntegrityError) constraint failed u'UPDATE jobelements SET attempts=?, timestamp=? WHERE jobelements.id = ? AND jobelements.job_id = ?' (3, '2018-11-09 16:20:22.020414', 0, 417)
@alexanderrichards
Copy link
Collaborator

looks like the db is being updated with a value that fails the constraint, specifically the attempts constraint.

CheckConstraint('attempts <= max_tries'),

Where unless specified max_tries defaults to 2:

max_tries = SmartColumn(SmallInteger, nullable=False, default=2, allowed=True)

This could be a race condition... However given we don't see this with the BY_NUMBER algorithm seems that this is to do with the BY_SIZE algorithm. This has been changed extensively in #436 so maybe we should wait and see if this is still an issue after that PR is merged

@alexanderrichards alexanderrichards added the awaiting input This issue is on hold until more information is received label Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting input This issue is on hold until more information is received
Projects
None yet
Development

No branches or pull requests

2 participants