Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem if # commits == 100 or 1000 #139

Open
mfenner1 opened this issue Sep 22, 2011 · 2 comments
Open

Problem if # commits == 100 or 1000 #139

mfenner1 opened this issue Sep 22, 2011 · 2 comments

Comments

@mfenner1
Copy link

So, if I do a little black magic (edit repositoryhandler.git.log to request only 100 commits .... implying git log -100 for a "limited history"), the last commit does not get properly inserted into DBTempLog in __writer. In fact, it comes in "ill - formed". The revision field is filled in, but no "data" (author, log message, etc.).

Adding a little debugging code to __writer shows the issue:

if n_commits == 50:
      print str(load(StringIO(commits[0][2])))
      cursor.executemany(statement("INSERT into _temp_log " + \
                                   "(rev, date, object) values (?, ?, ?)", 
                                   self.db.place_holder), commits)

Gives the following (for the oldest commit ... sort of HEAD~100):

{'committer': None, 'author': None, 'composed_rev': False, 'tags': None, 'commit_date': None, 'actions': [], 'branch': 'master', 'mes
sage': '', 'author_date': None, 'revision': '6d7c2b4cfa5cb17c9f84c949bfece17b60b0f929'}

I receive the same sort of error with a history limited to 100 and 1000. With a history of 10, 50, 105, 200, 500 -- it works just fine. I'm guessing that one of the AsyncQ's (maybe in parser?), isn't flushing out its lines properly and leaving one at the tail end.

Input is very welcome.

Best,
Mark

@apepper
Copy link
Contributor

apepper commented Sep 23, 2011

Hello Mark.
Can you give a little more context? What exactly is the problem you observed? Are you talking about a repository that has exactly 100 commits?

Greetings
Alex

@mfenner1
Copy link
Author

Hi Alex, Sorry for the vagueness. It was the end of a long day.

So, I think there are two ways to recreate this.

The first is how I found it: in repositoryhandler/backends/git.py GitRepository.log(), hack cmd to include an argument to git log that limits the number of extracted commits to 100 (that is, have GitRepository.log() call git log -100) [this is useful if you need to test stuff in the context of a big code base, but don't want to wait for lots of commits to be processed while testing].

The second would be to develop a repository with exactly 100 commits. Ugh.

Anyway, once you do that, if you run cvsanaly2, you will get an error in DBContentHandler:

def __get_person(self, person):
     # <snip>
     name = to_utf8(person.name)

that person.name has the value None which to_utf8 doesn't like. This only occurs for the last (100th) commit being processed. If you follow the chain of calls backwards, the list of commits (from which person originates) is populated in DBTempLog.

Within DBTempLog, the information about the last commit is missing (see the dictionary above that is mostly filled with values of None).

My guess is that last lines of the commit log aren't making it (1) to the parser and/or (2) out of the parser. So, the information doesn't get into to DBTempLog.

I'm currently working around the problem by not using 100 or 1000 as my commit limit. But, I'll be happy to help with bug finding, confirming, and squashing.

Best,
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants