Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top Languages Card not working properly #136

Closed
fluorspar20 opened this issue Jul 21, 2020 · 58 comments · Fixed by #240
Closed

Top Languages Card not working properly #136

fluorspar20 opened this issue Jul 21, 2020 · 58 comments · Fixed by #240
Labels
bug Something isn't working. help wanted Extra attention is needed. lang-card Issues related to the language card.

Comments

@fluorspar20
Copy link

fluorspar20 commented Jul 21, 2020

Describe the bug
I have quite a few repos with the top language being javascript. However, the top languages card doesn't show javascript at all in the list.

Expected behavior
It should show js as one of the top languages.

Screenshots / Live demo link
Screenshot_2020-07-21 fluorspar20 - Overview(1)

@fluorspar20 fluorspar20 changed the title Top Languages not working properly Top Languages Card not working properly Jul 21, 2020
@rjoydip-zz
Copy link
Contributor

Same with my profile.

github-readme-stats/top-langs not showing all languages as well as percentages.

But github-profile-languages is more appropriate.

@filiptronicek
Copy link
Contributor

Sure, but I think the GitHub GraphQl API just takes the raw LOC count from all your repos and does the analysis from that. Not 100% sure tho

@filiptronicek
Copy link
Contributor

Ok, I was mistaken, it should just fetch the top language of the repo. @anuraghazra WDYT about this?

const fetcher = (variables, token) => {
return request(
{
query: `
query userInfo($login: String!) {
user(login: $login) {
repositories(isFork: false, first: 100) {
nodes {
languages(first: 1) {
edges {
size
node {
color
name
}
}
}
}
}
}
}

@anuraghazra
Copy link
Owner

anuraghazra commented Jul 21, 2020

Same with my profile.

github-readme-stats/top-langs not showing all languages as well as percentages.

But github-profile-languages is more appropriate.

Working on my side, which browser you are using?

not working

live demo

@rjoydip-zz
Copy link
Contributor

rjoydip-zz commented Jul 21, 2020

Working on my side, which browser you are using?

Chrome. But below code is showing rust & go lang as well.

query {
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 1) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }
}

@anuraghazra
Copy link
Owner

anuraghazra commented Jul 21, 2020

Ok, I was mistaken, it should just fetch the top language of the repo. @anuraghazra WDYT about this?

Yup it should fetch the correct langs.

@filiptronicek
Copy link
Contributor

Mine is also weird 😕 @anuraghazra

Actual (pie chart) Our implementation
image my langs

@anuraghazra
Copy link
Owner

NOTE: Consider the 100 max repos & also it get's the totalSize (in bytes) to calculate how many bytes you have written with the language.

@filiptronicek
Copy link
Contributor

Is that how we want it to be? Is there not a better implementation?

NOTE: Consider the 100 max repos & also it get's the totalSize (in bytes) to calculate how many bytes you have written with the language.

@anuraghazra
Copy link
Owner

Is that how we want it to be? Is there not a better implementation?

NOTE: Consider the 100 max repos & also it get's the totalSize (in bytes) to calculate how many bytes you have written with the language.

That's how github calculates and it's all fetched from github's api so no way the data is wrong, maybe the data processing is wrong from my side. have to do some experiments.

@anuraghazra
Copy link
Owner

I'll look into this tomorrow.

@anuraghazra anuraghazra added the bug Something isn't working. label Jul 21, 2020
@anuraghazra
Copy link
Owner

Working on my side, which browser you are using?

Chrome. But below code is showing rust & go lang as well.

query {
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 1) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }
}

Hi @rjoydip yes, but as you can see

              "edges": [
                {
                  "size": 196,
                  "node": {
                    "color": "#dea584",
                    "name": "Rust"
                  }
                }
              ]

There is only one rust lang in those 100 results, and the size is 196bytes it's i think this is why it's not showing

@anuraghazra
Copy link
Owner

Maybe if i change the gql query to fetch 5 langs from a certain repo then it would be better because for now i'm just only selecting one language from each repo.

user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 5) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }

@rjoydip-zz
Copy link
Contributor

Maybe if i change the gql query to fetch 5 langs from a certain repo then it would be better because for now i'm just only selecting one language from each repo.

user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 5) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }

@anuraghazra Yes, I saw the same thing. It'll be better to make dynamic isFork and languages as variable.

@anuraghazra
Copy link
Owner

Not dynamic, making it max 5 or 10 would do the job, a repo can't have too much languages anyways.

and isFork should always be false, don't want to count forked repos. for example if anyone forked reactjs then they would have lot of js code

@filiptronicek
Copy link
Contributor

I am of the opinion, that we should count forks too, it's something, that GitHub also does, and forks exist also because people have projects they make on their own.

I am not sure if GitHub provides this, I am not exactly an expert on their v4 API, but the extensions that provide the same solution must be querying it somehow, I'll look into that.

@filiptronicek
Copy link
Contributor

Just a link for some info on another solution:
https://github.com/freyamade/github-user-languages

@anuraghazra

@rjoydip-zz
Copy link
Contributor

FYI...

{
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        name
        updatedAt
        languages(first: 5, orderBy: {field: SIZE, direction: DESC}) {
          nodes {
            name
          }
        }
        primaryLanguage {
          name
        }
      }
    }
  }
}

@filiptronicek
Copy link
Contributor

Useful, thanks, for this, I think it would be enough to not use the languages, just the primary one. / cc: @anuraghazra @rjoydip

{
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        primaryLanguage {
          name
        }
      }
    }
  }
}

That gives out something like this:

{
  "data": {
    "user": {
      "repositories": {
        "nodes": [
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "Java"
            }
          },
          {
            "primaryLanguage": null
          }
    ]
}

@stemount
Copy link

Useful, thanks, for this, I think it would be enough to not use the languages, just the primary one. / cc: @anuraghazra @rjoydip

{
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        primaryLanguage {
          name
        }
      }
    }
  }
}

That gives out something like this:

{
  "data": {
    "user": {
      "repositories": {
        "nodes": [
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "Java"
            }
          },
          {
            "primaryLanguage": null
          }
    ]
}

I think this is would be good for "most used language widget" as it is currently an approximation of many repos.

for example I could one repo that is just an express app just serving mostly HTML, but it would say 100% typescript.

@anuraghazra anuraghazra added the lang-card Issues related to the language card. label Jul 23, 2020
@filiptronicek
Copy link
Contributor

filiptronicek commented Jul 23, 2020

Useful, thanks, for this, I think it would be enough to not use the languages, just the primary one. / cc: @anuraghazra @rjoydip

I think this is would be good for "most used language widget" as it is currently an approximation of many repos.

for example I could one repo that is just an express app just serving mostly HTML, but it would say 100% typescript.

Does this mean, that HTML isn't considered a language in this analysis? I am a bit confused. Can you give me an example repo?

@anuraghazra
Copy link
Owner

I don't think we can effectively do language analysis, for example take a scenario if someone uploaded node_modules to their github then their javascript would be 100% no matter what. same as #153

@filiptronicek
Copy link
Contributor

I don't think we can effectively do language analysis, for example take a scenario if someone uploaded node_modules to their github then their javascript would be 100% no matter what. same as #153

True, but nobody should ever do that (upload their node_modules), if they do, they cannot be then angry at our code, which considers Industry best practices and it also affects GitHub's own analysis.

@anuraghazra
Copy link
Owner

I don't think we can effectively do language analysis, for example take a scenario if someone uploaded node_modules to their github then their javascript would be 100% no matter what. same as #153

True, but nobody should ever do that (upload their node_modules), if they do, they cannot be then angry at our code, which considers Industry best practices and it also affects GitHub's own analysis.

Yup, but that's not my point, there are lot of scenarios where we cannot evaluate code correctly and there is no perfect way to do that, lets just take an example of my website's github repo which has a develop branch and a master branch and master branch holds all the static html code which is generated by gatsby, there are huge amounts of meta data, JSON data & javascript files.

https://github.com/anuraghazra/anuraghazra.github.io/tree/master

@tobiasvl
Copy link

I also came here because I saw discrepancies between this tool and my Chrome extension-generated pie chart (linked above).

As an example, this tool says my top language is JavaScript with 46.21% and Lua is second with 29.29%, while the pie chart says I have 5 JavaScript repos and 15 Lua repos. However, if I do a search for JavaScript repos, I only get 1. Not sure what to make of that; presumably this tool counts LOC while the pie chart counts top language per repo, but perhaps the pie chart counts forks too, since it comes up with 5 and not 1?

By the way, not sure if it's relevant, but organization profile pages (like https://github.com/github) actually list the top 5 languages in the org's repos (without bars or percentages or anything fancy). It looks like those are just top languages, not LOC. I might be mistaken though.

@anuraghazra anuraghazra added the help wanted Extra attention is needed. label Jul 24, 2020
@anuraghazra
Copy link
Owner

And i also like the suggestion of @stemount

I think this is would be good for "most used language widget" as it is currently an approximation of many repos.

I think the "Top Languages" labeling is misleading, it should be "Most used languages"

@anuraghazra
Copy link
Owner

@NikhilCodes i don't no whats wrong with your profile but i've checked with other user's stats with the fix i'm working on and they are all fine expect yours.

nikhil_lang_stat

Btw i've checked the graphql request and seems like you do have a very very very large python repo, and this repo is so huge in bytes its straight up kicking your dart stats, so i think the stats are totally fine.

{
            "nameWithOwner": "NikhilCodes/VirtualBLU",
            "isFork": false,
            "languages": {
              "edges": [
                {
                  "size": 78537442,
                  "node": {
                    "color": "#3572A5",
                    "name": "Python"
                  }
                }
              ]
            }
          },

THE UPDATED GQL QUERY LOOKS LIKE THIS

user(login: "NikhilCodes") {
    repositories(ownerAffiliations: OWNER, isFork: false, first: 100) {
      nodes {
        nameWithOwner
        isFork
        languages(first: 10, orderBy: {field: SIZE, direction: DESC}) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }

@tobiasvl
Copy link

@tobiasvl i was experimenting with the api & the code, Are you satisfied with these stats?

tobais_lang_stat

Much better! Thank you.

However, I believe there's something wrong with your "tie-break" algorithm, so to speak. The Chrome extension linked earlier lists these as my top languages:

  1. Lua (15 repos)
  2. Python (13 repos)
  3. JavaScript (5 repos)
  4. Assembly (5 repos)
  5. HTML (5 repos)
  6. CSS (5 repos)
  7. Ruby (4 repos)
  8. C (4 repos)

And then a bunch of languages, including Java, with 1 repo each.

As you'll notice, your new code lists Assembly as my third repo and then C below that. It seems that it's simply ignoring my HTML and CSS repos, because I have as many of them as I have assembly repos! That would also explain why Java is on the list at all even though I only have one Java repo – it's ignoring my 4 Ruby repos because I already have 4 C repos on the list.

So, to sum it up, I think your patched code is getting there, but if there are several languages with the same repo count, it only displays one of them and ignores the rest.

@anuraghazra
Copy link
Owner

anuraghazra commented Jul 28, 2020

@tobiasvl It is not about how many repos you have, you could have 100 Js repos with 10bytes of code and you can have 1 Python repo with 20000bytes of code, Python would be at top in this case.

So, to sum it up, I think your patched code is getting there, but if there are several languages with the same repo count, it only displays one of them and ignores the rest.

As you can see i changed the gql query to fetch 10 languages in every repo & i'm calculating all of them.

@tobiasvl
Copy link

Hmm, OK, thanks. If it's intentional then I'm definitely fine with this. I don't want HTML and CSS on my list anyway 😅

@anuraghazra
Copy link
Owner

Also @tobiasvl

I believe there's something wrong with your "tie-break" algorithm

Actually there is no special algorithm in play here, i'm just sorting/manipulating the data coming from Github's API and picking the Top languages which have the most size.

@tobiasvl
Copy link

I see, thanks. Ship it! :shipit:

@anuraghazra
Copy link
Owner

anuraghazra commented Jul 28, 2020

Oh my godh.
I just realized this, why even everyone was comparing the stats with http://ionicabizau.github.io/github-profile-languages/ ?????

because i just checked their source code and they are calculating "How MANY languages a user has in their profile" & github-readme-stats is calculating "MOST used languages in user's profile"

And they have a package called gh-polygot which is doing this :-
https://github.com/IonicaBizau/node-gh-polyglot/blob/master/lib/index.js#L102-L106

@VictorNS69
Copy link

Hi! First of all, thanks for the job you are doing with this repo!

I have a question, that I think fits here.

I have like 15 Python repositories, but I have no stats in "most used languages"
Here you can see the live card:

Also the static card:
image

Maybe Python is not my "Top language with the most size", but seems strange to not have any %.

@anuraghazra
Copy link
Owner

anuraghazra commented Aug 11, 2020

Maybe Python is not my "Top language with the most size", but seems strange to not have any %

@VictorNS69
You have so much code in java & ASP that python did not made it to the list.

@tobiasvl
Copy link

Yes, I think it's pertinent to point out that strangely, the percentages seem to be within the top 5. You don't have 1.69% TeX in all your repos, but it makes up 1.69% of all the code within your top 5 languages. (Unless I've misunderstood.)

@anuraghazra
Copy link
Owner

Yes, I think it's pertinent to point out that strangely, the percentages seem to be within the top 5. You don't have 1.69% TeX in all your repos, but it makes up 1.69% of all the code within your top 5 languages. (Unless I've misunderstood.)

He has one repo with a good amount of TeX https://github.com/VictorNS69/Apuntes-Ciber

@abdullasirajudeen
Copy link

abdullasirajudeen commented Sep 21, 2020

I my Profile Not Show Top Language, All are Empty
Github Stat Link

@anuraghazra
Copy link
Owner

anuraghazra commented Sep 21, 2020

@abdullasirajudeen because you don't have anything, you have 5 repos 4 of them are forks which won't be counted and one is your readme repo which does not have any code.

@tobiasvl
Copy link

@anuraghazra I think you mean @abdullasirajudeen? (I have lots of repos)

@abdullasirajudeen
Copy link

@tobiasvl because you don't have anything, you have 5 repos 4 of them are forks which won't be counted and one is your readme repo which does not have any code.

private repository not counted here.

@anuraghazra
Copy link
Owner

@anuraghazra I think you mean @abdullasirajudeen? (I have lots of repos)

oh yeahh, sorry for the wrong mention.

@apoorvpandey0
Copy link

Mine is also working incorrectly,
It's showing top language as JS but that's not the case I use Python

@aelbenney
Copy link

Is there any stats for the languages you have contributed in? (Company repositories)

@kavinvalli
Copy link

Mine is also working incorrectly,
It's showing top language as JS but that's not the case I use Python

Same here. I haven't used python in very long but it still shows up, and doesn't show JS and TS which i use mostly

@ramonpaolo
Copy link

same here

@dancarlton
Copy link

mine isn't showing any languages at all in my readme, but it is updated in one of my repos. am I sourcing it incorrectly?

langallrepo

lang1repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. help wanted Extra attention is needed. lang-card Issues related to the language card.
Projects
None yet
Development

Successfully merging a pull request may close this issue.