Skip to content
This repository has been archived by the owner on Nov 24, 2021. It is now read-only.

Proposal: data mining on positions of users for the 2 repos #1474

Open
KOLANICH opened this issue Mar 24, 2021 · 42 comments
Open

Proposal: data mining on positions of users for the 2 repos #1474

KOLANICH opened this issue Mar 24, 2021 · 42 comments

Comments

@KOLANICH
Copy link
Contributor

KOLANICH commented Mar 24, 2021

  • For each repo of {rms-support-letter/rms-support-letter.github.io, rms-open-letter/rms-open-letter.github.io}:

    • get via GH API nicknames of {stargazers, forks owners}
  • for each user compute a vector describing its membership in orgs and companies (1 - member, 0 - non-member). Can be detected by orgs and by company field in user's profile

  • compute an Euler's diagram (a - b, b - a, a ^ b)

  • for nonintersecting users compute correlation of their position (binary variable, 0 - pro-Stallman, 1 - against-Stallman) to the companies they are members of

  • sort the results and plot the nice plots

It also may be possible to train an XGBoost model predicting the position from memberships in companies and repos, and then apply SHAP, and again visualize feature importances.

Stated companies and locations
query ($cursor: String = null) {
  repository(name: "rms-open-letter.github.io", owner: "rms-open-letter") {
    pullRequests(first: 100, after: $cursor, states: MERGED) {
      pageInfo {
        endCursor
        hasNextPage
      }
      nodes {
        author {
          ... on User {
            company
            location
          }
        }
      }
    }
  }
}
@13sqrt5
Copy link
Contributor

13sqrt5 commented Mar 24, 2021

That's highly unethical, i think.

@KOLANICH
Copy link
Contributor Author

What is unethical? Analysing corporate culture based on publicly available information people themselves willfully made publicly available ?

@13sqrt5
Copy link
Contributor

13sqrt5 commented Mar 24, 2021

Well, yes. You ask people to sign a letter to support Stallman, and then use their personal data for data mining. That's what corporations are doing, and what they are hated for: collecting personal data and and then selling/abusing it.

@pizdjuk
Copy link

pizdjuk commented Mar 25, 2021

I support @13sqrt5 hear. Meanwhile finding this idea meaningful. So, @KOLANICH, it would be interesting to make this stuff. But discuss it elsewhere, and private. Without publicity.

@13sqrt5
Copy link
Contributor

13sqrt5 commented Mar 25, 2021

But indeed, the idea is cool. I think that's an interesting research, and you should make it, but some time later, after the situation with Stallman is resolved.

@Shamar
Copy link
Contributor

Shamar commented Mar 25, 2021

A more interesting study that I think would have a huge value is to compute the percentage of signers that received money or worked on projects that received money from GAFAM&friends (e.g. GSoC, sponsorship and so on).

This is not something you can automate, but I think that once the signatures will settle it could have a huge sociological (and journalistic) value.

@purplesyringa
Copy link
Collaborator

We could probably collect anonymous-ish data, like the percentage of corporate users here vs there and such, though not names and don't build a model. Some simple graphs would be nice to look at and wouldn't harm anyone's privacy.

@nukeop
Copy link
Member

nukeop commented Mar 25, 2021

Organizations:

FSF: 69
Open Source Initiative: 8
EFF: 6
Collabora: 45
GitHub: 14
Gitlab: 1
Mozilla: 24
Tor: 8
Google: 3
Microsoft: 2
Red Hat: 8

Distro:

Debian: 98
Fedora: 39
Ubuntu: 23
Arch: 23
OpenSUSE: 11
Gentoo: 3
Void: 1
OpenBSD: 1
FreeBSD: 1

DE/WM:

GNOME: 60
KDE: 13
xfce: 0
cinnamon: 1
i3/dwm/sway: 0

Others:

Guix: 4
Libreoffice: 3
Krita: 4

@purplesyringa
Copy link
Collaborator

Is that from our repo or their repo?

@shenlebantongying
Copy link
Collaborator

Debian is holding a vote on general solution on the letter.

OMG, Debian is corrupted by a small group of people.

General resolution: ratify https://github.com/rms-open-letter/rms-open-letter.github.io
Willingness to share a position statement?

@KOLANICH
Copy link
Contributor Author

We could probably collect anonymous-ish data, like the percentage of corporate users here vs there and such, though not names and don't build a model. Some simple graphs would be nice to look at and wouldn't harm anyone's privacy.

To collect "anonymous-ish" data you need to collect non-anonymous one first. And I see no issue in processing non-anonymous data: it is already public data

@purplesyringa
Copy link
Collaborator

My point is don't list the names of the supporters to blame or something like that. This would give SJW a reason to call out on our behavior.

@nukeop
Copy link
Member

nukeop commented Mar 25, 2021

Is that from our repo or their repo?

it's from open-letter

@jordigh

This comment was marked as abuse.

@purplesyringa
Copy link
Collaborator

purplesyringa commented Mar 25, 2021

the counterletter is being heavily promoted in Russia

Not really 'promoted', we just have a site that every Russian developer reads--Habr, and the letter was posted there. I think no other region has a similar site, hence the bias.

@shenlebantongying
Copy link
Collaborator

shenlebantongying commented Mar 25, 2021

I don't think the bias is that extreme. If you check one by one, there are ppl from all over the world, including friends from Asian, Australia and Africa. Believe it or not, there are free software activities on the third world too.

Besides, due to the pressure from "popular opinion", I think a lot of people actually cannot support us. They might lose job for signing our letter. If someone work for RedHat/SUSE and sign us, then they will probably get fired the same reason as RMS. Oops, this is kind of feel like playing against big crops :)

Even though a lot of us don't work for big companies, we are still users and developers for free software.

@nukeop
Copy link
Member

nukeop commented Mar 25, 2021

If hackernews didn't nip the post about this letter in the bud, we'd be at 10k signatures right now easily

@jordigh

This comment was marked as abuse.

@M-i-k-o-t-o
Copy link

It's also absolutely obvious that the counterletter is being heavily promoted in Russia and 4chan, hence why almost all of the names in the counterletter are Russian. No need to datamine to figure that out either; you can see it at a glance.

Uh oh, beware of the evil Russian hackers!!

Seriously... Please cool it with your racism.

@jordigh

This comment was marked as abuse.

@RealJTG
Copy link
Contributor

RealJTG commented Mar 26, 2021

the counterletter is being heavily promoted in Russia

Not really 'promoted', we just have a site that every Russian developer reads--Habr, and the letter was posted there. I think no other region has a similar site, hence the bias.

There is quite popular Hacker News, but submission about RMS support letter got [flagged] tag for a some reason, so it gets a penalty and drops in the order and nobody actually could see it, then it probably will disappear https://news.ycombinator.com/item?id=26565107

the counterletter is being heavily promoted in Russia

I'd say it's rather counterletter is being heavily dis-promoted in US.

@purplesyringa
Copy link
Collaborator

There is quite popular Hacker News, but submission about RMS support letter got [flagged] tag for a some reason, so it gets a penalty and drops in the order and nobody actually could see it, then it probably will disappear https://news.ycombinator.com/item?id=26565107

Oh wow! I posted the link to the support letter on Hackernews too and got flagged too. What a coincidence!

@jordigh

This comment was marked as abuse.

@shenlebantongying
Copy link
Collaborator

shenlebantongying commented Mar 27, 2021

Some outsiders criticize that a few of our signers' account is new or don't have much activity.

We have the same number of signers as them now. However, we have 3.1k PR and 2.7K forks, and the open letters only have 2.3 PR and 2K forks.

This may imply that a lot of their signers comes from email and added in bulk. That's interesting.

@jordigh

This comment was marked as abuse.

@purplesyringa
Copy link
Collaborator

I stand for clearness. All signatures we receive have a public source: either it's a comment on a public issue, or a PR, or a email that we publish to #3105. rms-open-letter does not publish sources of bulk signatures.

@Alessandro-Barbieri
Copy link
Contributor

Do you want a takedown because of violating GDPR?

@pizdjuk
Copy link

pizdjuk commented Mar 27, 2021

Hm?

There's nothing evil about Russians, but if you want to figure out who is signing each letter, you don't need to do anything but quickly look at the list of names. It's super-obvious. Or you can even do something as simple as what /g/ did and just grep the original letter. Many of the affiliations are there. I don't know why the counterletter doesn't list affiliations, but almost all of the names are obviously Russian, at least initially.

There has been some outreach outside Russia now, but the counterletter is still mostly Russia.

not Russia, but russian speaking. There is a great difference.

@kchanqvq
Copy link
Contributor

kchanqvq commented Apr 1, 2021

I stand for clearness. All signatures we receive have a public source: either it's a comment on a public issue, or a PR, or a email that we publish to #3105. rms-open-letter does not publish sources of bulk signatures.

I'm maintaining a (best-effort) list of GitHub account signing the against letter here. https://github.com/BlueFlo0d/fashy-detector/blob/main/github-users.txt

@KOLANICH
Copy link
Contributor Author

KOLANICH commented Apr 1, 2021

@BlueFlo0d, this list may be completely useless. Why? Because the only reliable info on the anti-RMS letter can be got only using GitHub. The rest of info mined from the list can be completely bullshit and misinformataion, and we shouldn't trust it and shouldn't relay it. In fact it contained quite some obvious (for the ones knowing Russian) trolling names of non-existent people, like literally "a girl with penis" from "asshole-of-a-homosexulaist-labs" and Vlad(imir)Len(in)a (yes some people were really named in favour of Lenin) daughter-of-sucking-(gerund grammatically)-of-dicks (it is patronim (though may be a matronim in extremily rare circumstances), not a surname, in Russia the respectful way to call a person (especially the one who is older or has a higher status in the situation) is <first name> <patronim>) which at least shows that they haven't done basic checks when acdepting signatures, even if the anti-RMS letter maintainers recorded the signatures faithfully, if anyone can send there bullshit without any difficulty meant that the we cannot even estimate the amount of bullshit there, it may be 1 record, or it may be the whole base of signatories accepted via email.

@kchanqvq
Copy link
Contributor

kchanqvq commented Apr 1, 2021

@BlueFlo0d, this list may be completely useless. Why? Because the only reliable info on the anti-RMS letter can be got only using GitHub. The rest of info mined from the list can be completely bullshit and misinformataion, and we shouldn't trust it and shouldn't relay it. In fact it contained quite some obvious (for the ones knowing Russian) trolling names of non-existent people, like literally "a girl with penis" from "asshole-of-a-homosexulaist-labs" and Vlad(imir)Len(in)a (yes some people were really named in favour of Lenin) daughter-of-sucking-(gerund grammatically)-of-dicks (it is patronim (though may be a matronim in extremily rare circumstances), not a surname, in Russia the respectful way to call a person (especially the one who is older or has a higher status in the situation) is ) which at least shows that they haven't done basic checks when acdepting signatures, even if the anti-RMS letter maintainers recorded the signatures faithfully, if anyone can send there bullshit without any difficulty meant that the we cannot even estimate the amount of bullshit there, it may be 1 record, or it may be the whole base of signatories accepted via email.

Yes I'm aware of that, therefore I'm retrieving information only from commit history and repository metadata.
Currently no name is ever used.

The name maybe used to improve confident level of the mined GitHub IDs though. We know that name can be non sense, but if it's the same name as the GitHub account profile says (and if the GitHub account has sufficient activity), then it's a good indicator that we get a genuine entry.

@kchanqvq
Copy link
Contributor

kchanqvq commented Apr 1, 2021

Also I only aim for a snapshot of the data before they closed GitHub PR and switch to email completely.
I agree that now-added new entries may contain little to no information.

@jordigh

This comment was marked as abuse.

@shenlebantongying

This comment has been minimized.

@jordigh

This comment was marked as abuse.

@jordigh

This comment was marked as abuse.

@nukeop
Copy link
Member

nukeop commented Apr 1, 2021

Why does naming them always cause them to lash out?

@jordigh

This comment was marked as abuse.

@KOLANICH
Copy link
Contributor Author

KOLANICH commented Apr 2, 2021

@nukeop, I don't think he said something deserving blocking. I think he should be unblocked.

@jordigh,

Why is the counterletter more reliable?

  1. at least our transparency already makes it more reliable. One can check that the people have really sent PRs to us

  2. All you have to do is create a Github account.

IMHO only votes of people with FOSS contributions should be counted. Making a PR that is merged into real projects is not easy. Making more than 1 merged PR is even more difficult.

That's pretty easy to do and easy to fake too.

Not very easy. Even registering a email without telling it a phone number is not easy in 2021.

And yeah, some mistakes happen, like someone who didn't know Russian didn't realise that the Russian name was a troll.

It is irrelevant. The name just makes it obvious that anti-RMS letter accepts "signatures" without enough checks. This automatically means that whole signatures accepted this way are not valid. At least until, the contrary is proved.

I mean, it's not a big surprise that Russia really hates the original letter

You shouldn't tell on behalf of whole Russia. Most of population of Russian Federation and Russia just don't give a f**k about what is happening with Stallman. They don't even know who Stallman is.

so of course Russia is trolling it and then feeling so happy that Russia managed to get mistakes in before they were noticed.

It shows that the anti-rms letter doesn't have sufficient checks. If opponents of the anti-rms letter have managed to send some obviously and explicitly fake signatures without being noticed, how many ones the proponents of RMS removal directly interested in making the results be as high as possible could have sent?

@nukeop
Copy link
Member

nukeop commented Apr 2, 2021

@nukeop, I don't think he said something deserving blocking. I think he should be unblocked.

He only came here to troll, not participate in the discussion in a meaningful way, so it's just noise.

@6r1d
Copy link
Member

6r1d commented Apr 2, 2021

@KOLANICH, while I agree with your arguments, I don't think it's time to talk to people who come and shout their point of view in our ears. I already had some spam in many places, was told to avoid some, etc.
If someone comes, listens to our arguments and thinks, it's one thing, but that's what generally leads to people helping us or staying neutral, does it not?

@KOLANICH
Copy link
Contributor Author

KOLANICH commented Apr 2, 2021

Even if he has anti-stallman point, it doesn't me that he should be blocked.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

14 participants