How Git shows the patriarchal nature of the software industry

Git is a Distributed Source Control Manager (DSCM), a piece of software that enables software developers to keep track of the code changes in a project progressively over time without relying on a centralised database. It was created by Linus Torvalds, more famous for his other major creation, Linux. It was a novel Open Source alternative to the existing SCMs, allowing developers from all over the world to collaborate in a new, dynamic fashion; even when they were unable to access the internet. As a result, it and other DSCMs have become the dominant tools of this nature in Open Source projects.

Every time a developer wants to share her bug fix or improvement with others in her team, she makes a commit which is a description of the differences between it and the last commit. This commit is labelled with a short comment describing in plain English the changes made for the benefit of the other developers. It is also labelled with the name and email address of the developer who made the commit.

This is where we start running in to problems. If you ever want to change your name on your commits, say you have been married, divorced or chosen a name more congruent with your gender identity, you have to make what is called a “destructive action”.

To try and put it simply, the author of a commit is tied in to the identity of the commit itself. If you change the author, it’s treated as an entirely new commit. Anyone who has grabbed a copy of your original commit and made subsequent changes on top of it finds themselves orphaned from the history of the project. To use a crude analogy, it’s like you rip the trunk of a tree out, while the branches are magically left hanging in the air, connected to nothing and isolated.

In practice this would be almost impossible for you to do. Even if you controlled the project, your contributors would have to rewrite all the changes they had made, potentially taking hundreds of hours depending on the size of the project.

Of course this problem has never been significant for the vast majority of the software developer community. Even in today’s day and age, the majority of people who get a name change are going to be married/divorced women. It has probably never crossed the minds of the creators of Git that this could be such a huge problem to a non-cis or non-male person.

There are a few workarounds to mitigate this problem, the most obvious being to grin and bear it. Unfortunately this isn’t an option for many trans* software developers who open themselves up to discrimination if they are forced to reveal their trans* status by disclosing their assigned at birth name. Women who have gone through divorce may find their surname from their marriage to be upsetting and want to disassociate themselves from it.

Another option is to use a handle/pseudonym. This is also fraught with problems. Even if you are able to predict your one day need for it, if you are developing in a professional capacity this can come across as unprofessional to many current and future employers.

This has personally affected me. I am now having to distance myself from a large and popular Open Source project that I co-founded and which was at least partly responsible for me getting my current job. Just another example of how a seemingly small, normative assumption can have profound negative effects on minorities.

  • Pingback: Quick hit: “How Git shows the patriarchal nature of the software industry” | Geek Feminism Blog

  • thought

    I can certainly understand why people want to change online records of themselves. I have some very personal stuff on myself up online, not sure if it’s more or less of an issue than being trans (being the thing that I am, and not being trans, I wouldn’t try to compare – they are orthogonal and it would be fair to either side to try to contrast them). Regardless, there is no way I can remove that information from the net.

    This isn’t an issue with git or the software industry or with partriarchy though. It’s that information on the internet, once uploaded, can’t be easily modified in all its copies. If someone takes an embarrassing picture of themselves and posts it, there’s generally no way to “remove it from the web”. Worse, trying to remove all the copies can lead to the opposite result, that’s the Streisand effect. Likewise copyrighted material, stuff the government wants to censor, and yes, also names in git commits.

    (Git does let you locally rewrite history. The issue, as you say, is with updating all the other copies on the net. But again, that’s the usual problem of removing all copies of information once it’s posted publicly.)

    Whether we want it to be or not, that’s the world we live in – stuff posted online becomes part of the immutable public record. It can be more troubling for some people (and again, I feel this personally myself), but it’s a fact. Blaming git or the software industry or patriarchy won’t help. The only way it *could* be possible to remove all copies of information that was ever posted is if somebody had that kind of power. Would any of us want to live in a world where a government or a corporation could push a button and remove all memory of something?

    • Megan

      Resigning yourself to it because “it’s the internet” is not an option for many trans* people. It is entirely legal in many first world countries to discriminate on the basis of gender identity, so insulating your past from your professional portfolio is sometimes key to whether you eat or have a home.

      Trying to make this into a wider issue about removing information from the internet is completely disingenuous. It is a correction, not a deletion.

      • thought

        I was hoping for honest discussion here, but since you’re going on the attack (calling me “completely disingenuous”) – I am out.

        Regardless, good luck, as I said before I have a personal understanding of the situation and can sympathize.

      • http://gravatar.com/decklin Decklin

        Unfortunately, I think it really does boil down to a problem of removing information from the internet. Ignore the design of git or the technical details of editing history for the moment.

        At some point, if you’re recording and publishing source code changes, and Alice implements feature X, you’re going to publish the fact that Alice added code to do X at time T. If at a later point, Alice’s name changes to Bob, and you publish the fact that Bob wrote code to do X at time T, you have told (or at least strongly implied to) everyone that “Bob” is equivalent to “Alice”.

        I’m not sure what you want to remediate this. A client that automatically/silently replaces altered commits when pulling? Won’t stop a malicious adversary from keeping good backups and noticing the information leak. What if someone has a public mirror and doesn’t automatically update it? (GitHub has a helpful button to create those…)

        The problem is the same if you have a model that stores changes instead of contents, like Darcs. Even if the “Bob” patch is morally equivalent to the “Alice” patch (unlike in Git), no one else is required to forget that the “Alice” patch existed, even if their client helpfully garbage-collects it for them.

        • Megan

          Sign commits with a digital key->have a centralised database mapping signatures to an identity->only pull down that identity information live->no identity leaks revealed to future employers.

          A malicious human being is always going to be able to subvert this if they knew you in your previous identity, there is little technology can do to fix someone’s memory. However that doesn’t mean we shouldn’t try to change what technology can change. Especially with the repercussions of throwing this in the “too hard” basket. We’re all busy talking about the technology here, don’t forget the people behind the technology.

          • http://gravatar.com/decklin Decklin

            A centralized database would do more harm than help (see my other comment).

            You are essentially asking people to give up distributed, free-as-in-freedom, proven version control (replace it with something that must “phone home” and use a different object model) in order to put a small road block in the way of something that can “always be subverted anyway” (i.e. a non-solution). I sympathize with your motivations, but do you see why this won’t fly with the intended audience?

            I agree that this is not really a technical issue. There are plenty of other careers where reputation and social capital are more important than a list of qualifications, and changing one’s identity is equally hard there.

            • Megan

              It’s not a non-solution. In the majority of situations, if you’re presenting your portfolio of work through Github or something similar, malicious people have no chance to reveal your identity. All anyone would see would be the commits you’ve done under the name you’ve attached to them.

            • vasi

              I think you’re making a jump too far. The “centralized database” isn’t a centralized repository like SVN or CVS. It’s just a way of connecting an ID to a person, the actual repository is still distributed. Think of it like WHOIS or a GPG key-server.

              All we’d be doing is changing what metadata attaches to each git commit. Right now a commit says “This change was made by a person who claims to be named Jane Doe, and claims to have the email address jane AT example DOT com.” Instead, it should just say “This change was made by a person who holds the private key with which it is signed.” Just imagine that git worked exactly as it did now, but names and emails are replaced with an opaque identifier, would that really be “asking people to give up distributed…version control”?

              It’s not even strictly speaking necessary that the ID Database be centralized. There could perhaps be multiple such databases, maybe each project would host one. Or maybe a distributed hash table could be used. Some committers might prefer to remain pseudonymous and not exist in any such database.

              • http://twitter.com/maradydd Meredith L Patterson (@maradydd)

                Okay, but keyservers don’t function as a mapping from (keyserver-managed) identifiers to keys. A public key includes a pair as the user-id along with the modulus and public exponent, and that’s stored on the keyserver. Keys can of course be updated on the server (adding or removing a uid, for instance), but as with a DVCS, there’s not much you can do about people who have and preserve an earlier copy of a key.

              • Decklin

                For our purposes here, something schlepped around with the repo itself, but mutable (like refs) would probably work. `cat .git/authors/$SOME_SHA1`, get name+email+pubkey.

                I just think all it buys is removing some denormalization from the object database; as I have argued elsewhere in the thread, I dont think it prevents someone from being outed or discriminated against.

          • http://gravatar.com/djnemec djnemec

            If you want something like that, switch to a centralized source control like SVN. By design, git is *decentralized* so there’s no way to put a centralized mapping database into it.

            • http://twitter.com/jmtd Jon Dowland (@jmtd)

              How would you address this problem with SVN?

          • http://netcrusher88.wordpress.com/ Joe

            You know that what you’re talking about is using pseudonyms, right? All you’re adding to what already exists in git is a trusted third party to resolve pseudonyms to real names, which… well, I suppose you could use an email account? Use an email as a pseudonym and send emails from it to prove who you are.

          • http://gravatar.com/zacharyalexstern Zachary

            But what about this as a solution:

            Public/private key signing.

            Distribute the public keys of all committers as part of the repository, e.g. in the .git repository.

            Then, create a git command of some sort that lets you go back and rename your old commits, but make that command only work if you have the appropriate private key, that matches the pubic key those commits were originally signed with.

            There you go. It’s still decentralized, and trans* people can safely go in and change the names on their old commits.

            I wish I had the software development chops to implement it myself :(

            • http://twitter.com/DaveWilkinsonII wilkie (@DaveWilkinsonII)

              Local repositories that other people have would have to be updated as well. This is noticeable and sometimes unexpected. What that means, is that there is undue ceremony about the process. Since this is true as a social condition mostly for non-cis-males, this is exclusionary and non-intersectional of a solution.

              What you want is a solution that understands that the changing of a name as something that may happen, to anybody, equally, at any time and that this change is about associating one’s past actions with one’s present self without the need for agreement or acknowledgement of everybody involved in any way.

              I certainly think key signing is necessary, and this is a good head-start. With a distributed system, indirection is probably the better means of coming to a compromise, though. My other comment describes such a thing. It differs from yours since it comes with a setup cost that you perform once and then a tiny, insignificant cost to change your name, as opposed to a arbitrary cost to change all instances of the name at the time you change it.

            • Frank Ch. Eigler

              But public keys are themselves associated with names. Then *that* association can be monitored for changes to lose privacy in name changes.

              • http://twitter.com/DaveWilkinsonII wilkie (@DaveWilkinsonII)

                Yeah. That’s true. Such a change could be monitored like that, yes. Read my other responses.

                What is interesting is that such a person would be able to say ‘Hey! That person changed their name, everybody!” but have no verifiable proof. The burden would be on them to convince somebody else that such a thing happened. Everybody else in the world would go to the identity service and see one name and no history about the change. It’d be essentially conspiracy logic. They’d have evidence that is as good as fabricated.

                Since it deals with human memory at the end of the day, this is very very likely the best we can do.

  • http://interi.org maiki

    Could you please clarify something for me? I think I am missing a nuance, and I want to make sure I understand.

    Is the issue a matter of continuity? As in, if you change your name, and start submitting new commits with it, the history of the project will then have a record of a name change?

    I don’t know much about how git handles the id. I presume that one may change eir name or e-mail address and it see that person as separate from the prior id.

    It is an interesting problem, and a lot of the language surrounding identity in a project points to how people use it. For instance, there is “blame”, which seems negative to me.

    My first thought was to keep an accurate record, so that how each person identified as they were committing is what stays in the log, but I think that is what you are talking about. Is it a matter of those tell-tale signs that can cause problems for folks who present a different id presently?

    Thanks for bringing this up, I read about it on the GF blog. I am very sensitive to gender fields in software, so this really hit me hard, and if it is a systemic issue I want to make sure as many people as possible are aware of it as we continue to develop software. ^_^

    • Megan

      If you simply start using your new name, the past work you have done is still recorded with your old name. Presenting it as your work is going to reveal your old identity to those who see it, potentially having catastrophic effects.

    • http://gravatar.com/djnemec djnemec

      The authors of the “blame” command know full well its negative connotations. Its original use is for exactly that — if a given commit breaks the build, you want to know who to “blame” for it. Of course it’s also in there somewhat sarcastically. I don’t think the developers intend for the person being blamed to be punished in some way, it’s just how the language evolved.

  • http://gravatar.com/empact Ben Woosley

    Is it valid to criticize a system for faithfully representing the past? Should we aspire to erase all records of our past selves, or work toward a society which respects past and present selves both?

    Found your post via: http://geekfeminism.org/2012/09/29/quick-hit-how-git-shows-the-patriarchal-nature-of-the-software-industry/

    • Megan

      When that past is not accurate or dangerous for its owner, yes I think it is valid to criticise it. Especially when such little thought has been put into the repercussions of the rigid adherence to the “true history”

      • http://matthewjudebrown.com Matthew Jude Brown

        Practically speaking, Ben, it’s not an either/or. Trans people are of course working towards a society that is accepting of what they are, and won’t freak out when it’s discovered someone changed their gender presentation.

        At the same time, they have to live in the here-and-now. Society doesn’t change overnight for the wanting; it changes generationally. Campaign all you like for societal change, but you won’t get it in time to fix today.

        There’s also the question, here, of what’s the actual history. The changes! Not the then-correct contact details of the person who made the change! What good does it have to encode in a unchangeable form details that may, now, be plain WRONG? Surely the point of user information in a DSCM is (a) to tie all changes by the same person together, and (b) to be able to identify the author and contact them, if you wish to and they wish to be contactable.

        Does it do any good to have anyone’s changes associated with a name they no longer use or an email address they no longer have?

    • http://gravatar.com/kalital Kali Tal

      I think it depends on what purpose is served by “faithfully representing the past.” For example, when I changed my name in the U.S., all my official documents were also changed (passport and birth certificate), and there was no representation of my previous name on either document. But I don’t think you can accuse my U.S. documents of “falsifying the past.” I had to swear that I was not changing my name for fraudulent reasons, publish a notice of the intent to change my name in a local newspaper, and go before a judge. But once having done so, my identity of the present was retroactively determined to be my identity of the past — my identity was continuous; only my name had changed.

      The point of this is not to erase all records of oneself, but to retain them, in usable format. In the current system, the past of a person with a name change is effectively erased. When a trans person has to abandon years of work after a name change, that’s hardly a “faithful representation” of their past. I think we can agree that it’s the work that’s important, and that — as long as identity is continuous — the name is simply a marker of convenience. You create a dichotomy where none exists. Changing the name system is part of working towards a society that respects our past and present selves.

  • http://gravatar.com/djnemec djnemec

    There are very, very strong technical reasons behind the identity of commits. The identity of a commit it built through a mathematical function that turns your commit (or more accurately, the bytes of the commit files stored on your hard drive) into a globally unique identifier. This way, even if someone else made the exact same edits to the exact same previous commit it’s easy to tell the two apart. It’s mostly a security mechanism.

    http://www.sbf5.com/~cduan/technical/git/git-1.shtml#commit-objects

    Git makes it very, very clear that there is no “one truth” repository that everyone syncs up with and that makes it virtually impossible to change “history” once it’s been made. This is by design.

    If a solution were created that allowed authors under different names to be identified as the same author, how would one stop people from “stealing” the commits of others by migrating commits from a contributor to themselves? There is no governing body in Git that can manage the mappings.

    It’s about as futile as publishing a book under one name and then coming back years later after changing your name and expecting everyone’s copies to magically change to your new name.

    • Megan

      Could this not be mitigated by changing the name/email of an author/commiter to a piece of pure metadata, not tied to the identity of the commit and instead using a digital signature like a PGP key? This isn’t a new idea and is ultimately more secure.

      • Zygo

        You still have to provide a verifiable audit trail of identity changes. It’s basically the same problem, even worse: somewhere, you’d have to wrap a SHA1 signature around “Jane Smith became John Smith in 2008″ and publish it with the git repo.

        • Megan

          No, you’re not understanding the solution. Making the name/email easily changeable on the commit without changing the hash would not require an audit trail. If someone changes the name on a commit to try and claim it as their own, you can easily verify it with the digital signature.

        • Zygo

          Oh never mind, I just figured out how to solve that problem with public key IDs. It’s not necessary to hash a transition message, since anyone who knows the private key will be able to prove that their _current_ ID is the same as the one recorded in Git.

          It would still be possible to observe the Jane to John transition, but you’d have to be watching for PGP key ID changes. On the other hand, I’d expect open source projects to be doing nothing less, in case someone’s private key gets stolen and the thief tries to impersonate someone privileged. Also, everyone involved in the project would be aggressively and persistently caching the ID strings if they want a gitk window to open in less than an hour.

          There’s no change required to Git plumbing to make it work. The existing email field could be used to store the key ID and the address of a HKP server, and porcelain could fetch the public key data for you when you browse commits.

          I’m not sure if all that’s worth the cost of implementing it. The little privacy it affords could be circumvented pretty easily, and it would make Git slower for everyone. Pseudonymous or anonymous email ID’s are a much better solution to this problem, even with all the warts. These would also solve related problems like the email addresses and names embedded in mailing list archives.

      • http://gravatar.com/decklin Decklin

        Where does this meta-metadata go? You could, I suppose, not store the author in the commit (so, hash of tree+parents+time only) and use something like git notes or signed tags to attach a name. (Not exactly like notes/tags, but this is no longer exactly git either…)

        Okay, but now what happens if you change your name and someone else has notes identifying your old name as the author of commit X (you can’t force them to delete their old files) and notes identifying your new name as author of commit X? If connecting you to your old name is a problem, this doesn’t help and might even make it easier for someone to “out” you.

        • Megan

          Unfortunately it would probably need a centralised identity database. You don’t really need to know a person’s name/email unless you need to contact them about a commit, which usually requires an internet connection these days. But it’s not really my job to decide on a solution, I’m just making a less than formal bug report here. Trying to promote discussion on the underlying assumptions and issues, not argue about the design of Git.

          • http://gravatar.com/decklin Decklin

            But a centralized identity database would connect your old name to your new name. All someone would have to do for “lulz” is poll it once in a while and see when one name points to an ID that a different name used to point to.

            • Megan

              No it wouldn’t, see my other reply to you.

        • Zygo

          If you put a GPG key fingerprint into the email address field, then you could try to get the key from any HKP keyserver. Protocols and servers for that already exist. You could suggest a HKP server after the ‘@’ sign in the address part. The string that Git would display would be found in the latest self-signed non-revoked signature on the key. Git could also integrate PKI tools so you don’t have to go download GnuPG and learn how to use it before you can use Git.

          You can’t delete a PGP key signature from a HKP server, only sign a revocation certificate and upload it, so the old name is still going to be there to find somewhere (like in the HKP server’s data stream to its mirror servers). There will be cryptographic signatures outside of Git explicitly tying the old and new names together (unless you abandon the old ID and start using a new one, which you can do right now with ‘git config’). You could avoid this by running your own HKP server where you can delete any signature you like, but that’s a pretty big burden for most users. It requires the same predictive foresight as an ordinary pseudonym, and fails if anyone uploads your key to a HKP server you don’t control–which is exactly what PGP tools do automatically from time to time. The project could help there–if they’ve got Git hosting, HKP hosting might not be a big additional cost.

          Also, what happens to your online Git persona if you lose your private key, or someone steals it? Those are new bad things that Git doesn’t have now.

          Moving author identity outside of Git is one option, but it’s expensive, brittle, and doesn’t solve the outing problem. You could remove author identity entirely (which you can do right now if you have all contributors ‘git config’ the same address or use a random address for each commit), but that would break Git for projects that rely on being able to recognize distinct contributors. You could weaken author identity integrity to the point where it can be changed silently and retroactively, which is just as bad if not much worse. You could use pseudonyms which identify you only in the context of a project and cannot (trivially) be correlated with personal information. You could use bare hashes as author ID’s, but that’s a special case of project-specific pseudonyms. You could steganographically encode your identity for each commit, so you can later choose to reveal which commits are yours, but I fail to see the relevant use case.

          I think that’s an exhaustive list of options. I’d love to be wrong.

          This is a political problem, not a technical one, so it’s no surprise that solutions based on changing the software seem to suck.

  • http://twitter.com/galtenberg C Galtenberg (@galtenberg)

    This is simply muddled thinking. You have an issue with names, and names being applied to work. If you published books for a living, you’d take your frustration out on the idea of copyright, which would equally be a red herring.

    Scapegoating git is just attention trolling. If you want git commit name changes or PGP identity as a feature, open a pull request and work your case.

    • Megan

      This isn’t analogous to book publishing, digital data is easily changed and corrected.

      Drawing attention to normative assumptions made by a majority cis male culture is important in changing that culture so it isn’t actively harmful to minorities, whether they’re cis women or trans* people. I’m addressing the naive thinking behind the initial decision that exists widely in the culture, not scapegoating Git directly. It is just a good example of it.

      • Zygo

        The essential purpose of Git’s plumbing is to make changing digital data as hard as possible. Git was designed to provide the kind of clear, reliable, robust, distributed, and irrepudiable audit trail of “who committed what, when” that could be used in court to defend against the next SCO lawsuit. I doubt it’s naive thinking–they probably considered the idea of malleable identity at some point, and explicitly rejected it as counterproductive to mission critical goals.

        • Zygo

          There is a Git feature called ‘filter-branch’ which also arose as a result of the SCO lawsuit. It can remove a copyrighted file or change a name in minutes. It is a figurative nuclear weapon, designed to be deployed on the losing end of a court order when the alternative is to destroy the project completely. The project also has to remember to never merge a branch from a repo of the old project, since that would import all the old history with the wrong names or evil files.

  • http://twitter.com/galtenberg C Galtenberg (@galtenberg)

    So a scheme for attribution built for people who need to completely disavow past identities, thereby effectively canceling attribution – that’s what should become the norm, is what I’m hearing you say.

    • Megan

      I don’t see how that would cancel attribution.

  • http://gravatar.com/decklin Decklin

    I seem to be unable to continue other threads (maybe they are nested too deeply).

    If your only concern is putting your own work on GitHub, then why don’t you just rewrite all the old commits to correct your name?

    • Megan

      Because as I have said in the post, that destructively changes the repository and anyone who contributes would have to patch their changes on to my new version of the repository. For the size of the projects I have contributed to, that is effectively impossible. The small personal projects, I have changed using `git filter-branch` but they are a tiny part of the entire body of my work.

      • http://gravatar.com/decklin Decklin

        I think your characterization of rewriting as “destructive” in the post is overstating it. Your collaborators will have to `reset` their tracking branches and `rebase` any unpublished topic branches; no patch IDs change so the rebase knows what it doesn’t have to include. That’s it.

        If I had a magic box that put everyone who had ever cloned your repo to sleep, logged into all of their machines, ran those commands, then `gc`d the old objects (with your old name on them) away, updated their backup tapes, woke them up again, and erased their memory of the incident, would that accomplish what you want? If you are proposing a technical solution, how does it differ in effect from this? How does a technical solution address the problem of having to distance yourself from a particular open source project? If you want an non-technical solution, why are we talking about particular git commands, patch IDs, etc?

        • Megan

          I think your characterization of rewriting as “destructive” in the post is overstating it.

          No, no it isn’t. `git filter-branch` changes the hash of every single commit in the entire repository. It is essentially an entirely new repository.

          • http://gravatar.com/decklin Decklin

            If you really want to be pedantic, it creates new objects, then changes a ref. Objects are immutable.

            Which is the crux of the issue, right? In the design of Git, objects must be immutable. Therefore, publishing them is sending information out into the world which can’t be taken back. Including author information in commit objects is isomorphic to *not* including author information in commits but also publishing information that links authors and objects (via IDs of the authors, or IDs of the objects). Even if the linkage information is not permanent, the objects still are. So, there is no way to completely redact authorship information.

            Are you saying objects should be mutable, or is something else in that argument that you don’t think follows?

      • Joel Salomon

        Is there a filter-branch operation that will also change names within files; e.g., for embedded copyright notices or AUTHORS files? (“I’m impressed with this project you wrote, Alice; but why does the copyright read Bob?”) But not to let the perfect be the enemy of the good…

        Dumping to git-fast-import format, running search-and-replace Bob to Alice, may be a surer thing. (And you’ll need to adjust blobs from using length markers to the delineated format, or use ESR’s reposurgeon.) And this definitely touches more than metadata and can only be done on repos you own. But if you’re rewriting history, do it right.

        • Joel Salomon

          Cancel that bit about reposurgeon; it (currently, at least) only handles metadata.

  • http://creativepony.com/ Bluebie

    After thinking about this for a while, I agree with the direction of this article – git is in the wrong, and should be fixed at a software level to be fair to all individuals. How?

    Change the field name from ‘user.name’ to ‘user.pseudonym’, patch software to alias them across, add warning message indicating deprecation. If people want to use their real names, they can, but people shouldn’t feel obligated to do so where not otherwise required to by contract or law.

    The whole issue here is that git software encourages people to commit their present day identities in to immutable records for no clear purpose, and that arbitrary design decision harms a minority group. It’s fair to say these groups can’t reasonably foresee the need for a name change and that this is a software bug in conflict with humanity. For technological and cultural reasons git requires immutable identity. It shouldn’t encourage people to use forms of identity which are harmful to minorities.

    • vasi

      Props for thinking for awhile! My first reaction to the article was “OMG WHAT ABOUT THE HISTORIES?!?” but after sitting down and thinking for a bit, it felt obvious that putting Real Name in commits isn’t any more necessary than Favourite Frozen Dessert.

      I think it might be possible to use git hooks and commit signing, maybe together with existing GPG keyservers, to implement an acceptable solution—at least for new projects that are interested in pseudonymous committers. If anybody feels like doing this, I’d be glad to help.

    • http://twitter.com/sethish Seth Woodworth (@sethish)

      This is the take I like the best. I think that it points the finger of blame on the culture around git instead of the technical implementation of immutable histories. I don’t think that user.name need actually be deprecated as much as the expectation be that people use genderless pseudonyms. But does that just take us back to the 4chan problem, “there are no girls on the internet”, assuming that everyone you interact with online is a white cis-gendered male?

  • spacekitteh

    Would tying git commits to OpenID’s fix this? (iirc openid names can be changed, right?)

  • http://twitter.com/DaveWilkinsonII wilkie (@DaveWilkinsonII)

    I’ve worked on distributed systems and applications that handle identities. There is nothing worse, technically and socially, than using a name as *direct* identification. Names change, for whatever reason.

    There is a solution that is better than git provides, because git was indeed short-sighted about how its metadata would be used. The change in authorship *should* be destructive (it’s complete technical chaos if they aren’t), so we have to come up with a scheme that allows a write-once identifier that doesn’t need to change to reflect changes publicly seen through that identity.

    The theoretical best is indirection which allows self-hosting. Have a centralized server that holds your identity and public key. That way, you can place a signature with each commit and can verify using that key that it was indeed you that wrote the commit. Attach also your display name, which can change at leisure. Place the url for this information into the author field of the commit.

    The centralized nature of the identity server is necessary. It’s not a problem. People can fork, commit, push, etc to a repo without having to talk to it. You only need it in the case where you want authorship information or to verify the commit (such as the one time you are merging those commits to ensure they are authored by the given person.) People may cache it, and there is nothing you can do about that, except apply an expiry (which you need for the public key anyway!) where all identity data will be purged and reacquired the next time it is needed.

    Look at Webfinger http://code.google.com/p/webfinger/ for an example of this type of idea. It expects a handle to be used to give to others, but you can use a url for something like git, since it is a machine and not a person, and we can teach git how to safely get the user information for humans to read in a trivial manner, which shouldn’t add any UI or UX complexity.

    One problem is the lack of a service that allows you to easily create and maintain such hosted identities and the technical obstacle of deploying your own.

    • http://gravatar.com/decklin Decklin

      But this still introduces a publicly addressable resource that can be observed resolving to one identity before and a different identity now.

      • vasi

        You’re right that it’s not absolute protection. If someone is really determined to cache the names of every committer and continually compare them to the current value to check for changes, there’s no way to stop them.

        But I wouldn’t let the perfect be the enemy of the good. I think the vast majority of people in the vast majority of cases will not go to the extra effort to track name changes, for no appreciable benefit to themselves. This differs from now, where anyone who views the history could become aware of a name change, even if they don’t even realize it’s a possibility.

        • Decklin

          I will offer as one datapoint: I already use a plugin/wrapper that displays changelogs for all automatic (i.e. “normal”, fast-forward) ref updates. I highly suspect that if a mutable indirection was added to commit author info, something similar would come into use.

          Consider the prejudices of the typical Git user: they are cis and male and therefore the most common need for something like this would be changing an email address. They would think “oh, that’s useful, now I know his email is xxx@yyy.zzz from now on if I have to write him”. They wouldn’t think someone might *not* want to publicize an identity change.

      • http://twitter.com/DaveWilkinsonII wilkie (@DaveWilkinsonII)

        Agreed. There is no way to prevent this. Mostly because you are trying to counteract a memory, and you can’t know for sure if somebody read it, remembered it, and noticed the change. So that’s impossible to correct outside of science-fiction movies, right?

        The malicious case would be, however, predicting the name change, and then polling for the name change until it happens. Once it happens, however, there is no way, outside of finding old cached history and somehow destroying it, to determine it was changed. You’d have to proactively check repeatedly. That case is impossible to solve, but completely impractical. It’s like saying keypair encryption is bad because you could break it in theory with luck or even brute force. It’s a true statement, however, it doesn’t mean anything until somebody asserts how probable that would be.

        Another angle is to relate this to normal bureaucracy. To get a name change in a typical US state, you have to get the permission of a judge. The technical solution I’ve proposed is morally better as it does not require announcement of the change nor forces the recording of such history. (although, some people may record it offline, as I mentioned) In a court system, name changes are public record, require notary, and require forced announcement in 2 publications. That is really terrible. And that’s kind of the system git has in place currently. The solution I listed solves these moral issues without losing the worth of the author field. (it strengthens it by using signing)

  • http://jayferd.us https://github.com/jayferd

    $ git config –global user.name https://github.com/$your_username

    • http://gravatar.com/decklin Decklin

      I really don’t think anyone here is unaware of how to do that.

  • http://blog.piechotka.com.pl Maciej Piechotka

    Why not use .mailmap files (which are at least described on git-blame manpage)? They look like solution designed to deal with this problem, If some git tools don’t respect them – it looks like bug in tools, not missing feature.

  • un1c0rn

    man git-shortlog, .mailmap section

    • Megan

      .mailmap only affects shortlog output, nothing else.

      • http://blog.piechotka.com.pl/ Maciej Piechotka

        It also affects git-blame or at least it is included it its man page.

        In any case – the simplest solution might be to file bugs against other tools to use mailmap and the problem will be solved (it is IMHO more elegant solution then changing history).

        • Megan

          Well I guess it’d be a start at least.

  • http://twitter.com/orodu Dana Jansens (@orodu)

    As someone who has come from a similar situation, and since you founded this unnamed open source project and should have some decent authority within the project still.. I hope that you’ll consider rewriting the git history to put your chosen name into all commits in the repository. Make people rebase onto a new tree.. it’s your history to maintain, and in time you might appreciate not orphaning your code.

  • http://kayateia.livejournal.com/141732.html Megan (another one :)

    I wrote a response to this on my LJ (linked in ‘website’). I don’t really have a technical solution for you, not that the Git developers would care if I did. :) But I think that this is really a part of a larger issue with people trying to assign unchanging identifiers (typically legal names) to people. Some people do it out of habit, but I think on a conscious level it’s a means of control.

    • Megan

      I really liked your comments about the larger issue on your LJ, it’s very much a fallacy that people like to hold dear that names are immutable.

  • John Bitme

    Not everything is patriarchy. No-one forced you to use your real name in commits, you made that choice, and now you have to deal with the consequences. No-one else can fix it for you.