The GitHub Blog: Engineering

Soft U2F - Software U2F authenticator for macOS

In an effort to increase the adoption of FIDO U2F second factor authentication, we're releasing Soft U2F—a software-based U2F authenticator for macOS.

Soft U2F currently works with Google Chrome and Opera's built-in U2F implementations, as well as with the U2F extensions for Safari and Firefox.

When a site loaded in a U2F-compatible browser attempts to register or authenticate with the software token, you'll see a notification asking you to accept or reject the request. You can experiment on Yubico's U2F demo site:

And as of today, you can download the Soft U2F installer and configure it for use with your GitHub account.

Interested in learning more? Head over to our Engineering Blog and read more about the motivations for the project plus the security considerations of hardware vs. software key storage.

Git LFS v2.2.0 is now available with the all new git-lfs-migrate command, making it easier than ever to start using Git LFS in your repository.

For example, if you've tried to push a large file to GitHub without LFS, you might have seen the following error:

$ git push origin master
# ...
remote: error: gh001: large files detected. you may want to try git large file storage - https://git-lfs.github.com.
remote: error: see http://git.io/iept8g for more information.
remote: error: file a.psd is 1.2 gb; this exceeds github's file size limit of 100.00 mb
to github.com:ttaylorr/demo.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to '[email protected]:ttaylorr/demo.git'

You can use the git lfs migrate info command to see which files are causing the push failure:

$ git lfs migrate info
*.psd   1.2 GB   27/27 files(s)  100%

Using the information above, you can determine which files to pluck out of your history and store in LFS:

$ git lfs migrate import --include="*.psd"
migrate: Sorting commits: ..., done
migrate: Rewriting commits: 100% (810/810), done
  master        f18bb746d44e8ea5065fc779bb1acdf3cdae7ed8 -> 35b0fe0a7bf3ae6952ec9584895a7fb6ebcd498b
migrate: Updating refs: ..., done

$ git push origin
Git LFS: (1 of 1 files) 1.2 GB / 1.2 GB
# ...
To github.com:ttaylorr/demo.git
 * [new branch]      master -> master

You can also configure the 'import' command to migrate specific filetypes, branches, and more. For a detailed overview, take a look at the man page.

This was a quick look at the migrate command available today in Git LFS v2.2.0. For more on the full release, check out the release notes.

The open source Git project has just released Git 2.13.0, with features and bugfixes from over 65 contributors. Before we dig into the new features, we have a brief security announcement.

For those running their own Git hosting server, Git 2.13 fixes a vulnerability in the git shell program in which an untrusted Git user can potentially run shell commands on a remote host. This only affects you if you're running a hosting server and have specifically configured git shell. If none of that makes sense to you, you're probably fine. See this announcement for more details. As neither GitHub.com nor GitHub Enterprise uses git shell, both are unaffected.

Phew. With that out of the way, let's get on to the fun stuff.

SHA-1 collision detection

Did I say fun? Oops, we're not there yet.

You may have heard that researchers recently found the first collision in SHA-1, the hash function Git uses to identify objects. Their techniques may eventually be used to conduct collision-based attacks against Git users. Fortunately those same researchers also provided a way to detect content that is trying to exploit this technique to create collisions. In March, GitHub.com began using that implementation to prevent it being used as a potential platform for conducting collision attacks.

Git 2.13 ships with similar changes, and will detect and reject any objects that show signs of being part of a collision attack. The collision-detecting SHA-1 implementation is now the default. The code is included with Git, so there's no need to install any additional dependencies. Note that this implementation is slower than the alternatives, but in practice this has a negligible effect on the overall time of most Git operations (because Git spends only a small portion of its time computing SHA-1 hashes in the first place).

In other collision detection news, efforts have continued to develop a transition plan and to prepare the code base for handling new hash functions, which will eventually allow the use of stronger hash algorithms in Git.

[collision detection, SHA-1 transition 1, SHA-1 transition 2]

More convenient pathspecs

You've probably passed path arguments to Git before, like:

$ git log foo.c
$ git grep my_pattern program.rb

But you may not have known that to Git, the foo.c and program.rb arguments are actually pathspecs, a Git-specific pattern for matching paths. Pathspecs can be literal paths, prefixes, or wildcards:

$ git log Documentation/      # Everything under the Documentation/ directory
$ git log '*.c'               # C files anywhere in the tree

But they also have a powerful extension syntax. Pathspecs starting with :(magic) enable special matching features. The complete list can be found in the pathspec section of git help glossary, but let's look at a few here.

For instance, you may want to exclude some files from a grep, which you can do with the :(exclude) directive:

$ git grep this.is.a src
src/foo.c:this is a C file
src/foo.rb:this is a ruby file

$ git grep this.is.a -- src ':(exclude)*.c'
src/foo.rb:this is a ruby file

There are a few things to note in that example. The first is that we had to put our pathspec after a -- (double-dash) separator. This is necessary because most Git commands actually take a combination of revisions and pathspecs. The full syntax is [<revisions>] -- [<pathspecs>]. If you omit the double-dash, Git will check each argument to see if it's either a valid object name or a file in the filesystem. But since our exclude pattern is neither, without the double-dash Git would give up and complain (this may change in a future version of Git; wildcards like *.c used to have the same problem, but the rules were recently loosened to resolve them as pathspecs). More information is available via git help cli.

The second thing to note is that typing :(exclude) is a pain, and we have to quote it from the shell. But there's a solution for that: short form pathspec magic. The short form for exclude is ! (exclamation point). This is easy to remember, since it matches the syntax in other parts of Git, like .gitignore files.

$ git grep this.is.a -- src ':!*.c'
src/foo.rb:this is a ruby file

That's shorter than exclude, but we still have to quote, since the exclamation point triggers history expansion in most shells. Git 2.13 adds ^ (caret) as a synonym for the exclamation point, letting you do the same thing without any shell quoting:

$ git grep this.is.a -- src :^*.c
src/foo.rb:this is a ruby file

Ah, much better. Technically we would need to also quote the *.c wildcard from the shell, but in practice it works out. Unless you have a file that starts with :^ and ends in .c, the shell will realize that the wildcard matches nothing and pass it through to Git verbatim.

But wait, there's more! Git 2.13 also adds the attr token, which lets you select files based on their gitattributes values. For instance, if you use Git LFS, you may want to get a list of files which have been configured to use it:

$ git ls-files
.gitattributes
README
video.mp4

$ git ls-files ':(attr:filter=lfs)'
video.mp4

You can even define your own attributes in order to group files. Let's say you frequently want to grep a certain set of files. You can define an attribute, and then select those files using that attribute:

$ echo 'libfoo/ vendored' >>.gitattributes
$ echo 'imported-tool/ vendored' >>.gitattributes
$ git grep -i license -- ':(attr:vendored)'

And if you want to get really fancy, you can combine the attr and exclude tokens:

$ git grep foobar -- ':(exclude,attr:vendored)'

Note that the attr token is not yet supported in all parts of the code. Some commands may report that it cannot be used with them, but this is likely to be expanded in future versions of Git.

[negative pathspecs, attribute pathspecs]

Conditional configuration

Git's configuration system has several levels of priority: you can specify options at the system level, the user level, the repository level, or for an individual command invocation (using git -c). In general, an option found in a more specific location overrides the same option found in a less specific one. Setting user.email in a repository's .git/config file will override the user-level version you may have set in ~/.gitconfig.

But what if you need to set an option to one value for a group of repositories, and to another value for a different group? For example, you may use one name and email address when making commits for your day job and another when working on open source. You can set the open source identity in the user-level config in your home directory and then override it in the work repositories. But that's tedious to keep up to date, and if you ever forget to configure a new work repository, you'll accidentally make commits with the wrong identity!

Git 2.13 introduces conditional configuration includes. For now, the only supported condition is matching the filesystem path of the repository, but that's exactly what we need in this case. You can configure two conditional includes in your home directory's ~/.gitconfig file:

[includeIf "gitdir:~/work/"]
  path = .gitconfig-work
[includeIf "gitdir:~/play/"]
  path = .gitconfig-play

Now you can put whatever options you want into those files:

$ cat ~/.gitconfig-work
[user]
name = Serious Q. Programmer
email = [email protected]

$ cat ~/.gitconfig-play
[user]
name = Random J. Hacker
email = [email protected]

The appropriate config options will be applied automatically whenever you're in a repository that's inside your work or play directories.

[conditional includes]

Bits and bobs

--decorate=auto is now the default for git log. When output is sent to the user's terminal, commits that are pointed to directly by a branch or tag will be "decorated" with the name of the branch. [source]
git branch's output routines have been ported to the ref-filter system shared by git for-each-ref and git tag. This means you can now use git branch --format= to get custom output. See git help for-each-ref for the list of substitutions. As a side note, these patches are from Karthik Nayak, Git's Google Summer of Code student from 2015. Though his GSoC project to introduce ref-filter was completed almost two years ago, he's continued contributing to the project. Great work! [source]
git branch, git tag, and git for-each-ref all learned the --no-contains option to match their existing --contains option. This can let you ask which tags or branches don't have a particular bug (or bugfix). [source]
git stash now accepts pathspecs. You can use this to create a stash of part of your working tree, which is handy when picking apart changes to turn into clean commits. [source]
The special branch names @{upstream}, @{u}, and @{push} are now case-insensitive. This is especially convenient as both @ and { require holding down the shift key on most keyboards, making it easy to accidentally type a capital U. Now you can hold that shift key AS LONG AS YOU WANT. [source]
More commands have learned to recurse into submodules in the past few versions of Git, including checkout, grep, and ls-files. git status --short also now reports more information about submodules. [source, source, source, source]
The last few versions of Git have cleaned up many corner cases around repository discovery and initialization. As a final step in that work, Git 2.13 introduced a new assertion to catch any cases that were missed. After being tested for months in development versions, this shouldn't trigger. But it's possible that you may see BUG: setup_git_env called without repository. If you do, please consider making a bug report. [source]

The whole kit and caboodle

That's just a sampling of the changes in Git 2.13, which contains over 700 commits. Check out the the full release notes for the complete list.

Organization owners can now limit the ability to delete repositories. The new repository deletion setting is available for all plans hosted by GitHub and will be coming to GitHub Enterprise soon.

The setting is enabled by default, allowing organization members with admin permissions for a repository to delete it. When the feature is disabled, only organization owners will be able to delete the repository.

When deleting a repository, you'll still see Danger Zone warnings. If the repository deletion setting is disabled and you are not an organization admin, you'll get a message letting you know that only owners can delete the repository.

Coming soon in the next GitHub Enterprise release

The next GitHub Enterprise release will include the same organization setting for repository deletion. In addition, there will be an appliance-level override that will limit repository deletion to only site-administrators.

The appliance-level setting is enabled by default. This allows users with admin permissions for a repository to delete it and defers to the organization-level repository deletion setting. If disabled, only site administrators will be able to delete the repository.

When the appliance-level setting is disabled, users will see a similar message as they would at the organization-level when trying to delete a repository.

Documentation and support

For more information about limiting repository deletion for your organization, see our help documentation on deleting repositories. You can learn more about how repository deletion will work on GitHub Enterprise when the next version is released.

If you have any questions, please get in touch with us. We'd be happy to help!

Last September, we launched the Early Access Program of GitHub Integrations, a new option for extending GitHub. We've recently added some new features and moved Integrations into pre-release so you can begin using it within your production workflows. Here's a summary of the latest features. You can learn more about what's changed from our Developer Blog.

Enabling users to log in from your Integration

Users can now log in with your Integration using the OAuth protocol, allowing you to identify users and display data to them from the relevant installations. Additionally, an Integration can now make authorized API requests on behalf of a user; for example, to deploy code or create an issue. Learn more about authenticating as a user via an Integration.

Updating an Integration's permissions

When you create an Integration, you have to specify which permissions it needs; for example, the ability to read issues or create deployments. Now you can update the requested permissions via Settings > Developer settings > Integrations, whenever the needs of your Integration change. Users will be prompted to accept these changes and enable the new permissions on their installation.

Post-installation setup

Finally, you now have the option to configure a Setup URL to which you can redirect users after they install your integration if any additional setup is required on your end.

Resources

More information on getting started can be found on our Developer Blog and in our documentation. We'd also love to hear what you think. Talk to us in the new Platform forum.

Today we're announcing the next major release of Git LFS: v2.1.0, including new features, performance improvements, and more.

New features

Status subcommand

With Git LFS 2.1.0, get a more comprehensive look at which files are marked as modified by running the git lfs status command.

Git LFS will now tell you if your large files are being tracked by LFS, stored by Git, or a combination of both. For instance, if LFS sees a file that is stored as a large object in Git, it will convert it to an LFS pointer on checkout which will mark the file as modified. To diagnose this, try git lfs status for a look at what's going on:

On branch master

Git LFS objects to be committed:

converted-lfs-file.dat (Git: acfe112 -> LFS: acfe112)
new-lfs-file.dat (LFS: eeaf82b)
partially-staged-lfs-file.dat (LFS: a12942e)

Git LFS objects not staged for commit:

unstaged-lfs-file.dat (LFS: 1307990 -> File: 8735179)
partially-staged-lfs-file.dat (File: 0dc8537)

Per-server configuration

Git LFS 2.1.0 introduces support for URL-style configuration via your .gitconfig or .lfsconfig. For settings that apply to URLs, like http.sslCert or lfs.locksverify, you can now scope them to a top-level domain, a root path, or just about anything else.

Network debugging tools

To better understand and debug network requests made by Git LFS, version 2.1.0 introduces a detailed view via the GIT_LOG_STATS=1 environment variable:

$ GIT_LOG_STATS=1 git lfs pull
Git LFS: (201 of 201 files) 1.17 GB / 1.17 GB

$ cat .git/lfs/objects/logs/http/*.log
concurrent=15 time=1493330448 version=git-lfs/2.1.0 (GitHub; darwin amd64; go 1.8.1; git f9d0c49e)
key=lfs.batch event=request url=https://github.com/user/repo.git/info/lfs/objects/batch method=POST body=8468
key=lfs.batch event=request url=https://github.com/user/repo.git/info/lfs/objects/batch method=POST status=200 body=59345 conntime=47448309 dnstime=2222176 tlstime=163385183 time=491626933
key=lfs.data.download event=request url=https://github-cloud.s3.amazonaws.com/... method=GET status=200 body=4 conntime=58735013 dnstime=6486563 tlstime=120484526 time=258568133
key=lfs.data.download event=request url=https://github-cloud.s3.amazonaws.com/... method=GET status=200 body=4 conntime=58231460 dnstime=6417657 tlstime=122486379 time=261003305

# ...

Relative object expiration

The Git LFS API has long supported an expires_at property in both SSH authenticate as well as Batch API responses. This introduced a number of issues where an out-of-sync system clock would cause LFS to think that objects were expired when they were still valid. Git LFS 2.1.0 now supports an expires_in property to specify a duration relative to your computer's time to expire the object.

What's next?

The LFS team is working on a migration tool to easily migrate your existing Git repositories with large objects into LFS without the need to write a git filter-branch command. We're also still inviting your feedback on our File Locking feature.

In addition, our roadmap is public: comments, questions, and pull requests are welcomed. To learn more about Git LFS, visit the Git LFS website.

That was a quick overview of some of the larger changes included in this release. To get a more detailed look, check out the release notes.

A few weeks ago, researchers announced SHAttered, the first collision of the SHA-1 hash function. Starting today, all SHA-1 computations on GitHub.com will detect and reject any Git content that shows evidence of being part of a collision attack. This ensures that GitHub cannot be used as a platform for performing collision attacks against our users.

This fix will also be included in the next patch releases for the supported versions of GitHub Enterprise.

Why does SHA-1 matter to Git?

Git stores all data in "objects." Each object is named after the SHA-1 hash of its contents, and objects refer to each other by their SHA-1 hashes. If two distinct objects have the same hash, this is known as a collision. Git can only store one half of the colliding pair, and when following a link from one object to the colliding hash name, it can't know which object the name was meant to point to.

Two objects colliding accidentally is exceedingly unlikely. If you had five million programmers each generating one commit per second, your chances of generating a single accidental collision before the Sun turns into a red giant and engulfs the Earth is about 50%.

Why do collisions matter for Git's security?

If a Git fetch or push tries to send a colliding object to a repository that already contains the other half of the collision, the receiver can compare the bytes of each object, notice the problem, and reject the new object. Git has implemented this detection since its inception.

However, SHA-1 names can be assigned trust through various mechanisms. For instance, Git allows you to cryptographically sign a commit or tag. Doing so signs only the commit or tag object itself, which in turn points to other objects containing the actual file data by using their SHA-1 names. A collision in those objects could produce a signature which appears valid, but which points to different data than the signer intended. In such an attack the signer only sees one half of the collision, and the victim sees the other half.

What would a collision attack against Git look like?

The recent attack cannot generate a collision against an existing object. It can only generate a colliding pair from scratch, where the two halves of the pair are similar but contain a small section of carefully-selected random data that differs.

An attack therefore would look something like this:

Generate a colliding pair, where one half looks innocent and the other does something malicious. This is best done with binary files where humans are unlikely to notice the difference between the two halves (the recent attack used PDFs for this purpose).
Convince a project to accept your innocent half, and wait for them to sign a tag or commit that contains it.
Distribute a copy of the repository with the malicious half (either by breaking into a hosting server and replacing the innocent object on disk, or hosting it elsewhere and asking people to verify its integrity based on the signatures). Anybody verifying the signature will think the contents match what the project owners signed.

How is GitHub protecting against collision attacks?

Generating a collision via brute-force is computationally too expensive, and will remain so for the foreseeable future. The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair.

GitHub.com now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair. That prevents attackers from using GitHub to convince a project to accept the "innocent" half of their collision, as well as preventing them from hosting the malicious half.

The actual detection code is open-source and was written by Marc Stevens (whose work is the basis of the SHAttered attack) and Dan Shumow. We are grateful for their work on that project.

Are there Git collisions?

Not yet. Git's object names take into account not only the raw bytes of the files, but also some Git-specific header information. The PDFs provided by the SHAttered researchers collide in their raw bytes, but not when added to a Git repository. The same technique could be used to generate a Git object collision, but like the generation of the original SHAttered PDFs, it would require spending hundreds of thousands of dollars in computation.

What future work is there?

Blocking collisions that pass through GitHub is only the first step. We've already been working with the Git project to include the collision detection library upstream. Future versions of Git will be able to detect and reject colliding halves no matter how they reach the developer: fetching from other hosting sites, applying patches, or generating objects from local data.

The Git project is also developing a plan to transition away from SHA-1 to another, more secure hash algorithm, while minimizing the disruption to existing repository data. As that work matures, we plan to support it on GitHub.

In honor of our Bug Bounty Program's third birthday, we kicked off a promotional bounty period in January and February. In addition to bonus payouts, the scope of the bug bounty was expanded to include GitHub Enterprise. It may come as no surprise that including a new scope meant that the most severe bugs were all related to the newly included target.

There was no shortage of high-quality reports. Picking winners is always tough, but below are the intrepid researchers receiving extra bounties.

First prize

The first prize bonus of $12,000 goes to @jkakavas for their GitHub Enterprise SAML authentication bypass report which allowed an attacker to construct a SAML response that could arbitrarily set the authenticated user account. You can read more about the story on their blog.

Second prize

The second prize bonus of $8,000 goes to @iblue for their remote code execution bug found in the GitHub Enterprise management console. This was due to a static secret mistakenly being used to cryptographically sign the session cookie. The static secret was intended to only be used for testing and development. However, an unrelated change of file permissions prevented the intended (and randomly generated) session secret from being used. By knowing this secret, an attacker could forge a cookie that is deserialized by Marshal.load, leading to remote code execution.

Third prize

The third prize bonus of $5,000 goes to @soby for their report of another GitHub Enterprise SAML authentication bypass. This attack showed it was possible to replay a SAML response to have our SAML implementation use unsigned data in determining which user account was authenticated.

Best report

The best report bonus of $5,000 goes to @orangetw for their report of Server-Side Request Forgery. This report involved chaining together four vulnerabilities to deliver requests to internal services that end up executing attacker-controlled code. The reporter supplied a clear explanation of the problem through each step and included proof-of-concept scripts for the entire journey. While this report did not earn a prize for being the most severe, it is exactly the type of report we want to encourage reporters to submit.

Other notable reports

We received a wonderful Christmas gift from @orangetw with a SQL Injection bug on GitHub Enterprise. You can read more about how they learned Rails in three days before finding the bug.

A theme we've seen continue over the years is paying out for bugs that aren't in our own code but in browsers. 2016 was no exception with @filedescriptor's report of funky Internet Explorer behavior detailing how a triple-encoded host value in a URL is handled in redirects.

Lastly, Unicode gonna Unicode. @jagracy found a way to exploit the way Unicode was normalized in our code and in our database engine to deliver password reset emails to entirely different addresses than what were intended.

Some statistics

Our "Two Years of Bounties" post has detailed stats for our submissions during the first two years of the program. These years saw a total payout of $95,300 across 102 submissions out of a total of 7,050 submissions (1.4% validity rate). So far in the third year of the program, we have paid out for 73 submissions for a total of $81,700. Many of these reports fall into our "$200 thank you" bucket. These are issues that we do not consider severe enough for an immediate fix, but that we still want to reward our researchers for. Forty-eight of these issues were deemed high enough risk to warrant a write-up on https://bounty.github.com. We saw a total of 48 out of 795 valid reports, bringing our validity rate to 6%.

In 2016, we saw a slight decrease in the number of reports compared to the average of the previous two years. While we can't know for certain, we suspect this is due to the ever-decreasing presence of "low hanging fruit." Unfortunately, we saw an increase in the number of "critical" reports. All other categories saw a decrease in the number of reports.

Out of the accepted reports, we saw some notable changes in the type of vulnerability reports submitted. Many categories such as XSS, Injection, and CSRF saw a decrease in reports. Notably, the number of valid CSRF reports for 2016 was zero. At the same time, we saw an increase in session handling bugs, sensitive data exposure, and missing function level access controls.

In April of 2016, we transitioned to the HackerOne platform. The graphs will not include all data for the year, unfortunately, but the data is still interesting. Notably, even though we had run a public bounty for almost 2.4 years, we still experienced a large spike upon announcing the transition, just like many programs' initial public launch.

We've also been able to track new data via the platform. For example, our average response was about 16 hours and our time to resolution was about 28 days. Tracking this data and keeping it within acceptable bounds will help ensure our program continues to run smoothly and efficiently.

It's more than just a bounty

We continue to see rewards being donated to charities. We absolutely love donating bounties, and we match all contributions. This year saw donations to Doctors without Borders and the Electronic Frontier Foundation (EFF).

We also sponsored an event that was aimed at helping people from under-represented backgrounds participate in bug bounties. The h1-415 event saw attendance from groups like Hack the Hood, Women in Security and Privacy (WISP), FemHacks, Lairon College Cyber Patriots, and /dev/color. Hack the Hood was nice enough to make a video from the event.

In 2016, we've learned a lot about running a bug bounty program and we've continued to uncover sharp edges of our services and codebase. Our program will continue to evolve to engage and support the community of bounty hunters that have made this program successful. We look forward to your submission in our fourth year of the program!

Best regards and happy hacking,

@GitHubSecurity

Starting today, all Markdown user content hosted in our website, including user comments, wikis, and .md files in repositories will be parsed and rendered following a formal specification for GitHub Flavored Markdown. We hope that making this spec available will allow third parties to better integrate and preview GFM in their software.

The full details of the specification are available in our Engineering Blog.

This project is based on CommonMark, a joint effort to specify and unify the different Markdown dialects that are currently available. We've updated the original CommonMark spec with formal definitions for the custom Markdown features that are commonly used in GitHub, such as tables, task lists, and autolinking.

Together with the specification, we're also open-sourcing a reference implementation in C, based on the original cmark parser, but with support for all the features that are actively used in GitHub. This is the same implementation we use in our backend.

How will this affect me and my projects?

A lot of care and research has been put into designing the CommonMark spec to make sure it specifies the syntax and semantics of Markdown in a way that represents the existing real-world usage.

Because of this, we expect that the vast majority of Markdown documents on GitHub will not be affected at all by this change. Some documents sporting the most arcane features of Markdown may render differently. Check out our extensive GitHub Engineering blog post for more details on what has changed.

Today we're announcing the next major release of Git LFS: v2.0.0.

The official release notes have the complete list of all the new features, performance improvements, and more. In the meantime, here's our look at a few of our newest features:

File locking

With Git LFS 2.0.0 you can now lock files that you're actively working on, preventing others from pushing to the Git LFS server until you unlock the files again.

This will prevent merge conflicts as well as lost work on non-mergeable files at the filesystem level. While it may seem to contradict the distributed and parallel nature of Git, file locking is an important part of many software development workflows—particularly for larger teams working with binary assets.

# This tells LFS to track *.tga files and make them lockable.
# They will appear as read-only on the filesystem.
$ git lfs track "*.tga" --lockable

# To acquire the lock and make foo.tga writeable:
$ git lfs lock foo.tga

# foo.tga is now writeable

$ git add foo.tga
$ git commit ...
$ git push

# Once you're ready to stop work, release the file so others can work on it.
$ git lfs unlock foo.tga

Everything else

Git LFS v2.0.0 also comes with a host of other great features, bug fixes, and other changes.

Transfer queue

Our transfer queue, the mechanism responsible for uploading and downloading files, is faster, more efficient, and more resilient to failure.

To dive in, visit our release notes to learn more.

Internals

Git LFS has tremendously improved internals, particularly in Git and filesystem operations. push and pull operations have been optimized to run concurrently with the underlying tree scans necessary to detect LFS objects. Repositories with large trees can begin the push or pull operation immediately, while the tree scan takes place, greatly reducing the amount of time it takes to complete these operations.

The mechanism that scans for files tracked by LFS has been enhanced to ignore directories included in your repository's .gitignore, improving the efficiency of these operations.

In Git LFS v1.5.0, we introduced the process filter (along with changes in Git v2.11) to dramatically improve performance across multiple platforms, thanks to contributions from @larsxschneider.

Thank you

Since its release, Git LFS has benefited from the contributions of 81 members of the open source community. There have been 1,008 pull-requests, 851 of which have been merged into an official release. Git LFS would not be possible without the gracious efforts of our wonderful contributors. Special thanks to @sinbad who contributed to our work on file locking.

What's next?

File locking is an early release, so we're eager to hear your feedback and thoughts on how the feature should work.

In addition, our roadmap is public: comments, questions (and pull requests) are welcomed. To learn more about Git LFS, visit the Git LFS website.

Psst! We also just announced the GitHub plugin for Unity, which brings the GitHub workflow to Unity, including support for Git LFS and file locking. Sign up for early access now.

Starting January 31, 2017, the Delegated Account Recovery feature will let you associate your GitHub account with your Facebook account, giving you a way back into GitHub in certain two-factor authentication lockout scenarios. If you've lost your phone or have otherwise lost the ability to use your phone or token without a usable backup, you can recover your account through Facebook and get back to work. See how the new recovery feature works on the GitHub Engineering Blog.

Currently, if you lose the ability to authenticate with your phone or token, you have to prove account ownership before we can disable two-factor authentication. Proving ownership requires access to a confirmed email address and a valid SSH private key for a given account. This feature will provide an alternative proof of account ownership that can be used along with these other methods.

To set up the new recovery option, save a token on the security settings page on GitHub. Then confirm that you'd like store the token. If you get locked out for any reason, you can contact GitHub Support, log in to Facebook, and start the recovery process.

The GitHub Bug Bounty Program is turning three years old. To celebrate, we're offering bigger bounties for the most severe bugs found in January and February.

The bigger the bug, the bigger the prize

The process is the same as always: hackers and security researchers find and report vulnerabilities through our responsible disclosure process. To recognize the effort these researchers put forth, we reward them with actual money. Standard bounties range between $500 and $10,000 USD and are determined at our discretion, based on overall severity. In January and February we're throwing in bonus rewards for standout individual reports in addition to the usual payouts.

And t-shirts obviously

In addition to cash prizes, we've also made limited edition t-shirts to thank you for helping us hunt down GitHub bugs. We don't have enough for everyone—just for the 15 submitters with the most severe bugs.

Enterprise bugs count, too

GitHub Enterprise is now included in the bounty program. So go ahead and find some Enterprise bugs. If they're big enough you'll be eligible for the promotional bounty. Otherwise, rewards are the same as GitHub.com ($200 to $10,000 USD). For more details, visit our bounty site.

Giving winners some extra cash doesn't mean anyone has to lose. If you find a bug, you'll still receive the standard bounties.

Happy hunting!

On Thursday, October 20th, a bug in GitHub’s system exposed a small amount of user data via Git pulls and clones. In total, 156 private repositories of GitHub.com users were affected (including one of GitHub's). We have notified everyone affected by this private repository disclosure, so if you have not heard from us, your repositories were not impacted and there is no ongoing risk to your information.

This was not an attack, and no one was able to retrieve vulnerable data intentionally. There was no outsider involved in exposing this data; this was a programming error that resulted in a small number of Git requests retrieving data from the wrong repositories.

Regardless of whether or not this incident impacted you specifically, we want to sincerely apologize. It’s our responsibility not only to keep your information safe but also to protect the trust you have placed in us. GitHub would not exist without your trust, and we are deeply sorry that this incident occurred.

Below is the technical analysis of our investigation, including a high-level overview of the incident, how we mitigated it, and the specific measures we are taking to safeguard against incidents like this from happening in the future.

High-level overview

In order to speed up unicorn worker boot times, and simplify the post-fork boot code, we applied the following buggy patch:

The database connections in our rails application are split into three pools: a read-only group, a group used by Spokes (our distributed Git back-end), and the normal Active Record connection pool. The read-only group and the Spokes group are managed manually, by our own connection handling code. This meant the pool was shared between all child processes of the rails application when running using the change. The new line of code disconnected only ConnectionPool objects that are managed by Active Record, whereas the previous snippet would disconnect all ConnectionPool objects held in memory.

The impact of this bug for most queries was a malformed response, which errored and caused a near immediate rollback. However, a very small percentage of the queries responses were interpreted as legitimate data in the form of the file server and disk path where repository data was stored. Some repository requests were routed to the location of another repository. The application could not differentiate these incorrect query results from legitimate ones, and as a result, users received data that they were not meant to receive.

When properly functioning, the system works as sketched out roughly below. However, during this failure window, the MySQL response in step 4 was returning malformed data that would end up causing the git proxy to return data from the wrong file server and path.

Our analysis of the ten-minute window in question uncovered:

17 million requests to our git proxy tier, most of which failed with errors due to the buggy deploy
2.5 million requests successfully reached git-daemon on our file server tier
Of the 2.5 million requests that reached our file servers, the vast majority were "already up to date" no-op fetches
40,000 of the 2.5 million requests were non-empty fetches
230 of the 40,000 non-empty requests were susceptible to this bug and served incorrect data
This represented 0.0013% of the total operations at the time

Deeper analysis and forensics

After establishing the effects of the bug, we set out to determine which requests were affected in this way for the duration of the deploy. Normally, this would be an easy task, as we have an in-house monitor for Git that logs every repository access. However, those logs contained some of the same faulty data that led to the misrouted requests in the first place. Without accurate usernames or repository names in our primary Git logs, we had to turn to data that our git proxy and git-daemon processes sent to syslog. In short, the goal was to join records from the proxy, to git-daemon, to our primary Git logging, drawing whatever data was accurate from each source. Correlating records across servers and data sources is a challenge because the timestamps differ depending on load, latency, and clock skew. In addition, a given Git request may be rejected at the proxy or by git-daemon before it reaches Git, leaving records in the proxy logs that don’t correlate with any records in the git-daemon or Git logs.

Ultimately, we joined the data from the proxy to our Git logging system using timestamps, client IPs, and the number of bytes transferred and then to git-daemon logs using only timestamps. In cases where a record in one log could join several records in another log, we considered all and took the worst-case choice. We were able to identify cases where the repository a user requested, which was recorded correctly at our git proxy, did not match the repository actually sent, which was recorded correctly by git-daemon.

We further examined the number of bytes sent for a given request. In many cases where incorrect data was sent, the number of bytes was far larger than the on-disk size of the repository that was requested but instead closely matched the size of the repository that was sent. This gave us further confidence that indeed some repositories were disclosed in full to the wrong users.

Although we saw over 100 misrouted fetches and clones, we saw no misrouted pushes, signaling that the integrity of the data was unaffected. This is because a Git push operation takes place in two steps: first, the user uploads a pack file containing files and commits. Then we update the repository’s refs (branch tips) to point to commits in the uploaded pack file. These steps look like a single operation from the user’s point of view, but within our infrastructure, they are distinct. To corrupt a Git push, we would have to misroute both steps to the same place. If only the pack file is misrouted, then no refs will point to it, and git fetch operations will not fetch it. If only the refs update is misrouted, it won’t have any pack file to point to and will fail. In fact, we saw two pack files misrouted during the incident. They were written to a temporary directory in the wrong repositories. However, because the refs-update step wasn’t routed to the same incorrect repository, the stray pack files were never visible to the user and were cleaned up (i.e., deleted) automatically the next time those repositories performed a “git gc” garbage-collection operation. So no permanent or user-visible effect arose from any misrouted push.

A misrouted Git pull or clone operation consists of several steps. First, the user connects to one of our Git proxies, via either SSH or HTTPS (we also support git-protocol connections, but no private data was disclosed that way). The user’s Git client requests a specific repository and provides credentials, an SSH key or an account password, to the Git proxy. The Git proxy checks the user’s credentials and confirms that the user has the ability to read the repository he or she has requested. At this point, if the Git proxy gets an unexpected response from its MySQL connection, the authentication (which user is it?) or authorization (what can they access?) check will simply fail and return an error. Many users were told during the incident that their repository access “was disabled due to excessive resource use.”

In the operations that disclosed repository data, the authentication and authorization step succeeded. Next, the Git proxy performs a routing query to see which file server the requested repository is on, and what its file system path on that server will be. This is the step where incorrect results from MySQL led to repository disclosures. In a small fraction of cases, two or more routing queries ran on the same Git proxy at the same time and received incorrect results. When that happened, the Git proxy got a file server and path intended for another request coming through that same proxy. The request ended up routed to an intact location for the wrong repository. Further, the information that was logged on the repository access was a mix of information from the repository the user requested and the repository the user actually got. These corrupted logs significantly hampered efforts to discover the extent of the disclosures.

Once the Git proxy got the wrong route, it forwarded the user’s request to git-daemon and ultimately Git, running in the directory for someone else’s repository. If the user was retrieving a specific branch, it generally did not exist, and the pull failed. But if the user was pulling or cloning all branches, that is what they received: all the commits and file objects reachable from all branches in the wrong repository. The user (or more often, their build server) might have been expecting to download one day’s commits and instead received some other repository’s entire history.

Users who inadvertently fetched the entire history of some other repository, surprisingly, may not even have noticed. A subsequent “git pull” would almost certainly have been routed to the right place and would have corrected any overwritten branches in the user’s working copy of their Git repository. The unwanted remote references and tags are still there, though. Such a user can delete the remote references, run “git remote prune origin,” and manually delete all the unwanted tags. As a possibly simpler alternative, a user with unwanted repository data can delete that whole copy of the repository and “git clone” it again.

Next steps

To prevent this from happening again, we will modify the database driver to detect and only interpret responses that match the packet IDs sent by the database. On the application side, we will consolidate the connection pool management so that Active Record's connection pooling will manage all connections. We are following this up by upgrading the application to a newer version of Rails that doesn't suffer from the "connection reuse" problem.

We will continue to analyze the events surrounding this incident and use our investigation to improve the systems and processes that power GitHub. We consider the unauthorized exposure of even a single private repository to be a serious failure, and we sincerely apologize that this incident occurred.

We're pleased to announce that version 2.0 of the GitHub Extension for Visual Studio is now available. You can install it directly from the Tools and Extensions gallery in Visual Studio, as well as from our releases page and our website.

This release allows you to list your existing pull requests and create new ones directly from Visual Studio. You can also create gists directly from your code, making it even easier to share your code and collaborate. It also includes a slew of other enhancements and fixes, available in our release notes.

As we focus our efforts on building workflows targeted towards maintainers and contributors, we'd love to hear your thoughts on how we can improve our tools. Let us know what works and what doesn't and how to make your coding and collaboration workflows easier.

GitHub Pages will upgrade to Jekyll 3.2 on August 23rd.

The upgrade to Jekyll 3.2 comes with over 100 improvements including the introduction of themes, meaning that soon, you'll be able to create a beautiful site in minutes by simply adding theme: my-awesome-theme to your site's config, without needing to copy styles or templates into your site's repository.

This should be a seamless transition for all GitHub Pages users, but if you have a particularly complex Jekyll site, we recommend building your site locally with the latest version of Jekyll 3.2.x prior to August 23rd to ensure your site continues to build as expected.

For more information, see the Jekyll changelog and if you have any questions, we encourage you to get in touch with us.

May	JUL	Sep
	25
2016	2017	2019