Add git-based detection of tags at HEAD to improve PublicRelease detection #876

georg-jung · 2022-12-15T20:13:32Z

Fixes #873.

This implements my second idea described in detail in #873.

My thoughts form there:

It would just work without any configuration needed (one might argue it's a breaking change though; could be opt-in to mitigate, wouldn't just work then).

It hits the notch that git is the only source of truth.

With more complicated pipelines and changing environments (e.g. the nbgv step vs. inside a docker build container), it is still straight forward (if .git is available, it works) and we don't need to worry about is this env var for the override available inside the container that performs the build?

It feels natural, because it does what I specified in version.json and what I specified there does work in the typical case, where GITHUB_REF is not stale.

It makes local builds where the tag is checked out equivalent to cloud builds initiated by tag push. Currently "^refs/tags/v\\d+\\.\\d+" doesn't have any effect locally if I get it right - it does in CI environments though.

This comes with parsing support for annotated tags for managed git and LibGit2. Lightweight tags pointing HEAD are considered as well as annotated tags. Nested annotated tags that indirectly point at HEAD are not considered (intentionally, changing probably wouldn't be hard). BuildingRef value is not changed.

Currently, the output of nbgv get-version -f json is extended as follows:

  "VersionHeight": 3,
  "VersionHeightOffset": 0,
  "BuildingRef": "2e6084bfeb07d7f8e426b52aa51bd7517faf5610",
  "BuildingTags": [
    "refs/tags/possiblyASecondTag",
    "refs/tags/v0.7.3",
  ],
  "Version": "0.7.3.11872",
  "CloudBuildAllVarsEnabled": false,
  "CloudBuildAllVars": {
    // ...
    "NBGV_BuildingRef": "2e6084bfeb07d7f8e426b52aa51bd7517faf5610",
    "NBGV_BuildingTags": "System.Collections.Generic.List`1[System.String]",
    "NBGV_Version": "0.7.3.11872",
    // ...
  },
  "CloudBuildVersionVarsEnabled": true,

~~Obviously the NBGV_BuildingTags value isn't here to stay. I thought I'd leave this for discussion, which of these values should be generated and what their value should be.~~ I added an [Ignore] to skip it in a3089b1.

Probably I made some opinionated decisions when putting this together, don't hesitate to change what you don't like.

* Managed git does not support annotated tags yet * json output reads: "NBGV_BuildingTags": "System.Collections.Generic.List`1[System.String]",

dnfadmin · 2022-12-15T20:13:46Z

All CLA requirements met.

georg-jung · 2022-12-15T20:39:31Z

In 3c93a89 I added a workaround for GitPack.TryGetObject which kind of violates its implicit contract (it throws if it can not get an object of the requested type). I wasn't sure about the best way to fix this. Also this is kind of unrelated to the rest of this PR.

Throwing was intentional to match HeadCanonicalName behaviour. Throwing fails the tests though. Maybe this is because the tests run in a cloud build environment and HeadCanonicalName isn't used there? this.BuildingRef = cloudBuild?.BuildingTag ?? cloudBuild?.BuildingBranch ?? context.HeadCanonicalName; This would mean that the tests would currently fail when executed locally.

Return null if no HEAD could be determined, return an empty collection if there are no matching tags.

src/NerdBank.GitVersioning/VersionOracle.cs

georg-jung · 2023-02-06T23:46:05Z

@AArnott anything I could do from my side to assist/simplify getting this merged?

AArnott · 2023-02-16T23:49:28Z

I'm looking at this now and preparing some local changes to push to your PR.

AArnott

Thanks for putting this together. I have several comments I look forward to discussing with you.

src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTag.cs

AArnott · 2023-02-17T00:58:25Z

src/NerdBank.GitVersioning/ManagedGit/GitRepository.cs

+            var tagObjId = GitObjectId.Parse(line.Substring(0, 40));
+            var refName = line.Substring(41);
+


This seems to not handle a line pattern that I see in one of my repos:

^3d2137a79ee4e3621696cfbde8d4f1c0e98bc5f6 40614e83b9b3892e05efd6155c3b61035bb5542a refs/tags/drop/a.proddiag/official.00000.00 ^2f0a22666a958c15f7e7486a0a14490f7b8e210f 9aa2dc36733d3f3076702b31a3dc754c144a9473 refs/tags/drop/a.proddiag/official.26601.00

See the leading caret? I don't know what that means, but it seems your line parsing assumes no such prefix exists.

The ^ prefix indicates a "peeled line" that lists the object ID referenced by the tag object on the previous line.

git's source supports that: https://github.com/git/git/blob/d9d677b2d8cc5f70499db04e633ba7a400f64cbf/refs/packed-backend.c#L542

A "record" here is one line for the reference itself and zero or one peel lines that start with '^'.

Thus, if I understand correctly, simply skipping these lines would be functionally equivalent to if they didn't exist, because we'd peel the refs in the next step. Having them does provide us with a way to speed things up though and hits the notch @AArnott mentioned here - we could decide if we're interested in this ref in place. Right?

I tried a tag referencing another tag referencing a commit, and the peeled line after the first tag listed the object ID of the second tag, not the commit ID. So if you implement some optimisation that uses those lines, please consider this scenario.

Good to know, thanks! Actually I think this behaviour makes sense and is also in line with what the code in this PR currently does. My take on "tags at HEAD" behaves like git tag --points-at HEAD would. Means it includes any lightweight as well as annotated tags pointing at HEAD. But not any tags that point at HEAD transitively. A git show HEAD on the other hand does include transitive tags. Not including transitive tags might lead to less lookups => increased performance. In a broken repo, lookup including transitive tags could lead to an infinite loop.

I guess transitive tags are kind of an edge case so my plan was to just not support them at this point. I think specifying "We do what git tag --points-at HEAD does" makes sense too. If the need to support transitive tags would arise, a second PR would probably be easier than this first one. What do you think @AArnott?

My main concern is that the behavior should not depend on whether the peeled lines are in packed-refs or not. Whether nbgv recursively follows tags in general is a matter of policy and either choice is fine with me.

Makes sense. You might want to take a look at my two recent commits in that regard.

src/NerdBank.GitVersioning/ManagedGit/GitPack.cs

AArnott · 2023-02-17T01:21:15Z

src/NerdBank.GitVersioning/ManagedGit/GitRepository.cs

+        {
+            if (objectId.Equals(tagObjId))
+            {
+                tags.Add(tagNameCandidate);


Filtering the tags at this point is quite late. NB.GV functions in some repos with tens of thousands of tags, and the code would have parsed and allocated multiple strings for all of them. I think we'll need to optimize this by walking the tags and immediately skipping lines with non-matching object IDs before we allocate anything.

I agree that this can be solved without the candidate list and that the number of string allocations can be reduced. The current implementation without tag support does however allocate multiple strings per packed-ref line too and it doesn't seem to be a performance hit or is there some detail I'm missing?

Looking up the target of an annotated tag might lead to an IO operation - read the relevant git pack. Thus, I'm wondering if collecting the candidates first wouldn't have a minor impact compared to opening the file.

I don't have any that large repo at hand. If this turns out to have a substantial negative performance impact, it might be an option to make annotated tag support opt-in/out, as reading the packed-refs file itself (plus the files in refs/tags) is sufficient for lightweight tag lookup. Edit: with the two commits incl. 49b9dc9 we don't need additional IO operations for annotated tags in packed-refs if peel lines are available.

I'm working on a commit to remove the candidate list and reduce the string allocations though. Edit: see a84ad8b

Without this fix we might have considered one level of transitive annotated tags, if a peel line is present in packed-refs. With this fix we also avoid reading the git pack of annotated tags that have peel lines and do not match the object id we are looking for.

KalleOlaviNiemitalo · 2023-02-17T15:25:27Z

I thought there should be something to detect if reftable is used, and fall back to LibGit2Context in that case, but…

Nerdbank.GitVersioning.ManagedGit.GitRepository already reads the packed-refs file before this PR, so it is not a new problem.
libgit2 does not support reftable, either. reftable support for libgit2 libgit2/libgit2#5352

AArnott · 2023-02-17T15:29:43Z

The build failure reproduces in main, so it's unlikely to be due to this PR. I don't know how that happened, but I'll be investigating it.

x

AArnott · 2023-02-23T21:17:30Z

/azp run

azure-pipelines · 2023-02-23T21:17:39Z

Azure Pipelines successfully started running 1 pipeline(s).

AArnott · 2023-03-07T20:47:00Z

@georg-jung I was prepared to merge this, but happened to hit an error related to the change. Can you check this out?

[20:42:24] Error: Unhandled exception: Newtonsoft.Json.JsonSerializationException: Error getting value from 'Tags' on 'Nerdbank.GitVersioning.VersionOracle'.
 ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at Nerdbank.GitVersioning.ManagedGit.GitObjectId.ParseHex(ReadOnlySpan`1 value) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitObjectId.cs:line 131
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.ReadObject(ReadOnlySpan`1 line) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs:line 100
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(ReadOnlySpan`1 tag, GitObjectId sha) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs:line 72
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(Stream stream, GitObjectId sha) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs:line 48
   at Nerdbank.GitVersioning.ManagedGit.GitRepository.LookupTags(GitObjectId objectId) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitRepository.cs:line 678
   at Nerdbank.GitVersioning.Managed.ManagedGitContext.get_HeadTags() in /home/vsts/work/1/s/src/NerdBank.GitVersioning/Managed/ManagedGitContext.cs:line 64
   at Newtonsoft.Json.Serialization.ExpressionValueProvider.GetValue(Object target)

georg-jung · 2023-03-10T08:54:53Z

@dotnet-policy-service agree

georg-jung · 2023-03-10T09:33:11Z

Hey @AArnott,

sorry for the delay. As a first measure, I tried to reproduce this and created a branch in my fork that incorporates the recent fixes in main (to make the pipelines in my repo work again) as well as this PR's code.

I wasn't able to reproduce though. The stacktrace points to a piece of code that wasn't touched by this PR - altough I don't really understand the line number in the stacktrace, as that line only reads var objectId = default(GitObjectId);. So I decided to ignore the line number for now and looked at that method in whole but I wasn't able to find any error and still came to the conclusion that this piece wasn't touched here.

What are the chances this was just some temporary hick up with the pipeline? An IO error in the checkout?

Does anything stop us form just running that pipeline again and see if its reproducible?

AArnott · 2023-03-17T21:33:50Z

The stacktrace points to a piece of code that wasn't touched by this PR - altough I don't really understand the line number in the stacktrace, as that line only reads var objectId = default(GitObjectId);

Are you looking at the same error? This is what I see:

[20:46:40] Error: Unhandled exception: Newtonsoft.Json.JsonSerializationException: Error getting value from 'Tags' on 'Nerdbank.GitVersioning.VersionOracle'.
 ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at Nerdbank.GitVersioning.ManagedGit.GitObjectId.ParseHex(ReadOnlySpan`1 value) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitObjectId.cs:line 131
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.ReadObject(ReadOnlySpan`1 line) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitAnnotatedTagReader.cs:line 100
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(ReadOnlySpan`1 tag, GitObjectId sha) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitAnnotatedTagReader.cs:line 72
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(Stream stream, GitObjectId sha) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitAnnotatedTagReader.cs:line 48
   at Nerdbank.GitVersioning.ManagedGit.GitRepository.LookupTags(GitObjectId objectId) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitRepository.cs:line 678
   at Nerdbank.GitVersioning.Managed.ManagedGitContext.get_HeadTags() in D:\a\1\s\src\NerdBank.GitVersioning\Managed\ManagedGitContext.cs:line 64
   at Newtonsoft.Json.Serialization.ExpressionValueProvider.GetValue(Object target)

That looks extremely relevant to this PR, considering this PR introduces GitAnnotatedTagReader which makes up several frames in the callstack.

What are the chances this was just some temporary hick up with the pipeline? An IO error in the checkout?

Checkout is long behind this failure. And exactly the same error happened on both Windows and linux, so no, I don't think we can right this one off.

Sometimes a failure can occur because of the particular way a git database is represented on disk, so that can be a challenge to repro on another clone of the repo. But it's not something we want to sweep under the rug, because if it can happen in a PR build, it can certainly happen on a dev box among the many thousands that this code runs on.

AArnott · 2023-03-17T22:11:06Z

I've been looking this over to try to figure out what may be going wrong, and it occurs to me that we don't have a single new test for the tags functionality in this PR. Sorry I missed this before, but we will need some tag tests to accept this PR as well.

AArnott · 2023-03-17T22:20:57Z

I added a test for Tags to your PR. If you can think of anything else to test, you're welcome to. We have the facility to check in whole repo databases in order to be able to test particular git db formats (pack files, peeled lines, etc) too.

AArnott · 2023-03-17T22:35:53Z

... and the re-build failed at exactly the same place, in the same way. So, at least it's reproducible. I can't reproduce it locally (yet).

AArnott · 2023-03-17T22:56:14Z

I pushed a commit that will capture the entire clone, including .git directory, next time it fails. That should help us repro it.

AArnott · 2023-03-20T13:20:47Z

Yay! I have a local repro by downloading the drop I changed the pipeline to produce on failure. I'm investigating now.

AArnott

This is a legitimate failure, introduced by the GitAnnotatedTagReader code, apparently.

You can repro by following these steps:

Download the drop artifact from here and use 7-zip to extract it. Complete all the rest of these steps within the unzipped directory.
Open VS to the solution within the unzipped directory.
Add Debugger.Launch() to the GitObjectId.ParseHex method.
Run dotnet publish -o ../nerdbank-gitversioning.npm/out/nbgv.cli/tools/net6.0/any from the src\nbgv directory.
Run yarn build from the src\nerdbank-gitversioning.npm directory.
When the JIT debugger attach dialog appears, have the VS with your unzipped solution open attach to the debuggee.
Configure the debugger to break on first chance exceptions, at least for IndexOutOfRangeException.
GitException is thrown a lot due to the issue of relying on exceptions to recognize the wrong object type. You can turn off break on first chance exceptions for this particular exception type.
Hit F5 till you encounter a debug assertion failure and/or the IndexOutOfRangeException that causes the ultimate failure.

Can you please look into this?

Also, given the GitException is thrown about 200 times before this failure, this is likely going to need to be addressed. I already have a local code change that I hope will address this, although it causes other failures, so I'll work through that after we get this fatal error resolved.

src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs

georg-jung added 4 commits December 15, 2022 20:37

Add wip tag lookup

317a142

* Managed git does not support annotated tags yet * json output reads: "NBGV_BuildingTags": "System.Collections.Generic.List`1[System.String]",

Add managed git support for annotated tags

f0468bc

Fix nullable warnings

df58f36

Fix mistaken comment in GitCommitReader

5f5b002

georg-jung added 2 commits December 15, 2022 21:20

Fix nullability issues and code style

dd9e383

Add workaround for improper GitPack.TryGetValue function

3c93a89

AArnott changed the title ~~Add git-based detection of tags at HEAD to improve RublicRelease detection, fix #873~~ Add git-based detection of tags at HEAD to improve PublicRelease detection, fix #873 Dec 15, 2022

georg-jung added 5 commits December 15, 2022 21:49

Fix xml doc

a5c61c4

Add some minor improvements

3b05482

Fix packed-refs handling

9c7d4c5

Make LibGit2Context behave like ManagedGit

066d82d

Return null if no HEAD could be determined, return an empty collection if there are no matching tags.

ronaldbarendse suggested changes Jan 3, 2023

View reviewed changes

src/NerdBank.GitVersioning/VersionOracle.cs Outdated Show resolved Hide resolved

ronaldbarendse mentioned this pull request Jan 4, 2023

Fix CloudBuildAllVars value formatting #882

Merged

Add [Ignore] to exclude BuildingTags from CloudBuildAllVars

a3089b1

AArnott added 2 commits February 16, 2023 17:31

Cache tags collections

46b3681

Reduce allocations for GitAnnotatedTag equality checks

c321ea2

AArnott requested changes Feb 17, 2023

View reviewed changes

georg-jung added 4 commits February 17, 2023 13:18

Skip tag candidate collection, reduce string operations

a84ad8b

Fix EnumeratePackedRefsWithPeelLines

82c1751

Clarify EnumeratePackedRefsWithPeelLines

2198efa

georg-jung added 2 commits February 22, 2023 16:25

Speedup: Use packed-refs header, consider records peeled if applicable

41652e2

x

Fix: read from disposed StreamReader

be0182b

Avoid exploring tags unless version.json needs it

89f29b1

AArnott enabled auto-merge (squash) March 7, 2023 20:41

AArnott disabled auto-merge March 7, 2023 20:46

Merge remote-tracking branch 'origin/main' into georg-jung/main

a39f728

Add first Tags test

cc2111b

AArnott added 2 commits March 17, 2023 16:54

Add annotated tag test

3373263

Capture failure drop in its entirety

cd36215

AArnott added 2 commits March 17, 2023 17:03

Add runtime check on ParseHex argument

544704a

Collect .git directory too

fe34eb2

AArnott requested changes Mar 20, 2023

View reviewed changes

src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs Show resolved Hide resolved

georg-jung mentioned this pull request May 12, 2023

Make GitPackCache include ObjectType #942

Merged

georg-jung and others added 3 commits September 15, 2023 10:49

Merge remote-tracking branch 'upstream/main'

9f52d81

Adapt to changes on main

f0008fe

Touch-ups

60462f2

AArnott approved these changes Sep 15, 2023

View reviewed changes

AArnott enabled auto-merge September 15, 2023 23:41

AArnott disabled auto-merge September 15, 2023 23:42

AArnott enabled auto-merge (squash) September 15, 2023 23:42

AArnott merged commit 9675c18 into dotnet:main Sep 16, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add git-based detection of tags at HEAD to improve PublicRelease detection #876

Add git-based detection of tags at HEAD to improve PublicRelease detection #876

georg-jung commented Dec 15, 2022 •

edited

Loading

dnfadmin commented Dec 15, 2022 •

edited

Loading

georg-jung commented Dec 15, 2022

georg-jung commented Feb 6, 2023

AArnott commented Feb 16, 2023

AArnott left a comment

AArnott Feb 17, 2023

KalleOlaviNiemitalo Feb 17, 2023

georg-jung Feb 17, 2023

KalleOlaviNiemitalo Feb 17, 2023

georg-jung Feb 17, 2023

KalleOlaviNiemitalo Feb 17, 2023

georg-jung Feb 17, 2023

AArnott Feb 17, 2023

georg-jung Feb 17, 2023 •

edited

Loading

KalleOlaviNiemitalo commented Feb 17, 2023

AArnott commented Feb 17, 2023

AArnott commented Feb 23, 2023

azure-pipelines bot commented Feb 23, 2023

AArnott commented Mar 7, 2023

georg-jung commented Mar 10, 2023

georg-jung commented Mar 10, 2023 •

edited

Loading

AArnott commented Mar 17, 2023 •

edited

Loading

AArnott commented Mar 17, 2023

AArnott commented Mar 17, 2023

AArnott commented Mar 17, 2023

AArnott commented Mar 17, 2023

AArnott commented Mar 20, 2023

AArnott left a comment

		var tagObjId = GitObjectId.Parse(line.Substring(0, 40));
		var refName = line.Substring(41);

Add git-based detection of tags at HEAD to improve PublicRelease detection #876

Add git-based detection of tags at HEAD to improve PublicRelease detection #876

Conversation

georg-jung commented Dec 15, 2022 • edited Loading

dnfadmin commented Dec 15, 2022 • edited Loading

georg-jung commented Dec 15, 2022

georg-jung commented Feb 6, 2023

AArnott commented Feb 16, 2023

AArnott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georg-jung Feb 17, 2023 • edited Loading

Choose a reason for hiding this comment

KalleOlaviNiemitalo commented Feb 17, 2023

AArnott commented Feb 17, 2023

AArnott commented Feb 23, 2023

azure-pipelines bot commented Feb 23, 2023

AArnott commented Mar 7, 2023

georg-jung commented Mar 10, 2023

georg-jung commented Mar 10, 2023 • edited Loading

AArnott commented Mar 17, 2023 • edited Loading

AArnott commented Mar 17, 2023

AArnott commented Mar 17, 2023

AArnott commented Mar 17, 2023

AArnott commented Mar 17, 2023

AArnott commented Mar 20, 2023

AArnott left a comment

Choose a reason for hiding this comment

georg-jung commented Dec 15, 2022 •

edited

Loading

dnfadmin commented Dec 15, 2022 •

edited

Loading

georg-jung Feb 17, 2023 •

edited

Loading

georg-jung commented Mar 10, 2023 •

edited

Loading

AArnott commented Mar 17, 2023 •

edited

Loading